experiment

This module have all function for initiating pipeline and training

class PipeLine(pplid=None)[source]

Bases: object

khgkjv

clean()[source]
get_path(of, pplid=None, args=None)[source]

Generate a standardized file path for various experiment artifacts.

Constructs and returns a file path based on the type of file (of), experiment ID, epoch number, and batch index, where applicable. Automatically creates necessary directories if they do not exist.

Parameters:
  • of (str) – The type of file to retrieve the path for. Supported values: - β€œconfig”: Configuration file path. - β€œweight”: Model weights file path. - β€œgradient”: Saved gradients file path. - β€œhistory”: Training history file path. - β€œquick”: Quick config file path.

  • pplid (str, optional) – Experiment ID. If not provided, uses the currently set self.pplid.

  • epoch (int, optional) – Epoch number. Required for weight and gradient file paths. For weights, if not specified, the best epoch from config is used.

  • batch (int, optional) – Batch index, required for gradient file paths.

  • args (Dict | None)

Returns:

Full path to the specified artifact as a string with forward slashes.

Return type:

str

Raises:

ValueError – If pplid is not set or invalid. If required parameters (epoch, batch) are missing for gradient paths. If the of argument is not one of the supported values.

is_running()[source]
load(pplid, prepare=False)[source]

Load a pipeline configuration from disk

Parameters:
  • pplid (str)

  • prepare (bool)

load_component(loc, args=None, setup=True)[source]
Parameters:
  • loc (str)

  • args (Dict[str, Any] | None)

  • setup (bool)

new(pplid=None, args=None, prepare=False)[source]

Create a new experiment configuration and initialize its tracking files.

Parameters:
  • pplid (str, optional) – Unique experiment identifier. Raises ValueError if it already exists.

  • args (dict, optional) – Configuration arguments for the experiment.

  • prepare (bool, optional) – If True, calls self.prepare() after creation. Defaults to False.

Raises:
  • ValueError – If the experiment ID already exists or if monitor mode is invalid.

  • KeyError – If β€˜metrics’ key is missing from settings.

  • Behavior –

  • -------- –

  • - Checks if the experiment ID already exists; raises an error if so. –

  • - Checks if the same configuration already exists using verify. –

  • - Initializes configuration dictionary with metadata. –

  • - Saves the configuration. –

  • - Creates an empty history CSV with columns for training and validation metrics and loss. –

  • - Initializes quick checkpoint file with default best and last epoch metrics. –

  • - Appends experiment metadata to the main experiments CSV. –

  • - Optionally calls self.prepare() if prepare=True. –

Return type:

None

property paths
prepare()[source]

Prepare the experiment by loading model, optimizer, metrics, loss, and data loaders.

Loads components according to current configuration, initializes data loaders, and sets the best metric value based on the stored history and strategy.

Raises:
  • ValueError – If strategy monitor mode is not β€˜min’ or β€˜max’.

  • Behavior –

  • -------- –

  • - Loads model and moves it to device. –

  • - Loads optimizer with model parameters. –

  • - Loads metrics and loss functions to device. –

  • - Creates training and validation data loaders. –

  • - Loads last saved model weights. –

  • - Initializes the best metric value from saved checkpoints or sets default. –

  • - Sets internal flag _prepared to True on success. –

Return type:

None

reset()[source]
run()[source]
Return type:

None

property should_running
status()[source]
stop_running()[source]
verify(*, pplid=None, cnfg=None)[source]

Check whether a given experiment ID exists in the experiment database.

Queries the experiments table to verify whether the specified experiment ID is recorded.

Parameters:
  • pplid (str) – The experiment ID to check.

  • cnfg (Dict | None)

Returns:

Returns the pplid if it exists in the database, otherwise returns False.

Return type:

Union[str, bool]

Examples

>>> pipeline.verify("exp_001")
'exp_001'
>>> pipeline.verify("nonexistent_exp")
False
class TransferContext[source]

Bases: object

Runtime context for remapping paths and components on remote.

map_cnfg(cnfg)[source]
map_loc(loc, pplid)[source]
Parameters:
  • loc (str)

  • pplid (str)

Return type:

str

map_src(src, pplid)[source]
Parameters:
  • src (str)

  • pplid (str)

Return type:

str

archive_ppl(ppls, reverse=False)[source]
Parameters:
  • ppls (List[str])

  • reverse (bool)

Return type:

None

delete_ppl(ppls)[source]

Permanently delete archived pipelines, including config files, logging files, and database records.

Parameters:

ppls (list[str]) – List of pipeline IDs to delete from archive.

Return type:

None

filter_ppls(query, ppls=None, params=False)[source]

Filters pipelines based on a query string applied to their configurations.

Parameters:
  • query (str) – A query string used to filter pipeline configurations.

  • ppls (list or None, optional) – List of pipeline IDs to filter. If None, all pipelines are considered.

  • params (bool, optional) – Whether to return parameters of matching pipelines along with their IDs.

Returns:

Filtered list of pipeline IDs or tuples of (pplid, params) if params is True.

Return type:

list

get_ppl_details(ppls=None)[source]
Parameters:

ppls (list | None)

Return type:

DataFrame

get_ppl_status(ppls=None)[source]
Parameters:

ppls (list | None)

Return type:

DataFrame

get_ppls()[source]

Retrieves a list of all pipeline IDs from the database.

Returns:

A list containing all pipeline IDs.

Return type:

list of str

group_by_common_columns(records)[source]

Group pipeline records by their common set of DataFrame columns.

Parameters:
  • (dict) (records) – (e.g., training histories with various metrics).

  • records (Dict[str, DataFrame])

Returns:

dict – pipeline IDs sharing that column structure.

Return type:

A dictionary mapping each unique set of column names (as a frozenset) to a list of

Example

>>> records = {
...     "exp1": pd.DataFrame(columns=["epoch", "train_loss", "val_loss"]),
...     "exp2": pd.DataFrame(columns=["epoch", "train_loss", "val_loss"]),
...     "exp3": pd.DataFrame(columns=["epoch", "accuracy", "val_accuracy"])
... }
>>> group_by_common_columns(records)
{
    frozenset({'epoch', 'train_loss', 'val_loss'}): ['exp1', 'exp2'],
    frozenset({'epoch', 'accuracy', 'val_accuracy'}): ['exp3']
}
transfer_ppl(ppls, transfer_type='export', mode='copy', env=True)[source]

Transfers pipeline data between main storage and transfer folder.

Parameters:
  • (list[str]) (ppls)

  • (str (mode) – β€˜export’ moves data from main storage to transfer folder, β€˜import’ moves data from transfer folder back to main storage.

  • optional) (Transfer mode, either 'copy' (default) or 'move'.) – β€˜export’ moves data from main storage to transfer folder, β€˜import’ moves data from transfer folder back to main storage.

  • (str – β€˜copy’ duplicates files, β€˜move’ relocates files.

  • optional) – β€˜copy’ duplicates files, β€˜move’ relocates files.

  • ppls (List[str])

  • transfer_type (str)

  • mode (str)

Raises:

ValueError – If transfer_type or mode is invalid,: or if any pipeline ID is not found in the source records.

Return type:

None