Framework Overview
This framework provides a modular and reproducible environment for managing experiments. It introduces three core concepts — Component, Workflow, and Pipeline — each representing a different layer of abstraction and flexibility in the experimental process.
These concepts allow users to design, execute, and reproduce complex computational or machine learning experiments with a clear separation of concerns between logic, configuration, and execution history.
1. Component
A Component is the smallest reusable unit in the framework. It represents a functional block — such as a data loader, model, optimizer, or evaluator — that performs a single, well-defined task.
A component can be:
A Python function or class encapsulated in a file.
A reference to a shared resource, library, or custom code base.
A configurable unit, whose behavior depends on arguments specified at runtime.
Key Ideas
Reusability: Components can be reused across different workflows and pipelines.
Configurability: Each component can have a flexible configuration (args) dictionary.
Isolation: Components do not depend directly on other components — they only define what they do, not when they run.
Example
{
"loc": "components.data.load_dataset",
"args": {
"path": "./data/train.csv",
"batch_size": 32
}
}
In this example, the component loads a dataset and exposes it to downstream steps. The “loc” field points to the function to execute, and “args” defines how it behaves.
This separation of location and arguments allows the same logical component to be reused in many workflows with different parameters — increasing modularity.
2. Workflow
A Workflow defines the structure and order in which components are executed. While a component is a single building block, a workflow describes how components connect.
Think of a workflow as a template or blueprint for a process. It defines what happens and in what sequence but does not fix the data or model parameters.
Key Ideas
Declarative Composition: A workflow lists components and their dependencies.
Dynamic Construction: Workflows can be loaded from JSON, YAML, or Python definitions.
Reproducibility: The same workflow can be reused across multiple runs, ensuring consistency.
Example
{
"workflow": {
"loc": "components.training.supervised_training",
"template": ["load_data", "build_model", "train", "evaluate"]
},
"args": {
"load_data": {"loc": "components.data.load_dataset", "args": {"path": "data/train.csv"}},
"build_model": {"loc": "components.model.create_cnn", "args": {"num_layers": 5}},
"train": {"loc": "components.training.train_epoch", "args": {"epochs": 10}},
"evaluate": {"loc": "components.eval.compute_accuracy", "args": {}}
}
}
Here, the workflow specifies the topology of execution (the “template”) and how each step is realized via components.
Each step can be substituted or reconfigured without breaking the structure. This makes workflows flexible, composable, and shareable across projects.
3. Pipeline
A Pipeline is a runtime instantiation of a workflow with concrete settings, logs, and status tracking.
While a workflow defines what should happen, a pipeline defines what actually happened.
It binds together: - The workflow definition. - The specific component configurations. - Metadata about the environment, logs, and execution history.
Each pipeline has a unique identifier (pplid) and is tracked in a database (ppls.db), which stores: - Pipeline metadata (hashes, creation time, status). - Relationships between pipelines (via the edges table). - Active runs (runnings table).
Pipeline Lifecycle
Creation: A pipeline is created using PipeLine(pplid=…), loading its configuration and workflow.
Preparation: It sets up its directories, loads components, and initializes resources.
Execution: The pipeline runs through its workflow components in order (or dynamically).
Status Tracking: Each run is recorded in the database, making pipelines reproducible and auditable.
Archival / Transfer: Finished pipelines can be archived, deleted, or transferred between environments, preserving their full state.
Flexibility
You can create many pipelines from a single workflow with different component arguments.
You can rerun or resume a pipeline at any stage.
Pipelines can be programmatically filtered, grouped, and compared using utilities like: -
experiment.get_ppl_status()-experiment.filter_ppls()-experiment.group_by_common_columns()
4. Hierarchical View
Level |
Represents |
Purpose |
|---|---|---|
Component |
A single reusable operation |
Define atomic behavior (e.g., load, preprocess, train, evaluate). |
Workflow |
A structured composition of components |
Define how components connect. Manage process logic and dependencies. |
Pipeline |
A concrete, executable instance of a workflow |
Execute and track a specific run. Store results and ensure reproducibility. |
Together, these layers create a flexible, declarative, and traceable experimental system that supports both research iteration and production reproducibility.
5. Design Philosophy
Modularity: Each layer is independent and composable.
Transparency: All configurations and runs are logged and queryable.
Reproducibility: Every experiment can be reloaded, re-executed, or audited.
Portability: Pipelines and workflows can be transferred between machines or environments.
Scalability: Supports many experiments with shared or divergent configurations.
This design makes it easy to iterate on ideas quickly while preserving the integrity and traceability of experimental data — crucial for scientific and ML workflows alike.