Airflow¶
Orchestration layer for ML pipelines. Contains the Dockerfile and DAG
definitions. No ML dependencies — pipelines run via BashOperator + uv run.
DAG Flow¶
graph LR
subgraph "Materialize (schedule=None, manual)"
S[seed] --> M[materialize]
end
subgraph "Training (schedule=None, manual)"
T[train]
end
subgraph "Promotion (schedule=None, manual)"
P[promote]
end
subgraph "Pipeline (schedule=None, orchestrator)"
TT[trigger_training] --> TP[trigger_promotion]
end
P -.->|Redis pub/sub| SRV[Serving reloads model]
DAGs¶
| DAG | Schedule | Tasks | Description |
|---|---|---|---|
vroom_forecast_materialize |
Manual | seed, materialize | Seed DB + compute features, write Parquet + Redis |
vroom_forecast_training |
Manual | train | Train from offline store, register candidate |
vroom_forecast_promotion |
Manual | promote | Compare candidate vs champion, promote if better, notify via Redis |
vroom_forecast_pipeline |
Manual | trigger_training, trigger_promotion | Orchestrator: chains training → promotion (used by UI "Train" button) |
Materialization Pipeline¶
The materialize DAG runs two steps that populate the feature stores:
graph LR
CSV[vehicles.csv<br/>reservations.csv] -->|seed.py| DB[(SQLite)]
UI[Serving API] -->|save vehicle| DB
DB -->|pipeline.py| COMPUTE[Compute Features<br/><i>price_diff<br/>num_reservations</i>]
COMPUTE --> PQ[Parquet<br/>offline store<br/><i>fleet vehicles only</i>]
COMPUTE --> RD[(Redis<br/>online store<br/><i>new arrivals only</i>)]
Step 1: Seed (seed.py)¶
Loads vehicles.csv and reservations.csv into SQLite. Idempotent — safe to
run multiple times. Vehicles are inserted with source='csv'.
Step 2: Materialize (pipeline.py)¶
Reads all vehicles + reservations from SQLite, computes derived features, and writes to both stores:
- Fleet vehicles (observed
num_reservations) → Parquet (offline store) — used for training and fleet display - New arrivals only (
num_reservations IS NULL) → Redis viastore.write_to_online_store()— used for real-time inference
cd features && uv run python pipeline.py \
--db /feast-data/vehicles.db \
--feast-repo feature_repo \
--parquet-path /feast-data/vehicle_features.parquet
Derived features computed¶
| Feature | Formula |
|---|---|
price_diff |
actual_price - recommended_price |
num_reservations |
COUNT(reservations) per vehicle — NULL for ui-sourced vehicles, 0 for csv vehicles with no bookings |
Running¶
# Via Airflow (recommended — handles seed + materialize in order):
docker compose exec airflow airflow dags trigger vroom_forecast_materialize
# Or manually:
cd features && uv run python seed.py --data-dir ../data --db /feast-data/vehicles.db
cd features && uv run python pipeline.py --db /feast-data/vehicles.db \
--feast-repo feature_repo --parquet-path /feast-data/vehicle_features.parquet
How it works¶
Airflow doesn't install any ML dependencies. Each task runs:
uv run --project <project> python -m <module> [args]
# or
cd features && uv run python pipeline.py [args]
uv creates an isolated venv inside the container on first run.
Key files¶
Dockerfile— Extendsapache/airflow:2.10.5-python3.12, addsuv, copies sub-projectsdags/vroom_forecast_materialize.py— Feature materialization DAGdags/vroom_forecast_training.py— Training DAGdags/vroom_forecast_promotion.py— Promotion DAGdags/vroom_forecast_pipeline.py— Pipeline orchestrator DAG
Credentials¶
The admin user is created explicitly in docker-compose.yml with credentials
admin / admin. The password is also written to: