Training Pipeline¶
Independent uv project that trains a RandomForest model to predict vehicle reservation counts from listing attributes.
Data Flow¶
graph LR
PQ[Parquet<br/>offline store] -->|--feature-store| Train
Train -->|log metrics + model| MLflow[(MLflow)]
Train -->|set candidate alias| MLflow
Running¶
# From the offline feature store (materialized by the feature pipeline):
uv run --project training python -m training \
--feature-store feast-data/vehicle_features.parquet \
--mlflow-uri http://localhost:5001
CLI options¶
| Flag | Default | Description |
|---|---|---|
--feature-store |
(required) | Path to offline store Parquet file |
--mlflow-uri |
http://localhost:5001 |
MLflow tracking server URI |
--experiment |
vroom-forecast |
MLflow experiment name |
--model-name |
vroom-forecast |
Registered model name |
What it does¶
- Loads features from the offline store (Parquet file written by the feature pipeline)
- Trains a
RandomForestRegressor(200 trees, max_depth=10) with 5-fold CV - Logs params, metrics, and model artifact to MLflow
- Registers the model version and sets the
candidatealias
Key files¶
train.py— Pipeline logic (load, train, evaluate, register)__main__.py— CLI entry pointpyproject.toml— Dependencies: pandas, scikit-learn, mlflow, pyarrow
Feature columns¶
technology, num_images, street_parked, description, price_diff
Target: num_reservations
Raw prices (
actual_price,recommended_price) are vehicle attributes used to computeprice_diffbut are not model inputs —price_diffcaptures the full pricing signal with less collinearity. See exploration.