Header image generated by Dall-E 2
Did you know that DVC can track experiments? Now you can track experiments in DVC by changing a few lines of your Python code.
And with the optional DVC extension for VS Code, you have a full-fledged experiment tracking interface in your IDE!
Why?
We want to bring the DVC ethos to experiment tracking, but the learning curve for DVC can be steep. That's why we built our Python logging library DVCLive to make it easy to start.
source: https://twitter.com/untitled01ipynb/status/1593911944989270016
All you need to start is a Git repo. There are no logins, servers, databases, or UI to spin up. Every experiment run is saved in a Git commit, but those commits are hidden so they don't clutter your repo, unlike saving each run to a separate directory, or creating a Git branch for each.
From that simple starting point, DVC experiment tracking grows with your project. You don't have to decide today whether you will need to share with your team or backup to cloud storage. That's because DVC builds on top of the tools you already use and allows you to incrementally integrate them.
When you need to share, push existing experiments to your Git provider (GitHub/GitLab). When you need artifact storage, add your own cloud provider and push your existing artifacts. When you need a UI, use VS Code or add Iterative Studio for a collaborative interface.
How to start
Check out the example repo, try it out in a colab notebook, or follow the steps below to start with your own model training code.
-
Install DVC>=2.38.0 as a library in your Python environment.
$ pip install --upgrade dvc
-
Setup a DVC repo where your model training code is (or use an existing repo).
$ git init $ dvc init $ git add -A $ git commit -m "setup dvc repo"
-
In your code, enable DVC experiment tracking using DVCLive with
save_dvc_exp=True
. Use the callback for your framework or log your own metrics. You can find examples below (other frameworks available):
from dvclive.lightning import DVCLiveLogger
...
trainer = Trainer(logger=DVCLiveLogger(save_dvc_exp=True))
trainer.fit(model)
from dvclive.huggingface import DVCLiveCallback
...
trainer.add_callback(DVCLiveCallback(save_dvc_exp=True))
trainer.train()
from dvclive.keras import DVCLiveCallback
...
model.fit(
train_dataset, validation_data=validation_dataset,
callbacks=[DVCLiveCallback(save_dvc_exp=True)])
from dvclive import Live
with Live(save_dvc_exp=True) as live:
live.log_param("epochs", NUM_EPOCHS)
for epoch in range(NUM_EPOCHS):
train_model(...)
metrics = evaluate_model(...)
for metric_name, value in metrics.items():
live.log_metric(metric_name, value)
live.next_step()
4. Run your code and track the experiment results.
# Show the experiments table in the terminal.
$ dvc exp show
────────────────────────────────────────────────────────────────────────────────────
Experiment Created train_loss epoch step encoder_size
────────────────────────────────────────────────────────────────────────────────────
workspace - 0.020196 4 500 512
main Dec 06, 2022 - - - -
├── c1759a5 [quare-foil] 08:55 PM 0.020196 4 500 512
├── affedee [bitty-tass] 08:55 PM 0.02038 4 500 256
├── a5bdc18 [murky-emeu] 08:55 PM 0.016396 4 500 128
├── 744f3b6 [sworn-wage] 08:54 PM 0.01972 4 500 64
└── 0c3ac81 [named-gaby] 08:54 PM 0.031206 4 500 32
────────────────────────────────────────────────────────────────────────────────────
# Plot the diff of all experiments in an HTML file.
$ dvc plots diff $(dvc exp list --name-only)
file:///Users/dave/Code/dvclive-exp-tracking/dvc_plots/index.html
Open the HTML to see the plots:
Stay tuned
That's all there is to it! There's lots more coming for DVC experiment tracking, including:
-
Showing you where to go from here. Share your experiments, add data or pipelines, and use DVC without ever leaving your notebook or Python IDE.
-
Adding more DVCLive features. Share realtime updates to Iterative Studio, log data and model artifacts, and compare experiments in Python.
Try out the repo or colab notebook and let us know what you think in Discord or GitHub.