When I run dvc repro
on a stage, does it automatically push any outputs to my remote?
Great question from @tina_rey!
The dvc repro
command doesn't automatically push any outputs or data to your
remote. The outputs are stored in the cache until you run dvc push
, which then
pushes them from your cache to your remote.
Is dvc dag
based on deps
and outs
, so that a stage that depends on the output of another stage will always be executed after the former has finished?
This is a good question from @johnysku!
That is correct! If the pipelines are independent or the stages are independent, they may run in any order. Without explicit dependency linkage, stages could be executed in an unexpected order.
If I want to use the foreach
utility in dvc repro
, is there a way I can use glob patterns to create the list DVC needs to iterate over?
Another interesting question from @copah!
If you have mystage
which uses foreach
, you can do dvc repro
to mystage
to iterate over every mystage
stage.
How does DVC handle files that have been deleted from remote storage?
Really good question from @Meme Philosopher!
DVC will fail when you try to pull files that have been deleted from the remote and notify you that those files are missing in remote storage.
Can I separate CML running from GitHub actions VM to work with GCP or AWS so training and testing are in these cloud environments?
Thanks for the question @Atsu!
This is supported out-of-the-box! Here's how it works:
- Within Github Actions, CML launches a
self-hosted runner on GCP or AWS
using
cml runner --labels=cml --cloud=gcp
/--cloud=aws
- GitHub Actions runs the rest of the workflow on the self-hosted runner using
runs-on: [self-hosted, cml]
and the maximum allowabletimeout-minutes: 4320
- If GitHub Actions is about to timeout, CML will restart the workflow, so make sure your code regularly caches and restores data if it's expected to take >3 days to run.
You can follow along with this doc to get started.
The key is requesting GitHub's
maximum timeout-minutes: 4320
.
This signals to CML to
restart the workflow
just before the timeout. You'll also have to write your code to cache results so
that the restarted workflow will use previous results (e.g. use
https://dvc.org/doc/user-guide/experiment-management/checkpoints#caching-checkpoints
and https://github.com/iterative/dvc/issues/6823)
When running an experiment from the web interface with DVC, is there any way to get the new metrics to show on the commit created by Iterative Studio for the experiment?
Awesome question about Studio from @Benjamin-Etheredge!
In order to show the experiment results in Studio, you would have to commit and push the results as part of your CI (continuous integration) action. Here's an example GitHub action script that does this.
We do understand that it is not ideal that there are 2 commits, one with your changes and one with the results. We have been thinking about how this can be improved and it would be great to hear if you have any thoughts/ideas!
Is there a way to get DVC to import from a private repository?
Good question from @qubvel!
You can use SSH to handle this and run the following command:
$ dvc import [email protected]:<reposiotry location> <data_path>
If I use a local remote and a shared cache, will the data be symlinked from the remote to the cache?
Very interesting question from @cajoek!
The data will not be symlinked from the remote to the cache.
Sometimes we can treat cache as something temporary so a lot of data that will never be used can get there from failed experiments, etc. In this case having a local remote to keep track of important data for important versions of your project would be good.
That way, later when your cache is too big and the project takes up too much
space, you can remove .dvc/cache
and download latest important version from
remote.
At our May Office Hours Meetup we will have Matt Squire of Fuzzy Labs join us sharing his view on open source MLOps tools! RSVP for the Meetup here to stay up to date with specifics as we get closer to the event!
Join us in Discord to get all your DVC and CML questions answered!