Is it possible to export a plot generated using dvc plots diff HEAD main
to vega-lite for use in CML?
Thanks for the awesome question @dominic!
You can use the dvc plots diff --show-vega
command to export the plot to
vega-lite on a single graph. You'll need to run the following command:
$ dvc plots diff HEAD main --targets prediction.json --show-vega > vega.json
You can also include this plot in a comment with CML so that it appears on your pull requests in GitHub.
What is the difference between dvc pull
and dvc checkout
?
Great question @Derek!
Here are some explanations around how dvc pull
and dvc checkout
work.
They're comparable to git pull
and git checkout
.
dvc pull
fetches data from your remote cache to your local cache and syncs it to your workspacedvc checkout
syncs data from your local cache to your workspace
Is there a way to add all of the outs
of a foreach
job to the deps
of a downstream stage?
Very interesting question from @mathematiguy!
One way to do this is to have all foreach
stages write out to different paths
within the same directory and then track the entire directory as a dependency of
your downstream stage.
Here's an example of how that might look in your dvc.yaml
file.
stages:
cleanups:
foreach:
- raw1
- labels1
- raw2
do:
cmd: echo "${item}" > "data/${item}"
outs:
- data/${item}
reduce:
cmd: echo file > file
deps:
- data
outs:
- file
Is there a way to version and move data from one cloud storage to another with DVC remotes?
Wonderful question from @Hisham!
There are a couple of ways you can do this. One approach is to use
dvc add --to-remote
.
The other approach is to use the
import-url --to-remote
functionality. The main difference between these approaches is that
dvc import-url
has the added benefit of keeping a connection to the data
source so it can be updated later with dvc update
.
You can see an example of how to do this in the docs. Just make sure that you have your remotes set up!
If I'm using Feast feature store, is it possible to version datasets with DVC?
This is a good integration question from @Bernardo Galvao!
If you want to fetch historical features from the offline store to generate training data, a typical pattern would be to write the script to do so and set up a DVC pipeline stage to track that script and version the output file. This is similar to how a lot of people use DVC alongside SQL databases.
How can I run a DVC pipeline in a Docker container?
Nice question from @Anudeep!
Here's an example of a Dockerfile with a simple DVC setup.
FROM ubuntu:latest
RUN apt-get update && apt install -y python-is-python3 python3-pip
WORKDIR /dvc_project
COPY . .
pip install -r requirements.txt # assuming your requirements, including dvc, are here
CMD dvc pull && dvc exp run
You would save this file and then run the following commands in your terminal.
$ docker build -t "myproject-dvc-exp-run" .
$ docker run myproject-dvc-exp-run
You could also use the dvc repro
command or any of the other DVC commands.
How can I reset a repository and start fresh with DVC?
Nice question from @strickvl!
The best approach for resetting a repo is to run the dvc destroy
command that
will remove all DVC file and internals from your repository.
Is there an example of using CML with GCP that can be used as a reference?
Excellent question from @sabygo!
Here is a GitHub Actions snippet to get you started:
jobs:
setup:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: iterative/setup-cml@v1
- name: Deploy runner
env:
GOOGLE_APPLICATION_CREDENTIALS_DATA: ${{ secrets.GCP_CML_RUNNER_KEY }}
run: |
cml runner \
--single \
--labels=cml-gcp \
--token=${{ secrets.GCP_SECRET }} \
--cloud=gcp \
--cloud-region=us-west \
--cloud-type=e2-highcpu-2
test:
needs: [setup]
runs-on: [self-hosted, cml-gcp]
steps:
- uses: actions/checkout@v2
# - uses: iterative/setup-cml@v1
- run: |
echo "model training"
Can I use preemptive instances provided by GCP as a cml-runner
?
Good question from @Atsu!
Yes! You can use cml runner --cloud-spot
to request a preemptive instance.
At our June Office Hours Meetup we will be the launch party for our new MLOps tool! Make sure you join us to find out what it is! RSVP for the Meetup here to stay up to date with specifics as we get closer to the event!
Join us in Discord to get all your DVC and CML questions answered!