DataChain Blog

Find here DataChain and DVC news, findings, interesting reads, community takeaways, deep dive into machine learning workflows from data versioning and processing to model productionization.

Moving Local Experiments to the Cloud with Terraform Provider Iterative (TPI) and Docker

Tutorial for easily running experiments in the cloud with the help of Terraform Provider Iterative (TPI) and Docker.

Casper da Costa-Luis
May 24, 2022 • 3 min read

May '22 Heartbeat

Monthly updates are here! You will find a link to Chip Huyen's new book, great guides and frameworks on the iterative nature of AI, tons of company news, Dmitry on TFIR, beyond machine learning use cases and more! Welcome to May!

Jeny De Figueiredo
May 16, 2022 • 8 min read

Moving Local Experiments to the Cloud with Terraform Provider Iterative (TPI)

Tutorial for easily moving a local ML experiment to a remote cloud machine with the help of Terraform Provider Iterative (TPI).

Maria Khalusova
May 12, 2022 • 7 min read

End-to-End Computer Vision API, Part 3: Remote Experiments & CI/CD For Machine Learning

In this final part, we will focus on leveraging cloud infrastructure with CML; enabling automatic reporting (graphs, images, reports and tables with performance metrics) for PRs; and the eventual deployment process.

Alex Kim
May 09, 2022 • 6 min read

Training and saving models with CML on a dedicated AWS EC2 runner (part 2)

Use CML to automatically retrain a model on a provisioned AWS EC2 instance and export the model to a DVC remote storage on Google Drive.

Rob de Wit
May 06, 2022 • 6 min read

End-to-End Computer Vision API, Part 2: Local Experiments

In part 1, we talked about effective management and versioning of large datasets and the creation of reproducible ML pipelines. Here we'll learn about experiment management: generation of many experiments by tweaking configurations and hyperparameters; comparison of experiments based on their performance metrics; and persistence of the most promising ones

Alex Kim
May 05, 2022 • 5 min read

End-to-End Computer Vision API, Part 1: Data Versioning and ML Pipelines

In most cases, training a well-performing Computer Vision (CV) model is not the hardest part of building a Computer Vision-based system. The hardest parts are usually about incorporating this model into a maintainable application that runs in a production environment bringing value to the customers and our business.