Welcome to Summer!
From the Community
As usual we have a ton of goodness from the Community! Let's jump in!
Antoine Toubhans' Post Combining Streamlit and DVC!
Antoine Toubhans of Sicara wrote a fantastic and detailed tutorial entitled How to Build Customizable Web UI for Machine Learning with Streamlit and DVC bringing together the best of DVC and integrating it with Streamlit to provide a customizable UI. The tutorial goes through the steps of setting up a pipeline, spltting a dataset, training and evaluating a model, tracking changes to data and model, dvc metrics and plots and then bridging the gap in visualizations using Streamlit. You won't want to miss this one!
DVC + Streamlit = ♥️! Source link
DVC and CML in Japanese!
For our friends that speak Japanese, these slides created by Yusuke Shibui walk you through a machine learning to production project using DVC and CML. We love seeing our tools being used all around the world! 🌏
DVC and CML in Japanese! Source link
Miguel Méndez' DVC Tutorial
Miguel Méndez and his team at Gradiant struggled with reproducibility before using DVC for versioning their image dataset and annotations. The dataset and annotaions are held in a shared storage space and used by the whole team. DVC enables the team to track changes and know what versions of the dataset produce the best results. His tutorial walks you through the steps to set it up!
Version Control Your Dataset with DVC
Jobs requiring DVC!
We have been seeing an uptick in the number of jobs requiring knowledge of DVC. It's exciting to see that our tools are helping these companies in their MLOps workflows! 🎉
Learning Opportunities
With all those DVC job opportunities out there, you better get on it! 😉
A New Udacity Course Incorporating DVC!
Just this month a new Udacity nannodegree program came out entitled Machine Learning DevOps Engineer, that teaches DVC as part of the program. This course includes sections on:
- Clean Code Principles
- Building a Reproducible Model Workflow
- Deploying a Scalable ML Pipeline in Production
- Automated Model Scoring and Monitoring
Machine Learning DevOps Engineer
DVC Learn
This week we kicked off our new DVC Learn Meetup series with Milecia McGregor. This set of three, short, half-hour classes are designed to get you up and running in DVC. If you are just getting started with DVC or kicking the tires, this Meetup series is for you! Our next class on August 4th will get you started with experiments.
If you are interested in weighing in on what kinds of educational content you would like to see from us, we'd be grateful if you'd fill out this survey to help us plan! 🙏🏼
DVC Learn - Getting Started: Experiments
Data Science Journal Article on Reproducibility Practices in Research
New research presented in the Data Science Journal aims to provide best practices for providing reproducibility in research datasets. This is necessary to pinpoint the version of the dataset that grounds any research. In this work the authors reviewed 39 use cases from 33 organizations to arrive at six principles for versioning datasets. These include Revision, Release, Granularity, Manifestation, Provenance and Citation. See the full work below. 👇🏼
Versioning Data is About More Than Revisions: A Conceptual Framework and Proposed Priniciples
June Office Hours Meetup
The June Office Hours Meetup was 🔥! Amazing discussion on experiments ignited
by Sami Jawhar of
Kernel around experiment use cases and workflows.
You can
find the repo for his presentation here
and watch all the great DVC discussion below.
DVC News
Summer and vaccinations mean travel! ☀️💉 And that travel has enabled some of our team members to get together! Pictured below are Dmitry Petrov, Alexander Guschin, Max Shmakov, Mikhail Rozhkov, Sergey Kryukov, Mikhail Sveshnikov, and Guro Bokum… But not necessarily in that order.
The first person to guess the correct order of our teammates starting from the upper right of the picture moving clockwise, and post in the corresponding Twitter Heartbeat post, will win some DVC SWAG! Hint: If you've been wondering why there are random purple letters in this blog post, they're a clue to this cipher. 🧐
Team Meetup in Moscow! (hand signals obscured for our UK friends, because we care! 🤗)
New Team Member
David de la Iglesia Castro is the third teammate joining us from Spain! 🇪🇸 And also the third David! He hails from Galicia and has been an active member of our Community for over two years. We are so excited to have him join the team as a software enginer where he will work to improve DVC Live. When he's not contributing to DVC, David likes to go climbing, surfing or just hiking whenever he can! Welcome David!
Open Positions
And yes indeed, we are still hiring! Use this link to find details of all the positions including:
- Senior Front-End Engineer (TypeScript, Node, React)
- Senior Software Engineer (ML, Dev Tools, Python)
- Senior Software Engineer (ML, Data Infra, GoLang)
- Machine Learning Engineer/Field Data Scientist
- Developer Advocate (ML)
- Director/VP of Engineering (ML, DevTools)
- Director/VP of Product (ML, Data Infra, SaaS)
- Director/VP of Operations/Chief of Staff
Please pass this info on to anyone you know that may fit the bill. We look forward to new team members! 🎉
Next Meetup
Don't miss our Meetup July 28th at 2:00 pm UTC (7:00 am PDT), where João Santiago of Billie will present "DVThis" a set of utility functions for DVC pipelines using R scripts. Additionally the project aims to document the usual workflows of a DVC pipeline using these scripts and create templates for the use of DVC and R together.
Following Santiago, team member Tapa Dipti Sitaula will give a demo of DVC Studio! Bring your questions; we look forward to seeing you!
DVThis
Tweet Love ❤️
Fantastically detailed tutorial from @AntoineToubhans on how to build a customizable web UI for #MachineLearning with @Streamlit and @DVCorg! 🐍🎈https://t.co/zrZCueWk0n
— DataChazGPT (not a bot) (@DataChaz) June 30, 2021
Do you have any use case questions or need support? Join us in Discord!
Head to the DVC Forum to discuss your ideas and best practices.