From the Community
This month we have been flooded with content from our Community. We are grateful and inspired to keep serving you!
Ricardo Manhรฃes Savii: Trying to turn Machine Learning into value
If we can't turn machine learning into value, what good are we? Ricardo Manhรฃes Savii wrote a piece in Medium where he tackles how to technically and visually define the steps to deliver an Intelligent System with the same level of best practice maturity that software development has today. He combines and synthesizes the ideas of some of the best known thinkers in the space to build a thorough architecture of machine learning best practices. You won't want to miss this post and wrap your head around these diagrams!
Ricardo Manhรฃes Savii's Addendum to Franรงois Chollet's](https://medium.com/@francois.chollet) figure on result of machine learning (Source link)
RappiBank: How to build an efficient machine learning project workflow
Continuing the theme of ML workflow Complexity, Daniel Baena wrote a great overview and tutorial piece outlining the challenges that his team at RappiBank encountered and found ways to solve with DVC including:
- confusing experiment files with different names
- disjointed messaging about training and models and dataset changes
- holding in your head or own notes progress that is not visible to the rest of the team
- heavy run and re-run times without a modularized system
Daniel shows how all of these things can be solved using DVC.๐
How to Build an Efficient Machine Learning Project Workflow Usign Data Version Control (DVC)
DAGsHub: Production Oriented Work
Next up, Nir Barazida from DAGsHub created a video on Production-oriented work using a monorepo strategy and focusing on moving from research to production-ready code using Git and DVC. If you are a data scientist trying to wrap your head around going from your notebook to production, this may help!
Production-Oriented Work with Git, DVC and DAGsHub
ML Data Versioning with DVC: How to Manage Machine Learning Data
Piotr Storoลผenko of Appsilon wrote a great tutorial taking into account the many challenges data scientists and ML engineers encounter in their data versioning efforts and how DVC solves them. Do these scenarios from his article look familiar?
Was it in
model_3final.pth
ormodel_last.pth
that I used a bigger lerning rate?When did I start using data preprocessing, during
model_2a.pth
ormodel_2aa.pth
Is
model_7.pth
trained on the new dataset or on the old one?`Oh, gosh, which set of parameters and data have I used to train
model_2.pth
? It was pretty good in the endโฆโ
Learning Opportunities
Raviraja Ganta's 10-week course on Basic MLOps
Twitter and LinkedIn were a blaze in the last month when Raviraja Ganta announced his 10-Week Course on MLOps basics. This course is chock full of resoures and practical tutorials to build your MLOps platform and knowledge. Week 3 of the course is about DVC and its ability to solve your versioning and reproducibility challenges. Be sure to check out the course repo as well!
MLOps Community is hosting him to speak about his course on October 20th. Sign up to attend here!
Raviraja Ganta's 10-Week Course on MLOps Basics (Source link)
Josh Willis video on COVID simulations with DVC
This week, this Tweet comment led me to this work by Josh Wills. Josh was tapped by DJ Patil to participate in some COVID simulation research early on in the pandemic in which he used DVC. In his presentation about the project, he tells of the tools he used and challenges of the use case. Nice DVC shout out at 19:56! Ah, the fruits of a Twitter ๐๐ณ!
September Office Hours Video: Transfer Learning with Milecia McGregor
If you missed last month's Office Hours Meetup, you can now catch the video! Milecia's presentation was based on her blog post on the same topic: Using Experiments for Transfer Learning. If you're curious about transfer learning in general, AlexNet and SqueezeNet in particular, or using DVC experiments and checkpoints to track all that you do, this video's for you!
Quoc-Tien Au: Continuously Learning on the Job as a Data Scientist
This Towards Data Science article by Quoc-Tien Au entitled "The What, Where, and How about continuously learning on the job as a data scientist," speaks to some higher points on the need to have a mindset for continuous learning in the Data Science field. It's packed with great thought processes and resources on what to learn, where to learn, and how to keep learning while still getting your work done. Who stuggles with this? ๐
DVC News
Amsterdam Off-site
Most of our team members from Europe got together in Amsterdam recently for a couple days of brainstorming and team bonding. They went on a Treasure Hunt, ate Ramen (a favorite among our team) and had great discussions on how to make our tools and our team even better! Pictured below from front of the room left, going clockwise (to the back of the room and back up) are David Ortega, Helio Machado, David de la Iglesia Castro, Laurens Duijvesteijn, Ruslan Kupriev (hidden), Dmitry Petrov, Jelle Bouwman, Batuhan Taskaya,Svetlana Sachkovskaya, and Paweล Redzyลski.
Be sure to check out this section next month as our Americas team members will meet in San Francisco!
Iterative Team Members meet in Amsterdam (Source: David Ortega))
New Team Members
Jordan Weber joins us from Los Angeles, California as our new Chief of Staff. She has previously held similar roles at venture captial and FinTech firms. In Jordan's free time she enjoys cooking, tennis, dance, and hiking! ๐พ
Ken Thom joins us from Palo Alto, California as our new Director of Operations. His past work includes business operations, product management, software and hardware development. In his spare time he likes to spend time with his family, swim, ski, and hike! ๐ฅพ
Jon Burdo joins us from Boston, Massachusetts as a Senior Software engineer. He's been working for the past few years as a machine learnng engineer with a focus on NLP. In his last role he used DVC and loved it, which is how he eventually ended up here! ๐ In his spare time, Jon likes learning about open source software, tinkering with Linux, and inline skating.
Stephanie Roy joins the team as a Senior Software Engineer from Quebec, Canada. Our first Canadian team member! She has previously worked at LogMeln on one of their mobile apps. In her spare time she likes taking care of her plants in her indoor grow house, playing roller derby, and discovering new things to watch, listen to and eat! ๐
Welcome to all our new team members! We are so glad you are here! ๐๐ผ
Open Positions
And wouldn't you know it? We're still hiring! Use this link to find details of all the positions including:
- Senior Software Engineer (ML, Labeling, Python)
- Senior Software Engineer (ML, Labeling, Python)
- Senior Software Engineer (ML, DevTools, Python)
- Field Data Scientist / Sales Engineer
- Developer Advocate (ML)
- Director / VP of Engineering (ML, DevTools)
- Director / VP of Product (ML, Data Infra, SaaS)
- Head of Talent
- Head of DevRel
Please pass this info on to anyone you know that may fit the bill. We look forward to new team members! ๐
Docs Updates
Here are a few important docs updates you may want to take a look at this month!
๐ PyTorch Lightning
We all have Ilia Sirotkin to thank for his contribution to our docs. He created the PyTorch Lightning integration docs for all to use!
๐ CML with DVC guide:
Our updated CML with DVC Guide provides updated code and streamlined information on Cloud Storage Provider credentials and GitHub Actions set up.
name: CML & DVC
on: [push]
jobs:
run:
runs-on: ubuntu-latest
container: docker://ghcr.io/iterative/cml:0-dvc2-base1
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Train model
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
pip install -r requirements.txt # Install dependencies
dvc pull data --run-cache # Pull data & run-cache from S3
dvc repro # Reproduce pipeline
- name: Create CML report
env:
REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
echo "## Metrics" >> report.md
dvc metrics diff master --show-md >> report.md
# Publish confusion matrix diff
echo "## Plots" >> report.md
echo "### Class confusions" >> report.md
dvc plots diff \
--target classes.csv \
--template confusion \
-x actual \
-y predicted \
--show-vega master > vega.json
vl2png vega.json -s 1.5 > plot.png
cml publish --md plot.png >> report.md
# Publish regularization function diff
echo "### Effects of regularization" >> report.md
dvc plots diff \
--target estimators.csv \
-x Regularization \
--show-vega master > vega.json
vl2png vega.json -s 1.5 > plot.png
cml publish --md plot.png >> report.md
cml send-comment report.md
๐ Shtab
Team member Casper da Costa-Luis has created a docs website for his python tab- completion script generator project shtab. For more info checkout the original blog post about it as well.
Next Meetups
For the second class of DVC Learn, join us to learn about getting started running experiments! This lesson will include information on how to use our checkpoints feature as well. We look forward to seeing you there!
DVC Learn - Getting Started with Running Experiments
Be sure to join us at the November Office Hours Meetup, where Maykon Shots will talk about how he used DVC and CML to create an internal Kaggle competition for his team to arrive at their best models in their work for the largest bank in Brazil.
DVC Office Hours - Creating an Internal Kaggle Competition with DVC and CML
Tweet Love โค๏ธ
This month, it was exceedingly hard to pick just one Tweet. I'm leaving you with one that ballooned our followers over the last month. But there have been many! I encourage you to visit our newly created Wall of Love โค๏ธ to see all the beautiful Iterative tool love. ๐ โค๏ธ๐ค
Startups I'm *incredibly* bullish about: @Stripe, @IterativeAI, @HuggingFace, and @Explosion_AI.
โ ๐ฉโ๐ป Paige Bailey (@DynamicWebPaige) September 7, 2021
If you're an engineer/PM considering a career change (and it's that time of the year again, no? ๐)โbut want to opt away from FAAMG, definitely consider one of the companies above.
Do you have any use case questions or need support? Join us in Discord!
Head to the DVC Forum to discuss your ideas and best practices.