August'21 Heartbeat
This month you will find:
- ๐ง๐ฝโ๐ป Data-centric for the win,
- ๐ง Comparison of DVC, MLFlow and Metaflow,
- ๐ Tutorials and Tool Stacks,
- ๐ DVC + Streamlit = โค๏ธ,
- ๐ Doc Updates,
- ๐ฅ July Meetup Video available,
- ๐ and more!
It's all about that Data!
From the Community
This month we are seeing the progression of a couple of pieces from the June Heartbeat as well as checking out a use case, tool stack, and some great tutorials of our Community members.
LJ Miranda synthesizes the MLOps space once again!
LJ Miranda writes another amazing article after the series of articles he wrote covering the MLOps tools landscape we covered in the June Heartbeat. This time he focuses on the wave of data-centric focus taking over the space giving a review of the methods, approaches, and techniques to ensure quality data for ML projects. If the adroit summaries of complex concepts doesn't thrill you, the links to no less than 63 (๐ฑ) resources will get you on your way to data-centric nirvana.
LJ Miranda's Framework for putting data-centric machine learning into context Source link
Neda Sultova's Comparison of DVC, MLFlow and Metaflow
Also covered in the June Hearbeat was Neda Sultova's piece on the rubric she is using to decide on the what MLOps tools to use for the teams at Helmholtz AI. This next article reviews her research into DVC, MLFlow and Metaflow and offers a thorough analysis of the tools across multiple dimensions. Beyond the article, check out her MLOps Comparison repository as well as her Comparison Table. They will not disappoint!
Machine Learning Lifecycle Source link
Amit Kulkarni's Tutorials
Writing for the Analytics Vidhya Data Science Blogathon, Amit Kulkarni created two tutorials on DVC. Tracking ML Experiments with Data Version Control reviews DVC and takes you through getting started, setup, fetching data and pre-processing, and the steps of an ML project. Next it sets up DVC, the pipeline, and shows how to run model metrics and plots. In MLOps| Versioning with Git & DVC, Amit continues with an explanation how data and model versioning works with Github paired with DVC.
In a previous article entitled Bring DevOps to Data Science with MLOps Amit walks through a tutorial using CML to bring CI/CD functionality to your ML project and automate the process. All great posts to check out!๐๐ผ
Tracking ML Experiments With Data Version Control
MLOps | Versioning Datasets with Git & DVC
Bring DevOps To Data Science With MLOps
Andreas Malekos' MLOps Tool Stack at Continuum Industries
Last but not least, we bring you a great article from Andreas Malekos, Chief Scientist at Continuum Industries. In the post he outlines the tool stack and MLOps platform they use to do their work automating and optimizing the design of linear infrastructure assets like water pipelines, overhead transmission lines, subsea power lines, or telecommunication cables.
Amongst their tool stack are DVC and CML, and the article outlines what they like (!๐Spoiler alert๐! DVC making repeatability achievable) and the things that they don't like that still need to be improved.
Continuum Industries MLOps Tool Stack Source link
DVC News
Though the team has been taking some vacation time in the last month, there's still a lot going on!
Docs Updates
This month we are introducing docs updates so that you will always be aware of what has changed as our open source projects mature.
Our docs team made up of Jorge Orpinel, Emre ลahin, Casper da Costa-Luis, and David de la Iglesia-Castro, has been hard at work updating our docs to make sure you have what you need to be successful using our tools! Updates include:
- Complete DVCLive docs
- We have a new Glossary page and a first Basic Concepts page (DVC Workspace)
- CML Docs migration to CML.Dev
- Added Videos to Get Started: Metrics and Experiments pages and Checkpoints Guide
- Authentication examples for Azure Blob remote storage from Community member @meierale โค๏ธ
Batuhan Taskaya's Refactor Project hits First Page in HackerNews!
A Refactor Project created by team Member Batuhan Taskaya (AKA @isidentical), was shared by someone on HackerNews and made it to the main page! You can catch all the comments here!
Explanation of the project:
refactor is an end-to-end refactoring framework that is built on top of the 'simple but effective refactorings' assumption. It is much easier to write a simple script with it rather than trying to figure out what sort of a regex you need in order to replace a pattern (if it is even matchable with regexes).
Every refactoring rule offers a single entrypoint, match(), where they accept an AST node (from the ast module in the standard library) and respond with either returning an action to refactor or nothing. If the rule succeeds on the input, then the returned action will build a replacement node and refactor will simply replace the code segment that belong to the input with the new version.
Way to go Batuhan! ๐
July Office Hour Meetup
If you missed our July Office Hours, good news! It's now available on our YouTube Channel and you can see Joรฃo Santiago shares about {dvthis}, and how his team at Billie.io uses DVC to productionize rstats.
Also in the Meetup is a DVC Studio demo by Tapa Dipti Situala, Senior Product Engineer for Studio. You can catch the presentations along with great questions and discussion from the Community!
Next Meetup
So remember when I told you last month about DVC + Streamlit = โค๏ธ ? Well at our August Office Hours Meetup, Antoine Toubhans of Sicara will be presenting his tutorial on how to do just that! Join us in the integrating fun on August 19th at 3:00 pm UTC! RSVP at this link below! ๐๐ผ
DVC Office Hours - DVC and Streamlit Integration
Learning Opportunities
This week's DVC Learn Meetup (August 18th) will be the last in our series of DVC Learn Meetups designed to get teams up and running with DVC. We will digest our learnings from this first cohort and revamp for the next set of three classes that will begin in September. Subscribe to our Meetup group and and follow us in Twitter and LinkedIn to stay in the know about all of our upcoming events!
If you are interested in weighing in on what kinds of educational content you would like to see from us, we'd be grateful if you'd fill out this survey to help us plan! ๐๐ผ
Help us plan our Online Course! ๐๐ผ Source link)
Open Positions
Looking for a great opportunity at an amazing company? Check out our open postions at this link to find details of all the positions including:
- Senior Front-End Engineer (TypeScript, Node, React)
- Senior Software Engineer (ML, Dev Tools, Python)
- Senior Software Engineer (ML, Data Infra, GoLang)
- Machine Learning Engineer/Field Data Scientist
- Developer Advocate (ML)
- Director/VP of Engineering (ML, DevTools)
- Director/VP of Product (ML, Data Infra, SaaS)
- Director/VP of Operations/Chief of Staff
Please pass this info on to anyone you know that may fit the bill. We look forward to new team members! ๐
Do you have any use case questions or need support? Join us in Discord!
Head to the DVC Forum to discuss your ideas and best practices.