January '23 Heartbeat
This month you will find:
🎥 MLEM tutorial video from Community member,
🥇 Top Python tools for 2022 from Tryolabs,
🎅🏼 Naughty or Nice MLEM project,
❣️ Unstructured Data Query Language coming,
🎥 Sami Jawhar's Running Parallel Pipelines with DVC & TPI Video,
🎥 Casper da Costa-Luis' MLOps Summit presentation video,
👀 DVC tutorial, and more!

Happy New Year! We are looking forward to what’s going to be a stellar year for us and for all of you! We are hoping for peace to reign, the recession to subside, and success aplenty. 🤞🏼 Are you ready? Let’s do this!
From the Community
We always start with DVC, but this month, in this new year, we’ll start with MLEM! We released MLEM in June of last year and have made some advances to it already. It seems the Community is learning about it and recognizing its benefits. We are thrilled to see that!
MLEM Tutorial Video from JCharis Jesse
JCharis Jesse created the FIRST video tutorial from the Community for MLEM! In this very well-explained and recorded video, Jesse takes you through what MLEM is and where it fits in the machine learning to production process. He follows that by showing the different options of saving a model, where to find the model metadata and how it works, loading the ML model, examples of serving with FastAPI and Docker, and finally applying the model to data for prediction. If you are interested in using MLEM for serving your models, this will definitely help get you started! You can find a ton of other great content on his YouTube site.
Tryolabs Top Python Libraries of 2022
From our friends at Tryolabs, Alan Descoins and Facundo Lezama round out 2022 with Tryolabs’ annual picks for the best Python Libraries of 2022. The requirements to make the cut are for libraries that were launched or gained popularity within the year. They have a list of top 10 picks that you will want to take a look at, including LineaPy which helps you convert notebooks to production pipelines. MLEM also made the list in the category of Tools & Enablers.
Bex Tuychiev - Data Version Control: Learn What Other Data Scientists Are Ignoring
Aryan Jadon - Survey of Data Versioning Tools for Machine Learning Operations
For a very nice comparison of Data Versioning Tools, look to Aryan Jadon’s recent post on the subject. He seems to hit them all, providing information about their benefits and things of which to be cautious. Naturally, DVC makes this list with the only caution being, “you need to use a Git repository to use DVC’s versioning features." Isn’t Git a part of every modern tech stack? 😉 Staying true to our mission to deliver the best developer experience for machine learning teams by creating an ecosystem of open, modular ML tools!
Sami Jawhar - Running Parallel Pipelines with DVC and TPI
If you couldn’t make the December Meetup, good news! The video is already out! Sami Jawhar joined us to share a solution he built to run parallel pipelines with DVC and TPI to save time processing the massive amount of data they use in their brain research at Kernel. He describes the context of his situation as well as all of its constraints and finally the details of the solution, coined “Neuromancer” after the famous sci-fi novel. Get ready for some mind-blowing engineering! 🤯
Company News
MLEM Christmas Project
In case you missed it while you were out for the holidays,
Alex Guschin and
Mike Sveshnikov, your friendly
neighborhood MLEM creators, put together
a fun project using MLEM
to determine if you had been naughty or nice just ahead of Santa’s trot around
the globe in 2022. In the blog post, you will learn how they DDOS’ed Santa’s
website, Trained a Christmas (decision) tree, and Deployed a ML service with
MLEM to Streamlit to see the predictions.
You can try it out here. And check out how some of our team members fared in this LinkedIn post. Spoiler alert: I’m naughty and nice?
Casper da Costa-Luis at MLOps Summit - Painless cloud experiments without leaving your IDE
Our CML Product Manager, Casper da Costa-Luis' presented in November at MLOps Summit on Painless cloud experiments without leaving your IDE. The presentation is now available on YouTube here. If Full lifecycle management of computing resources (including GPUs and auto-respawning spot instances) from several cloud vendors (AWS, Azure, GCP, K8s)… without needing to be a cloud expert appeals, this talk is for you! He discusses how to move experiments seamlessly between a local laptop, a powerful cloud machine, and your CI/CD of choice.
New Unstructured Data Query Language
Do you use Amazon S3, Azure Blob Storage, or Google Cloud Storage? We have a new solution for finding and managing your datasets of unstructured data like images, audio files, and PDFs! Extend your DVC environment with the first unstructured data query language (think SQL -> DQL) for machine learning. We are looking for beta customers for this new tool.
Schedule a meeting with us if that's what you're needing! Find more info here.
✍🏼 Doc Updates!
Tweet Love ❤️
Our favorite Tweet this month is from Osman Bayram who mentions he plans to use CML with Huggingface GPU. We are looking forward to that! 🍿 I'm seeing a lot of popcorn eating in our future. See you next month!
Have something great to say about our tools? We'd love to hear it! Head to this page to record or write a Testimonial! Join our Wall of Love ❤️
Do you have any use case questions or need support? Join us in Discord!
Head to the DVC Forum to discuss your ideas and best practices.