MLEM + Modal + nanoGPT

If you hadn't heard all the recent fuss about the ChatGPT model from open.ai, you must have been living under a rock. And you might even have seen this video from Andrej Karpathy on how those GPT models work. In this post, I will show how easy it is to train your own GPT model and also share it with your friends via a nice Streamlit app in the cloud (see this one as an example!). All you need is an idea of what you want to generate and a couple of bucks for renting a GPU if you don’t have access to your own.

Mike Sveshnikov
February 08, 2023 • 2 min read

Writing dogs with nanoGPT

Preparing data

To kick off the process, you basically just need a single text file that you want your model to be trained on. For example, I often struggle with writing docs for MLEM framework, so I will try to generate those. Here you can find my code that clones mlem.ai repo, compiles every .md from the docs directory into a single text file and then creates a train set using the same code as an example Shakespeare dataset. I also prepended each file’s content with the path to this file, so I can condition the generation for a specific file.

Of course, for your own experiments, you can provide different data and train GPT model for a different task.

Training the model

Thanks to Andrej’s original repo, it’s as easy as cloning and running a couple of commands. My fork has some additional stuff to make it even easier.

$ git clone https://github.com/mike0sv/nanoGPT && cd nanoGPT/ && git checkout -b mlem origin/mlem
$ pip install -r requirements-mlem.txt

# Prepare mlem docs dataset
# Alternatively, you can compile your own training data for different task
$ python data/mlem-docs/prepare.py char

If you don’t have access to GPU, you can use modal.com to train your model without any infrastructure configuration. Just register there, wait for approval, and run this script to run the training and download the resulting model checkpoint.

$ modal token new  # approve in browser
$ python modal_train.py  # you can edit paths or other parameters

Or if you are already working on a machine with GPU, just run the training locally

# train model
$ python train.py config/train_mlemai.py --device cuda --dtype=float32 --max_iters=3000 --init_from=scratch

After training you model will be saved at out-mlemai-char/ckpt.pt and you can sample it with

# sample model
$ python sample.py --out_dir=out-mlemai-char --dtype=float32

Deploying your model

Now, to show off your model to friends and colleagues, we will deploy it as a Streamlit application to https://fly.io. It’s very easy with MLEM Streamlit extension. First, we need to save the model as MLEM model - here is the script for that

$ python wrapper.py out-mlemai-char mlem_char

Now, setup and login into fly.io and run mlem deploy command. I also prepared a custom Streamlit application template you can use to give it more ChatGPT feel


# setup flyio
$ flyctl auth login

$ mlem deploy run flyio app -m mlem_char \
	--app_name mlem-nanogpt --scale_memory 1024 \
	--server streamlit  --server.ui_port 8080 \
	--server.server_port 8081 --server.template app.py

After the command finishes, just go to https://<app_name>.fly.dev - in my case its https://mlem-nanogpt.fly.dev/ - and start chatting.

Well, I guess if this is what generated docs look like, I still have a job! 🤣

But just for lulz, I re-generated the whole MLEM documentation with this model - you can check it out here.

Conclusion

Nowadays it’s really easy to recreate someone else’s work thanks to open source software. And thanks to folks like Andrej and companies like Modal and Fly now it becomes much faster to build and deploy ML models. We are happy to be part of this, with tools like MLEM, DVC, CML and others. Long live the open source!