Using Experiments to Improve Pre-trained Models
Intro
There are plenty of machine learning models available that have been trained to solve one problem and the knowledge gained from that can be applied to a new, yet related problem. For example, a model like AlexNet has been trained on millions of images so you could potentially use this to classify cars, animals, or even people. This is called transfer learning and it can save a lot of time on developing a model from scratch.
For us to take advantage of transfer learning, we can use fine-tuning to adopt the model to our new problem. In many cases, we start by replacing the last layer of the model. With the AlexNet example, this might mean the last layer was previously used to classify cars but our new problem is classifying animals.
Even though we already have the bulk of the model defined, we'll still have to do some experimentation to determine whether we need to replace more layers in the model or if any other changes need to be made.
In this post, we'll go through an example of fine-tuning AlexNet and SqueezeNet to classify bees and ants. We'll use DVC to handle experiments for us and we'll compare the results of both models at the end.
Initialize the pre-trained model
We'll be fine-tuning the AlexNet model and the SqueezeNet model to classify images of bees and ants. You can find the project we're working with in this repo, which is based on the tutorial over at this post.
In the pretrained_model_tuner.py
file, you'll find the code that defines both
the AlexNet and SqueezeNet models. We start by initializing these models so we
can get the number of model features and the input size we need for fine-tuning.
Since the project has everything we need to initialize the models, we can start training and comparing the differences between them with the ants/bees dataset. Running experiments to get the best tuning for each model can make it difficult to see which changes led to a better result. That's why we will be using DVC to track changes in the code and the data.
Adding the train stage
Stages in DVC let us define individual data processes and can be used to build detailed machine learning pipelines. You have the ability to define the different steps of model creation like preprocessing, featurization, and training.
We currently have a train
stage in the dvc.yaml
file. If you take a look at
it, you'll see something like:
stages:
train:
cmd: python pretrained_model_tuner.py
deps:
- data/hymenoptera_data
- pretrained_model_tuner.py
params:
- lr
- momentum
- model_name
- num_classes
- batch_size
- num_epochs
outs:
- model.pt:
checkpoint: true
live:
results:
summary: true
html: true
The reason we need this dvc.yaml
file is so DVC knows what to pay attention to
in our workflow. It will start managing data, understand which metrics to pay
attention to, and what the expected output for each step is.
You'll typically add stages to dvc.yaml
using the dvc stage add
command and
this is one of the ways you can add new stages or update existing ones.
With the train
stage defined, let's look at where the metrics actually come
from in the code. If you open pretrained_model_tuner
, you'll see a line where
we dump the accuracy and loss for the training epochs into the results.json
file. We're also saving the model on the epoch run and recording metrics for
each epoch using dvclive
logging.
if phase == 'train':
torch.save(model.state_dict(), "model.pt")
dvclive.log('acc', epoch_acc.item())
dvclive.log('loss', epoch_loss)
dvclive.log('training_time', epoch_time_elapsed)
if phase == 'val':
dvclive.log('val_acc', epoch_acc.item())
dvclive.log('val_loss', epoch_loss)
val_acc_history.append(epoch_acc)
dvclive.next_step()
This code is needed to let DVC access the metrics in the project because it will
read the metrics from the dvclive.json
file.
Since we have several hyperparameters set in the params.yaml
, we need to use
those values when we run the training stage. The following code makes the
hyperparameter values accessible in the train
function.
with open("params.yaml") as f:
yaml=YAML(typ='safe')
params = yaml.load(f)
With all of this in place, we can finally start running experiments to fine-tune the two models.
Fine-tuning AlexNet
You can find the code that initializes the AlexNet model in the
initialize_model
function in pretrained_model_tuner.py
. Since we have DVC
set up, we can jump straight into fine-tuning this model to see which
hyperparameters give us the best accuracy.
We'll run the first experiment with the following command.
$ dvc exp run
This will execute the pretrained_model_tuner.py
script and run for 5 epochs
since that's what we defined in params.yaml
. When this finishes, you can check
out the metrics from this run with the current hyperparameter values.
$ dvc exp show
You'll see a table similar to this.
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Experiment Created step acc loss training_time val_acc val_loss lr momentum model_name num_classes batch_size num_epochs
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
- workspace 4 0.92623 0.19567 229.18 0.9085 0.25145 0.001 0.09 alexnet 2 8
5 main - - - - - - 01:58 PM 0.001 0.09 alexnet 2 8
│ ╓ bf81637 [exp-a1f53] 02:05 PM 4 0.92623 0.19567 229.18 0.9085 0.25145 0.001 0.09 alexnet 2 8 5
│ ╟ 9ca3fb8 02:04 PM 3 0.89344 0.27423 178.34 0.90196 0.26965 0.001 0.09 alexnet 2 8 5
│ ╟ a34ead1 02:03 PM 2 0.87295 0.29018 127.36 0.9085 0.2796 0.001 0.09 alexnet 2 8 5
│ ╟ ae382c7 02:02 PM 1 0.89754 0.26993 76.419 0.89542 0.31113 0.001 0.09 alexnet 2 8 5
├─╨ a95260d 02:01 PM 0 0.73361 0.5271 25.71 0.86928 0.36408 0.001 0.09 alexnet 2 8 5
5 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Now let's update the hyperparameters and run another experiment. There are several ways to do this with DVC:
- Change the hyperparameter values directly in
params.yaml
- Update the values using the
--set-param
or the shorthand-S
option ondvc exp run
- Queue multiple experiments with different values using the
--queue
option ondvc exp run
We'll do an example of each of these throughout the rest of this article.
Let's start by updating the hyperparameter values in params.yaml
. You should
have these values in your file.
lr: 0.009
momentum: 0.017
Now run another experiment with dvc exp run
. To make the table more readable,
we're going to specify the parameters we want to show and take a look at the
metrics with:
$ dvc exp show --no-timestamp --include-params lr,momentum,model_name
Your table should look something like this now. Since we're using checkpoints, note that we continue training additional epochs on top of your previous experiment. You'll see what it takes to start training from scratch later.
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Experiment step acc loss training_time val_acc val_loss lr momentum model_name
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
workspace 9 0.91803 0.27989 228.59 0.82353 0.69077 0.009 0.017
alexnet main - - - - - - 0.001 0.09
│ ╓ 2361cff [exp-c0b11] 9 0.91803 0.27989 228.59 0.82353 0.69077 0.009 0.017 alexnet
│ ╟ 7686d2f 8 0.90984 0.23496 177.65 0.87582 0.50887 0.009 0.017 alexnet
│ ╟ 671f8cd 7 0.88934 0.39237 126.7 0.86928 0.47856 0.009 0.017 alexnet
│ ╟ ea1bf61 6 0.84836 0.4195 75.834 0.91503 0.30885 0.009 0.017 alexnet
│ ╟ a9f8dab (bf81637) 5 0.79508 0.72891 25.219 0.66667 1.0311 0.009 0.017 alexnet
│ ╓ bf81637 [exp-a1f53] 4 0.92623 0.19567 229.18 0.9085 0.25145 0.001 0.09 alexnet
│ ╟ 9ca3fb8 3 0.89344 0.27423 178.34 0.90196 0.26965 0.001 0.09 alexnet
│ ╟ a34ead1 2 0.87295 0.29018 127.36 0.9085 0.2796 0.001 0.09 alexnet
│ ╟ ae382c7 1 0.89754 0.26993 76.419 0.89542 0.31113 0.001 0.09 alexnet
├─╨ a95260d 0 0.73361 0.5271 25.71 0.86928 0.36408 0.001 0.09 alexnet
alexnet ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Finding good values for hyperparameters can take a few iterations, even when
you're working with a pretrained model. So we'll run one more experiment to
fine-tune this AlexNet model. This time we'll do it using the -S
option.
$ dvc exp run -S lr=0.025 -S momentum=0.5 -S num_epochs=2
The updated table will have values similar to this.
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Experiment step acc loss training_time val_acc val_loss lr momentum model_name
────────────────────────────────────────────────────────────────────────── ────────────────────────────────────────────────
workspace 11 0.88525 1.1355 76.799 0.9085 1.7642 0.025 0.5
alexnet main - - - - - - 0.001 0.09
│ ╓ 54e87bc [exp-52406] 11 0.88525 1.1355 76.799 0.9085 1.7642 0.025 0.5 alexnet
│ ╟ b2b9ad0 (2361cff) 10 0.79098 2.9427 25.715 0.8366 1.4148 0.025 0.5 alexnet
│ ╓ 2361cff [exp-c0b11] 9 0.91803 0.27989 228.59 0.82353 0.69077 0.009 0.017 alexnet
│ ╟ 7686d2f 8 0.90984 0.23496 177.65 0.87582 0.50887 0.009 0.017 alexnet
│ ╟ 671f8cd 7 0.88934 0.39237 126.7 0.86928 0.47856 0.009 0.017 alexnet
│ ╟ ea1bf61 6 0.84836 0.4195 75.834 0.91503 0.30885 0.009 0.017 alexnet
│ ╟ a9f8dab (bf81637) 5 0.79508 0.72891 25.219 0.66667 1.0311 0.009 0.017 alexnet
│ ╓ bf81637 [exp-a1f53] 4 0.92623 0.19567 229.18 0.9085 0.25145 0.001 0.09 alexnet alexnet
If you take a look at the metrics and the corresponding hyperparameter values, you'll see which direction you should try next with your values. That's one way we can use DVC to fine-tune AlexNet for this particular dataset.
Fine-tuning SqueezeNet
We'll switch over to fine-tuning SqueezeNet now that you've seen how the process
works in DVC. You'll need to update the model_name
hyperparameter in
params.yaml
to squeezenet
if you're following along. The other
hyperparameter values can stay the same for now.
This is a good time to note that DVC is not only tracking the changes of your hyperparameters for each experiment, it also tracks any code changes and dataset changes as well.
Let's run one experiment with dvc exp run --reset
just to show the difference
in the metrics between the two models. Remember, since we're using checkpoints
it continues training on top of the previous experiment. That's why we're using
the --reset
option here so that we can start a fresh experiment for the new
model. You should see results similar to this in your table.
─────────────────────────────────── ──────────────────────────────────────────────────────────────────────────────────────
Experiment step acc loss training_time val_acc val_loss lr momentum model_name
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
workspace 1 0.85656 0.35667 83.414 0.87582 0.34273 0.025 0.5
squeezenet - - - - - - main 0.001 0.09
│ ╓ 87ccd2e [exp-95f0f] 1 0.85656 0.35667 83.414 0.87582 0.34273 0.025 0.5 squeezenet
├─╨ 7d2fafc 0 0.80328 0.50723 29.165 0.89542 0.3987 0.025 0.5 squeezenet
│ ╓ 54e87bc [exp-52406] 11 0.88525 1.1355 76.799 0.9085 1.7642 0.025 0.5 alexnet
│ ╟ b2b9ad0 (2361cff) 10 0.79098 2.9427 25.715 0.8366 1.4148 0.025 0.5 alexnet
│ ╓ 2361cff [exp-c0b11] 9 0.91803 0.27989 228.59 0.82353 0.69077 0.009 0.017 alexnet squeezenet
The newest experiment has an accuracy that's significantly different since we switched models. That tells us that the hyperparameter values that were good for AlexNet might not work the best for SqueezeNet.
So we'll need to run a few experiments to find the best hyperparameter values. This time, we'll take advantage of queues in DVC to set up the experiments and then run them at the same time. To set up a queue, we'll run this command.
$ dvc exp run --queue -S lr=0.0001 -S momentum=0.9 -S num_epochs=2
Running this sets up an experiment for future execution so we'll go ahead a run this command one more time with different values.
$ dvc exp run --queue -S lr=0.001 -S momentum=0.09 -S num_epochs=2
You can check out the details for the queues you have in place by looking at the
experiments table with dvc exp show
. You'll see something like this.
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Experiment step acc loss training_time val_acc val_loss lr momentum model_name
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
workspace 1 0.85656 0.35667 83.414 0.87582 0.34273 0.025 0.5
squeezenet - - - - - - main 0.001 0.09
│ ╓ 87ccd2e [exp-95f0f] 1 0.85656 0.35667 83.414 0.87582 0.34273 0.025 0.5 squeezenet
├─╨ 7d2fafc 0 0.80328 0.50723 29.165 0.89542 0.3987 0.025 0.5 squeezenet
│ ╓ 54e87bc [exp-52406] 11 0.88525 1.1355 76.799 0.9085 1.7642 0.025 0.5 alexnet
│ ╟ b2b9ad0 (2361cff) 10 0.79098 2.9427 25.715 0.8366 1.4148 0.025 0.5 alexnet
│ ╓ 2361cff [exp-c0b11] 9 0.91803 0.27989 228.59 0.82353 0.69077 0.009 0.017 alexnet
│ ╟ 7686d2f 8 0.90984 0.23496 177.65 0.87582 0.50887 0.009 0.017 alexnet
│ ╟ 671f8cd 7 0.88934 0.39237 126.7 0.86928 0.47856 0.009 0.017 alexnet
│ ╟ ea1bf61 6 0.84836 0.4195 75.834 0.91503 0.30885 0.009 0.017 alexnet
...
├── *2df7fa5 - - - - - - 0.0001 0.9 squeezenet
├── *699dcae - - - - - - 0.001 0.09 squeezenet
squeezenet ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Then you can execute all of the queues with this command.
$ dvc exp run --run-all
Now if you take a look at your table, you'll see the metrics from those 3 experiments.
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Experiment step acc loss training_time val_acc val_loss lr momentum model_name
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
workspace 5 0.76639 0.49865 85.705 0.81699 0.4518 0.001 0.09
squeezenet - - - - - - main 0.001 0.09
│ ╓ 699dcae [exp-8322f] 5 0.76639 0.49865 85.705 0.81699 0.4518 0.001 0.09 squeezenet
│ ╟ d26c25b (2df7fa5) 4 0.60246 0.68464 29.243 0.69935 0.55156 0.001 0.09 squeezenet
│ ╓ 2df7fa5 [exp-d1c65] 3 0.78689 0.488 83.929 0.83007 0.41527 0.0001 0.9 squeezenet
│ ╟ 05e1b41 (87ccd2e) 2 0.59016 0.76999 28.455 0.75163 0.49807 0.0001 0.9 squeezenet
│ ╓ 87ccd2e [exp-95f0f] 1 0.85656 0.35667 83.414 0.87582 0.34273 0.025 0.5 squeezenet
├─╨ 7d2fafc 0 0.80328 0.50723 29.165 0.89542 0.3987 0.025 0.5 squeezenet
│ ╓ 54e87bc [exp-52406] 11 0.88525 1.1355 76.799 0.9085 1.7642 0.025 0.5 alexnet
│ ╟ b2b9ad0 (2361cff) 10 0.79098 2.9427 25.715 0.8366 1.4148 0.025 0.5 alexnet
│ ╓ 2361cff [exp-c0b11] 9 0.91803 0.27989 228.59 0.82353 0.69077 0.009 0.017 alexnet
│ ╟ 7686d2f 8 0.90984 0.23496 177.65 0.87582 0.50887 0.009 0.017 alexnet squeezenet
Then you'll be able to make a decision on which way to go with your fine-tuning efforts and make a decision on which model works best for your project. In this case, it seems like SqueezeNet might be the winner!
You can take all of the DVC setup and apply this to your own custom fine-tuning use case.
Conclusion
When you're working with pretrained models, it can be hard to fine-tune them to give you the results you need. You might end up replacing the last layer of the model to fit your problem or you might need to dig deeper. Then you have to consider updating the hyperparameter values until you get the best model you can.
That's why it's important to research tools that make this process more efficient. Using DVC to help with this kind of experimentation will give you the ability to reproduce any experiment you run, making it easier to collaborate with others on a project. It will also help you keep track of what you've already tried in previous experiments.