This is the second of a 3 part series of articles about transfer learning. In this series, we will detail what transfer learning is and how you can explore it. You can check the other articles of this series in the links below once they are available.
In the first post of the series, I wrote about how transfer learning can help you to achieve better performance by importing and using a pre-trained model. In this post, I will go further into the details of where you can find those pre-trained models and how you can use them effectively.
Before talking about the models, the first step is to get your dataset. In most situations, the data comes from your business, but if you want to do a personal project to learn the technology, there is a lot of data available online. One example of those public datasets is the Kaggle dataset for Dogs vs. Cats, which has 25,000 training images. Each image contains either a dog or a cat, and the objective is to train a model to correctly recognize each one. Below are some image examples:
After finding your dataset, one important step – sometimes overlooked – is to create a benchmark model, a simple model that you create and train on our own without using transfer learning. The objective of this model is to become a baseline, so that you can assess the relative performance of your final model using transfer learning (for example, a model using transfer learning that achieves an accuracy of 80% is good if your baseline has an accuracy of 60%, but not so good if your baseline has an accuracy of 78%). There are a lot of software and programming languages that you can use to train this model (and for transfer learning in the next steps), but the most common choice is to use Python with either the Tensorflow or PyTorch deep learning frameworks.
Now that you have a baseline for your expected performance, it is time to use transfer learning to increase that performance. There are lots of pre-trained models available, so you will have to decide which model to use. If you are using PyTorch, you can find the available models here, and on this same page, you can find more details about them. If you are using Tensorflow instead, you should import those models by using Keras (a deep learning library that is commonly used with Tensorflow) in the applications page.
The question that now arises is how to select the most suited models for your problem. The most common metric is the pre-trained model performance (in terms of accuracy or errors) on some benchmark dataset (like ImageNet), because better performances in those datasets tend to give a better performance in a new problem. There is also the number of parameters of the model (more parameters increase the training and prediction time) and the model size, that can become a problem depending on the capacity of the environment where you are deploying it. It is also a good practice to import more than one model, because sometimes you have to try some models to find the best one for your data. To exemplify this process, the table below shows a comparison of the models available in Keras:
If we wanted to choose 3 models, with model size not being a problem, from this case we would select NasNetLarge, InceptionResNetV2, and Xception. However, if we did have a restriction on model size, we would select NasNetMobile, MobileNetV2, and DenseNet121.
After selecting the models, we should look into the details of each one to find what their inputs are. Using the three models selected in the last example, we can see that NasNetLarge has an input size of 331×331, whereas the Xception and InceptionResNetV2 models use an input size of 299×299. After finding those sizes, we should process our images to that size (in other words, rescale them so that the images sizes are the same as the input size). Both Tensorflow and PyTorch have built-in tools to do it, so this step can be easily done independently of the framework you choose.
The next step after rescaling the images is to import the pre-trained models and train them to classify our images. One step that should be always done is to remove the last layer of the imported models and add a fully connected layer with a number of neurons equal to the number of the classes in your problem (models trained using the ImageNet have a final layer with 1,000 neurons, because that is the number of classes in that dataset, but if you are using the Dogs vs. Cats dataset, you should use only two neurons).
Finally, another important decision that you should make is how many layers of the imported model you will train and how many you will freeze (keeping them as they are). This depends mostly on the size of your dataset, as explained in the first post of this series. After changing the last layer, freezing some layers and training the others, you should now apply your model to your test set and compare the results with your baseline model. For more detailed instructions on how to import and train those models, go to Tensorflow and PyTorch.
With those instructions, you should be able to obtain good performance on your data using transfer learning. But if you still want to do better, there are two more techniques that can help you with your data. I’ll discuss this and more in my next post of the series.
About the author
Luiz Nonenmacher is a Data Scientist at Poatek. He has a Master’s Degree in Production Engineering in the area of Machine Learning and Quantitative Methods. On his spare time, he likes to read about a lot of different subjects, including (but not limited to) classical philosophy (especially Stoicism), science, history, fantasy, and science fiction.