This is the third of a 3 part series of articles about transfer learning. In this series, we will detail what transfer learning is and how you can explore it. You can check the other articles of this series in the links below once they are available.
In the last post of the series, I talked about how to create and train a model using transfer learning. Now, in this post, I will talk about two techniques that can be used to improve your model results.
When you are using transfer learning to import a pre-trained model and adapt it to your problem, one very important step is to define how many layers you will freeze (not train). In general, the layers that you freeze are at the beginning of the network and, therefore, you only train the last layers.
In a very simplified way, when you are training a convolutional model (deep learning models used for images), the first step is to take an image (or a batch of images) and pass it through the entire network, applying different filters to obtain your output (for example, if the image is a cat or a dog). The next step is to compare this output to the real output and use this information to go back to the network and change the weights of every layer – in other words, training the model. Then, you repeat this process for several epochs, and in each one, you will pass all the images and run this process.
The problem with this is that passing an image through the entire network (especially for bigger models) is time-consuming and we have to do it several times for each image, because each time an image passes through the networks, the layers are different (because they have been trained). The only exception occurs when you have layers on your model that don’t change over time. And this is exactly what we do when we use transfer learning and decide to freeze some layers.
In that situation, we can take all the images, pass them through all the “frozen” layers and save those results (those results are called bottleneck features). Then, we train the rest of the model with those bottleneck features as inputs (in some sense, we are using those first layers as a processing step applied to the input images before they are passed to the model). The usage of bottleneck features reduces the time to train the model, so we can train it for more epochs and obtain better results in the end with the same computational power.
One of the main reasons to use transfer learning is having little data to train an entire model from scratch. The problem with using a small number of images (even with transfer learning) is that we have to train the model for a small number of epochs to avoid overfitting – a problem where our model “memorizes” the training data instead of learning its structure, resulting in a bad accuracy when classifying new data. However, without training for a reasonable number of epochs, we tend to achieve bad results, which limits the usefulness of the model.
The most common way to solve this tradeoff is to collect more data, but in several situations, we cannot do it (or it is very expensive to do so). So, what do we do? We use data augmentation. Data augmentation is a technique to create more “fake” data based on the “real” data that we have. This process is easier for images, because we can create new images by taking the original ones and changing its color (or converting it to grayscale), enhancing its edges, rotating it, flipping it, cropping it, moving it in some direction or several other techniques. The image below illustrates it, using an example of an image and several images that can be generated from it.
With this post, we are finished with this 3 part series on transfer learning. In the first post of the series, we talked about what transfer learning is, in the second one I showed you how to start with it, and in this one, I talked about two techniques that can be used to improve your results. There are several techniques and details that can be discussed about transfer learning, but with these three posts, you have enough information to start creating your models using transfer learning and solving real-world problems!
About the author
Luiz Nonenmacher is a Data Scientist at Poatek. He has a Master’s Degree in Production Engineering in the area of Machine Learning and Quantitative Methods. On his spare time, he likes to read about a lot of different subjects, including (but not limited to) classical philosophy (especially Stoicism), science, history, fantasy, and science fiction.