Recently, I had the opportunity to participate in the 14º AWS Porto Alegre MeetUp, where I explained how SageMaker works and how it can help Machine Learning projects. Here you can find some of the topics I addressed:
The Amazon Web Services (AWS) is a subsidiary of Amazon that provides on-demand cloud computing platforms. Inside AWS, there are 165 different services that can be used for a wide range of objectives, such as computing, storage, database, and deployment. Eighteen of those 165 services are focused on Machine Learning and can be used to tackle different tasks, such as NLP (Comprehend, Lex, Polly), Image Processing (Rekognition, Textract), Temporal Series (Forecast), and Recommender Systems (Personalize).
Most of those Machine Learning services can be used directly as applications, with no need to create and train an ML model. For example, if you want to use Amazon Rekognition to recognize if a person belongs to your organization, you only need to send images of all people in your organization to Rekognition, and all the training of the model will be done by AWS. However, there are situations in which it is necessary to create custom models to solve a specific problem and, because of this, AWS has created the SageMaker.
In the meetup, we talked about Machine Learning, cloud migration, and extraction of data using AWS.
SageMaker is a machine learning platform launched in November 2017 that enables developers to create, train and deploy ML models in the cloud (it can also be used to deploy those models on embedded systems and edge-devices). Its main advantage is that you can operate it on a higher level of abstraction, so you can quickly train and deploy your model (and use the necessary amount of hardware for your problem).
As illustrated by the picture below, SageMaker consists of four main parts: notebook instances, jobs, models, and endpoints. Notebook instances are used for data processing; models can be imported from SageMaker or other sources; jobs can be created to train those models; and endpoints are the result of the deployment of a model.
Main components of SageMaker. You can check the original file here.
We can conceptually illustrate how to use SageMaker in 5 simple steps:
You can do this by creating a notebook instance and importing your data inside it (either from a S3 bucket or directly from the web) and processing the columns, so that the final format is ready to feed the ML model.
After processing the data, you should save it inside a S3 bucket that will be read by your model in the next steps.
In this step, you can create your custom model using any library you want (such as scikit-learn, TensorFlow, PyTorch) or you can import a built-in algorithm from SageMaker. SageMaker has a lot of models available for different tasks (you can find some models here) and it is recommended to use those built-in algorithms because they are optimized for the AWS hardware used for training in the next step.
After creating your ML model, you should create a training job that will fit the model to your data. After the training job is finished, you can also create a tuning job that will search for the hyperparameters that give the best results for your data.
Finally, after you are done with your model, you can deploy it to the cloud by using only one line of code, and this will create an endpoint that you can use for inference with new data.
One important aspect of SageMaker is that you can only use the parts of the pipeline that you need for your problem. For example, you can create and train your model locally and use SageMaker only for the deployment or the other way around (use AWS to build and train your model but download it, so you can deploy it locally). Another important feature is that you can use different machines for each step of the process: one for the notebook instances used in the steps 1 to 3, another one for the training in step 4 and a third one for the deployment step.
If you want to know more and want to get your hands dirty with SageMaker, you can follow this Amazon tutorial, that can guide you through all this process.
About the author
Luiz Nonenmacher is a Data Scientist at Poatek. He has a Master’s Degree in Production Engineering in the area of Machine Learning and Quantitative Methods. On his spare time, he likes to read about a lot of different subjects, including (but not limited to) classical philosophy (especially Stoicism), science, history, fantasy, and science fiction.