An important tool for people analyzing data with Python, Matplotlib is the foundational Python library for data visualization, since it is easy to use and has grounded many more data viz libraries. It sure is a great option for data science enthusiasts.
As we have seen in the last blogpost, a picture is worth thousands of words. And working with data requires good visualizations for a ton of things – generate business value, insights, statistical inferences and many more. Nowadays, there are plenty of tools that are used to perform this task, and Matplotlib is one of the most simple and widely-used python libraries for it.
This huge and powerful library was first developed by John D. Hunter to provide python users a tool with similar approach to MATLAB’s visualization features, and today it became a well-documented, open source library that is maintained by a host of many developers.
An important module within Matplotlib is Pylab. This is a module that was built to mimic MATLAB’s naming style into NumPy and Matplotlib regular objects.
How can we plot beautiful charts and graphs with it?
You can definetely do it with one-liners sometimes, but for some other case you’ll have to use the remaining 99% of the library (which can be a lot more powerful), so let’s first understand how it works.
Matplotlib’s objects are plenty, but the leading ones are Figure and Axes.
Figure: think of it as a box that can contain multiple Axes.
Axes: these are the real plots we think of (Axes itself contains x and y-axis).
Axis: the x and y-axis we are used to.
Artist: basically everything visible on a Figure, so that it can be customized.
An example to illustrate it was taken from Real Python’s blog.
Every object named above is customizable, leaving endless possibilities for the user. The official Matplotlib’s documentation shows the multiple components of a Figure, that can be quite easily changed with one line of code. See the image below:
Remember I said that one Figure can have multiple Axes? This is a very versatile feature to show different metrics within the same “image”, and it is called subplots. They are really easy to create, and code examples are coming next. Check out the example below, which contains 3 Axes in the same Figure:
To create this example, I took public data on criminality within the city of Porto Alegre, Brazil. If you want to checkout this example’s 154-lined code, you can find it here.
As can be seen in the plot below, a plot can be highly customizable. You can even plot different kinds of plots in the same figure – scatter and simple line plots in this example. You can also see that the title and subtitle – as well as the footnotes – can be made with matplotlib’s text commands.
For styling reasons, I removed the right and top spines, as well as edited the x and y-ticks and labels. If you want to know more about special tips on data visualization, you can checkout my first blogpost here.
Back to our example, since I needed to highlight the Carnaval 🎉 national holiday in the plot, I used fill_between method. This is useful for – guess what – filling specific parts of a plot for highlighting. Another cool feature to use is the annotate, that can take the arrowprops argument and hence create arrows to point things out in your plot.
Even though this example might not be the easiest to start with, it sure demonstrates the power of Matplotlib’s cutomizibility. And now that you know the basic concepts matplotlib’s is built on, and also some cool tips on customizing your plot, I encourage you to start a new data analysis on whatever data you’d like. Nevertheless, by now you should be thinking…
Wooow, that’s a whole another topic to study. And there is actually a ton of tools that can perform this job, even within Python’s universe. As for example: Plotly Dash, which I will cover in a next blogpost.
I hope you enjoyed reading a bit more about datavis and also hope to see you in the next blog post.