The rise of Data Science in the past years has left many people wondering what it is and why everyone is talking about it. Job-seeking websites are filled with never-before-heard-of vacancies, some of which not even the companies know how to define appropriately. In this first article, we will give an overall view of a data scientist’s roles, what he or she does, and how.
The “Data Scientist” title is seen almost as a myth. That is no surprise: it is a relatively new job arisen from new technologies and situations still under development. The need for data scientists has increased much faster than universities and schools were able to form specialized professionals, and maybe this is one of the reasons why data scientists come from so many different backgrounds. According to the job-seeking platform Indeed data scientists do come from multiple areas of knowledge, and their previous jobs also vary considerably. Compared to Software Engineers, Data Scientists and Data Analysts are around three times less likely to come from a Computer Science background. Another interesting data is that approximately 20% of the data scientists hold a Ph.D. degree (and 50% hold a master’s degree), which is a much higher percentage than other IT jobs.
Taking the Wikipedia definition for Data Science:
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from many structural and unstructured data.
That is a rather generalist definition. And not without reason. The fact is that all those tasks attributed to data scientists’ jobs could be handled by a multitude of professionals of the natural sciences. Data scientists usually hold skills based on three pillars: statistics, programming, and business domain. The professionals are commonly placed in critical positions to analyze business cases and develop solutions based on data.
Although being new and blooming professionals, data scientists use centuries-old tools that have stood the test of time. Even Artificial Intelligence and Machine Learning are not that new. Neither are most of our regression techniques (some date back to Legendre’s least square method, formulated in the 1800s). One thing, however, is very new: big data. The internet and social media create lots of data, very fast, and in all formats you can imagine, name it. That is what we call Big Data.
Nowadays we have lots of data, and we know it says a lot about what people want, think, love and hate. But there is so much of it that we cannot even analyze it by ourselves. So, what do we do when we got a very intellectually and time-consuming job that is too heavy for humans? Well, we train computers to do it. That is one of the primary jobs of a Data Scientist: make things make sense. Find patterns. Find out what a consumer wants and what he hates. Solve problems.
But the possible applications for data science are not limited to modeling consumer behavior. Assessing the risk of diseases or modeling how an epidemic evolves is data science. Extracting data from scanned documents to improve processes in a bank or law firm is also data science. Credit decisions can be developed as a data science problem. The fact is: if you have enough data to describe your business problem, a data scientist can use it to create several optimizations. If you do not have the data, you can still use data scientists to develop a data model for your business.
So far, we presented a lot of definitions for a data scientist and a variety of scenarios in which one could use data science to provide state-of-the-art business solutions. But if there are so many different areas of knowledge, problems, and applications, how can one professional acquire this wide range of skills? This is a very recent discussion, and we are concluding that: he cannot. Or at least it makes sense to have more “specialized data scientists.” Data engineers, machine learning engineers, machine learning scientists, data analysts, are some of the roles being announced nowadays. All of them could be reported as a data scientist role, but both business and professionals are understanding the value of a specialized set of skills. Although these roles overlap somehow, they demand a significant level of specialized skills that don’t overlap. Data science projects are also getting more complex and thus, requiring these types of specialists.
We hope to have enlightened you a little regarding data science and what a data scientist does with this first article. In the following blogposts, we will get into more details on how those data science specializations are defined and what each one does.
Featured Image from Freepik.