We have heard all about how difficult it is to manage big data. We have heard of parallel computing, which means Hadoop and Spark.
The lesser known aspect of the job
The thing that is lesser known is Aggregation and Labelling aspects of a Data Scientist’s job. Surprisingly, this is one of the most important things for companies because you are trying to tell the company what to do with your product. This means Analytics that tells you using the data, what kind of insights can you give me, for example what is happening to my users. Metrics is important as it tells you what is happening with your product. These metrics will tell you if you are successful or not. Also, A/B testing and experimentation allows you to know which product versions are the best. These things are really important, but they are not so well covered in the media. What is covered in the media is Artificial Intelligence and Deep Learning. We have heard about it on and on about it. But when you think about it, for a company and for the industry, it is actually not the highest priority. Or at least it is not the thing that yields the most results for the least amount of effort.
What does a Data Scientist really do?
This depends on the size of the company. In a startup, you lack resources. So, they will probably have only one Data Scientist. That one Data Scientist will be doing all the work that is to do with various data science roles. He may not be doing Artificial Intelligence and Deep Learning because that may not be the priority right now. He will have to set up the whole data structure. He may even have to write some software code to add logging and then have to do the Analytics by himself. Then he will have to build the metrics himself. He even has to undertake the A/B testing on his own.
For a medium size company, they have a lot more resources. They can separate the data engineers and the data scientists. So collection will be handled by Software Engineering, Moving/Storing and Exploring/Transforming jobs will probably be handled by Data Engineers. A Data Scientist will take up the rest of the work. A Data Scientist role can get very technical and that is why companies, mostly hire PhDs or Master degree holders for this role because they want you to be able to do the more complicated things.
Let us take the case of a large company now. They tend to have a lot more money and can spend on a lot more employees. So, you can have a lot more employees work on different areas. That way, the employee does not need to think about the stuff they do not want to do. They can focus on the things they are best at.
So, Data Science is all of this and what you do depends on the company you work for.