A data scientist is someone who conducts research on data, asking questions, exploring data, generating hypotheses, and building models that provide insight. Data scientists are the sexy heroes of the growing big data world. Data scientists go beyond statisticians because many times they need to find the right data first, which makes data science a difficult to define field because it combines multiple disciplines.
Data scientists typically have advanced degrees, for example a MS in computer science and a PhD in statistics or applied mathematics. According to recruitment firm Burtch Works, 88% of data science candidates have at least a Master’s and 46% have PhDs with the most common fields of study being mathematics and statistics (32%), computer science (19%) and engineering (16%).
A good data scientist is beyond valuable to his employer because his work can discover new business opportunities or find new revenue sources in existing lines of business. Because of this, data scientists command top dollar with salaries ranging from $90K to $300K depending on industry and experience.
There’s no set list of skills required for data science, but there are some tools and technologies that are most helpful for data scientists.
- SAS and/or R are required skills in the data science world. I’m seeing a preference for R emerging, which is good because it is free and there are a ton of tutorials available online.
- Python is the most common programming language required in data science, with Java, Perl, and C/C++ also being widely utilized
- Hadoop has become the infrastructure of choice for static big data analysis. Hive and Pig are nice skills to have also.
- Spark and Storm are becoming the infrastructure of choice for streaming big data analysis.
- SQL and relational database technologies are still common, especially in established businesses. The ability to write complex SQL queries is critical.
- Analysis of unstructured data is getting a lot of attention these days. Analysis of social media, video, and audio feeds is rapidly becoming a critical skill for the data scientist. There’s a lot to this realm, but good places to start are MongoDB and Cassandra.
- A general understanding APIs and how to work with them to access data sources is also growing in importance.
- Above all else, an insatiable intellectual curiosity is required of data scientists. You’ve got to be the kind of person who wants to unlock the secrets that data holds. If you’ve never stayed up all night looking for something which most people wouldn’t even pay attention to, then you don’t want to be a data scientist.
- Data science also requires a keen understanding of specific vertical industries. You have to understand the business in order to provide insight into it. In particular, you’ll have to understand which problems are important for the business to solve and to identify new ways the business can leverage its data.
- As with most professional positions, data scientists absolutely must have strong communication skills. The ability to work with engineers and business people is critical. You’ll have to be able to explain your findings to non-technical sales and marketing colleagues.
In essence, data scientists leverage data to solve business problems. A large part of what they do is build predictive models that help businesses make intelligent decisions about their future. They draw on a variety of tools and technologies to accomplish this, yet the most important tool in the data scientist’s arsenal is a burning desire to ask questions and seek out answers.
Matthew David Sarrel has been practicing and writing about network and information security for over 20 years. He is Executive Director of Sarrel Group, an editorial services/content marketing, product test lab, and information technology consulting company. He is a Contributing Editor for PCMag.com, Triple-G Editor for Backayard Magazine, and contributor to Infoworld, Programmable Web, and numerous other sites and publications. Previously, he was a technical director for PC Magazine Labs. Prior to joining PC Magazine, he served as VP of Engineering and IT Manager at two Internet startups. Earlier, he spent almost 10 years providing IT solutions in HIV-and-TB-related medical research settings at the New Jersey Medical School. Mr. Sarrel has a BA (History) from Cornell University, an MPH (Epidemiology) from Columbia University, and is also a Certified Information Systems Security Professional (CISSP). Mr. Sarrel has written for and spoken to numerous international audiences about information technology and information security. He participated as an expert in two Federal Trade Commission workshops, one about spam in 2003 and one about spyware in 2004. Follow Matt on Twitter and on Instagram.