With great amounts of data comes a great need for data analysts. Organizations generate and collect an exponentially growing amount of data: wringing actionable answers and insights out of the chaos is a valuable and in-demand skill set to have.
Organizations across industries need these answers and insights to improve the decisions they make. B2B and B2C commerce, health care, manufacturing, and marketing all use data analytics to improve processes and enhance profits.
For example, "Medicine uses data analytics in clinical studies to predict the efficacy of medicines and survival rates," says Carl Howe, director of education at RStudio, a company that provides open-source and enterprise tools for use with the R programming language. "Factories are always looking to improve production yields — if you can improve yield by one to two percent, that can mean millions of dollars to a chip or drug manufacturer."
And while companies are working on automating data analytics, "around 80% of the job hasn't been automated, and the 20% that is being automated still isn't automated really well," says Matthew May, lead data scientist at URSA. "More importantly, complex problems take one or more people to work on - jobs doing data analytics aren't going away."
Note, data analytics can sometimes be confused with data science. Data science, says Howe, "is about 'can we model the world — and use these models to make predictions,' while data analytics is more about extracting insights from big datasets.
Now the question: What skills and experience do you need to succeed in a data analytics career? To help answer this, we collected advice from four reputable data analytics professionals:
Dr. Rosaria Silipo - Principal Data Scientist at KNIME, which provides the KNIME Analytics Platform tool for data science. She's also the author of six books, including Practicing Data Science: A Collection of Studies, the three-hour video course KNIME: A Data Science Approach to Analytics, plus dozens of other technical publications.
Matthew May - lead data scientist at URSA, Inc (for Unmanned Robotics Systems Analysis), an unmanned systems data analytics and visualization company that helps users unlock value from their telemetry data.
Carl Howe - a data scientist and director of education for R Studio, Inc., which provides open source and enterprise tools for use with the R programming language.
Dr. Kristen Sosulski - a professor of tech operations and statistics at NYU Stern and author of the book Data Visualization Made Simple: Insights Into Becoming Visual.
So, what do you need to succeed in a data analytics career?
1. The ability to tell a story out of numbers
"Doing data analytics makes use of two skills," Howe says: "One, statistics, and two, telling a story with those statistics in ordinary words."
"If you're going to be a data analyst, you must know how to use statistical techniques accurately. You have to like and be good at working with numbers. You have to be able to see data like a mystery or puzzle, and think, 'There's something in here that I want to discover.' Then you apply your math skills to find clues and eventually solve that mystery."
But that's only half the story. Jobs in data analytics focus not only on the numbers but also on how we communicate insight, Howe says. "You're turning data and statistics into a story that can influence others. That story probably has to be told in pictures, because that's the way we internalize information quickly."
Conveying the meaning of results in a way that can be quickly and easily grasped is essential, Sosulski agrees. "You have to be very descriptive in the work you do, be able to visually communicate what you've learned from your analysis — for example, creating charts and graphs, and solidly interpreting them, using the data as evidence."
In general, says May, "You have to be curious and inquisitive, and enjoy not knowing how to solve a problem, not knowing the answer, and working through that to get to a usable solution or usable actionable answer."
What kind of modeling have you done already?
How comfortable are you working with a dataset — specifically, gleaning insights using statistical models and techniques, and creating insights that are interpretable by people who aren't quantitative? To do this, you need to have a solid foundation in coding, modeling, analysis, and data presentation, including data visualization.
Your answers can help you decide what new topics to learn about. Beyond this, May says, "There are varying levels of technical ability. Statistics helps. So does having a behavioral analysis background."
Behavioral analysis "is concerned with describing, understanding, predicting, and changing behavior." That, May says, "is a big part of data science/analytics. Often we are looking to be able to influence user behavior, i.e., ‘get them to click on this, or that’ or answer the question ‘why did the user click on this, or that?’ So having a background in a science that is all about behavior can be very useful."
Also valuable, May says: "Knowing a lot of the high-level math to do analysis using probability and statistics. To do the modeling that's at the core of machine learning models, you need linear algebra along with calculus, statistics, and probability. The better you understand this math, the better you can understand the underlying behavior of the algorithm you are using, the positive and negatives of a certain algorithm. Most people have the ability to learn calculus, linear algebra, statistics, and probability... But the desire to actually do that can't be learned — although it can be fostered."
2. Knack for coding (but not necessarily computer science)
Along with a love for numbers, data analysts need an affinity for working with them programmatically.
"You should learn to code, for reproducibility so others can build on what you've done," Howe says. If you can't write down a program that does what you are doing, "you're left with two choices: teach others how to do it or keep doing it yourself forever."
What computer languages and other software tools are most likely to be useful for a data analyst? SQL is essential — it's the standard language for data manipulation. Other useful options:
However, notes Carl Howe, "Many of those tools are simply cloud-based versions of point-and-click visualization tools, which rely on manual and irreproducible processes for analyzing data. If you're an analyst who knows how to use a programming language, you'll have no trouble picking up those tools if you need them. On the other hand, if your skills are primarily in the point-and-click world, you'll find it difficult to make the transition to a code-based analysis environment, which is where hardcore data analysts work."
While knowing how to code and knowing a programming language or three is essential to being a data analyst, coding for data analytics doesn't require the same depth of knowledge required for a degree in computer science.
"Data analytics and computer science are different disciplines," Howe says. "Data analytics is more about understanding large datasets. In a computer science course, you'd be introduced to the concept of loops and loop statements, but in data analytics, you might not encounter this concept until the end, because data analytics operations process a whole set at once; looping is only used rarely. So while a data analyst needs to be able to write code, they don't necessarily need a computer science background."
3. Communication skills and curiosity
You may have the technical skills to handle data analytics, but that might not be enough to get hired. What else do you need to ace an interview?
"One, make sure you can talk your way through a number of machine learning algorithms," says URSA's May. "Two, be able to speak to the bias-variance tradeoff in prediction models — and what you can do to/about it using SQL. And three, be able to talk through an end-to-end data science or data analytics problem that you've solved — what the problem was, your solution, and how you dealt with the roadblocks you encountered along the way."
Silipo says, "I look for many different things when I run interviews for these people and positions. First of all I look for technical skills. I give them an exercise and see how they approach it, how their way of thinking is, and whether they have the right math background. This applies to both data analytics and data science.
"Then I check their communication skills. It's true that a data analyst’s role, like that of a data scientist, is mainly technical, but for both roles, a minimum level of communication will be required to explain the results of a project or even to promote the project itself."
Communication skills covers a range of factors. Do you have a design sense to create visualizations? Can you communicate with non-technical colleagues?
"And then finally, and most importantly, I check their attitude," says Silipo. Data science is constantly evolving and there will be new concepts and new algorithms to learn every year. A curious attitude is what I need. I need somebody who isn't afraid of saying, 'I don't know. I'll research it.' It's impossible to know everything in the data science space, so a healthy humble attitude mixed with a self-starting curious attitude is the right combination."
But attitude only goes so far. "People want to see demonstrable evidence of your skills," NYU's Sosulski says. "How do you do this? Build portfolios of data projects from start to finish."
May agrees. "It's good to have one — and it shouldn't just be code that you've written. You should have writeups to accompany your code, using words to explain — in a concise way — what you did. Code can get pretty long, and can take a while to digest if you don't comment it correctly. And even then, nobody has hours to dig through your code. So you have to be able to explain what you did.
"Your portfolio should have at least two classification problems where you use different algorithms, and two regression problems where you use different algorithms," May advises. "And all of these problems need to have the proper data science workflow."
In terms of the datasets you use in your portfolio, "Use some nice clean datasets — but if you can, get your hands on at least one that's very dirty and raw, so you can show what you did with missing values — do you fill them in or remove them." And how can you find data and projects to work on? Many open source datasets are available online.
4. Analysis based on healthy data
By now, you hopefully have an idea about what a data analyst does, but what you imagine might not always line up with how you actually spend your time. URSA's May says, "Mostly, you'll be thinking about a problem or question, and how you can use data to potentially solve or answer that. And you'll be doing EDA — exploratory data analysis — which means seeing if you can find a signal that can help answer that question or problem."
Once you think you know how to answer a question, you still have a ways to go before you create a report or a visualization. "One irony of both data science and analytics is that while you need to know a great deal about models and machine learning, you could spend a great deal of your time cleaning real-world data before you analyze it. It's the old story of 'garbage in, garbage out,'" says RStudio's Howe. "You need clean data to work with before you can model it."
Without healthy data, any business decision is at risk. If data is a passion for you, consider a career with Talend! You can browse open opportunities here.
In this article:
Data Integration