You can use automated machine learning in a variety of applications, such as natural language processing, voice recognition, and recommendation engines. It can also support your BI and analytics needs, by using models to analyze historical data, find key drivers and patterns across large sets of business metrics, and make smart business predictions based on those patterns.
Citizen data scientists benefit from AutoML tools and processes by quickly and easily developing baseline models and acting on the results of these models. ML experts benefit from AutoML avoiding the traditional trial-and-error workflow process and instead putting their time and effort toward customizing models and notebooks.
Here are the high-level benefits of automated machine learning which apply to both types of users:
Quickly apply machine learning across your organization. AutoML allows non-ML-experts to leverage machine learning models and helps ML-experienced developers and data scientists to more quickly produce solutions which are often simpler and even perform better than hand-coded models.
Focus effort on higher impact work. AutoML eliminates time-intensive and monotonous coding throughout the machine learning workflow, from preprocessing and cleaning the data to selecting the algorithm to optimizing and monitoring model parameters. Also, training a computer to identify content can reduce errors and save countless hours of manually curating tables, text, images and videos.
Improve business performance. AutoML makes it faster and easier to give your analytics teams the power of predictive analytics, which can significantly improve business performance. Specific applications include detecting fraud, giving consumers more personalized experiences, and better managing inventory through improved demand forecasting.
Automated machine learning typically maps to the traditional machine learning workflow. As with other data science or data analytics projects, you should first clearly define the question you’re trying to answer or the problem you’d like to solve. This critical step will inform your data requirements.
Depending on your specific use case and type of data (structured, image, video, or language), the details of the AutoML process will vary. But, below is a high-level overview to get you started.
The four main different types of AutoML models are based on the different types of data you’re analyzing: structured, image, video, and language.
An example of structured (or tabular) data might be historical sales data for a company. You can train your automated machine learning model to:
Image data refers to pictures stored in a database. Manually categorizing large volumes of images can lead to errors and be very time-consuming. Your model can be trained to either:
Like image data, manually categorizing large volumes of videos can lead to errors and take countless hours to perform. Your model can be trained to:
Text data, including natural language, refers to any types of messages your company might have, from social media to the help section on your site. Your model can analyze this text data to decipher its meaning and structure:
To illustrate AutoML in action, let’s imagine you’re running a SaaS company selling monthly subscriptions to an online platform. Below we look at how you can use automated machine learning in a BI tool to evaluate customer behavior.
Evaluating customer churn
AutoML tables can help you understand the patterns and drivers affecting customer churn in the past. It can also use those same patterns to predict which current customers have the highest risk of leaving in the future.
Accessing a dataset of historic customers reveals what the first 12 rows of such a dataset might look like:
Each row in the table above represents a unique, historic customer. Each column represents an attribute about the customer.
Some attributes about each customer became clear the moment that person became a customer. For example, CustomerID, Gender, Age, Zip, Plan_Type. Some attributes about each customer became available later in the customer journey, for example, Logins_1M (the number of times a customer logged into the site during month one), Avg_min_log_1M (the average time – in minutes – that a customer spent on the site during month one), and Churn_1Y (whether or not the customer quit the platform within a year of becoming a customer). Churn_1Y is the column of interest because you want to be able to predict whether or not a given customer is likely to leave the platform during the first 12 months.
Close inspection of AutoML tables reveals three key patterns in the dataset: