AutoML

What it is, why you need it, and best practices.

Diagram showing how AutoML utilizes data to generate predictive analytics and what-if scenarios.

AutoML Guide

This guide provides definitions and practical advice to help you understand and practice modern automated machine learning.

What is AutoML?

AutoML (short for automated machine learning) refers to the tools and processes which make it easy to build, train, deploy and serve custom machine learning models. AutoML provides both ML experts and citizen data scientists a simple, code-free experience to generate models, make predictions, and test business scenarios. This allows you to quickly apply machine learning across your organization.

You can use automated machine learning in a variety of applications, such as natural language processing, voice recognition, and recommendation engines. It can also support your BI and analytics needs, by using models to analyze historical data, find key drivers and patterns across large sets of business metrics, and make smart business predictions based on those patterns.

Key Benefits of AutoML

Citizen data scientists benefit from AutoML tools and processes by quickly and easily developing baseline models and acting on the results of these models. ML experts benefit from AutoML avoiding the traditional trial-and-error workflow process and instead putting their time and effort toward customizing models and notebooks.

Here are the high-level benefits of automated machine learning which apply to both types of users:

Quickly apply machine learning across your organization. AutoML allows non-ML-experts to leverage machine learning models and helps ML-experienced developers and data scientists to more quickly produce solutions which are often simpler and even perform better than hand-coded models.

Focus effort on higher impact work. AutoML eliminates time-intensive and monotonous coding throughout the machine learning workflow, from preprocessing and cleaning the data to selecting the algorithm to optimizing and monitoring model parameters. Also, training a computer to identify content can reduce errors and save countless hours of manually curating tables, text, images and videos.

Improve business performance. AutoML makes it faster and easier to give your analytics teams the power of predictive analytics, which can significantly improve business performance. Specific applications include detecting fraud, giving consumers more personalized experiences, and better managing inventory through improved demand forecasting.

How Auto ML Works

Automated machine learning typically maps to the traditional machine learning workflow. As with other data science or data analytics projects, you should first clearly define the question you’re trying to answer or the problem you’d like to solve. This critical step will inform your data requirements.

Depending on your specific use case and type of data (structured, image, video, or language), the details of the AutoML process will vary. But, below is a high-level overview to get you started.

Screenshot of an AutoML data table evaluating customer churn.

Dataset. First you gather the appropriate data and prepare your dataset. Key actions include:

  • Ensuring your dataset is correctly labeled and formatted.
  • Avoiding data leakage and training-serving skew.
  • Cleaning up data which is incomplete, missing, or inconsistent.
  • Reviewing the dataset after you’ve imported it into your automated machine learning platform to ensure accuracy.

Train and Evaluate. Now you’re ready to train your model. Most AutoML tools will get you going with a default model but you should consider adjusting parameters to better fit your particular use case. Your platform should have a simple graphical interface which allows you to build custom models.

  • Make sure you understand all feature columns you’re including and that you’re not including columns which are not relevant to your analysis and will just create noise.
  • Once you’ve completed training, your tool should provide a metrics report on how your model performed on the test dataset. These metrics help you to gauge whether your model is ready to use. They include forecasting and regression metrics (such as mean absolute error and observed quantile), and classification metrics (such as prediction outcomes and score threshold).
  • In addition to this metrics report, you can further evaluate your model by using new data to run additional tests and see if the predictions generated meet what you would expect.

Deploy and Serve. Once you’re confident in the performance of your model, you can make it available to use. Usage may mean a one-time project or as part of an on-going production process.

  • For one-time projects, an asynchronous batch prediction approach is probably most appropriate.
  • If your model will be integral to a larger process in which other applications depend on fast predictions, you should consider a synchronous, real-time deployment.
  • The best AutoML tools allow you to publish your data to other cloud platforms and directly integrate your models into BI and analytics tools for full interactive analysis. This brings you deeper insights and data-driven decisions which improve your company’s performance.

AutoML Model Types

The four main different types of AutoML models are based on the different types of data you’re analyzing: structured, image, video, and language.

Structured Data
An example of structured (or tabular) data might be historical sales data for a company. You can train your automated machine learning model to:

  • Plan and forecast for decisions and actions by testing “what-if” scenarios. For example, AutoML can provide you guidance on how changing different parameters might affect your sales for a particular product. As you shift variables in the model it will automatically redistribute the data and re-predict sales volume so you can understand the implications of any potential action. This allows you to get a feel for what actions lead to the best outcome before making a decision.
  • Derive a numeric value by using regression analysis. For example, your model could estimate the value of a warehouse you’re considering buying.
  • Produce a list of categories which describe the data by using classification analysis. For example, your model could determine the likelihood of a given customer buying a particular product based on purchases they’ve made from you in the past.

Image Data
Image data refers to pictures stored in a database. Manually categorizing large volumes of images can lead to errors and be very time-consuming. Your model can be trained to either:

  • Find objects in image data, such as detecting images which contain a car.
  • Classify the image data, such as listing which types of car are included in an image.

Video Data
Like image data, manually categorizing large volumes of videos can lead to errors and take countless hours to perform. Your model can be trained to:

  • Classify the video based on whatever parameters you choose. So, if your videos are from a sales conference, you could classify them as main stage, second stage, reception hall, and party.
  • Find specific actions, such as speaking at a podium, audience questions or group activities.
  • Track specific objects or people, such as your CEO during the reception.

Text Data
Text data, including natural language, refers to any types of messages your company might have, from social media to the help section on your site. Your model can analyze this text data to decipher its meaning and structure:

  • Categorize the text data, such as by product or issue.
  • Determine the user’s attitude on the subject, such as neutral, positive, or negative.

AutoML Example

To illustrate AutoML in action, let’s imagine you’re running a SaaS company selling monthly subscriptions to an online platform. Below we look at how you can use automated machine learning in a BI tool to evaluate customer behavior.

Evaluating customer churn
AutoML tables can help you understand the patterns and drivers affecting customer churn in the past. It can also use those same patterns to predict which current customers have the highest risk of leaving in the future.

Accessing a dataset of historic customers reveals what the first 12 rows of such a dataset might look like:

Screenshot of an AutoML data table evaluating customer churn.

Each row in the table above represents a unique, historic customer. Each column represents an attribute about the customer.

Some attributes about each customer became clear the moment that person became a customer. For example, CustomerID, Gender, Age, Zip, Plan_Type. Some attributes about each customer became available later in the customer journey, for example, Logins_1M (the number of times a customer logged into the site during month one), Avg_min_log_1M (the average time – in minutes – that a customer spent on the site during month one), and Churn_1Y (whether or not the customer quit the platform within a year of becoming a customer). Churn_1Y is the column of interest because you want to be able to predict whether or not a given customer is likely to leave the platform during the first 12 months.

Close inspection of AutoML tables reveals three key patterns in the dataset:

  1. Customers over the age of 70 rarely churn during their first year.
Screenshot of an AutoML data table of customers over 70 during their first year.
  1. Female customers in their 40s on a "Family" plan rarely churn during their first year.
Screenshot of an AutoML data table of female customers in their 40s on a family plan during their first year.
  1. Male customers in their 30s on a "Personal" plan are more complicated. They’re likely to churn during their first year if they logged in fewer than 20 times during month one. If, however, they logged in more than 20 times during their first month, they’re not likely to leave.
Screenshot of an AutoML data table of male customers in their 30s  on a personal plan during their first year.

Automated machine learning excels at finding patterns such as these. It can even discover significantly more complex patterns over a large number of columns to predict how combinations of values in the feature columns will affect the values in the target columns.

AutoML takes a dataset (like the one in the example) and allows you to specify a target field (e.g., Churn_1Y). It then finds key drivers and patterns in the data that are often impossible to visualize or detect by a human. You can then refine and finalize the model as well as use it to make future predictions for both forward-looking data and scenario planning.

Taking action on your AutoML predictions
Once you learn which customers will churn, you can employ customer retention strategies as well as begin to ask questions about more valuable customers’ purchasing patterns, demographics, and other characteristics that make them the most successful.

The knowledge you gain from the patterns that automated machine learning uncovers can directly impact business conversations related to your company's long-term growth strategy and financial performance.

AutoML Tool Features

The best machine learning automation platforms include the following key capabilities:

  • Allow you to quickly preprocess, clean, and connect your data.
  • Provide a simple, code-free interface to easily auto-generate and refine ML models, make predictions, and test business scenarios.
  • Automatically scores and ranks multiple ML models to select the best performing model for your data set.
  • Help you influence the predicted outcomes by providing prediction-influencer data to explain outcomes at the record-level and including full explainability data.
  • Allow you to easily publish the data and/or directly integrate your models into BI and analytics tools to build interactive visualizations and dashboards with predictive insights that give transparency on which metrics drive results.

Learn more about AutoML tools.

Learn more about automated machine learning

WHITEPAPER

Introduction to Automated Machine Learning Whitepaper

WHITEPAPER

Machine Learning for Your Modern Business

WHITEPAPER

Machine Learning for Analysts

DATASHEET

Machine Learning for Your Analytics Teams

Want to learn more about Qlik AutoML?