A touch of Auto Machine Learning with Visual Analytics.

AutoML is the medium of automating Machine learning pipelines to real-world problem through the use of an interactive platform to target some of the important processes like -

  1. Automated data preparation
  2. Automated feature engineering
  3. Automated Model selection
  4. Hyper-parameter tuning
  5. Quality evaluation

As part of my last Academic project, I had the opportunity to work on an intuitive platform to implement an AutoML kind-of interface to automate few of the processes, specifically —

  1. Ability to allow users to select a classification model.
  2. Hyper-parameter tuning
  3. Ability to play with interactive Visualizations.
  4. Ability to evaluate the classifiers using on-the-fly generated charts.

The intuition behind the project was to address the problem of “Sarcasm detection”. So the solution aims at providing a way to check if a given comment is sarcastic or not and allow for automated model selection and tune parameters. We leveraged the dataset of News headlines available on Kaggle which comprised of the newspaper headlines from 2 news websites — The HuffPost and TheOnion. The dataset consists of 3 fields — ‘article_link’, ‘headline’ and class ‘is_sarcastic’. Since the dataset is textual and the Machine Learning model needs to be trained using numerical data, we employed different Natural Language Processing (NLP) techniques in the data cleaning and preparation phase.

The project has 2 major components: a) Front-end b) Back-end

The front-end has been developed using Vue.js and the framework used is Python Flask.

The back-end has multiple REST API’s developed in Python to perform the following operations:

  1. Call a predict API to check whether a certain text is sarcastic or not.

2. preprocess the textual data using NLTK tools.

3. Feature extraction(TF-IDF) and Dimensionality reduction using LSA.

4. Train the classification models.

5. Visualize the results.

The overall output of the product allows for 2 important functionalities — Visualization & Play with Models

The Visualization section allows users to play with interactive visualizations. For instance, the chart above shows the frequency of each word in the entire textual data.

The Play with Models section allows any user (irrespective of an expert/non-expert) to browse through different classifier algorithms and select the available hyper-parameters of a Model.

After everything is selected, a user can click on the ‘Submit’ button to start Model training. The results of training are rendered on the right-hand side of the screen using on-the-fly generated charts -

The solution has quite some improvement scopes in the future. One of the most important additions to the project would be the ability to allow users to load their own dataset and take advantage of the automated Machine Learning flows already implemented within the system.

This project allowed us to touch on certain features of Auto Machine Learning. In the future, the need for such AutoML systems would be crucial in order to democratize Machine Learning, so any user irrespective of their expertise can take advantage of it.

Personal AutoML choices: H2O.ai, Auto Keras, Cloud AutoML.

I am an R&D Advocate at Qlik where I experiment with the Engine. Also, a Visual Analytics Researcher at Dalhousie University.