Description

In this project, we use the data from kaggle competition Toxic Comment Classification Challenge by Jigsaw and only use the training data. Then we have break this raw training data into train and test data and evaluate the model performances in test data.

The dataset is taken from wikipedia edit text and is classified as one of the following:

  1. toxic
  2. severe_toxic
  3. obscene
  4. threat
  5. insult
  6. identity_hate

This is a multi-label (not-multiclass) classification. One text row has six labels and exactly one label is 1 and other labels are 0.

Load the libraries

Useful Functions

Parameters

Load the Data

Word Embeddings

Data Processing

Parameters

Text Data Processing

Model Evaluation

multilabel confusion matrix

classification report

Co-occurrence Matrix

Plotly Visualization

Time Taken