Description

In this project, we use the data from kaggle competition Toxic Comment Classification Challenge by Jigsaw and only use the training data. Then we have break this raw training data into train and test data and evaluate the model performances in test data.

The dataset is taken from wikipedia edit text and is classified as one of the following:

  1. toxic
  2. severe_toxic
  3. obscene
  4. threat
  5. insult
  6. identity_hate

This is a multi-label (not-multiclass) classification. One text row has six labels and exactly one label is 1 and other labels are 0.

Keras Modelling Resources

Load the libraries

Useful Functions

Parameters

Load the Data

Text Data Processing

Word Embeddings

clean the text

Build Embedding Matrix

Modelling

Model Evaluation

multilabel confusion matrix

classification report

Co-occurrence Matrix

Plotly Visualization

Time Taken