Table of Contents

Introduction

Data Description

You are provided with a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are:

toxic
severe_toxic
obscene
threat
insult
identity_hate

You must create a model which predicts a probability of each type of toxicity for each comment.

Imports

Useful Scripts

Load the Data

Class distribution

Train Target Split

Text Modelling: Logistic Regression

Baseline Logistic Regression

Model 2: NBFeaturer

Model 3: NBFeaturer + Lemmatizer

Model 4: NBFeaturer + Lemmatizer + two tfidfs

Using Best Parameters

Testing the Model

Model Evaluation for Classification

Model Explanation