Table of Contents

Introduction

Data Description

You are provided with a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are:

toxic
severe_toxic
obscene
threat
insult
identity_hate

You must create a model which predicts a probability of each type of toxicity for each comment.

Imports

Useful Scripts

Load the Data

Class distribution

Text Preparation for Spacy

Classifying text into categories using Spacy

Named Entity Recognition

Chunking

Dependency Parsing

Verb Phrase Detection

Rule-Based Matching Using spaCy

Word vectors and similarity

Pipeline components