Table of Contents

Introduction

Data Description

You are provided with a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are:

toxic
severe_toxic
obscene
threat
insult
identity_hate

You must create a model which predicts a probability of each type of toxicity for each comment.

References:

pip install -U pip setuptools wheel
pip install -U spacy
python -m spacy download en_core_web_sm

Imports

Google Colab

Useful Functions

Load the Data

Class distribution

Text Preparation for Spacy

Named Entity Recognition using Spacy3

NER using web small

NER Using web large

NER using Transformers

Comparison