Introduction

In this project I will use the module simpletransformer to model the multiple class classsification of text data.

Ref: https://github.com/ThilinaRajapakse/simpletransformers/

Colab

Load the libraries

Data Processing for Simpletransformers

Train test split

Modelling: Simpletransformer

Ref:

Available models:

"bert": "bert-base-cased"
"roberta":"roberta-base"
"distilbert": "distilbert-base-cased"
"distilroberta":"roberta"
"electra-base":"electra"
"electra-small":"electra"
"xlnet":"xlnet-base-cased"


# note: xlnet uses too high memory, reduce batch
if model_type == "xlnet":
    train_args["train_batch_size"] = 64
    train_args["gradient_accumulation_steps"] = 2

Model Evaluation

The evaluation result in the form of a dict. By default, only the Matthews correlation coefficient (MCC) is calculated for multiclass classification.