Table of Contents

Description

Twitter sentiment analysis.

Model Evaluation Metric: Weighted F-1 score.

Load the libraries

Load the data

Train test split

Modelling

BoW (CountVectorizer)

Bow + ExtraFeatures

Word2Vec

Tf-idf

Term Frequency : This gives how often a given word appears within a document.

$\mathrm{TF}=\frac{\text { Number of times the term appears in the doc }}{\text { Total number of words in the doc }}$

Inverse Document Frequency: This gives how often the word appers across the documents. If a term is very common among documents (e.g., “the”, “a”, “is”), then we have low IDF score.

$\mathrm{IDF}=\ln \left(\frac{\text { Number of docs }}{\text { Number docs the term appears in }}\right)$

Term Frequency – Inverse Document Frequency TF-IDF: TF-IDF is the product of the TF and IDF scores of the term.

$\mathrm{TF}\mathrm{IDF}=\mathrm{TF} * \mathrm{IDF}$

In machine learning, TF-IDF is obtained from the class TfidfVectorizer. It has following parameters:

NOTE:

Logistic RegressionCV

Linear SVC