Description

This project uses the consumer complaint database.

Data Description

The Consumer Complaint Database is a collection of complaints about consumer financial products and services that we sent to companies for response. Complaints are published after the company responds, confirming a commercial relationship with the consumer, or after 15 days, whichever comes first. Complaints referred to other regulators, such as complaints about depository institutions with less than $10 billion in assets, are not published in the Consumer Complaint Database. The database generally updates daily.

Purpose

Classify consumer complaints into predefined categories.

Classification algorithms

Business Problem

Business Problem:
Task    : Find the category of given complaint.
Metric : IF-IDF
Cleaning: Remove punctuations, expand contractions, etc
Question: Which class the given complaint belongs to?

oad the serialized object make sure you have the same conda environment as it was when creating the serialized object. </div>

Term Frequency : This gives how often a given word appears within a document.

$\mathrm{TF}=\frac{\text { Number of times the term appears in the doc }}{\text { Total number of words in the doc }}$

Inverse Document Frequency: This gives how often the word appers across the documents. If a term is very common among documents (e.g., “the”, “a”, “is”), then we have low IDF score.

$\mathrm{IDF}=\ln \left(\frac{\text { Number of docs }}{\text { Number docs the term appears in }}\right)$

Term Frequency – Inverse Document Frequency TF-IDF: TF-IDF is the product of the TF and IDF scores of the term.

$\mathrm{TF}-\mathrm{IDF}=\frac{\mathrm{TF}}{\mathrm{IDF}}$

In machine learning, TF-IDF is obtained from the class TfidfVectorizer. It has following parameters:

Imports

Useful Scripts

Load the data

Class Distribution

EDA for Text Data

Total Time Taken