Table of Contents

Description

The data is taken from consumer complaint databse.

The original dataset looks like this:



Date received                : 2019-09-24
Product                      : Debt collection
Sub-product                  : I do not know
Issue                        : Attempts to collect debt not owed
Sub-issue                    : Debt is not yours
Consumer complaint narrative : transworld systems inc. \nis trying to collect a debt that is not mine, not owed and is inaccurate.
Company public response      : NaN
Company                      : TRANSWORLD SYSTEMS INC
State                        : FL
ZIP code                     : 335XX
Tags                         : NaN
Consumer consent provided?   : Consent provided
Submitted via                : Web
Date sent to company         : 2019-09-24
Company response to consumer : Closed with explanation
Timely response?             : Yes
Consumer disputed?           : NaN
Complaint ID                 : 3384392

In this dataset I am only interested in two columns Product and Consumer complaint narrative. The dataset is large, I will sample the data and take columns renamed to product and complaint.

Load the libraries

Load the data

Useful Functions

Text Data Processing

Process text

Text Features Generation

Script