NLP Application: Spam vs Ham

I explore using Natural Language Processing techniques (punctuation removal, tokenization, stopword removal, lemmatization, and vectorization) to build a sample spam-trap to distinguish between ‘spam’ or ‘ham’ text messages. This classifier (I compare the performance of a Random Forest and Gradient Boosted Classifier in this context, and for my purposes find the Random Forest algorithm to be better suited. Click here to see this repository.

Twitter Facebook LinkedIn

Sofia Pasquini