NLP Application: Spam vs Ham
I explore using Natural Language Processing techniques (punctuation removal, tokenization, stopword removal, lemmatization, and vectorization) to build a sample spam-trap to distinguish between ‘spam’ or ‘ham’ text messages. This classifier (I compare the performance of a Random Forest and Gradient Boosted Classifier in this context, and for my purposes find the Random Forest algorithm to be better suited. Click here to see this repository.