Machine Learning Engineer Intern

Overview: 1-2 minute read

I Started by doing a literature survey on various Natural Language Processing libraries like Spacy, Nltk, TextBlob, Textacy and documented the pros and cons of each library; hence making it easy for me to select the preferred library.

After selecting a library, I moved on to creating a sentiment analyzer using the Twitter Sentiment140 dataset which contains 1.6 million sentences classified as good, bad or neutral sentiment. Using the bag of words methodology I featurized my data points by normalizing, lemmatizing, stemming and removing the stopwords. I Then used various Machine Learning algorithms to build a predictive model and got an overall accuracy of 83%.

For the last step, I used similar techniques to classify airways emails according to it's content into classes like "Passport", "Complaint", "Boarding Pass" etc and received an overall accuracy of 87%; crushing the 74% accuracy of the previously deployed email classifier.