readme.md



Building a Real-time Sentiment Pipeline for Live Tweets using Python, R, & Azure


Requirements


Twitter Account + Twitter App setup (https://apps.twitter.com/)
Anaconda 3.5 or Python 3.5 Installed
Azure subscription or free trial account


30 day free trial
Azure Machine Learning Studio workspace


Text Editor, I'll be using Sublime Text 3
Github.com account (to receive code)
PowerBI.com account (for Dashboard portion)
.NET up to date + windows (for testing portion)


Cloning the Repo for Code & Materials

git clone https://www.github.com/datasciencedojo/meetup.git
Folder: Building a Real-time Sentiment Pipeline for Live Tweets using Python, R, & Azure


The Predictive Model


Supervised Twitter Dataset


Azure ML Reader Module:


Data source: Azure Blob Storage
Authentication type: PublicOrSAS
URI: http://azuremlsampleexperiments.blob.core.windows.net/datasets/Sentiment140.tenPercent.sample.tweets.tsv

File format: TSV
URI has header row: Checked


Import and save dataset


Preprocessing & Cleaning


Azure ML Metadata Editor: Cast categorical sentiment_label
Azure ML Group Categorical Values: Casting '0' as Negative, '4' as positive


Text Processing


Filtering using R


Removing stop words (Stop words list)
Removing special characters
Replace numbers
Globally conform to lower case
Stemming and lemmatization
Example of Cleansing Stop Words


Create a term frequency matrix for English words


Azure ML's Feature Hashing Module


Drop the tweet_text column, since it is no longer needed


Azure ML's Project Columns module


Feature Selection & Filtering


Pick only the most X relevant columns/words to train on. 
Using Azure ML's Filter based Selection module, set to Pearson's correlation to select the top 5000 most correlated columns


Normalize the Term Frequency Matrix


Text processing best practice, but does not matter too much for Tweets
Normalize Data Module: Min/Max for all numeric columns


Algorithm Selection


Algorithm Cheat Sheet
Beginer's Guide to Choosing Algorithms
Azure ML's Support Vector Machines
Support Vector Machines in General


Model Building


Train the model
Score the trained model against a validation set
Evaluate the performance, maximaxing accuracy in this case


Twitter App


Creating a Twitter Account
Creating a Twitter App
Get your Twitter app's OAuth keys and tokens.


Twitter API with Python


Twitter API for all languages
Tweepy Python Package
Streaming with Tweepy


Azure Event Hub


Create an Service Bus Namespace
Create an Azure Event Hub


Create a send key (to push data to)
Create a manage key (stream processor)
Create a listen key (to subscribe to)


Pushing to Azure Event Hub
Viewing inside of an Azure Event Hub


Deploy the Model
Hook up Stream Processors