Commit 15a2fbed by Arham Akheel

Migrating meetup, datasets, web_scraping_r,…

Migrating meetup, datasets, web_scraping_r, IntroDataVisualizationWithRAndGgplot2 to tutorials repository
parent 22f44079
Col1
the
and
you
for.
that
have
but
just
with
get
not
day
was
now
this
can
work
all
out
are
http
today
your
too
time
what
got
thank
back
want
from
one
know
will
see
feel
com
think
about
don
realli
had
how
some
there
night
amp
make
watch
need
new
still
they
come
home
when
look
here
off
more
much
quot
twitter
morn
last
tomorrow
then
has
been
wait
sleep
again
her
onli
week
tri
whi
tonight
would
she
thing
way
did
say
follow
veri
bit
though
take
gonna
them
over
should
yeah
bed
even
start
tweet
could
school
hour
peopl
show
twitpic
didn
guy
hey
after
him
next.
weekend
play
down
final
let
cant
use
yes
were
who
soon
never
dont
life
girl
littl
everyon
year
rain
wanna
movi
first
find
where
call
done
sure
head
our
keep
ani
than
alway
his
leav
lot
talk
alreadi
won
man
readi
someth
made
anoth
live
read
eat
becaus
yet
yay
phone
ever
hous
went
song
befor
sound
thought
mayb
summer
someon
tell
give
guess
babi
check
mean
other
end
game
into
hear
listen
later
doesn
noth
while.
actual
happen
same
pic
stuff
birthday
mom
saw
weather
car
two
doe
put
stay
yesterday
world
those
run
also
might
until
gotta
meet
said
around
post
exam
monday
friday
seem
sinc
sunday
job
must
mani
updat
myself
found
haven
video
gone
such
famili
book
most
www
aww
month
their
boy
shop
move
least
dinner
total
woke
may
anyth
lunch
studi
pictur
hair
isn
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# Please determine the required text preprocessing steps using the following flag
replace_special_chars <- TRUE
remove_duplicate_chars <- TRUE
replace_numbers <- TRUE
convert_to_lower_case <- TRUE
remove_default_stopWords <- TRUE
remove_given_stopWords <- TRUE
stem_words <- TRUE
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
# get the label and text columns from the input data set
text_column <- dataset1[["tweet_text"]]
#label_column <- dataset1[["label_column"]]
stopword_list <- NULL
result <- tryCatch({
dataset2 <- maml.mapInputPort(2) # class: data.frame
# get the stopword list from the second input data set
stopword_list <- dataset2[[1]]
}, warning = function(war) {
# warning handler
print(paste("WARNING: ", war))
}, error = function(err) {
# error handler
print(paste("ERROR: ", err))
stopword_list <- NULL
}, finally = {})
# Load the R script from the Zip port in ./src/
source("src/text.preprocessing.R");
text_column <- preprocessText(text_column,
replace_special_chars,
remove_duplicate_chars,
replace_numbers,
convert_to_lower_case,
remove_default_stopWords,
remove_given_stopWords,
stem_words,
stopword_list)
Sentinment <- dataset1[["sentiment_label"]]
data.set <- data.frame(
Sentinment,
text_column,
stringsAsFactors = FALSE
)
# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("data.set")
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<startup>
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5"/>
</startup>
<system.serviceModel>
<extensions>
<!-- In this extension section we are introducing all known service bus extensions. User can remove the ones they don't need. -->
<behaviorExtensions>
<add name="connectionStatusBehavior"
type="Microsoft.ServiceBus.Configuration.ConnectionStatusElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="transportClientEndpointBehavior"
type="Microsoft.ServiceBus.Configuration.TransportClientEndpointBehaviorElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="serviceRegistrySettings"
type="Microsoft.ServiceBus.Configuration.ServiceRegistrySettingsElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
</behaviorExtensions>
<bindingElementExtensions>
<add name="netMessagingTransport"
type="Microsoft.ServiceBus.Messaging.Configuration.NetMessagingTransportExtensionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="tcpRelayTransport"
type="Microsoft.ServiceBus.Configuration.TcpRelayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="httpRelayTransport"
type="Microsoft.ServiceBus.Configuration.HttpRelayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="httpsRelayTransport"
type="Microsoft.ServiceBus.Configuration.HttpsRelayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="onewayRelayTransport"
type="Microsoft.ServiceBus.Configuration.RelayedOnewayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
</bindingElementExtensions>
<bindingExtensions>
<add name="basicHttpRelayBinding"
type="Microsoft.ServiceBus.Configuration.BasicHttpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="webHttpRelayBinding"
type="Microsoft.ServiceBus.Configuration.WebHttpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="ws2007HttpRelayBinding"
type="Microsoft.ServiceBus.Configuration.WS2007HttpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="netTcpRelayBinding"
type="Microsoft.ServiceBus.Configuration.NetTcpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="netOnewayRelayBinding"
type="Microsoft.ServiceBus.Configuration.NetOnewayRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="netEventRelayBinding"
type="Microsoft.ServiceBus.Configuration.NetEventRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="netMessagingBinding"
type="Microsoft.ServiceBus.Messaging.Configuration.NetMessagingBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
</bindingExtensions>
</extensions>
</system.serviceModel>
<appSettings>
<!-- Service Bus specific app setings for messaging connections -->
<add key="Microsoft.ServiceBus.ConnectionString"
value="Endpoint=sb://tolltest.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=V93mgRhRp0d1FkslcsyjOZNLjo5iSZ730wJuWbZIbS8="/>
<add key="storageAccountName"
value="dojodemo"/>
<add key="storageAccountKey"
value="QPALUJTeuleyZLwLQ45uT5gLIe6KcrKtpO4VpDsRs/8blwphpkySk7FQwHO4lbgp633uNEG5UFePj/p+6bDmnw=="/>
</appSettings>
</configuration>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<startup>
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5"/>
</startup>
<system.serviceModel>
<extensions>
<!-- In this extension section we are introducing all known service bus extensions. User can remove the ones they don't need. -->
<behaviorExtensions>
<add name="connectionStatusBehavior"
type="Microsoft.ServiceBus.Configuration.ConnectionStatusElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="transportClientEndpointBehavior"
type="Microsoft.ServiceBus.Configuration.TransportClientEndpointBehaviorElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="serviceRegistrySettings"
type="Microsoft.ServiceBus.Configuration.ServiceRegistrySettingsElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
</behaviorExtensions>
<bindingElementExtensions>
<add name="netMessagingTransport"
type="Microsoft.ServiceBus.Messaging.Configuration.NetMessagingTransportExtensionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="tcpRelayTransport"
type="Microsoft.ServiceBus.Configuration.TcpRelayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="httpRelayTransport"
type="Microsoft.ServiceBus.Configuration.HttpRelayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="httpsRelayTransport"
type="Microsoft.ServiceBus.Configuration.HttpsRelayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="onewayRelayTransport"
type="Microsoft.ServiceBus.Configuration.RelayedOnewayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
</bindingElementExtensions>
<bindingExtensions>
<add name="basicHttpRelayBinding"
type="Microsoft.ServiceBus.Configuration.BasicHttpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="webHttpRelayBinding"
type="Microsoft.ServiceBus.Configuration.WebHttpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="ws2007HttpRelayBinding"
type="Microsoft.ServiceBus.Configuration.WS2007HttpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="netTcpRelayBinding"
type="Microsoft.ServiceBus.Configuration.NetTcpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="netOnewayRelayBinding"
type="Microsoft.ServiceBus.Configuration.NetOnewayRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="netEventRelayBinding"
type="Microsoft.ServiceBus.Configuration.NetEventRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add name="netMessagingBinding"
type="Microsoft.ServiceBus.Messaging.Configuration.NetMessagingBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
</bindingExtensions>
</extensions>
</system.serviceModel>
<appSettings>
<!-- Service Bus specific app setings for messaging connections -->
<add key="Microsoft.ServiceBus.ConnectionString"
value="Endpoint=sb://tolltest.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=V93mgRhRp0d1FkslcsyjOZNLjo5iSZ730wJuWbZIbS8="/>
<add key="storageAccountName"
value="dojoeventhubs"/>
<add key="storageAccountKey"
value="lrrS7WkjginKovVFS9E3J8JmYJRnEj6bsz7hGymEqwfqmbt31h5GmQwE9+SiVSC3NPQZ+FhYLtkbTkJxOBbTrg=="/>
</appSettings>
</configuration>
\ No newline at end of file
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
<assemblyIdentity version="1.0.0.0" name="MyApplication.app"/>
<trustInfo xmlns="urn:schemas-microsoft-com:asm.v2">
<security>
<requestedPrivileges xmlns="urn:schemas-microsoft-com:asm.v3">
<requestedExecutionLevel level="asInvoker" uiAccess="false"/>
</requestedPrivileges>
</security>
</trustInfo>
</assembly>
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
<?xml version="1.0" encoding="utf-8"?>
<doc>
<assembly>
<name>Microsoft.ServiceBus.Messaging.EventProcessorHost</name>
</assembly>
<members>
<member name="T:Microsoft.ServiceBus.Messaging.EventProcessorHost">
<summary>Represents a host for processing Event Hubs event data.</summary>
</member>
<member name="M:Microsoft.ServiceBus.Messaging.EventProcessorHost.#ctor(System.String,System.String,System.String,System.String,System.String)">
<summary>Initializes a new instance of the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> class.</summary>
<param name="hostName">The name of the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance. This name must be unique for each instance of the host.</param>
<param name="eventHubPath">The path to the Event Hub from which to start receiving event data.</param>
<param name="consumerGroupName">The name of the Event Hubs consumer group from which to start receiving event data.</param>
<param name="eventHubConnectionString">The connection string for the Event Hub.</param>
<param name="storageConnectionString">The connection string for the Azure Blob storage account to use for partition distribution.</param>
</member>
<member name="M:Microsoft.ServiceBus.Messaging.EventProcessorHost.#ctor(System.String,System.String,System.String,System.String,System.String,System.String)">
<summary>Initializes a new instance of the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> class.</summary>
<param name="hostName">The name of the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance. This name must be unique for each instance of the host.</param>
<param name="eventHubPath">The path to the Event Hub from which to start receiving event data.</param>
<param name="consumerGroupName">The name of the Event Hubs consumer group from which to start receiving event data.</param>
<param name="eventHubConnectionString">The connection string for the Event Hub.</param>
<param name="storageConnectionString">The connection string for the Azure Blob storage account to use for partition distribution.</param>
<param name="leaseContainerName">The name of the Azure Blob container in which all lease blobs are created. If this parameter is not supplied, then the Event Hubs path is used as the name of the Azure Blob container.</param>
</member>
<member name="P:Microsoft.ServiceBus.Messaging.EventProcessorHost.HostName">
<summary>Gets the host name, which is a unique name for the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance.</summary>
<returns>The host name.</returns>
</member>
<member name="P:Microsoft.ServiceBus.Messaging.EventProcessorHost.PartitionManagerOptions">
<summary>Gets or sets the <see cref="T:Microsoft.ServiceBus.Messaging.PartitionManagerOptions" /> instance used by the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> object.</summary>
<returns>The <see cref="T:Microsoft.ServiceBus.Messaging.PartitionManagerOptions" /> instance.</returns>
</member>
<member name="M:Microsoft.ServiceBus.Messaging.EventProcessorHost.RegisterEventProcessorAsync``1">
<summary>Asynchronously registers the <see cref="T:Microsoft.ServiceBus.Messaging.IEventProcessor" /> interface implementation with the host using the <see cref="T:Microsoft.ServiceBus.Messaging.DefaultEventProcessorFactory`1" /> factory. This method also starts the host and enables it to start participating in the partition distribution process.</summary>
<returns>A task indicating that the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance has started.</returns>
<typeparam name="T">Implementation of your application-specific <see cref="T:Microsoft.ServiceBus.Messaging.IEventProcessor" />.</typeparam>
</member>
<member name="M:Microsoft.ServiceBus.Messaging.EventProcessorHost.RegisterEventProcessorAsync``1(Microsoft.ServiceBus.Messaging.EventProcessorOptions)">
<summary>Asynchronously registers the <see cref="T:Microsoft.ServiceBus.Messaging.IEventProcessor" /> interface implementation with the host using the <see cref="T:Microsoft.ServiceBus.Messaging.DefaultEventProcessorFactory`1" /> factory. This method also starts the host and enables it to start participating in the partition distribution process.</summary>
<returns>A task indicating that the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance has started.</returns>
<param name="processorOptions">An <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorOptions" /> object that controls various aspects of the event pump created when ownership is acquired for a given Event Hubs partition.</param>
<typeparam name="T">Implementation of your application-specific <see cref="T:Microsoft.ServiceBus.Messaging.IEventProcessor" />.</typeparam>
</member>
<member name="M:Microsoft.ServiceBus.Messaging.EventProcessorHost.RegisterEventProcessorFactoryAsync(Microsoft.ServiceBus.Messaging.IEventProcessorFactory)">
<summary>Asynchronously registers the event processor factory.</summary>
<returns>The task representing the asynchronous operation.</returns>
<param name="factory">The factory to register.</param>
</member>
<member name="M:Microsoft.ServiceBus.Messaging.EventProcessorHost.RegisterEventProcessorFactoryAsync(Microsoft.ServiceBus.Messaging.IEventProcessorFactory,Microsoft.ServiceBus.Messaging.EventProcessorOptions)">
<summary>Asynchronously registers the event processor factory.</summary>
<returns>Returns <see cref="T:System.Threading.Tasks.Task" />.</returns>
<param name="factory">The factory to register.</param>
<param name="processorOptions">An <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorOptions" /> object that controls various aspects of the event pump created when ownership is acquired for a given Event Hubs partition.</param>
</member>
<member name="M:Microsoft.ServiceBus.Messaging.EventProcessorHost.UnregisterEventProcessorAsync">
<summary>Asynchronously shuts down the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance. This method maintains the leases on all partitions currently held, and enables each <see cref="T:Microsoft.ServiceBus.Messaging.IEventProcessor" /> instance to shut down cleanly by invoking the <see cref="M:Microsoft.ServiceBus.Messaging.IEventProcessor.CloseAsync(Microsoft.ServiceBus.Messaging.PartitionContext,Microsoft.ServiceBus.Messaging.CloseReason)" /> method with a <see cref="F:Microsoft.ServiceBus.Messaging.CloseReason.Shutdown" /> object.</summary>
<returns>A task that indicates the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance has stopped.</returns>
</member>
<member name="T:Microsoft.ServiceBus.Messaging.PartitionManagerOptions">
<summary>Represents the options that control various aspects of partition distribution that occur within the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance.</summary>
</member>
<member name="M:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.#ctor">
<summary>Initializes a new instance of the <see cref="T:Microsoft.ServiceBus.Messaging.PartitionManagerOptions" /> class.</summary>
</member>
<member name="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.AcquireInterval">
<summary>Gets or sets the interval at which the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance begins a task to determine whether partitions are distributed evenly among known host instances.</summary>
<returns>The acquire interval of the partition.</returns>
</member>
<member name="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.DefaultOptions">
<summary>Creates an instance of <see cref="P:Microsoft.ServiceBus.Messaging.EventProcessorHost.PartitionManagerOptions" /> with the following default values:<see cref="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.RenewInterval" />: 10 seconds.<see cref="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.AcquireInterval" />: 10 seconds.<see cref="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.LeaseInterval" />: 30 seconds. </summary>
<returns>The default partition manager options.</returns>
</member>
<member name="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.LeaseInterval">
<summary>Gets or sets the interval at which the lease is created on an Azure Blob representing an Event Hubs partition. If the lease is not renewed within this interval, it expires, and ownership of the partition passes to another <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance.</summary>
<returns>Returns <see cref="T:System.TimeSpan" />.</returns>
</member>
<member name="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.MaxReceiveClients"></member>
<member name="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.RenewInterval">
<summary>Gets or sets the renewal interval for all leases for partitions currently held by the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance.</summary>
<returns>The interval to renew the partition.</returns>
</member>
</members>
</doc>
\ No newline at end of file
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
# Building a Real-time Sentiment Pipeline for Live Tweets using Python, R, & Azure
## Requirements
* Twitter Account + Twitter App setup (https://apps.twitter.com/)
* Anaconda 3.5 or Python 3.5 Installed
* Azure subscription or free trial account
* [30 day free trial](https://azure.microsoft.com/en-us/pricing/free-trial/)
* Azure Machine Learning Studio workspace
* Text Editor, I'll be using Sublime Text 3
* Github.com account (to receive code)
* PowerBI.com account (for Dashboard portion)
* .NET up to date + windows (for testing portion)
## Cloning the Repo for Code & Materials
```
git clone https://www.github.com/datasciencedojo/meetup.git
```
Folder: Building a Real-time Sentiment Pipeline for Live Tweets using Python, R, & Azure
## The Predictive Model
### Supervised Twitter Dataset
* Azure ML Reader Module:
* Data source: Azure Blob Storage
* Authentication type: PublicOrSAS
* URI: http://azuremlsampleexperiments.blob.core.windows.net/datasets/Sentiment140.tenPercent.sample.tweets.tsv
* File format: TSV
* URI has header row: Checked
* Import and save dataset
### Preprocessing & Cleaning
* Azure ML Metadata Editor: Cast categorical sentiment_label
* Azure ML Group Categorical Values: Casting '0' as Negative, '4' as positive
### Text Processing
* Filtering using R
* Removing stop words (Stop words list)
* Removing special characters
* Replace numbers
* Globally conform to lower case