Migrating meetup, datasets, web_scraping_r,…

Migrating meetup, datasets, web_scraping_r, IntroDataVisualizationWithRAndGgplot2 to tutorials repository

Migrating meetup, datasets, web_scraping_r,…
Migrating meetup, datasets, web_scraping_r, IntroDataVisualizationWithRAndGgplot2 to tutorials repository
15a2fbed · Arham Akheel · 22f44079 · 15a2fbed · 15a2fbed · 15a2fbed
Commit 15a2fbed authored Mar 15, 2018 by Arham Akheel
117 changed files
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/AzureML Code/Stop Words Simple List.csv
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/AzureML Code/Stop Words Simple List.csv
+Col1
+the
+and
+you
+for.
+that
+have
+but
+just
+with
+get
+not
+day
+was
+now
+this
+can
+work
+all
+out
+are
+http
+today
+your
+too
+time
+what
+got
+thank
+back
+want
+from
+one
+know
+will
+see
+feel
+com
+think
+about
+don
+realli
+had
+how
+some
+there
+night
+amp
+make
+watch
+need
+new
+still
+they
+come
+home
+when
+look
+here
+off
+more
+much
+quot
+twitter
+morn
+last
+tomorrow
+then
+has
+been
+wait
+sleep
+again
+her
+onli
+week
+tri
+whi
+tonight
+would
+she
+thing
+way
+did
+say
+follow
+veri
+bit
+though
+take
+gonna
+them
+over
+should
+yeah
+bed
+even
+start
+tweet
+could
+school
+hour
+peopl
+show
+twitpic
+didn
+guy
+hey
+after
+him
+next.
+weekend
+play
+down
+final
+let
+cant
+use
+yes
+were
+who
+soon
+never
+dont
+life
+girl
+littl
+everyon
+year
+rain
+wanna
+movi
+first
+find
+where
+call
+done
+sure
+head
+our
+keep
+ani
+than
+alway
+his
+leav
+lot
+talk
+alreadi
+won
+man
+readi
+someth
+made
+anoth
+live
+read
+eat
+becaus
+yet
+yay
+phone
+ever
+hous
+went
+song
+befor
+sound
+thought
+mayb
+summer
+someon
+tell
+give
+guess
+babi
+check
+mean
+other
+end
+game
+into
+hear
+listen
+later
+doesn
+noth
+while.
+actual
+happen
+same
+pic
+stuff
+birthday
+mom
+saw
+weather
+car
+two
+doe
+put
+stay
+yesterday
+world
+those
+run
+also
+might
+until
+gotta
+meet
+said
+around
+post
+exam
+monday
+friday
+seem
+sinc
+sunday
+job
+must
+mani
+updat
+myself
+found
+haven
+video
+gone
+such
+famili
+book
+most
+www
+aww
+month
+their
+boy
+shop
+move
+least
+dinner
+total
+woke
+may
+anyth
+lunch
+studi
+pictur
+hair
+isn
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/AzureML Code/Text Processing Script.R
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/AzureML Code/Text Processing Script.R
+# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
+# Please determine the required text preprocessing steps using the following flag 
+replace_special_chars <- TRUE
+remove_duplicate_chars <- TRUE
+replace_numbers <- TRUE
+convert_to_lower_case <- TRUE
+remove_default_stopWords <- TRUE
+remove_given_stopWords <- TRUE
+stem_words <- TRUE
+# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
+
+# Map 1-based optional input ports to variables
+dataset1 <- maml.mapInputPort(1) # class: data.frame
+# get the label and text columns from the input data set
+text_column <- dataset1[["tweet_text"]]
+#label_column <- dataset1[["label_column"]]
+
+stopword_list <- NULL
+result <- tryCatch({
+ dataset2 <- maml.mapInputPort(2) # class: data.frame
+ # get the stopword list from the second input data set
+ stopword_list <- dataset2[[1]]
+}, warning = function(war) {
+ # warning handler 
+ print(paste("WARNING: ", war))
+}, error = function(err) {
+ # error handler
+ print(paste("ERROR: ", err))
+ stopword_list <- NULL
+}, finally = {})
+ 
+# Load the R script from the Zip port in ./src/
+source("src/text.preprocessing.R");
+ 
+text_column <- preprocessText(text_column, 
+ replace_special_chars,
+ remove_duplicate_chars,
+ replace_numbers,
+ convert_to_lower_case,
+ remove_default_stopWords,
+ remove_given_stopWords,
+ stem_words, 
+ stopword_list) 
+Sentinment <- dataset1[["sentiment_label"]]
+data.set <- data.frame(
+ Sentinment,
+ text_column,
+ stringsAsFactors = FALSE 
+ ) 
+
+# Select data.frame to be sent to the output Dataset port
+maml.mapOutputPort("data.set")
\ No newline at end of file
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/AzureML Code/text.preprocessing.R
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/AzureML Code/text.preprocessing.R
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/AzureML Code/text.preprocessing.zip
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/AzureML Code/text.preprocessing.zip
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/Building a Sentinment Predictor Using Azure Machine Learning Studio.docx
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/Building a Sentinment Predictor Using Azure Machine Learning Studio.docx
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/EventHubReceiver.exe
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/EventHubReceiver.exe
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/EventHubReceiver.exe.config
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/EventHubReceiver.exe.config
+<?xml version="1.0" encoding="utf-8"?>
+<configuration>
+ <startup> 
+ <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5"/>
+ </startup>
+ <system.serviceModel>
+ <extensions>
+ <!-- In this extension section we are introducing all known service bus extensions. User can remove the ones they don't need. -->
+ <behaviorExtensions>
+ <add name="connectionStatusBehavior"
+ type="Microsoft.ServiceBus.Configuration.ConnectionStatusElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="transportClientEndpointBehavior"
+ type="Microsoft.ServiceBus.Configuration.TransportClientEndpointBehaviorElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="serviceRegistrySettings"
+ type="Microsoft.ServiceBus.Configuration.ServiceRegistrySettingsElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ </behaviorExtensions>
+ <bindingElementExtensions>
+ <add name="netMessagingTransport"
+ type="Microsoft.ServiceBus.Messaging.Configuration.NetMessagingTransportExtensionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="tcpRelayTransport"
+ type="Microsoft.ServiceBus.Configuration.TcpRelayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="httpRelayTransport"
+ type="Microsoft.ServiceBus.Configuration.HttpRelayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="httpsRelayTransport"
+ type="Microsoft.ServiceBus.Configuration.HttpsRelayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="onewayRelayTransport"
+ type="Microsoft.ServiceBus.Configuration.RelayedOnewayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ </bindingElementExtensions>
+ <bindingExtensions>
+ <add name="basicHttpRelayBinding"
+ type="Microsoft.ServiceBus.Configuration.BasicHttpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="webHttpRelayBinding"
+ type="Microsoft.ServiceBus.Configuration.WebHttpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="ws2007HttpRelayBinding"
+ type="Microsoft.ServiceBus.Configuration.WS2007HttpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="netTcpRelayBinding"
+ type="Microsoft.ServiceBus.Configuration.NetTcpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="netOnewayRelayBinding"
+ type="Microsoft.ServiceBus.Configuration.NetOnewayRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="netEventRelayBinding"
+ type="Microsoft.ServiceBus.Configuration.NetEventRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="netMessagingBinding"
+ type="Microsoft.ServiceBus.Messaging.Configuration.NetMessagingBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ </bindingExtensions>
+ </extensions>
+ </system.serviceModel>
+ <appSettings>
+ <!-- Service Bus specific app setings for messaging connections -->
+ <add key="Microsoft.ServiceBus.ConnectionString"
+ value="Endpoint=sb://tolltest.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=V93mgRhRp0d1FkslcsyjOZNLjo5iSZ730wJuWbZIbS8="/>
+ <add key="storageAccountName"
+ value="dojodemo"/>
+ <add key="storageAccountKey"
+ value="QPALUJTeuleyZLwLQ45uT5gLIe6KcrKtpO4VpDsRs/8blwphpkySk7FQwHO4lbgp633uNEG5UFePj/p+6bDmnw=="/>
+ </appSettings>
+</configuration>
\ No newline at end of file
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/EventHubReceiver.pdb
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/EventHubReceiver.pdb
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/EventHubReceiver.vshost.exe
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/EventHubReceiver.vshost.exe
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/EventHubReceiver.vshost.exe.config
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/EventHubReceiver.vshost.exe.config
+<?xml version="1.0" encoding="utf-8"?>
+<configuration>
+ <startup> 
+ <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5"/>
+ </startup>
+ <system.serviceModel>
+ <extensions>
+ <!-- In this extension section we are introducing all known service bus extensions. User can remove the ones they don't need. -->
+ <behaviorExtensions>
+ <add name="connectionStatusBehavior"
+ type="Microsoft.ServiceBus.Configuration.ConnectionStatusElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="transportClientEndpointBehavior"
+ type="Microsoft.ServiceBus.Configuration.TransportClientEndpointBehaviorElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="serviceRegistrySettings"
+ type="Microsoft.ServiceBus.Configuration.ServiceRegistrySettingsElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ </behaviorExtensions>
+ <bindingElementExtensions>
+ <add name="netMessagingTransport"
+ type="Microsoft.ServiceBus.Messaging.Configuration.NetMessagingTransportExtensionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="tcpRelayTransport"
+ type="Microsoft.ServiceBus.Configuration.TcpRelayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="httpRelayTransport"
+ type="Microsoft.ServiceBus.Configuration.HttpRelayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="httpsRelayTransport"
+ type="Microsoft.ServiceBus.Configuration.HttpsRelayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="onewayRelayTransport"
+ type="Microsoft.ServiceBus.Configuration.RelayedOnewayTransportElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ </bindingElementExtensions>
+ <bindingExtensions>
+ <add name="basicHttpRelayBinding"
+ type="Microsoft.ServiceBus.Configuration.BasicHttpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="webHttpRelayBinding"
+ type="Microsoft.ServiceBus.Configuration.WebHttpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="ws2007HttpRelayBinding"
+ type="Microsoft.ServiceBus.Configuration.WS2007HttpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="netTcpRelayBinding"
+ type="Microsoft.ServiceBus.Configuration.NetTcpRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="netOnewayRelayBinding"
+ type="Microsoft.ServiceBus.Configuration.NetOnewayRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="netEventRelayBinding"
+ type="Microsoft.ServiceBus.Configuration.NetEventRelayBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ <add name="netMessagingBinding"
+ type="Microsoft.ServiceBus.Messaging.Configuration.NetMessagingBindingCollectionElement, Microsoft.ServiceBus, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
+ </bindingExtensions>
+ </extensions>
+ </system.serviceModel>
+ <appSettings>
+ <!-- Service Bus specific app setings for messaging connections -->
+ <add key="Microsoft.ServiceBus.ConnectionString"
+ value="Endpoint=sb://tolltest.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=V93mgRhRp0d1FkslcsyjOZNLjo5iSZ730wJuWbZIbS8="/>
+ <add key="storageAccountName"
+ value="dojoeventhubs"/>
+ <add key="storageAccountKey"
+ value="lrrS7WkjginKovVFS9E3J8JmYJRnEj6bsz7hGymEqwfqmbt31h5GmQwE9+SiVSC3NPQZ+FhYLtkbTkJxOBbTrg=="/>
+ </appSettings>
+</configuration>
\ No newline at end of file
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/EventHubReceiver.vshost.exe.manifest
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/EventHubReceiver.vshost.exe.manifest
+<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
+<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
+ <assemblyIdentity version="1.0.0.0" name="MyApplication.app"/>
+ <trustInfo xmlns="urn:schemas-microsoft-com:asm.v2">
+ <security>
+ <requestedPrivileges xmlns="urn:schemas-microsoft-com:asm.v3">
+ <requestedExecutionLevel level="asInvoker" uiAccess="false"/>
+ </requestedPrivileges>
+ </security>
+ </trustInfo>
+</assembly>
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.Data.Edm.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.Data.Edm.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.Data.Edm.xml
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.Data.Edm.xml
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.Data.OData.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.Data.OData.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.Data.OData.xml
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.Data.OData.xml
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.ServiceBus.Messaging.EventProcessorHost.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.ServiceBus.Messaging.EventProcessorHost.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.ServiceBus.Messaging.EventProcessorHost.xml
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.ServiceBus.Messaging.EventProcessorHost.xml
+<?xml version="1.0" encoding="utf-8"?>
+<doc>
+ <assembly>
+ <name>Microsoft.ServiceBus.Messaging.EventProcessorHost</name>
+ </assembly>
+ <members>
+ <member name="T:Microsoft.ServiceBus.Messaging.EventProcessorHost">
+ <summary>Represents a host for processing Event Hubs event data.</summary>
+ </member>
+ <member name="M:Microsoft.ServiceBus.Messaging.EventProcessorHost.#ctor(System.String,System.String,System.String,System.String,System.String)">
+ <summary>Initializes a new instance of the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> class.</summary>
+ <param name="hostName">The name of the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance. This name must be unique for each instance of the host.</param>
+ <param name="eventHubPath">The path to the Event Hub from which to start receiving event data.</param>
+ <param name="consumerGroupName">The name of the Event Hubs consumer group from which to start receiving event data.</param>
+ <param name="eventHubConnectionString">The connection string for the Event Hub.</param>
+ <param name="storageConnectionString">The connection string for the Azure Blob storage account to use for partition distribution.</param>
+ </member>
+ <member name="M:Microsoft.ServiceBus.Messaging.EventProcessorHost.#ctor(System.String,System.String,System.String,System.String,System.String,System.String)">
+ <summary>Initializes a new instance of the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> class.</summary>
+ <param name="hostName">The name of the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance. This name must be unique for each instance of the host.</param>
+ <param name="eventHubPath">The path to the Event Hub from which to start receiving event data.</param>
+ <param name="consumerGroupName">The name of the Event Hubs consumer group from which to start receiving event data.</param>
+ <param name="eventHubConnectionString">The connection string for the Event Hub.</param>
+ <param name="storageConnectionString">The connection string for the Azure Blob storage account to use for partition distribution.</param>
+ <param name="leaseContainerName">The name of the Azure Blob container in which all lease blobs are created. If this parameter is not supplied, then the Event Hubs path is used as the name of the Azure Blob container.</param>
+ </member>
+ <member name="P:Microsoft.ServiceBus.Messaging.EventProcessorHost.HostName">
+ <summary>Gets the host name, which is a unique name for the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance.</summary>
+ <returns>The host name.</returns>
+ </member>
+ <member name="P:Microsoft.ServiceBus.Messaging.EventProcessorHost.PartitionManagerOptions">
+ <summary>Gets or sets the <see cref="T:Microsoft.ServiceBus.Messaging.PartitionManagerOptions" /> instance used by the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> object.</summary>
+ <returns>The <see cref="T:Microsoft.ServiceBus.Messaging.PartitionManagerOptions" /> instance.</returns>
+ </member>
+ <member name="M:Microsoft.ServiceBus.Messaging.EventProcessorHost.RegisterEventProcessorAsync``1">
+ <summary>Asynchronously registers the <see cref="T:Microsoft.ServiceBus.Messaging.IEventProcessor" /> interface implementation with the host using the <see cref="T:Microsoft.ServiceBus.Messaging.DefaultEventProcessorFactory`1" /> factory. This method also starts the host and enables it to start participating in the partition distribution process.</summary>
+ <returns>A task indicating that the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance has started.</returns>
+ <typeparam name="T">Implementation of your application-specific <see cref="T:Microsoft.ServiceBus.Messaging.IEventProcessor" />.</typeparam>
+ </member>
+ <member name="M:Microsoft.ServiceBus.Messaging.EventProcessorHost.RegisterEventProcessorAsync``1(Microsoft.ServiceBus.Messaging.EventProcessorOptions)">
+ <summary>Asynchronously registers the <see cref="T:Microsoft.ServiceBus.Messaging.IEventProcessor" /> interface implementation with the host using the <see cref="T:Microsoft.ServiceBus.Messaging.DefaultEventProcessorFactory`1" /> factory. This method also starts the host and enables it to start participating in the partition distribution process.</summary>
+ <returns>A task indicating that the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance has started.</returns>
+ <param name="processorOptions">An <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorOptions" /> object that controls various aspects of the event pump created when ownership is acquired for a given Event Hubs partition.</param>
+ <typeparam name="T">Implementation of your application-specific <see cref="T:Microsoft.ServiceBus.Messaging.IEventProcessor" />.</typeparam>
+ </member>
+ <member name="M:Microsoft.ServiceBus.Messaging.EventProcessorHost.RegisterEventProcessorFactoryAsync(Microsoft.ServiceBus.Messaging.IEventProcessorFactory)">
+ <summary>Asynchronously registers the event processor factory.</summary>
+ <returns>The task representing the asynchronous operation.</returns>
+ <param name="factory">The factory to register.</param>
+ </member>
+ <member name="M:Microsoft.ServiceBus.Messaging.EventProcessorHost.RegisterEventProcessorFactoryAsync(Microsoft.ServiceBus.Messaging.IEventProcessorFactory,Microsoft.ServiceBus.Messaging.EventProcessorOptions)">
+ <summary>Asynchronously registers the event processor factory.</summary>
+ <returns>Returns <see cref="T:System.Threading.Tasks.Task" />.</returns>
+ <param name="factory">The factory to register.</param>
+ <param name="processorOptions">An <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorOptions" /> object that controls various aspects of the event pump created when ownership is acquired for a given Event Hubs partition.</param>
+ </member>
+ <member name="M:Microsoft.ServiceBus.Messaging.EventProcessorHost.UnregisterEventProcessorAsync">
+ <summary>Asynchronously shuts down the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance. This method maintains the leases on all partitions currently held, and enables each <see cref="T:Microsoft.ServiceBus.Messaging.IEventProcessor" /> instance to shut down cleanly by invoking the <see cref="M:Microsoft.ServiceBus.Messaging.IEventProcessor.CloseAsync(Microsoft.ServiceBus.Messaging.PartitionContext,Microsoft.ServiceBus.Messaging.CloseReason)" /> method with a <see cref="F:Microsoft.ServiceBus.Messaging.CloseReason.Shutdown" /> object.</summary>
+ <returns>A task that indicates the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance has stopped.</returns>
+ </member>
+ <member name="T:Microsoft.ServiceBus.Messaging.PartitionManagerOptions">
+ <summary>Represents the options that control various aspects of partition distribution that occur within the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance.</summary>
+ </member>
+ <member name="M:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.#ctor">
+ <summary>Initializes a new instance of the <see cref="T:Microsoft.ServiceBus.Messaging.PartitionManagerOptions" /> class.</summary>
+ </member>
+ <member name="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.AcquireInterval">
+ <summary>Gets or sets the interval at which the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance begins a task to determine whether partitions are distributed evenly among known host instances.</summary>
+ <returns>The acquire interval of the partition.</returns>
+ </member>
+ <member name="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.DefaultOptions">
+ <summary>Creates an instance of <see cref="P:Microsoft.ServiceBus.Messaging.EventProcessorHost.PartitionManagerOptions" /> with the following default values:<see cref="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.RenewInterval" />: 10 seconds.<see cref="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.AcquireInterval" />: 10 seconds.<see cref="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.LeaseInterval" />: 30 seconds. </summary>
+ <returns>The default partition manager options.</returns>
+ </member>
+ <member name="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.LeaseInterval">
+ <summary>Gets or sets the interval at which the lease is created on an Azure Blob representing an Event Hubs partition. If the lease is not renewed within this interval, it expires, and ownership of the partition passes to another <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance.</summary>
+ <returns>Returns <see cref="T:System.TimeSpan" />.</returns>
+ </member>
+ <member name="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.MaxReceiveClients"></member>
+ <member name="P:Microsoft.ServiceBus.Messaging.PartitionManagerOptions.RenewInterval">
+ <summary>Gets or sets the renewal interval for all leases for partitions currently held by the <see cref="T:Microsoft.ServiceBus.Messaging.EventProcessorHost" /> instance.</summary>
+ <returns>The interval to renew the partition.</returns>
+ </member>
+ </members>
+</doc>
\ No newline at end of file
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.ServiceBus.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.ServiceBus.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.ServiceBus.xml
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.ServiceBus.xml
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.WindowsAzure.Configuration.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.WindowsAzure.Configuration.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.WindowsAzure.Storage.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.WindowsAzure.Storage.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.WindowsAzure.Storage.xml
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Microsoft.WindowsAzure.Storage.xml
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Newtonsoft.Json.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Newtonsoft.Json.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Newtonsoft.Json.xml
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/Newtonsoft.Json.xml
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/System.Spatial.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/System.Spatial.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/System.Spatial.xml
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/System.Spatial.xml
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/de/Microsoft.Data.Edm.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/de/Microsoft.Data.Edm.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/de/Microsoft.Data.OData.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/de/Microsoft.Data.OData.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/de/System.Spatial.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/de/System.Spatial.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/es/Microsoft.Data.Edm.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/es/Microsoft.Data.Edm.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/es/Microsoft.Data.OData.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/es/Microsoft.Data.OData.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/es/System.Spatial.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/es/System.Spatial.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/fr/Microsoft.Data.Edm.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/fr/Microsoft.Data.Edm.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/fr/Microsoft.Data.OData.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/fr/Microsoft.Data.OData.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/fr/System.Spatial.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/fr/System.Spatial.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/it/Microsoft.Data.Edm.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/it/Microsoft.Data.Edm.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/it/Microsoft.Data.OData.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/it/Microsoft.Data.OData.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/it/System.Spatial.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/it/System.Spatial.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ja/Microsoft.Data.Edm.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ja/Microsoft.Data.Edm.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ja/Microsoft.Data.OData.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ja/Microsoft.Data.OData.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ja/System.Spatial.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ja/System.Spatial.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ko/Microsoft.Data.Edm.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ko/Microsoft.Data.Edm.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ko/Microsoft.Data.OData.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ko/Microsoft.Data.OData.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ko/System.Spatial.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ko/System.Spatial.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ru/Microsoft.Data.Edm.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ru/Microsoft.Data.Edm.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ru/Microsoft.Data.OData.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ru/Microsoft.Data.OData.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ru/System.Spatial.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/ru/System.Spatial.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/zh-Hans/Microsoft.Data.Edm.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/zh-Hans/Microsoft.Data.Edm.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/zh-Hans/Microsoft.Data.OData.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/zh-Hans/Microsoft.Data.OData.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/zh-Hans/System.Spatial.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/zh-Hans/System.Spatial.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/zh-Hant/Microsoft.Data.Edm.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/zh-Hant/Microsoft.Data.Edm.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/zh-Hant/Microsoft.Data.OData.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/zh-Hant/Microsoft.Data.OData.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/zh-Hant/System.Spatial.resources.dll
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/EventHubDashBoardApp/zh-Hant/System.Spatial.resources.dll
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/Predicting Sentiments of Tweets using Azure Machine Learning Studio.docx
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/Predicting Sentiments of Tweets using Azure Machine Learning Studio.docx
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/readme.md
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/readme.md
+# Building a Real-time Sentiment Pipeline for Live Tweets using Python, R, & Azure
+
+## Requirements
+* Twitter Account + Twitter App setup (https://apps.twitter.com/)
+* Anaconda 3.5 or Python 3.5 Installed
+* Azure subscription or free trial account
+ * [30 day free trial](https://azure.microsoft.com/en-us/pricing/free-trial/)
+ * Azure Machine Learning Studio workspace
+* Text Editor, I'll be using Sublime Text 3
+* Github.com account (to receive code)
+* PowerBI.com account (for Dashboard portion)
+* .NET up to date + windows (for testing portion)
+
+## Cloning the Repo for Code & Materials
+```
+git clone https://www.github.com/datasciencedojo/meetup.git
+```
+Folder: Building a Real-time Sentiment Pipeline for Live Tweets using Python, R, & Azure
+
+## The Predictive Model
+
+### Supervised Twitter Dataset
+* Azure ML Reader Module:
+ * Data source: Azure Blob Storage
+ * Authentication type: PublicOrSAS
+ * URI: http://azuremlsampleexperiments.blob.core.windows.net/datasets/Sentiment140.tenPercent.sample.tweets.tsv
+ * File format: TSV
+ * URI has header row: Checked
+* Import and save dataset
+
+### Preprocessing & Cleaning
+* Azure ML Metadata Editor: Cast categorical sentiment_label
+* Azure ML Group Categorical Values: Casting '0' as Negative, '4' as positive
+
+### Text Processing
+* Filtering using R
+ * Removing stop words (Stop words list)
+ * Removing special characters
+ * Replace numbers
+ * Globally conform to lower case
+ * Stemming and lemmatization
+ [Example of Cleansing Stop Words](http://demos.datasciencedojo.com/demo/stopwords/)
+* Create a term frequency matrix for English words
+ * Azure ML's [Feature Hashing Module](https://msdn.microsoft.com/library/azure/c9a82660-2d9c-411d-8122-4d9e0b3ce92a)
+* Drop the tweet_text column, since it is no longer needed
+ * Azure ML's Project Columns module
+* Feature Selection & Filtering
+ * Pick only the most X relevant columns/words to train on. 
+ * Using Azure ML's [Filter based Selection](https://msdn.microsoft.com/library/azure/818b356b-045c-412b-aa12-94a1d2dad90f) module, set to Pearson's correlation to select the top 5000 most correlated columns
+* Normalize the Term Frequency Matrix
+ * Text processing best practice, but does not matter too much for Tweets
+ * Normalize Data Module: Min/Max for all numeric columns
+
+### Algorithm Selection
+* [Algorithm Cheat Sheet](https://azure.microsoft.com/en-us/documentation/articles/machine-learning-algorithm-cheat-sheet/)
+* [Beginer's Guide to Choosing Algorithms](https://azure.microsoft.com/en-us/documentation/articles/machine-learning-algorithm-choice/)
+* [Azure ML's Support Vector Machines](https://msdn.microsoft.com/en-us/library/azure/dn905835.aspx)
+* [Support Vector Machines in General](https://en.wikipedia.org/wiki/Support_vector_machine)
+
+### Model Building
+* Train the model
+* Score the trained model against a validation set
+* Evaluate the performance, maximaxing accuracy in this case
+
+### Twitter App
+* [Creating a Twitter Account] (https://www.hashtags.org/platforms/twitter/how-to-create-a-twitter-account/)
+* [Creating a Twitter App](http://www.ning.com/help/?p=4955)
+* Get your [Twitter app's](https://apps.twitter.com/) OAuth keys and tokens.
+
+### Twitter API with Python
+* [Twitter API for all languages](https://dev.twitter.com/overview/api/twitter-libraries)
+* [Tweepy Python Package](https://github.com/tweepy/tweepy)
+* [Streaming with Tweepy](http://tweepy.readthedocs.org/en/v3.2.0/streaming_how_to.html?highlight=stream)
+
+### Azure Event Hub
+* Create an Service Bus Namespace
+* Create an Azure Event Hub
+ * Create a send key (to push data to)
+ * Create a manage key (stream processor)
+ * Create a listen key (to subscribe to)
+* [Pushing to Azure Event Hub](http://azure-sdk-for-python.readthedocs.org/en/latest/servicebus.html)
+* [Viewing inside of an Azure Event Hub](https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/)
+
+Deploy the Model
+Hook up Stream Processors
\ No newline at end of file
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/stream_tweets.py
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/stream_tweets.py
+import tweepy
+# import json
+
+# my keys
+consumer_token = ''
+consumer_secret = ''
+key = ''
+secret = ''
+
+auth = tweepy.OAuthHandler(consumer_token, consumer_secret)
+auth.set_access_token(key, secret)
+api = tweepy.API(auth)
+api.verify_credentials()
+
+
+class MyStreamListener(tweepy.StreamListener):
+
+ def on_status(self, status):
+ print(status.text)
+
+ def on_data(self, twitter_data):
+ print(twitter_data)
+ # tweetJSON = json.loads(twitter_data)
+ # print(tweetJSON['text'].encode("utf-8"))
+
+
+myStreamListener = MyStreamListener()
+myStream = tweepy.Stream(auth=api.auth, listener=MyStreamListener())
+
+myStream.sample(async=False, languages=['en'])
--- a/Building Real-Time Sentiment Pipeline for Live Tweets/twitter_sample_data.json
+++ b/Building Real-Time Sentiment Pipeline for Live Tweets/twitter_sample_data.json
--- a/Business Data Analysis with Excel/BusinessDataAnalysis.pdf
+++ b/Business Data Analysis with Excel/BusinessDataAnalysis.pdf
--- a/Business Data Analysis with Excel/BusinessDataAnalysis.xlsx
+++ b/Business Data Analysis with Excel/BusinessDataAnalysis.xlsx
--- a/Business Data Analysis with Excel/README.md
+++ b/Business Data Analysis with Excel/README.md
+# Intro to Business Data Analysis with Excel
+
+GitHub Repository for the 03/08/2017 Meetup titled "[Business Data Analysis with Excel](https://www.meetup.com/data-science-dojo/events/236198327/)".
+
+These materials make extensive use of the examples documented in the book "[Making Sense of Data](https://www.amazon.com/Making-Sense-Data-Donald-Wheeler/dp/0945320728/)" by Donald J. Wheeler. This book is highly recommended to all Data/Business Analysts interested in expanding the rigor of their analyses.
+
--- a/Datasets/Data Exploration Visulization and feature engineering using R.pdf
+++ b/Datasets/Data Exploration Visulization and feature engineering using R.pdf
--- a/Datasets/README.md
+++ b/Datasets/README.md
+# datasets
+A public repo of datasets
--- a/Datasets/WorldDBTables/CityTable.csv
+++ b/Datasets/WorldDBTables/CityTable.csv
--- a/Datasets/WorldDBTables/CountryTable.csv
+++ b/Datasets/WorldDBTables/CountryTable.csv
--- a/Datasets/WorldDBTables/LanguageTable.csv
+++ b/Datasets/WorldDBTables/LanguageTable.csv
--- a/Datasets/rmarkdownd_template.Rmd
+++ b/Datasets/rmarkdownd_template.Rmd
+---
+title: "Work and fun in Data Science Dojo"
+author: your name
+date: 
+output:
+ pdf_document:
+ toc: true
+---
+
+[linked phrase](http://datasciencedojo.com/)
+
+# My story of Titanic tragedy
+
+## Obtain the data
+<!-- You may want to load data here -->
+
+## Overview of the data
+<!-- You may want to do the preliminary exploration of the data, using str(), summary(), head(), class(), etc. -->
+<!-- Also write down your feelings of the data -->
+
+## Modification of the original data
+<!-- You can revise the data you got. -->
+<!-- For example: if you feel the feature Survived should better to be a factor, you can do something like: titanic$Survived = factor(titanic$Survived, labels=c("died", "survived")) -->
+
+## First plot of Titanic data
+<!-- Make your first plot of Titanic data, and write down what you see from the plot. -->
+<!-- Feel free to revise the headers to make this storybook nicer. -->
+
+## Second plot of Titanic data
+<!-- Make the 2nd, 3rd, 4th plots from here. Doesn't need to be a lot, but try to make every single one telling. -->
+
+## Your summary of the Titanic data (story of Titanic tragedy)
+* First...
+* Second...
+* Third...
+* Fourth...
+
+# Another course in Data Science Dojo
+<!-- Keep adding your note, code and thoughts during the bootcamp! -->
+
+# Another course in Data Science Dojo
+
+# Important contacts in DSD bootcamp
+* Raja Iqbal (Instructor) 
+ [email protected]
+ 
+* Jasmine Wilkerson (Instructor) 
+ [email protected]
+
+* Phuc Duong (Instructor) 
+ [email protected]
+
+* Yuhui Zhang (Instructor) 
+ [email protected]
+
+* Lisa Nicholson 
+ [email protected]
+
--- a/Datasets/slideSourceFile.Rmd
+++ b/Datasets/slideSourceFile.Rmd
--- a/Datasets/slides.html
+++ b/Datasets/slides.html
--- a/Datasets/titanic.csv
+++ b/Datasets/titanic.csv
--- a/Datasets/wine.rda
+++ b/Datasets/wine.rda
--- a/Intro to R Visualizations in Microsoft Power BI/CustomerData.R
+++ b/Intro to R Visualizations in Microsoft Power BI/CustomerData.R
+#=======================================================================================
+#
+# File: CustomerQuery.R
+# Author: Dave Langer
+# Description: This code illustrates querying a SQL Server database via the RODBC 
+# package for the "Introduction to R Visualization with Power BI " Meetup 
+# dated 03/15/2017. More details on the Meetup are available at:
+#
+# https://www.meetup.com/Data-Science-Dojo-Toronto/events/237952698/
+#
+# The code in this file leverages data from Microsoft's Wide World
+# Importers sample database available at:
+#
+# https://github.com/Microsoft/sql-server-samples/releases/tag/wide-world-importers-v1.0
+#
+# NOTE - This file is provided "As-Is" and no warranty regardings its contents are
+# offered nor implied. USE AT YOUR OWN RISK!
+#
+#=======================================================================================
+
+
+# Uncomment and run these lines of code to install required packages
+#install.packages("RODBC")
+
+library(RODBC)
+
+# Open connection using Windows ODBC DSN
+dbhandle <- odbcConnect("RConnection")
+
+# Query database for a denormalized view of [Fact][Sale] data
+dataset <- sqlQuery(dbhandle, 
+ "SELECT [C].[CustomerID]
+ ,[C].[CustomerName]
+ ,[C].[BuyingGroupID]
+ ,[C].[DeliveryMethodID]
+ ,[C].[DeliveryCityID]
+ ,[C].[DeliveryAddressLine1]
+ ,[C].[DeliveryAddressLine2]
+ ,[CITY].[CityName]
+ ,[P].[StateProvinceCode]
+ ,[C].[DeliveryPostalCode]
+ ,[CC].[CustomerCategoryName]
+ ,[BG].[BuyingGroupName]
+ ,[O].[OrderID]
+ ,[O].[OrderDate]
+ ,[OL].[OrderLineID]
+ ,[OL].[Quantity]
+ ,[OL].[UnitPrice]
+ ,[OL].[Quantity] * [OL].[UnitPrice] AS [LineTotal]
+ ,[SC].[SupplierCategoryName]
+ FROM [WideWorldImporters].[Sales].[Customers] C
+ INNER JOIN [WideWorldImporters].[Sales].[CustomerCategories] CC ON ([C].[CustomerCategoryID] = [CC].[CustomerCategoryID])
+ LEFT OUTER JOIN [WideWorldImporters].[Sales].[BuyingGroups] BG ON ([C].[BuyingGroupID] = [BG].[BuyingGroupID])
+ INNER JOIN [WideWorldImporters].[Sales].[Orders] O ON ([C].[CustomerID] = [O].[CustomerID])
+ INNER JOIN [WideWorldImporters].[Sales].[OrderLines] OL ON ([O].[OrderID] = [OL].[OrderID])
+ INNER JOIN [WideWorldImporters].[Warehouse].[StockItems] SI ON ([OL].[StockItemID] = [SI].[StockItemID])
+ INNER JOIN [WideWorldImporters].[Purchasing].[Suppliers] S ON ([SI].[SupplierID] = [S].[SupplierID])
+ INNER JOIN [WideWorldImporters].[Purchasing].[SupplierCategories] SC ON ([S].[SupplierCategoryID] = [SC].[SupplierCategoryID])
+ INNER JOIN [WideWorldImporters].[Application].[Cities] CITY ON ([C].[DeliveryCityID] = [CITY].[CityID])
+ INNER JOIN [WideWorldImporters].[Application].[StateProvinces] P ON ([CITY].[StateProvinceID] = [P].[StateProvinceID])",
+ stringsAsFactors = FALSE)
+
+#Close DB connection
+odbcClose(dbhandle)
+
+
+# Save off data frame in .RData binary format
+save(dataset, file = "CustomerData.RData")
+
+
+
--- a/Intro to R Visualizations in Microsoft Power BI/CustomerData.RData
+++ b/Intro to R Visualizations in Microsoft Power BI/CustomerData.RData
--- a/Intro to R Visualizations in Microsoft Power BI/CustomerData.sql
+++ b/Intro to R Visualizations in Microsoft Power BI/CustomerData.sql
--- a/Intro to R Visualizations in Microsoft Power BI/CustomerDataAnalysis.pbix
+++ b/Intro to R Visualizations in Microsoft Power BI/CustomerDataAnalysis.pbix
--- a/Intro to R Visualizations in Microsoft Power BI/CustomerVisualizations.R
+++ b/Intro to R Visualizations in Microsoft Power BI/CustomerVisualizations.R
+#=======================================================================================
+#
+# File: CustomerVisualizations.R
+# Author: Dave Langer
+# Description: This code illustrates R visualizaions used in the "Introduction to R 
+# Visualization with Power BI " Meetup dated 03/15/2017. More details on 
+# the Meetup are available at:
+#
+# https://www.meetup.com/Data-Science-Dojo-Toronto/events/237952698/
+#
+# The code in this file leverages data from Microsoft's Wide World
+# Importers sample Data Warehouse available at:
+#
+# https://github.com/Microsoft/sql-server-samples/releases/tag/wide-world-importers-v1.0
+#
+# NOTE - This file is provided "As-Is" and no warranty regardings its contents are
+# offered nor implied. USE AT YOUR OWN RISK!
+#
+#=======================================================================================
+
+
+# Uncomment and run these lines of code to install required packages
+#install.packages("dplyr")
+#install.packages("lubridate")
+#install.packages("ggplot2")
+#install.packages("scales")
+#install.packages("qcc")
+
+
+# NOTE - Change your working directory as needed
+load("CustomerData.RData")
+
+
+# Preprocessing to make dataset look like Power BI
+library(dplyr)
+library(lubridate)
+dataset <- dataset %>% 
+ mutate(Year = year(dataset$OrderDate),
+ Month = month(dataset$OrderDate, label = TRUE))
+
+
+#=============================================================================
+#
+# Visualization #1 - Aggregaed dynamic bar charts by Customer Category
+#
+#=============================================================================
+
+library(dplyr)
+library(ggplot2)
+library(scales)
+
+
+# Get total revenue by Buying Group, Supplier Category and Customer Catetory
+customer.categories <- dataset %>%
+ group_by(BuyingGroupName, SupplierCategoryName, CustomerCategoryName) %>%
+ summarize(TotalRevenue = sum(LineTotal))
+
+# Aggregate data across all supplier categories
+all.suppliers <- dataset %>%
+ group_by(BuyingGroupName, CustomerCategoryName) %>%
+ summarize(TotalRevenue = sum(LineTotal))
+all.suppliers$SupplierCategoryName <- "All Suppliers"
+
+# Add aggregated data
+customer.categories <- rbind(customer.categories,
+ all.suppliers)
+
+
+# Format visualization title string dynamically
+title.str.1 <- paste("Total Revenue for",
+ dataset$Year[1],
+ "by Buying Group and Supplier/Customer Categories for",
+ nrow(dataset),
+ "Rows of Data",
+ sep = " ")
+
+
+# Plot 
+ggplot(customer.categories, aes(x = CustomerCategoryName, y = TotalRevenue, fill = BuyingGroupName)) +
+ theme_bw() +
+ coord_flip() +
+ facet_grid(BuyingGroupName ~ SupplierCategoryName) +
+ geom_bar(stat = "identity") +
+ scale_y_continuous(labels = comma) +
+ theme(text = element_text(size = 18),
+ axis.text.x = element_text(size = 12, angle=90, hjust=1)) +
+ labs(x = "Customer Category",
+ y = "Total Revenue",
+ fill = "Buying Group",
+ title = title.str.1)
+
+
+
+
+
+
+
+#=============================================================================
+#
+# Visualization #2 - Aggregated Process Behavior Charts
+#
+#=============================================================================
+
+
+# Add artificial filtering for example
+dataset <- dataset %>%
+ filter(is.na(BuyingGroupName) & 
+ (Year == 2013 | Year == 2014))
+
+
+# Power BI code starts here
+library(dplyr)
+library(qcc)
+
+# Grab year variables
+Year1 <- min(dataset$Year)
+Year2 <- max(dataset$Year)
+
+# Accumulate totals
+totals <- dataset %>%
+ filter(Year == Year1| Year == Year2 ) %>%
+ mutate(Month = substr(Month, 1, 3),
+ MonthNum = match(Month, month.abb)) %>%
+ group_by(Year, MonthNum, Month) %>%
+ summarize(TotalRevenue = sum(LineTotal)) %>%
+ mutate(Label = paste(Month, Year, sep = "-")) %>%
+ arrange(Year, MonthNum)
+
+# Make labels pretty with dummy vars
+Revenue.Group.1 <- totals$TotalRevenue[1:12]
+Revenue.Group.2 <- totals$TotalRevenue[13:24]
+
+title.str <- paste("Process Behavior Chart - ", Year1, " and ", Year2, " ",
+ dataset$CustomerCategoryName[1], " Total Revenue for Buying Group '",
+ dataset$BuyingGroupName[1], "'", sep = "")
+
+# Plot
+blank.super.qcc <- qcc(Revenue.Group.1, type = "xbar.one",
+ newdata = Revenue.Group.2,
+ labels = totals$Label[1:12], 
+ newlabels = totals$Label[13:24],
+ title = title.str,
+ ylab = "Total Revenue", xlab = "Month-Year")
--- a/Intro to R Visualizations in Microsoft Power BI/README.md
+++ b/Intro to R Visualizations in Microsoft Power BI/README.md
+# Introduction to R Visualizations in Microsoft Power BI
+
+GitHub Repository for the 03/15/2017 and 04/05/2017 Meetups titled "Introduction to R Visualizations in Microsoft Power BI". First held in [Toronto](https://www.meetup.com/Data-Science-Dojo-Toronto/events/237952698/) and subsequently in [Redmond](https://www.meetup.com/data-science-dojo/events/237941790/).
+
+These materials make extensive use of Microsoft's [Wide World Importers](https://github.com/Microsoft/sql-server-samples/releases/tag/wide-world-importers-v1.0) SQL Server 2016 sample database.
+
+Additionally, the following are required to use the files for the Meetup:
+
+* [Power BI Desktop](https://www.microsoft.com/en-us/download/details.aspx?id=45331)
+* [The R programming language](https://cran.rstudio.com/)
+* The [dplyr](https://cran.r-project.org/web/packages/dplyr/index.html), [lubridate](https://cran.r-project.org/web/packages/lubridate/index.html), [ggplot2](https://cran.r-project.org/web/packages/ggplot2/index.html), [scales](https://cran.r-project.org/web/packages/scales/index.html), and [qcc](https://cran.r-project.org/web/packages/qcc/index.html) packages.
+
+While not required, [RStudio](https://www.rstudio.com/products/rstudio/download/) is highly recommended.
+
--- a/Intro to R Visualizations in Microsoft Power BI/RVisualizationsPowerBI.pdf
+++ b/Intro to R Visualizations in Microsoft Power BI/RVisualizationsPowerBI.pdf
--- a/Introduction to Machine Learning with R and Caret/IntroToMLWithRAndCaret.pdf
+++ b/Introduction to Machine Learning with R and Caret/IntroToMLWithRAndCaret.pdf
--- a/Introduction to Machine Learning with R and Caret/IntroToMachineLearning.R
+++ b/Introduction to Machine Learning with R and Caret/IntroToMachineLearning.R
+#=======================================================================================
+#
+# File: IntroToMachineLearning.R
+# Author: Dave Langer
+# Description: This code illustrates the usage of the caret package for the An 
+# Introduction to Machine Learning with R and Caret" Meetup dated 
+# 06/07/2017. More details on the Meetup are available at:
+#
+# https://www.meetup.com/data-science-dojo/events/239730653/
+#
+# NOTE - This file is provided "As-Is" and no warranty regardings its contents are
+# offered nor implied. USE AT YOUR OWN RISK!
+#
+#=======================================================================================
+
+#install.packages(c("e1071", "caret", "doSNOW", "ipred", "xgboost"))
+library(caret)
+library(doSNOW)
+
+
+
+#=================================================================
+# Load Data
+#=================================================================
+
+train <- read.csv("train.csv", stringsAsFactors = FALSE)
+View(train)
+
+
+
+
+#=================================================================
+# Data Wrangling
+#=================================================================
+
+# Replace missing embarked values with mode.
+table(train$Embarked)
+train$Embarked[train$Embarked == ""] <- "S"
+
+
+# Add a feature for tracking missing ages.
+summary(train$Age)
+train$MissingAge <- ifelse(is.na(train$Age),
+ "Y", "N")
+
+
+# Add a feature for family size.
+train$FamilySize <- 1 + train$SibSp + train$Parch
+
+
+# Set up factors.
+train$Survived <- as.factor(train$Survived)
+train$Pclass <- as.factor(train$Pclass)
+train$Sex <- as.factor(train$Sex)
+train$Embarked <- as.factor(train$Embarked)
+train$MissingAge <- as.factor(train$MissingAge)
+
+
+# Subset data to features we wish to keep/use.
+features <- c("Survived", "Pclass", "Sex", "Age", "SibSp",
+ "Parch", "Fare", "Embarked", "MissingAge",
+ "FamilySize")
+train <- train[, features]
+str(train)
+
+
+
+
+#=================================================================
+# Impute Missing Ages
+#=================================================================
+
+# Caret supports a number of mechanism for imputing (i.e., 
+# predicting) missing values. Leverage bagged decision trees
+# to impute missing values for the Age feature.
+
+# First, transform all feature to dummy variables.
+dummy.vars <- dummyVars(~ ., data = train[, -1])
+train.dummy <- predict(dummy.vars, train[, -1])
+View(train.dummy)
+
+# Now, impute!
+pre.process <- preProcess(train.dummy, method = "bagImpute")
+imputed.data <- predict(pre.process, train.dummy)
+View(imputed.data)
+
+train$Age <- imputed.data[, 6]
+View(train)
+
+
+
+#=================================================================
+# Split Data
+#=================================================================
+
+# Use caret to create a 70/30% split of the training data,
+# keeping the proportions of the Survived class label the
+# same across splits.
+set.seed(54321)
+indexes <- createDataPartition(train$Survived,
+ times = 1,
+ p = 0.7,
+ list = FALSE)
+titanic.train <- train[indexes,]
+titanic.test <- train[-indexes,]
+
+
+# Examine the proportions of the Survived class lable across
+# the datasets.
+prop.table(table(train$Survived))
+prop.table(table(titanic.train$Survived))
+prop.table(table(titanic.test$Survived))
+
+
+
+
+#=================================================================
+# Train Model
+#=================================================================
+
+# Set up caret to perform 10-fold cross validation repeated 3 
+# times and to use a grid search for optimal model hyperparamter
+# values.
+train.control <- trainControl(method = "repeatedcv",
+ number = 10,
+ repeats = 3,
+ search = "grid")
+
+
+# Leverage a grid search of hyperparameters for xgboost. See 
+# the following presentation for more information:
+# https://www.slideshare.net/odsc/owen-zhangopen-sourcetoolsanddscompetitions1
+tune.grid <- expand.grid(eta = c(0.05, 0.075, 0.1),
+ nrounds = c(50, 75, 100),
+ max_depth = 6:8,
+ min_child_weight = c(2.0, 2.25, 2.5),
+ colsample_bytree = c(0.3, 0.4, 0.5),
+ gamma = 0,
+ subsample = 1)
+View(tune.grid)
+
+
+# Use the doSNOW package to enable caret to train in parallel.
+# While there are many package options in this space, doSNOW
+# has the advantage of working on both Windows and Mac OS X.
+#
+# Create a socket cluster using 10 processes. 
+#
+# NOTE - Tune this number based on the number of cores/threads 
+# available on your machine!!!
+#
+cl <- makeCluster(10, type = "SOCK")
+
+# Register cluster so that caret will know to train in parallel.
+registerDoSNOW(cl)
+
+# Train the xgboost model using 10-fold CV repeated 3 times 
+# and a hyperparameter grid search to train the optimal model.
+caret.cv <- train(Survived ~ ., 
+ data = titanic.train,
+ method = "xgbTree",
+ tuneGrid = tune.grid,
+ trControl = train.control)
+stopCluster(cl)
+
+
+# Examine caret's processing results
+caret.cv
+
+
+# Make predictions on the test set using a xgboost model 
+# trained on all 625 rows of the training set using the 
+# found optimal hyperparameter values.
+preds <- predict(caret.cv, titanic.test)
+
+
+# Use caret's confusionMatrix() function to estimate the 
+# effectiveness of this model on unseen, new data.
+confusionMatrix(preds, titanic.test$Survived)
--- a/Introduction to Machine Learning with R and Caret/README.md
+++ b/Introduction to Machine Learning with R and Caret/README.md
+# An Introduction to Machine Learning with R and caret
+
+GitHub Repository for the 06/07/2017 Meetup titled "An Introduction to Machine Learning with R and caret". First held in [Redmond, WA](https://www.meetup.com/data-science-dojo/events/239730653/).
+
+These materials make use of the data from Kaggle's [Titanic: Machine Learning from Disaster](https://www.kaggle.com/c/titanic) competition.
+
+Additionally, the following are required to use the files for the Meetup:
+
+* [The R programming language](https://cran.rstudio.com/)
+* While not required, [RStudio](https://www.rstudio.com/products/rstudio/download/) is highly recommended.
+* The [e1071](https://cran.r-project.org/web/packages/e1071/index.html), [caret](https://cran.r-project.org/web/packages/caret/index.html), [doSNOW](https://cran.r-project.org/web/packages/doSNOW/index.html), [ipred](https://cran.r-project.org/web/packages/ipred/index.html), and [xgboost](https://cran.r-project.org/web/packages/xgboost/index.html) packages.
+
+
--- a/Introduction to Machine Learning with R and Caret/train.csv
+++ b/Introduction to Machine Learning with R and Caret/train.csv
--- a/Introduction to Random Forest/CodeForPrez.R
+++ b/Introduction to Random Forest/CodeForPrez.R
+#Set working directory
+
+setwd("C:\\Users\\user4\\Documents\\Mithun")
+
+
+
+# Create new image and Save it
+
+save.image("./randomforest4.Rdata")
+
+# Install packages( if necessary, install using 'Packages' tab)
+install.packages("randomForest")
+install.packages("caret")
+install.packages("rpart")
+
+# clear all the variables
+#rm(list=ls())
+
+train=read.csv("./data//train.csv")
+test=read.csv("./data//test.csv")
+head(test)
+# Add "Survived" column to test, to help combine with train data
+test$Survived=NA
+
+# Combine train and test
+combi=rbind(train,test)
+
+# Convert names to character
+combi$Name<-as.character(combi$Name)
+
+# Split 'Name' to isolate a person's title using strsplit
+strsplit(combi$Name[1],split='[,.]')
+
+# test of how strsplit works
+strsplit(combi$Name[1],split='[,.]')[[1]]
+strsplit(combi$Name[1],split='[,.]')[[1]][2]
+
+# apply function to dataset
+# This will isolate title for all rows
+
+combi$Title <- sapply(combi$Name, FUN=function(x){strsplit(x,split='[,.]')[[1]][2]})
+
+# remove empty spaces from the 'Title' field
+combi$Title=gsub(' ','',combi$Title)
+
+# Review contents of 'Title' field
+table(combi$Title)
+
+# Reduce 'Title' contenst into fewer categories
+combi$Title[combi$Title %in% c('Mme', 'Mlle')] <- 'Mlle'
+combi$Title[combi$Title %in% c('Capt', 'Don', 'Major', 'Sir')] <- 'Sir'
+combi$Title[combi$Title %in% c('Dona', 'Lady', 'the Countess', 'Jonkheer')] <- 'Lady'
+
+# Change Title to a factor
+combi$Title <- factor(combi$Title)
+
+# Combine sibling and parent/child variables into FamilySize variable
+combi$FamilySize <- combi$SibSp + combi$Parch + 1
+
+# Identifying families by combining last name and family size
+# # identify surname
+combi$Surname <- sapply(combi$Name, FUN=function(x) {strsplit(x, split='[,.]')[[1]][1]})
+# # combine with family size
+combi$FamilyID <- paste(as.character(combi$FamilySize), combi$Surname, sep="")
+# Categorize family size less than 2 as small
+combi$FamilyID[combi$FamilySize <= 2] <- 'Small'
+# Review results
+table(combi$FamilyID)
+# Further consolidate results( some families may have different last names)
+famIDs <- data.frame(table(combi$FamilyID))
+famIDs <- famIDs[famIDs$Freq <= 2,]
+combi$FamilyID[combi$FamilyID %in% famIDs$Var1] <- 'Small'
+combi$FamilyID <- factor(combi$FamilyID)
+
+# Splitting this new dataset back into train and test datasets
+train <- combi[1:891,]
+test <- combi[892:1309,]
+
+# PRESENTATION START
+# PRESENTATION START
+
+# The "Age" variable has a few missing values
+# To use the randomForest package in R, there should be no missing values
+# A quick way of dealing with missing values is to replace them with either the mean or the median of the non-missing values for the variable
+# In this example, we are replacing the missing values with a prediction, using decision trees.
+library(rpart)
+Agefit <- rpart(Age ~ Pclass + Sex + SibSp + Parch + Fare + Embarked + Title + FamilySize,data=combi[!is.na(combi$Age),], method="anova")
+combi$Age[is.na(combi$Age)] <- predict(Agefit, combi[is.na(combi$Age),])
+
+# Check for other missing variables
+## 'Embarked' has two blank variables
+## They are identified using the "which" command
+which(combi$Embarked == '')
+## rows 62 and 830 have the blank values for Embarked
+## they are replaced with the mode of all the values for Embarked, which is 'S'
+combi$Embarked[c(62,830)] = "S"
+
+## convert 'Embarked' to a factor
+combi$Embarked <- factor(combi$Embarked)
+
+
+##'Fare' has one NA value
+which(is.na(combi$Fare))
+#Replace with median
+combi$Fare[1044] <- median(combi$Fare, na.rm=TRUE)
+
+## All missing values are taken care of now
+
+# Random Forests in R can only digest factors with up to 32 levels
+# If any factor variable has more than 32 levels, the levels need to be redefined to be <= 32 or the variable needs to be converted into a continuous one
+# This example will redefine the levels
+str(combi$FamilyID)
+
+
+##increase the definition of Small from 2 to 3
+combi$FamilyID2 <- combi$FamilyID
+combi$FamilyID2 <- as.character(combi$FamilyID2)
+combi$FamilyID2[combi$FamilySize <= 3] <- 'Small'
+combi$FamilyID2 <- factor(combi$FamilyID2)
+
+# split the dataset into training and test
+train <- combi[1:891,]
+test <- combi[892:1309,]
+
+
+# installing the package
+#install.packages('randomForest')
+library(randomForest)
+
+
+# to ensure reproducible results, use the set.seed function
+# this will give you the same results everytime you run the code
+# the number inside is not important
+
+set.seed(415)
+fit <- randomForest(as.factor(Survived) ~ Pclass + Sex + Age + SibSp + Parch + Fare + Embarked + Title + FamilySize +FamilyID2, data=train, importance=TRUE, ntree=2000)
+## 'importance=TRUE' allows us to inspect variable importance
+## ntree enables specifying how many trees we want to grow
+
+### ALSO LOOK AT NODESIZE AND SAMPSIZE TO SIMPPLIFY TREE, IN ORDER TO REDUCE COMPLEXITY
+
+
+# Look at what variables are important
+varImpPlot(fit)
+## define accuracy and gini
+
+### MeanDecreaseAccuracy : Tells us how much the accuracy decreases without the variable on the Y-axis.
+### 'Title' causes the most decrease and is therefore the most predictive in nature
+
+### MeanDecreseGini: Measure how pure terminal nodes are. 
+### Again the plot tests results after removing each variable, for decrease in Gini value.
+### Variable with the highest value has the highest predictive power
+
+### "Title" variable is top for both measures
+
+
+
+# Performance Evaluation
+
+## Confusion Matrix
+## The 'fit' object contains several components
+
+names(fit)
+
+## to review the confusion matrix
+
+fit[5]
+fit$confusion
+
+## Confidence Interval forAccuracy
+set.seed(121)
+library(caret)
+confusionMatrix(fit$predicted,train$Survived)
+
+## Area under the curve
+###roc(train$Survived,as.integer(fit$predicted),plot = TRUE,smooth=TRUE)
+
+
+# Tuning the Model
+
+
+
+
+# creating a new data.frame to contain just the predictors necessary and not all the columns
+# in the original training dataset
+train1=data.frame(Pclass = train$Pclass,Survived =train$Survived, Sex=train$Sex,Age=train$Age,SibSp=train$SibSp,Parch=train$Parch,
+ Fare =train$Fare, Embarked=train$Embarked,Title=train$Title,FamilySize=train$FamilySize,FamilyID2=train$FamilyID2)
+
+
+# tune to get best value of mtry
+set.seed(121)
+tunefit=train(as.factor(Survived)~ ., data=train1,method="rf",metric="Accuracy",tuneGrid=data.frame(mtry=c(2,3,4)))
+tunefit
+
+
+# Prediction
+prediction=predict(tunefit, newdata=test)
+head(prediction)
+
+save.image("./randomforest4.Rdata")
+
--- a/Introduction to Random Forest/Random Forests.pptx
+++ b/Introduction to Random Forest/Random Forests.pptx
--- a/Introduction to Random Forest/data/test.csv
+++ b/Introduction to Random Forest/data/test.csv
--- a/Introduction to Random Forest/data/train.csv
+++ b/Introduction to Random Forest/data/train.csv
--- a/Introduction to Web Scraping with R/#1 Web Scraping in R.R
+++ b/Introduction to Web Scraping with R/#1 Web Scraping in R.R
+# install.packages("rvest")
+library(rvest)
+library(stringr)
+
+#################################################################################
+# ingress
+#################################################################################
+# scrape date, now
+now <- Sys.time()
+
+# url to scrape, then download page
+url <- "https://www.newegg.com/Desktop-Graphics-Cards/SubCategory/ID-48"
+webpage <- read_html(url)
+
+
+
+#################################################################################
+# parsing elements
+#################################################################################
+
+############
+# feature: card name
+############
+card_name <- webpage %>% html_nodes(".item-title") %>% html_text()
+
+################
+# feature: current price
+################
+cur_price <- webpage %>% html_nodes(".price-current strong") %>% html_text()
+
+################
+# feature: brand
+################
+brand <- webpage %>% html_nodes(".item-brand img") %>% html_attr("title")
+
+################
+# feature: shipping
+################
+shipping <- webpage %>% html_nodes(".price-ship") %>% html_text(trim=TRUE)
+shipping <- str_replace_all(string = shipping, pattern = " Shipping", replacement = "")
+
+
+
+#################################################################################
+# data binding
+#################################################################################
+graphics_cards <- as.data.frame(card_name)
+graphics_cards$scrape_date <- now
+graphics_cards$cur_price <- cur_price
+graphics_cards$brand <- brand
+graphics_cards$shipping <- shipping
+
+
+
+#################################################################################
+# egress
+#################################################################################
+
+# change this to your own working folder
+setwd("C:/Users/Phuc H Duong/Downloads/newegg")
+
+# write file out as a csv
+write.csv(
+ x = graphics_cards,
+ file = "graphics_card_report.csv",
+ row.names = FALSE
+)
+
--- a/Introduction to Web Scraping with R/.gitignore
+++ b/Introduction to Web Scraping with R/.gitignore
+# History files
+.Rhistory
+.Rapp.history
+
+# Session Data files
+.RData
+
+# Example code in package build process
+*-Ex.R
+
+# Output files from R CMD build
+/*.tar.gz
+
+# Output files from R CMD check
+/*.Rcheck/
+
+# RStudio files
+.Rproj.user/
+
+# produced vignettes
+vignettes/*.html
+vignettes/*.pdf
+
+# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
+.httr-oauth
+
+# knitr and R markdown default cache directories
+/*_cache/
+/cache/
+
+# Temporary files created by R markdown
+*.utf8.md
+*.knit.md
--- a/Introduction to Web Scraping with R/LICENSE
+++ b/Introduction to Web Scraping with R/LICENSE
--- a/Introduction to Web Scraping with R/README.md
+++ b/Introduction to Web Scraping with R/README.md
+# web_scraping_r
+Web scraping in R
--- a/Introduction to Web Scraping with R/master_web_scrape_script.R
+++ b/Introduction to Web Scraping with R/master_web_scrape_script.R
+# install.packages("rvest")
+library(rvest)
+library(stringr)
+
+#################################################################################
+# ingress
+#################################################################################
+# scrape date, now
+now <- Sys.time()
+
+# url to scrape, then download page
+url <- "https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38"
+webpage <- read_html(url)
+
+#################################################################################
+# web scraping
+#################################################################################
+
+############
+# feature: card name
+############
+card_name <- webpage %>% html_nodes(".item-title") %>% html_text()
+
+################
+# feature: current price
+################
+cur_price <- webpage %>% html_nodes(".price-current strong") %>% html_text()
+
+################
+# feature: original price
+################
+org_price <- webpage %>% html_nodes(".price-was") %>% html_text(trim=TRUE)
+
+# substring search for price, using regular expression.
+needle <- "\\d{1,}\\.\\d{1,}"
+indexes <- str_locate(string = org_price, pattern = needle)
+indexes <- as.data.frame(indexes)
+org_price <- str_sub(string=org_price, start = indexes$start, end = indexes$end)
+
+################
+# feature: rating
+################
+# problem: not every graphics card has a rating
+# solution: build a table of product id and ratings
+# then join with the main table by the same product id
+
+# product id
+rate.pid <- webpage %>% html_nodes(".item-rating") %>% html_attr("href")
+# format: <url><"Item='><pid><'$'><stuff>
+rate.pid.split <- str_split_fixed(rate.pid, pattern = "Item=", n=2)
+ # result: [1] [2]
+ # <url> <pid><'$'><stuff>
+rate.pid.split <- str_split_fixed(rate.pid.split[,2], pattern="&", n=2)
+ # result: [1] [2]
+ # <pid> <stuff>
+rate.pid <- rate.pid.split[,1]
+
+# rating
+rating <- webpage %>% html_nodes(".item-rating") %>% html_attr("title")
+ # result: <string><+\s><rating>
+rating <- str_split_fixed(string = rating, pattern="\\+\\s", n = 2)[,2]
+ # result: [1] [2]
+ # <string\s> <rating>
+rating_df <- as.data.frame(cbind(rate.pid, rating))
+
+# combine
+
+
+#################################################################################
+# data binding
+#################################################################################
+graphics_cards <- as.data.frame(card_name)
+graphics_cards$scrape_date <- now
+graphics_cards$cur_price <- current_price
+graphics_cards$org_price <- org_price
+graphics_cards$rating <- rating
+
+#######################
+# feature: sales price
+#######################
+# logic: sales price - current price = sales discount
+# pseudo code: replace NA of org price, with the current price
+# query org missing prices <- query cur prices of org missing prices
+na.org_price <- is.na(graphics_cards$org_price)
+graphics_cards[na.org_price,"org_price"] <- graphics_cards[na.org_price,"cur_price"]
+
+# cast into numeric
+graphics_cards$org_price <- as.numeric(graphics_cards$org_price)
+graphics_cards$cur_price <- as.numeric(graphics_cards$cur_price)
+
+# sales price - current price = sales discount
+graphics_cards$sales_amt <- graphics_cards$org_price - graphics_cards$cur_price
+
+#######################
+# feature: discount %
+#######################
+# logic: divide sales amount by original price
+graphics_cards$discount <- graphics_cards$sales_amt / graphics_cards$org_price
+
+#######################
+# feature: on_sale
+#######################
+# logic: if discount price as a percentage of the original price is higher than
+# a certain percentage threshold, mark as being on sale
+# key: 0 = not on sale
+# 1 = on sale
+threshold <- 0.03
+graphics_cards$on_sale <- 0
+graphics_cards[graphics_cards$discount > threshold, "on_sale"] <- 1
--- a/Introduction to Web Scraping with R/sample_html/first_page.html
+++ b/Introduction to Web Scraping with R/sample_html/first_page.html
--- a/Introduction to Web Scraping with R/sample_outputs/#1 graphics_card_report.csv
+++ b/Introduction to Web Scraping with R/sample_outputs/#1 graphics_card_report.csv
+"card_name","scrape_date","cur_price","brand","shipping"
+"EVGA GeForce GTX 1050 FTW GAMING ACX 3.0, 02G-P4-6157-KR, 2GB GDDR5, DX12 OSD Support (PXOC)",2017-06-27 08:31:03,"139","EVGA","$3.99"
+"GIGABYTE GeForce GTX 1050 DirectX 12 GV-N1050OC-2GD 2GB 128-Bit GDDR5 PCI Express 3.0 x16 ATX Video Card",2017-06-27 08:31:03,"119","GIGABYTE","$3.99"
+"GIGABYTE GeForce GTX 1050 Ti DirectX 12 GV-N105TWF2OC-4GD 4GB 128-Bit GDDR5 PCI Express 3.0 x16 ATX Video Card",2017-06-27 08:31:03,"159","GIGABYTE","$4.99"
+"GIGABYTE GeForce GTX 1050 Ti DirectX 12 GV-N105TD5-4GD 4GB 128-Bit GDDR5 PCI Express 3.0 x16 ATX Video Cards",2017-06-27 08:31:03,"139","GIGABYTE","$4.99"
+"EVGA GeForce GTX 1080 Ti SC2 HYBRID GAMING, 11G-P4-6598-KR, 11GB GDDR5X, HYBRID & LED, iCX Technology - 9 Thermal Sensors",2017-06-27 08:31:03,"809","EVGA","$4.99"
+"MSI GeForce GTX 1050 DirectX 12 GTX 1050 2G OC 2GB 128-Bit GDDR5 PCI Express 3.0 x16 HDCP Ready ATX Video Card",2017-06-27 08:31:03,"103","MSI","$4.99"
+"GIGABYTE Radeon RX 460 WINDFORCE OC 2GB GV-RX460WF2OC-2GD",2017-06-27 08:31:03,"109","GIGABYTE","$3.99"
+"MSI GeForce GTX 1080 Ti FE DirectX 12 GTX 1080 Ti Founders Edition 11GB 352-Bit GDDR5X PCI Express 3.0 x16 HDCP Ready SLI Support Video Card",2017-06-27 08:31:03,"699","MSI","$5.92"
+"SAPPHIRE Radeon RX 460 DirectX 12 100409-2GOC-2L 2GB 128-Bit GDDR5 PCI Express 3.0 CrossFireX Support Video Cards",2017-06-27 08:31:03,"99","Sapphire Tech","$3.99"
+"SAPPHIRE Radeon RX 560 DirectX 12 100413P2GOCL 2GB 128-Bit GDDR5 Video Card",2017-06-27 08:31:03,"109","Sapphire Tech","$3.99"
+"GIGABYTE Radeon RX 550 DirectX 12 GV-RX550D5-2GD 2GB 128-Bit GDDR5 PCI Express 3.0 x16 ATX Video Card",2017-06-27 08:31:03,"84","GIGABYTE","$3.99"
+"VisionTek Radeon RX 560 DirectX 12 900962 2GB 128-Bit GDDR5 PCI Express x16 Video Card",2017-06-27 08:31:03,"119","VisionTek","$3.99"
+"EVGA GeForce 8400 GS DirectX 10 512-P3-1301-KR 512MB 32-Bit DDR3 PCI Express 2.0 x16 HDCP Ready Low Profile Ready Video Card",2017-06-27 08:31:03,"29","EVGA","$3.99"
+"PNY GeForce GTX 1050 Ti DirectX 12 VCGGTX1050T4PB 4GB 128-Bit GDDR5 PCI Express 3.0 x16 HDCP Ready Video Card",2017-06-27 08:31:03,"154","PNY Technologies, Inc.","$4.99"
+"GIGABYTE GeForce GTX 750Ti 4GB WINDFORCE 2X OC EDITION",2017-06-27 08:31:03,"114","GIGABYTE","$3.99"
+"PNY GeForce GT 730 DirectX 12 (feature 11_0) VCGGT7301D5LXPB 1GB 64-Bit GDDR5 PCI Express 2.0 Low Profile Ready Video Card",2017-06-27 08:31:03,"64","PNY Technologies, Inc.","$2.99"
+"PNY GeForce GTX 950 Graphic Card - 1.02 GHz Core - 1.19 GHz Boost Clock - 2 GB GDDR5 - PCI Express 3.0 x16",2017-06-27 08:31:03,"109","PNY Technologies, Inc.","$3.99"
+"EVGA GeForce GT 1030 SC, 02G-P4-6333-KR, 2GB GDDR5, Low Profile",2017-06-27 08:31:03,"79","EVGA","$3.99"
+"XFX Radeon R7 240 R7-240A-2TS2 2GB 128-Bit DDR3 PCI Express 3.0 Video Cards",2017-06-27 08:31:03,"59","XFX","$3.99"
+"GIGABYTE GeForce GTX 1050 OC Low Profile 2GB Video Card",2017-06-27 08:31:03,"119","GIGABYTE","$3.99"
+"EVGA GeForce 8400 GS DirectX 10 01G-P3-1302-LR 1GB 64-Bit DDR3 PCI Express 2.0 x16 HDCP Ready Low Profile Ready Video Card",2017-06-27 08:31:03,"31","EVGA","$2.99"
+"GIGABYTE Ultra Durable 2 GeForce GT 740 DirectX 12 GV-N740D5OC-2GI (rev. 3.0) 2GB 128-Bit GDDR5 PCI Express 3.0 x16 ATX Video Card",2017-06-27 08:31:03,"89","GIGABYTE","$3.99"
+"EVGA GeForce GT 730 DirectX 12 04G-P3-2739-KR 4GB 128-Bit DDR3 PCI Express 2.0 Video Card",2017-06-27 08:31:03,"77","EVGA","$2.99"
+"GIGABYTE Ultra Durable 2 Series GeForce GT 730 DirectX 12 GV-N730-2GI (rev. 1.0) 2GB 128-Bit DDR3 PCI Express 2.0 HDCP Ready ATX Video Card",2017-06-27 08:31:03,"59","GIGABYTE","Free"
+"GIGABYTE GeForce GTX 1050 Ti DirectX 12 GV-N105TG1 GAMING-4GD 4GB 128-Bit GDDR5 PCI Express 3.0 x16 ATX Video Card",2017-06-27 08:31:03,"169","GIGABYTE","$4.99"
+"PNY GeForce GTX 1080 Ti DirectX 12 VCGGTX1080T11PB-CG2 11GB 352-Bit GDDR5X PCI Express 3.0 x16 Video Card",2017-06-27 08:31:03,"699","PNY Technologies, Inc.","$4.99"
+"GIGABYTE Radeon R7 360 DirectX 12 GV-R736OC-2GD (rev. 3.0) 2GB 128-Bit GDDR5 PCI Express 3.0 ATX Video Card",2017-06-27 08:31:03,"93","GIGABYTE","$3.99"
+"EVGA GeForce GTX 1050 SSC GAMING ACX 3.0, 02G-P4-6154-KR, 2GB GDDR5, DX12 OSD Support (PXOC)",2017-06-27 08:31:03,"129","EVGA","$3.99"
+"SAPPHIRE Radeon RX 550 DirectX 12 100414P2GL 2GB 128-Bit GDDR5 Video Card",2017-06-27 08:31:03,"82","Sapphire Tech","$3.99"
+"EVGA GeForce GTX 1050 Ti FTW GAMING ACX 3.0, 04G-P4-6258-KR, 4GB GDDR5, DX12 OSD Support (PXOC)",2017-06-27 08:31:03,"169","EVGA","$3.99"
+"PowerColor RED DRAGON Radeon RX 560 DirectX 12 AXRX 560 2GBD5-DHV2/OC 2GB 128-Bit GDDR5 CrossFireX Support ATX Video Card",2017-06-27 08:31:03,"119","PowerColor","$3.99"
+"EVGA GeForce GTX 1050 Ti GAMING, 04G-P4-6251-KR, 4GB GDDR5, DX12 OSD Support (PXOC)",2017-06-27 08:31:03,"139","EVGA","$4.99"
+"PowerColor RED DRAGON Radeon RX 550 DirectX 12 AXRX 550 2GBD5-DH/OC 2GB 128-Bit GDDR5 PCI Express 3.0 CrossFireX Support ATX Video Card",2017-06-27 08:31:03,"89","PowerColor","$3.99"
+"GIGABYTE GeForce GT 1030 Low Profile 2G",2017-06-27 08:31:03,"69","GIGABYTE","$3.99"
+"EVGA GeForce GTX 1050 Ti SC GAMING, 04G-P4-6253-KR, 4GB GDDR5, DX12 OSD Support (PXOC)",2017-06-27 08:31:03,"140","EVGA","$4.99"
+"PowerColor Radeon R5 230 DirectX 11 AXR5 230 2GBK3-LHE 2GB 64-Bit DDR3 PCI Express 2.1 HDCP Ready CrossFireX Support Low Profile Video Cards",2017-06-27 08:31:03,"34","PowerColor","$3.99"
--- a/R Programming for Excel Users/README.md
+++ b/R Programming for Excel Users/README.md
+# Introduction to R Programming for Excel Users
+
+GitHub Repository for the 05/03/2017 Meetup titled "Introduction to R Programming for Excel Users". First held in [Redmond, WA](https://www.meetup.com/data-science-dojo/events/239049571/).
+
+These materials make extensive use of Kaggle's [Titanic: Machine Learning from Disaster](https://www.kaggle.com/c/titanic) training dataset for data wrangling, analysis, and visualization examples.
+
+Additionally, the following are required to use the files for the Meetup:
+
+* [Microsoft Excel](www.microsoftstore.com/Excel)
+* [The R programming language](https://cran.rstudio.com/)
+* [RStudio](https://www.rstudio.com/products/rstudio/download/)
+* The following R packages: [ggplot2](https://cran.r-project.org/web/packages/ggplot2/index.html) and [dplyr](https://cran.r-project.org/web/packages/dplyr/index.html).
--- a/R Programming for Excel Users/RProgrammingForExcelUsers.pdf
+++ b/R Programming for Excel Users/RProgrammingForExcelUsers.pdf
--- a/R Programming for Excel Users/titanic.R
+++ b/R Programming for Excel Users/titanic.R
+#=========================================================================================
+#
+# File: titanic.R
+# Author: Dave Langer
+# Description: This code illustrates R coding used in the "Introduction to R Programming 
+# for Excel Users" Meetup dated 05/03/2017. More details on 
+# the Meetup are available at:
+#
+# https://www.meetup.com/data-science-dojo/events/239049571/
+#
+# The code in this file leverages data from Kaggle's "Titanic: Machine 
+# Learning from Disaster" introductory competition:
+#
+# https://www.kaggle.com/c/titanic
+#
+# NOTE - This file is provided "As-Is" and no warranty regardings its contents are
+# offered nor implied. USE AT YOUR OWN RISK!
+#
+#=========================================================================================
+
+
+# Load up Titanic data into a R data frame (i.e., R's version of an Excel table)
+titanic <- read.csv("titanic.csv", header = TRUE)
+
+
+# Add a new feature to the data frame for SurvivedLabel
+titanic$SurvivedLabel <- ifelse(titanic$Survived == 1, 
+ "Survived",
+ "Died")
+
+
+# Add a new feature (i.e., column) to the data frame for FamilySize
+titanic$FamilySize <- 1 + titanic$SibSp + titanic$Parch
+View(titanic)
+
+
+# Look at the data types (i.e., R's version of Excel data formatting for cells)
+str(titanic)
+
+
+# Apply a row filter to the Titanic data frame - return only males
+males <- titanic[titanic$Sex == "male",]
+
+
+# Create summary statistics for male fares
+summary(males$Fare)
+var(males$Fare)
+sd(males$Fare)
+sum(males$Fare)
+length(males$Fare)
+
+
+# Ranges work just like in Excel - pick the first 5 rows of data.
+first.five <- titanic[1:5,]
+
+
+# View the first five columns of the first five rows.
+View(first.five[, 1:5])
+
+
+# Use an R package (i.e., the Excel equivalent of an Add-in) to
+# create powerful visualizations easy.
+#install.packages("ggplot2")
+library(ggplot2)
+ggplot(titanic, aes(x = FamilySize, fill = SurvivedLabel)) +
+ theme_bw() +
+ facet_wrap(Sex ~ Pclass) +
+ geom_histogram(binwidth = 1)
+
+
+# Use an R package (i.e., the Excel equivalent of an Add-in) to 
+# make building data pivots easy.
+#install.packages("dplyr")
+library(dplyr)
+pivot <- titanic %>%
+ group_by(Pclass, Sex, SurvivedLabel) %>%
+ summarize(AvgFamilySize = mean(FamilySize),
+ PassengerCount = n()) %>%
+ arrange(Pclass, Sex, SurvivedLabel)
+View(pivot)
+
+
+
--- a/R Programming for Excel Users/titanic.csv
+++ b/R Programming for Excel Users/titanic.csv
--- a/Scale R to Big Data Using Hadoop and Spark/Scaling R to Big Data.pdf
+++ b/Scale R to Big Data Using Hadoop and Spark/Scaling R to Big Data.pdf
--- a/Scale R to Big Data Using Hadoop and Spark/extra/guest_cluster.txt
+++ b/Scale R to Big Data Using Hadoop and Spark/extra/guest_cluster.txt
+Guest Cluster: 
+HalJordonCluster
+
+SSH Endpoint for Edge Node:
+R-Server.HalJordonCluster-ssh.azurehdinsight.net:22
+
+Cluster Login Name:
+admin
+
+Cluster Login Password:
+DojoGuest123$
+
+SSH Username:
+sshguest
+
+SSH Password:
+ThisIsATerriblePassword2
\ No newline at end of file
--- a/Scale R to Big Data Using Hadoop and Spark/media/clustercredentials.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/clustercredentials.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/clustername.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/clustername.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/clustertypeconfig.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/clustertypeconfig.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/connecttostudio.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/connecttostudio.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/createsshtunnel.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/createsshtunnel.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/datastore.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/datastore.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/newcluster.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/newcluster.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/newscriptaction.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/newscriptaction.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/pricingtier.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/pricingtier.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/provisioned.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/provisioned.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/provisioning.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/provisioning.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/rgettingstarted.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/rgettingstarted.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/scriptaction.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/scriptaction.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/searchhdinsight.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/searchhdinsight.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/selectcreate.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/selectcreate.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/selectrserver.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/selectrserver.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/sshendpoint.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/sshendpoint.png
--- a/Scale R to Big Data Using Hadoop and Spark/media/test-r-script.png
+++ b/Scale R to Big Data Using Hadoop and Spark/media/test-r-script.png
--- a/Scale R to Big Data Using Hadoop and Spark/readme.md
+++ b/Scale R to Big Data Using Hadoop and Spark/readme.md