Data Science Dojo
Copyright (c) 2016 - 2019
Level: Intermediate
Recommended Use: Clustering/Regression/Classification Models
Domain: Business/Finance
Dow Jones Index Data Set
Predict which stock will provide greatest rate of return
This intermediate level data set has 750 rows and 16 columns. This dataset contains weekly data for the Dow Jones Industrial Index. It has been used in computational investing research. In this dataset, each record (row) is data for a week. Each record also has the percentage of return that stock has in the following week (percent_change_next_weeks_price). Ideally, this could be used to determine which stock will produce the greatest rate of return in the following week.
This data set is recommended for learning and practicing your skills in exploratory data analysis, data visualization, clustering and regression/classification modelling techniques. Feel free to explore the data set with multiple supervised and unsupervised learning techniques. The Following data dictionary gives more details on this data set:
Data Dictionary
Column Position | Atrribute Name | Definition | Data Type | Example | % Null Ratios |
---|---|---|---|---|---|
1 | quarter | Quarter: the yearly quarter (1: Jan-Mar; 2: Apr-Jun). | Quantitative | 1, 2 | 0 |
2 | stock | Stock: the stock symbol* | Qualitative | INTC, INTC, BA | 0 |
3 | date | Date: the last business day of the work (this is typically a Friday) | Quantitative | 40564, 40683, 40620 | 0 |
4 | open | Open: the price of the stock at the beginning of the week | Quantitative | $21.03, $23.32, $71.17 | 0 |
5 | high | High: the highest price of the stock during the week | Quantitative | $21.2, $23.96, $71.23 | 0 |
6 | low | Low: the lowest price of the stock during the week | Quantitative | $20.62, $23.08, $67.34 | 0 |
7 | close | Close: the price of the stock at the end of the week | Quantitative | $20.82, $23.22, $69.1 | 0 |
8 | volume | Volume: the number of shares of stock that traded hands in the week | Quantitative | 218479469, 387571150, 29746370 | 0 |
9 | percent_change_price | Percent_Change_Price: the percentage change in price throughout the week | Quantitative | -0.998573, -0.428816, -2.90853 | 0 |
10 | percent_change_volume_over_last_wk | Percent_Change_Volume_Over_Last_Week: the percentage change in the number of shares of stock that traded hands for this week compared to the previous week | Quantitative | -20.29526016, 12.41924755, 16.3954667 | 4 |
11 | previous_weeks_volume | Previous_Weeks_Volume: the number of shares of stock that traded hands in the previous week | Quantitative | 274111012, 344755154, 25556296 | 4 |
12 | next_weeks_open | Next_Weeks_Open: the opening price of the stock in the following week | Quantitative | $21.03, $22.92, $70.29 | 0 |
13 | next_weeks_close | Next_Weeks_Close: the closing price of the stock in the following week | Quantitative | $21.46, $22.21, $73.34 | 0 |
14 | percent_change_next_weeks_price | Percent_Change_Next_Weeks_Price: the percentage change in price of the stock in the | Quantitative | 2.0447, -3.09773, 4.33917 | 0 |
15 | days_to_next_dividend | Following Week Days_to_next_dividend: the number of days until the next dividend | Quantitative | 13, 75, 54 | 0 |
16 | percent_return_next_dividend | Percent_Return_Next_Dividend: the percentage of return on the next dividend | Quantitative | 0.864553, 0.904393, 0.607815 | 0 |
*Stock Symbols:
3M MMM
American Express AXP
Alcoa AA
AT&T T
Bank of America BAC
Boeing BA
Caterpillar CAT
Chevron CVX
Cisco Systems CSCO
Coca-Cola KO
DuPont DD
ExxonMobil XOM
General Electric GE
Hewlett-Packard HPQ
The Home Depot HD
Intel INTC
IBM IBM
Johnson & Johnson JNJ
JPMorgan Chase JPM
Kraft KRFT
McDonald's MCD
Merck MRK
Microsoft MSFT
Pfizer PFE
Procter & Gamble PG
Travelers TRV
United Technologies UTX
Verizon VZ
Wal-Mart WMT
Walt Disney DIS
Acknowledgement
This data set has been sourced from the Machine Learning Repository of University of California, Irvine Dow Jones Index Data Set (UC Irvine). The UCI page mentions the following publication as the original source of the data set:
Brown, M. S., Pelosi, M. & Dirska, H. (2013). Dynamic-radius Species-conserving Genetic Algorithm for the Financial Forecasting of Dow Jones Index Stocks. Machine Learning and Data Mining in Pattern Recognition, 7988, 27-41