README.md 6.64 KB
Newer Older
Rahim Rasool committed
1 2 3
# Data Sets to Uplift your Skills 


Usman Shahid committed
4
+ Data Science Dojo has added more than 43 data sets to this repository. 
Rahim Rasool committed
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
+ The repository carries a diverse range of themes, difficulty levels, sizes and attributes. 
+ They offer hands-on practice to boost their skills in exploratory data analysis, data visualization, data wrangling and machine learning.
+ The data sets below have been sorted with increasing level of difficulty for convenience (Beginner, Intermediate, Advanced).

![](21.jpg)

##### In order to fork this repository, click on the link to the guide [How to fork a project](https://docs.gitlab.com/ee/gitlab-basics/fork-project.html) on GitLab.

---
### Beginner:

[**Find out the age of Abalone from physical measurements**](Abalone)<br/>
Regression Models | Environment

[**Predict student's knowledge level**](User Knowledge Modeling)<br/>
Classification/Clustering | Education/Web

[**Can you predict the price of a house?**](Real Estate Valuation)<br/>
Regression Models | Real Estate

[**Can you estimate location from WIFI Signal Strength**](Wireless Indoor Localization)<br/>
Classification Models | Mobile/Location

[**Predict acceptability of a car**](Car Evaluation)<br/>
Classification Models | Automobile

[**Predict seminal quality of an individual**](Fertility)<br/>
Regression/Classification Models | Healthcare/Life

[**Estimate chance of bankruptcy from qualitative parameters by experts**](Qualitative Bankruptcy)<br/>
Classification Models | Finance/Banking
Usman Shahid committed
36 37 38 39

[**Understand driving patterns of Birmingham with respect to time and date**](Birmingham Parking Dataset)<br/>
Regression/Classification Models | Transport and Mobility

Usman Shahid committed
40
[**Explore the effect of time, date and weather on traffic volume on a US Interstate**](https://code.datasciencedojo.com/datasciencedojo/datasets/tree/patch-1/Interstate-94%20(I-94)%20Traffic%20Volume%20Dataset)<br/>
Usman Shahid committed
41 42 43 44 45
Regression Models | Transport and Mobility

[**Explore patterns in drug abuse between cities, age groups and racial groups**](Accidental Drug Related Deaths in Connecticut, US)<br/>
Classification Models | Healthcare/Social Sciences

Rahim Rasool committed
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
---
### Intermediate:

[**Can you predict the fuel-efficiency of a car?**](Auto MPG)<br/>
Regression Models | Automobiles

[**Was that chest pain an indicator of a heart disease**](Heart Disease)<br/>
Classification Models | Health Sciences

[**Predict total number of demand of orders**](Daily Demand Forecasting Orders)<br/>
Regression Models | Business

[**Find out if a donor will give blood in March 2007**](Blood Transfusion Service Center)<br/>
Classification Models | Business

[**Forecast pollution level of a city**](Beijing PM2.5)<br/>
Regression Models | Environment

[**Will the patient survive for at least one year after a heart attack**](Echocardiogram)<br/>
Classification Models | Automobiles

[**Estimate compressive strength of concrete**](Concrete Compressive Strength)<br/>
Regression Models | Civil Engineering/Construction

[**Discover patterns relating liver disorder and alcohol consumption**](Liver Disorders)<br/>
Classification/Regression/Clustering Models | Healthcare

[**Predict which stock will provide greatest rate of return**](Dow Jones Index)<br/>
Clustering/Regression/Classification Models | Business/Finance

[**Assess heating and cooling load requirements of building**](Energy Efficiency)<br/>
Regression/Classification Models | Energy

[**Determine the type of glass using oxide content**](Glass Identification)<br/>
Classification Models | Physical

[**Predict chance of survival**](Hepatitis)<br/>
Classification Models | Healthcare

[**Find patterns from spending data at wholesale**](Wholesale Customers)<br/>
Classification/Clustering | Business/Retail

[**Group similar travel reviews**](Travel Reviews)<br/>
Clustering/Classification Models | Domain: Web

[**Relate returns of Istanbul Stock Exchange with other international indices**](Istanbul Stock Exchange)<br/>
Regression/Classification Models | Business/Finance

[**Predict bike rental count (hourly/daily) based on the environmental & seasonal settings**](Bike Sharing)<br/>
Regression Models | Social

[**Detect Room Occupancy through Light, Temperature, Humidity and CO2 sensors**](Occupancy Detection)<br/>
Classification Models | Energy/Buildings

[**Estimate whether a person’s income exceeds $50K/year**](Census Income)<br/>
Classification Models | Social/Government

Usman Shahid committed
103 104 105 106 107 108 109 110 111 112 113 114
[**Predict the condition of a patients liver from their bloodwork**](https://code.datasciencedojo.com/datasciencedojo/datasets/tree/patch-1/Hepatitis%20C%20Virus%20(HCV)%20Classification%20Dataset)<br/>
Classification Models | Healthcare

[**Predict future poverty trends in EU Countries**](EU Population Poverty Status Dataset)<br/>
Regression Models | Social/Government

[**Predict the spread of Tuberculosis across the US**](US Tuberculosis Dataset)<br/>
Regression Models | Healthcare

[**Determine if smoking, invasive birth control methods and a history of STDs can lead to Cervical Cancer**](Risk Factors for Cervical Cancer)<br/>
Classification Models | Healthcare

Rahim Rasool committed
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
---
### Advanced:

[**Detect Autistic Spectrum Disorder (ASD) cases**](Autism Screening Adult)<br/>
Classification Models | Healthcare/Social Sciences

[**Estimate the probability of Default**](Default of Credit Card Clients)<br/>
Classification Models | Business/Finance

[**Predict if a note is genuine**](Banknote Authentication)<br/>
Classification Models | Banking/Finance

[**Find a short term forecast on electricity consumption of a single home**](Individual Household Electric Power Consumption)<br/>
Regression/Clustering Models | Electricity

[**Predict the number of shares on social networks**](Online News Popularity)<br/>
Regression/Classification Models | Business/Web

Rebecca Merrett committed
133 134
[**Analyze the text or sentiment of products on Amazon, or recommend products**](Amazon Product Reviews)<br/>
Text Analytics/Sentiment Analysis/Recommender Systems
Rebecca Merrett committed
135

Usman Shahid committed
136 137 138 139 140
[**Explore predictive modelling and numerical forecasting techniques**](Portugal 2019 Election Dataset)<br/>
Regression Models | Social Sciences/Government

[**Explore changes in brain activity in humans in the presence and absence of a visual stimulus**](EEG Eye State Dataset)<br/>
Classification Models | Neuroscience/Healthcare
Usman Shahid committed
141

Usman Shahid committed
142 143
[**Explore patterns in brain activity based on multiple visual and non-visual stimuli**](EEG Steady State Evoked Potential Dataset)<br/>
Classification Models | Neuroscience/Healthcare
Usman Shahid committed
144

Rahim Rasool committed
145 146 147 148 149 150 151 152 153 154 155 156 157
---
### Queries:

**Can I use these datasets for my project?**<br/>
Sure! You're totally free to do so.

**Can I add a dataset here**<br/>
Send us a pull request and we'll discuss

**There seems to be a problem here.**<br/>
If you find an issue, kindly raise it using help of this [link](https://docs.gitlab.com/ee/user/project/issues/create_new_issue.html)