README.md 4.91 KB
Newer Older
Rahim Rasool committed
1
# Data Sets to Uplift your Skills 
Tarun Shrivas committed
2

Arham Akheel committed
3

Rahim Rasool committed
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137
+ Data Science Dojo has added 30 data sets to this repository. 
+ The repository carries a diverse range of themes, difficulty levels, sizes and attributes. 
+ They offer hands-on practice to boost their skills in exploratory data analysis, data visualization, data wrangling and machine learning.
+ The data sets below have been sorted with increasing level of difficulty for convenience (Beginner, Intermediate, Advanced).

##### In order to fork this repository, click on the link to the guide [How to fork a project](https://docs.gitlab.com/ee/gitlab-basics/fork-project.html) on GitLab.

---
### Beginner:

1) **Find out the age of Abalone from physical measurements**<br/>
Recommended Use: Regression Models<br/>
Domain: Environment

2) **Predict student's knowledge level**<br/>
Recommended Use: Classification/Clustering<br/>
Domain: Education/Web

3) **Can you predict the price of a house?**<br/>
Recommended Use: Regression Models<br/>
Domain: Real Estate

4) **Can you estimate location from WIFI Signal Strength**<br/>
Recommended Use: Classification Models<br/>
Domain: Mobile/Location

5) **Predict acceptability of a car**<br/>
Recommended Use: Classification Models<br/>
Domain: Automobile

6) **Predict seminal quality of an individual**<br/>
Recommended Use: Regression/Classification Models<br/>
Domain: Healthcare/Life

7) **Estimate chance of bankruptcy from qualitative parameters by experts**<br/>
Recommended Use: Classification Models<br/>
Domain: Finance/Banking
---
### Intermediate:

8) **Can you predict the fuel-efficiency of a car?**<br/>
Recommended Use: Regression Models<br/>
Domain: Automobiles

9) **Was that chest pain an indicator of a heart disease**<br/>
Recommended Use: Classification Models<br/>
Domain: Health Sciences

10) **Predict total number of demand of orders**<br/>
Recommended Use: Regression Models<br/>
Domain: Business

11) **Find out if a donor will give blood in March 2007**<br/>
Recommended Use: Classification Models<br/>
Domain: Business

12) **Forecast pollution level of a city**<br/>
Recommended Use: Regression Models<br/>
Domain: Environment

13) **Will the patient survive for at least one year after a heart attack**<br/>
Recommended Use: Classification Models<br/>
Domain: Automobiles

14) **Estimate compressive strength of concrete**<br/>
Recommended Use: Regression Models<br/>
Domain: Civil Engineering/Construction

15) **Discover patterns relating liver disorder and alcohol consumption**<br/>
Recommended Use: Classification/Regression/Clustering Models<br/>
Domain: Healthcare

16) **Predict which stock will provide greatest rate of return**<br/>
Recommended Use: Clustering/Regression/Classification Models<br/>
Domain: Business/Finance

17) **Assess heating and cooling load requirements of building**<br/>
Recommended Use: Regression/Classification Models<br/>
Domain: Energy

18) **Determine the type of glass using oxide content**<br/>
Recommended Use: Classification Models<br/>
Domain: Physical

19) **Predict chance of survival**<br/>
Recommended Use: Classification Models<br/>
Domain: Healthcare

20) **Find patterns from spending data at wholesale**<br/>
Recommended Use: Classification/Clustering<br/>
Domain: Business/Retail

21) **Group similar travel reviews**<br/>
Recommended Use: Clustering/Classification Models<br/>
Domain: Web

22) **Relate returns of Istanbul Stock Exchange with other international indices**<br/>
Recommended Use: Regression/Classification Models<br/>
Domain: Business/Finance

23) **Predict bike rental count (hourly/daily) based on the environmental & seasonal settings**<br/>
Recommended Use: Regression Models<br/>
Domain: Social

24) **Detect Room Occupancy through Light, Temperature, Humidity and CO2 sensors**<br/>
Recommended Use: Classification Models<br/>
Domain: Energy/Buildings

25) **Estimate whether a person’s income exceeds $50K/year**<br/>
Recommended Use: Classification Models<br/>
Domain: Social/Government

---
### Advanced:

26) **Detect Autistic Spectrum Disorder (ASD) cases**<br/>
Recommended Use: Classification Models<br/>
Domain: Healthcare/Social Sciences

27) **Estimate the probability of Default**<br/>
Recommended Use: Classification Models<br/>
Domain: Business/Finance

28) **Predict if a note is genuine**<br/>
Recommended Use: Classification Models<br/>
Domain: Banking/Finance

29) **Find a short term forecast on electricity consumption of a single home**<br/>
Recommended Use: Regression/Clustering Models<br/>
Domain: Electricity

30) **Predict the number of shares on social networks**<br/>
Recommended Use: Regression/Classification Models<br/>
Domain: Business/Web
Tarun Shrivas committed
138

Rahim Rasool committed
139 140 141
---

### FAQ
Tarun Shrivas committed
142

Rahim Rasool committed
143 144
**Can I use these datasets for my project?**<br/>
Sure! You're totally free to do so.
Tarun Shrivas committed
145

Rahim Rasool committed
146 147
**Can i add a dataset here**<br/>
Send us a pull request and we'll discuss
Tarun Shrivas committed
148

Rahim Rasool committed
149 150
**There seems to be a problem here.**<br/>
If you find an issue, kindly raise it using help of this [link](https://docs.gitlab.com/ee/user/project/issues/create_new_issue.html)
Tarun Shrivas committed
151 152