Name Last update
Abalone
Accidental Drug Related Deaths in Connecticut, US
Amazon Product Reviews
Autism Screening Adult
Auto MPG
Banknote Authentication
Beijing PM2.5
Bike Sharing
Birmingham Parking Dataset
Blood Transfusion Service Center
Breast Cancer Wisconsin
Car Evaluation
Census Income
Concrete Compressive Strength
Coronavirus
Daily Demand Forecasting Orders
Default of Credit Card Clients
Dow Jones Index
EEG Eye State Dataset
EEG Steady State Evoked Potential Dataset
EU Population Poverty Status Dataset
Echocardiogram
Energy Efficiency
Fertility
Glass Identification
Heart Disease
Hepatitis
Hepatitis C Virus (HCV) Classification Dataset
Individual Household Electric Power Consumption
Interstate-94 (I-94) Traffic Volume Dataset
Istanbul Stock Exchange
Liver Disorders
Occupancy Detection
Online News Popularity
Portugal 2019 Election Dataset
Qualitative Bankruptcy
Real Estate Valuation
Risk Factors for Cervical Cancer
Travel Reviews
US Tuberculosis Dataset
User Knowledge Modeling
Wholesale Customers
Wireless Indoor Localization
21.jpg
README.md

Data Sets to Uplift your Skills

  • Data Science Dojo has added more than 43 data sets to this repository.
  • The repository carries a diverse range of themes, difficulty levels, sizes and attributes.
  • They offer hands-on practice to boost their skills in exploratory data analysis, data visualization, data wrangling and machine learning.
  • The data sets below have been sorted with increasing level of difficulty for convenience (Beginner, Intermediate, Advanced).

In order to fork this repository, click on the link to the guide How to fork a project on GitLab.

Beginner:

Find out the age of Abalone from physical measurements
Regression Models | Environment

Predict student's knowledge level
Classification/Clustering | Education/Web

Can you predict the price of a house?
Regression Models | Real Estate

Can you estimate location from WIFI Signal Strength
Classification Models | Mobile/Location

Predict acceptability of a car
Classification Models | Automobile

Predict seminal quality of an individual
Regression/Classification Models | Healthcare/Life

Estimate chance of bankruptcy from qualitative parameters by experts

Classification Models | Finance/Banking

Intermediate:

Can you predict the fuel-efficiency of a car?
Regression Models | Automobiles

Was that chest pain an indicator of a heart disease
Classification Models | Health Sciences

Predict total number of demand of orders
Regression Models | Business

Find out if a donor will give blood in March 2007
Classification Models | Business

Forecast pollution level of a city
Regression Models | Environment

Will the patient survive for at least one year after a heart attack
Classification Models | Automobiles

Estimate compressive strength of concrete
Regression Models | Civil Engineering/Construction

Discover patterns relating liver disorder and alcohol consumption
Classification/Regression/Clustering Models | Healthcare

Predict which stock will provide greatest rate of return
Clustering/Regression/Classification Models | Business/Finance

Assess heating and cooling load requirements of building
Regression/Classification Models | Energy

Determine the type of glass using oxide content
Classification Models | Physical

Predict chance of survival
Classification Models | Healthcare

Find patterns from spending data at wholesale
Classification/Clustering | Business/Retail

Group similar travel reviews
Clustering/Classification Models | Domain: Web

Relate returns of Istanbul Stock Exchange with other international indices
Regression/Classification Models | Business/Finance

Predict bike rental count (hourly/daily) based on the environmental & seasonal settings
Regression Models | Social

Detect Room Occupancy through Light, Temperature, Humidity and CO2 sensors
Classification Models | Energy/Buildings

Estimate whether a person’s income exceeds $50K/year
Classification Models | Social/Government


Advanced:

Detect Autistic Spectrum Disorder (ASD) cases
Classification Models | Healthcare/Social Sciences

Estimate the probability of Default
Classification Models | Business/Finance

Predict if a note is genuine
Classification Models | Banking/Finance

Find a short term forecast on electricity consumption of a single home
Regression/Clustering Models | Electricity

Predict the number of shares on social networks
Regression/Classification Models | Business/Web

Analyze the text or sentiment of products on Amazon, or recommend products
Text Analytics/Sentiment Analysis/Recommender Systems


Queries:

Can I use these datasets for my project?
Sure! You're totally free to do so.

Can I add a dataset here
Send us a pull request and we'll discuss

There seems to be a problem here.
If you find an issue, kindly raise it using help of this link