Commit 42b5ec2e by Tarun Shrivas

heart disease folder updated

parent 543b7e4d
Data Science Dojo <br/>
Copyright (c) 2016 - 2019
---
**Level** Intermediate <br/>
**Recommended Use:** Regression Models<br/>
**Domain:** Automobiles<br/>
## Auto MPG Data Set
### Can you predict the fuel-efficieny of a car?
---
![](tim-mossholder-680992-unsplash.jpg)
---
This *intermediate* level data set has 398 rows and 9 columns and provides mileage, horsepower, model year, and other technical specifications for cars. This data set is recommended for learning and practicing your skills in **exploratory data analysis**, **data visualization**, and **regression modelling techniques**. Feel free to explore the data set with multiple **supervised** and **unsupervised** learning techniques. The Following data dictionary gives more details on this data set:
---
### Data Dictionary
**Column Position**|**Attribute Name**|**Description** |**Examples** |**Attribute Type** |**Nulls Ratio**
|-------------------|------------------|------------------------------------------------------------|---------------------------|---------------------|----------------|
| #1 | mpg | fuel efficiency measured in miles per gallon (mpg) | 9.0, 13.0, 41.5 | quantitative | 0% |
| #2 | cylinders | number of cylinders in the engine | 3, 4, 8 | qualitative | 0% |
| #3 | displacement | engine displacement (in cubic inches) | 68.0, 112.0, 455.0 | quantitative | 0% |
| #4 | horsepower | engine horsepower | 46.0, 70.0, 230.0 | quantitative | 2% |
| #5 | weight | vehicle weight (in pounds) | 1613, 3615, 5140 | quantitative | 0% |
| #6 | acceleration | time to accelerate from O to 60 mph (in seconds) | 8.00, 15.50, 24.80 | quantitative | 0% |
| #7 | model year | model year | 73, 79, 82 | qualitative | 0% |
| #8 | origin | origin of car (1: American, 2: European, 3: Japanese) | 1, 2, 3 | qualitative | 0% |
| #9 | car name | car name | audi fox, subaru | qualitative | 0% |
---
### Acknowledgement
This data set has been sourced from the Machine Learning Repository of University of California, Irvine [Auto MPG Data Set (UC Irvine)](https://archive.ics.uci.edu/ml/datasets/auto+mpg). The UCI page mentions [StatLib (Carnegie Mellon University)](http://lib.stat.cmu.edu/datasets/) as the original source of the data set.
\ No newline at end of file
## Big Mart Sales Data
### Introduction
The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. Also, certain attributes of each product and store have been defined. The aim is to build a predictive model and find out the sales of each product at a particular store. Using this model, BigMart will try to understand the properties of products and stores which play a key role in increasing sales.
Please note that the data may have missing values as some stores might not report all the data due to technical glitches. Hence, it will be required to treat them accordingly.
### Data Dictionary
Column Position | Atrribute Name | Definition | Data Type | Example | % Null Ratios
--- | --- | --- | --- | --- | ---
1 | Item_Identifier | It is a unique product ID assigned to every distinct item. It consists of an alphanumeric string of length 5 | Alphanumeric | FDN15 | 0
2 | Item_Weight | This field includes the wieght of the product | Numeric (float) | 17.5 | 17.16531738
3 | Item_Fat_Content | This attribute is categorical and describes whether the product is low fat or not. There are 2 categories of this attribute: ['Low Fat', 'Regular']. However, it is important to note that 'Low Fat' has also been written as 'low fat' and 'LF' in dataset, whereas, 'Regular' has been referred as 'reg' as well | Alpha | Low Fat | 0
4 | Item_Visibility | This field mentions the percentage of total display area of all products in a store allocated to the particular product | Numeric (float) | 0.01676 | 0
5 | Item_Type | This is a categorical attribute and describes the food category to which the item belongs. There are 16 different categories listed as follows: ['Dairy', 'Soft Drinks', 'Meat', 'Fruits and Vegetables', 'Household', 'Baking Goods', 'Snack Foods', 'Frozen Foods', 'Breakfast', 'Health and Hygiene', 'Hard Drinks', 'Canned', 'Breads', 'Starchy Foods', 'Others', 'Seafood'] | Alpha | Meat | 0
6 | Item_MRP | This is the Maximum Retail Price (list price) of the product | Numeric (float) | 141.618 | 0
7 | Outlet_Identifier | It is a unique store ID assigned. It consists of an alphanumeric string of length 6 | Alphanumeric | OUT049 | 0
8 | Outlet_Establishment_Year | This attribute mentions the year in which store was established | Numeric (Integer) | 1998 | 0
9 | Outlet_Size | The attribute tells the size of the store in terms of ground area covered. It is a categorical value and described in 3 categories: ['High', 'Medium', 'Small'] | Alpha | Medium | 28.27642849
10 | Outlet_Location_Type | This field has categorical data and tells about the size of the city in which the store is located through 3 categories: ['Tier 1', 'Tier 2', 'Tier 3'] | Alpha | Tier 3 | 0
11 | Outlet_Type | This field contains categorical value and tells whether the outlet is just a grocery store or some sort of supermarket. Following are the 4 categories in which the data is divided: ['Supermarket Type1', 'Supermarket Type2', 'Grocery Store','Supermarket Type3'] | Alpha | Supermarket Type2 | 0
12 | Item_Outlet_Sales | This is the outcome variable to be predicted. It contains the sales of the product in the particulat store | Numeric (float) | 2097.27 | 0
### Source:
https://datahack.analyticsvidhya.com/contest/practice-problem-big-mart-sales-iii/
\ No newline at end of file
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
Data Science Dojo <br/>
Copyright (c) 2016 - 2019
---
**Level** Advanced <br/>
**Recommended Use:** Classification Models<br/>
**Domain:** Business<br/>
## Blood Transfusion Service Center Data Set
### Predict if a donor will give blood in March 2007
---
![](hush-naidoo-1170844-unsplash.jpg)
---
This *advanced* level data set has 748 instances and 5 attributes.
This data set is recommended for learning and practicing your skills in **exploratory data analysis**, **data visualization**, and **classification modelling techniques**.
Feel free to explore the data set with multiple **supervised** and **unsupervised** learning techniques.
Following is the information about the Data Set provided in the source:
To demonstrate the RFMTC marketing model (a modified version of RFM), this study adopted the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan. The center passes their blood transfusion service bus to one university in Hsin-Chu City to gather blood donated about every three months. To
build a FRMTC model, we selected 748 donors at random from the donor database. These 748 donor data, each one included R (Recency - months since last donation), F (Frequency - total number of donation), M (Monetary - total blood
donated in c.c.), T (Time - months since first donation), and a binary variable representing whether he/she donated blood in March 2007 (1 stand for donating blood; 0 stands for not donating blood)
The Following data dictionary gives more details on this data set:
---
### Data Dictionary
| Column Position | Atrribute Name | Definition | Data Type | Example | % Null Ratios |
|------------------- |---------------------------------------------- |------------------------------------------------------------------------------------------------------ |-------------- |----------------- |--------------- |
| 1 | Recency (months) | Number of months since the particular donor's most recent donation | Quantitative | 5, 0, 2 | 0 |
| 2 | Frequency (times) | Total number of donations that the donor has made | Quantitative | 50, 10, 9 | 0 |
| 3 | Monetary (c.c. blood) | Total amound of blood that the donor has donated (cubic centimeters) | Quantitative | 4000, 2750, 500 | 0 |
| 4 | Time (months) | Number of months since the donor's first donation | Quantitative | 16, 58, 69 | 0 |
| 5 | whether he/she donated blood in March 2007 | This is a binary variable which represents whether the donor donated blood in March 2007: (1, 0) | Quantitative | 1, 0 | 0 |
### Acknowledgement
This data set has been sourced from the Machine Learning Repository of University of California, Irvine [Blood Transfusion Service Center Data Set (UC Irvine)](https://archive.ics.uci.edu/ml/datasets/Blood+Transfusion+Service+Center). The UCI page mentions Blood Transfusion Service Center, Hsin-Chu City, Taiwan as the original source of the data set.
\ No newline at end of file
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
Data Science Dojo <br/>
Copyright (c) 2016 - 2019
---
**Level** Intermediate <br/>
**Recommended Use:** Regression Models<br/>
**Domain:** Real Estate<br/>
## Real Estate Valuation Data Set
### Can you predict the price of a house?
---
![](310.jpg)
---
This *intermediate* level data set has 414 rows and 7 columns.
It provides the market historical data set of real estate valuations which are collected from Sindian Dist., New Taipei City, Taiwan.
This data set is recommended for learning and practicing your skills in **exploratory data analysis**, **data visualization**, and **regression modelling techniques**.
Feel free to explore the data set with multiple **supervised** and **unsupervised** learning techniques.
The Following data dictionary gives more details on this data set:
---
### Data Dictionary
| Column Position | Atrribute Name | Definition | Data Type | Example | % Null Ratios |
|------------------- |---------------------------------------- |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |-------------- |--------------------------------- |--------------- |
| 1 | X1 transaction date | The transaction date (for example, 2013.250=2013 March, 2013.500=2013 June, etc.) | Qualitative | 2013.500, 2013.500, 2013.333 | 0 |
| 2 | X2 house age | The house age (unit: year) | Quantitative | 19.5, 13.3, 5.0 | 0 |
| 3 | X3 distance to the nearest MRT station | The distance to the nearest MRT station (unit: meter) | Quantitative | 390.5684, 405.21340, 23.38284 | 0 |
| 4 | X4 number of convenience stores | The number of convenience stores in the living circle on foot | Quantitative | 6, 8, 1 | 0 |
| 5 | X5 latitude | The geographic coordinate, latitude (unit: degree) | Quantitative | 24.97937, 24.97544, 24.94925 | 0 |
| 6 | X6 longtitude | The geographic coordinate, longitude (unit: degree) | Quantitative | 121.54243, 121.49587, 121.51151 | 0 |
| 7 | Y house price of unit area | The house price of unit area (10000 New Taiwan Dollar/Ping, where Ping is a local unit, 1 Ping = 3.3 meter squared) for example, 29.3 = 293,000 New Taiwan Dollar/Ping | Quantitative | 29.3, 33.6, 47.7 | 0 |
### Acknowledgement
This data set has been sourced from the Machine Learning Repository of University of California, Irvine [Real Estate Valuation Data Set (UC Irvine)](https://archive.ics.uci.edu/ml/datasets/Real+estate+valuation+data+set). The UCI page mentions the following as the original source of the data set: Yeh, I. C., & Hsu, T. K. (2018). Building real estate valuation models with comparative approach through case-based reasoning. Applied Soft Computing, 65, 260-271
\ No newline at end of file
Data Science Dojo <br/>
Copyright (c) 2016 - 2019
---
**Level** Intermediate <br/>
**Recommended Use:** Classification/Clustering <br/>
**Domain:** Business/Retail<br/>
## Wholesale Customers Data Set
### Discover patterns from spending data at wholesale
---
![](349.jpg)
---
This *intermediate* level data set has 440 rows and 8 columns.
The data set refers to clients of a wholesale distributor. It includes the annual spending in monetary units (m.u.) on diverse product categories.
This data set is recommended for learning and practicing your skills in **exploratory data analysis**, **data visualization**, **classification modelling** and **clustering**.
Feel free to explore the data set with multiple **supervised** and **unsupervised** learning techniques. The Following data dictionary gives more details on this data set:
---
### Data Dictionary
| Column Position | Atrribute Name | Definition | Data Type | Example | % Null Ratios |
|------------------- |------------------ |------------------------------------------------------------------------------------------------ |-------------- |-------------------- |--------------- |
| 1 | Channel | Customers Channel: Horeca (Hotel/Restaurant/Cafe) or Retail channel (1: Horeca, 2: Retail) | Quantitative | 1, 2 | 0 |
| 2 | Region | Customers Region: Lisnon, Oporto or Other (1: Lisnon, 2: Oporto, 3: Other) | Quantitative | 1, 2, 3 | 0 |
| 3 | Fresh | Annual spending (monetary units) on fresh products | Quantitative | 18291, 1640, 219 | 0 |
| 4 | Milk | Annual spending (m.u.) on milk products | Quantitative | 5139, 3259, 829 | 0 |
| 5 | Grocery | Annual spending (m.u.) on grocery products | Quantitative | 6532, 4042, 3 | 0 |
| 6 | Frozen | Annual spending (m.u.) on frozen products | Quantitative | 10643, 987, 6312 | 0 |
| 7 | Detergents_Paper | Annual spending (m.u.) on detergents and paper products | Quantitative | 12034, 116, 3 | 0 |
| 8 | Delicassen | Annual spending (m.u.) on and delicatessen products | Quantitative | 14472, 772, 120 | 0 |
### Acknowledgement
This data set has been sourced from the Machine Learning Repository of University of California, Irvine [Wholesale Customers Data Set (UC Irvine)](https://archive.ics.uci.edu/ml/datasets/Wholesale+customers).
The UCI page mentions the following as the original source of the data set: Abreu, N. (2011). Analise do perfil do cliente Recheio e desenvolvimento de um sistema promocional. Mestrado em Marketing, ISCTE-IUL, Lisbon
\ No newline at end of file
dojoHub @ 543b7e4d
Subproject commit 543b7e4d69f76937d06343811d2aebfe48c6463d
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment