Commit 4a413c8c by Arham Akheel

### Changing folder names on the repo

parent f8c26542
 Data Science Dojo
Copyright (c) 2016 - 2019 --- **Level** Intermediate
**Recommended Use:** Regression Models
**Domain:** Automobiles
## Auto MPG Data Set ### Can you predict the fuel-efficieny of a car? --- ![](tim-mossholder-680992-unsplash.jpg) --- This *intermediate* level data set has 398 rows and 9 columns and provides mileage, horsepower, model year, and other technical specifications for cars. This data set is recommended for learning and practicing your skills in **exploratory data analysis**, **data visualization**, and **regression modelling techniques**. Feel free to explore the data set with multiple **supervised** and **unsupervised** learning techniques. The Following data dictionary gives more details on this data set: --- ### Data Dictionary **Column Position**|**Attribute Name**|**Description** |**Examples** |**Attribute Type** |**Nulls Ratio** |-------------------|------------------|------------------------------------------------------------|---------------------------|---------------------|----------------| | #1 | mpg | fuel efficiency measured in miles per gallon (mpg) | 9.0, 13.0, 41.5 | quantitative | 0% | | #2 | cylinders | number of cylinders in the engine | 3, 4, 8 | qualitative | 0% | | #3 | displacement | engine displacement (in cubic inches) | 68.0, 112.0, 455.0 | quantitative | 0% | | #4 | horsepower | engine horsepower | 46.0, 70.0, 230.0 | quantitative | 2% | | #5 | weight | vehicle weight (in pounds) | 1613, 3615, 5140 | quantitative | 0% | | #6 | acceleration | time to accelerate from O to 60 mph (in seconds) | 8.00, 15.50, 24.80 | quantitative | 0% | | #7 | model year | model year | 73, 79, 82 | qualitative | 0% | | #8 | origin | origin of car (1: American, 2: European, 3: Japanese) | 1, 2, 3 | qualitative | 0% | | #9 | car name | car name | audi fox, subaru | qualitative | 0% | --- ### Acknowledgement This data set has been sourced from the Machine Learning Repository of University of California, Irvine [Auto MPG Data Set (UC Irvine)](https://archive.ics.uci.edu/ml/datasets/auto+mpg). The UCI page mentions [StatLib (Carnegie Mellon University)](http://lib.stat.cmu.edu/datasets/) as the original source of the data set. \ No newline at end of file
This diff is collapsed. Click to expand it.
This diff is collapsed. Click to expand it.
 ## Big Mart Sales Data ### Introduction The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. Also, certain attributes of each product and store have been defined. The aim is to build a predictive model and find out the sales of each product at a particular store. Using this model, BigMart will try to understand the properties of products and stores which play a key role in increasing sales. Please note that the data may have missing values as some stores might not report all the data due to technical glitches. Hence, it will be required to treat them accordingly. ### Data Dictionary Column Position | Atrribute Name | Definition | Data Type | Example | % Null Ratios --- | --- | --- | --- | --- | --- 1 | Item_Identifier | It is a unique product ID assigned to every distinct item. It consists of an alphanumeric string of length 5 | Alphanumeric | FDN15 | 0 2 | Item_Weight | This field includes the wieght of the product | Numeric (float) | 17.5 | 17.16531738 3 | Item_Fat_Content | This attribute is categorical and describes whether the product is low fat or not. There are 2 categories of this attribute: ['Low Fat', 'Regular']. However, it is important to note that 'Low Fat' has also been written as 'low fat' and 'LF' in dataset, whereas, 'Regular' has been referred as 'reg' as well | Alpha | Low Fat | 0 4 | Item_Visibility | This field mentions the percentage of total display area of all products in a store allocated to the particular product | Numeric (float) | 0.01676 | 0 5 | Item_Type | This is a categorical attribute and describes the food category to which the item belongs. There are 16 different categories listed as follows: ['Dairy', 'Soft Drinks', 'Meat', 'Fruits and Vegetables', 'Household', 'Baking Goods', 'Snack Foods', 'Frozen Foods', 'Breakfast', 'Health and Hygiene', 'Hard Drinks', 'Canned', 'Breads', 'Starchy Foods', 'Others', 'Seafood'] | Alpha | Meat | 0 6 | Item_MRP | This is the Maximum Retail Price (list price) of the product | Numeric (float) | 141.618 | 0 7 | Outlet_Identifier | It is a unique store ID assigned. It consists of an alphanumeric string of length 6 | Alphanumeric | OUT049 | 0 8 | Outlet_Establishment_Year | This attribute mentions the year in which store was established | Numeric (Integer) | 1998 | 0 9 | Outlet_Size | The attribute tells the size of the store in terms of ground area covered. It is a categorical value and described in 3 categories: ['High', 'Medium', 'Small'] | Alpha | Medium | 28.27642849 10 | Outlet_Location_Type | This field has categorical data and tells about the size of the city in which the store is located through 3 categories: ['Tier 1', 'Tier 2', 'Tier 3'] | Alpha | Tier 3 | 0 11 | Outlet_Type | This field contains categorical value and tells whether the outlet is just a grocery store or some sort of supermarket. Following are the 4 categories in which the data is divided: ['Supermarket Type1', 'Supermarket Type2', 'Grocery Store','Supermarket Type3'] | Alpha | Supermarket Type2 | 0 12 | Item_Outlet_Sales | This is the outcome variable to be predicted. It contains the sales of the product in the particulat store | Numeric (float) | 2097.27 | 0 ### Source: https://datahack.analyticsvidhya.com/contest/practice-problem-big-mart-sales-iii/ \ No newline at end of file
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
 Data Science Dojo
**Recommended Use:** Classification Models
## Blood Transfusion Service Center Data Set ### Predict if a donor will give blood in March 2007 --- ![](hush-naidoo-1170844-unsplash.jpg) --- This *advanced* level data set has 748 instances and 5 attributes. This data set is recommended for learning and practicing your skills in **exploratory data analysis**, **data visualization**, and **classification modelling techniques**. Feel free to explore the data set with multiple **supervised** and **unsupervised** learning techniques. Following is the information about the Data Set provided in the source: To demonstrate the RFMTC marketing model (a modified version of RFM), this study adopted the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan. The center passes their blood transfusion service bus to one university in Hsin-Chu City to gather blood donated about every three months. To build a FRMTC model, we selected 748 donors at random from the donor database. These 748 donor data, each one included R (Recency - months since last donation), F (Frequency - total number of donation), M (Monetary - total blood donated in c.c.), T (Time - months since first donation), and a binary variable representing whether he/she donated blood in March 2007 (1 stand for donating blood; 0 stands for not donating blood) The Following data dictionary gives more details on this data set: --- ### Data Dictionary | Column Position | Atrribute Name | Definition | Data Type | Example | % Null Ratios | |------------------- |---------------------------------------------- |------------------------------------------------------------------------------------------------------ |-------------- |----------------- |--------------- | | 1 | Recency (months) | Number of months since the particular donor's most recent donation | Quantitative | 5, 0, 2 | 0 | | 2 | Frequency (times) | Total number of donations that the donor has made | Quantitative | 50, 10, 9 | 0 | | 3 | Monetary (c.c. blood) | Total amound of blood that the donor has donated (cubic centimeters) | Quantitative | 4000, 2750, 500 | 0 | | 4 | Time (months) | Number of months since the donor's first donation | Quantitative | 16, 58, 69 | 0 | | 5 | whether he/she donated blood in March 2007 | This is a binary variable which represents whether the donor donated blood in March 2007: (1, 0) | Quantitative | 1, 0 | 0 | ### Acknowledgement This data set has been sourced from the Machine Learning Repository of University of California, Irvine [Blood Transfusion Service Center Data Set (UC Irvine)](https://archive.ics.uci.edu/ml/datasets/Blood+Transfusion+Service+Center). The UCI page mentions Blood Transfusion Service Center, Hsin-Chu City, Taiwan as the original source of the data set. \ No newline at end of file

433 KB

This diff is collapsed. Click to expand it.
This diff is collapsed. Click to expand it.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed. Click to expand it.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!