Commit e051db09 by Rahim Rasool

Added 4 more datasets - total count = 5

parent 81405bc6

128 KB

 Data Science Dojo
Copyright (c) 2016 - 2019 --- **Level:** Beginner
**Recommended Use:** Classification Models
**Domain:** Automobile
## Car Evaluation Data Set ### Predict acceptibility of a car --- ![](1190.jpg) --- This *intermediate* level data set was derived from a decision-making model which was originally developed for research on multi-attribute decision making. Decision making involves selection between seemingly conflicting alternatives. The data set has 1728 rows and 7 columns in which car attributes such as price and technology are described across 6 attributes such as "Buying Price", "Maintenance", and "Safety" etc. There are multiple alternatives under each of the 6 attributes. Car's acceptability, the seventh attribute, is the outcome variable. This data set is recommended for learning and practicing your skills in **classification modelling techniques**. Feel free to explore the data set with multiple classification methods. The Following data dictionary gives more details on this data set: --- ### Data Dictionary | Column Position | Atrribute Name | Definition | Data Type | Example | % Null Ratios | |------------------- |---------------- |------------------------------------------------------------------------------------------- |------------- |------------------ |--------------- | | 1 | buying | Buying price of the car (v-high, high, med, low) | Qualitative | low, med, high | 0 | | 2 | maint | Price of the maintenance of car (v-high, high, med, low) | Qualitative | low, med, high | 0 | | 3 | doors | Number of doors (2, 3, 4, 5-more) | Qualitative | 2, 3, 4 | 0 | | 4 | persons | Capacity in terms of persons to carry (2, 4, more) | Qualitative | 2, 4, more | 0 | | 5 | lug_boot | The size of luggage boot (small, med, big) | Qualitative | small, med, big | 0 | | 6 | safety | Estimated safety of the car (low, med, high) | Qualitative | low, med, high | 0 | | 7 | class | Car acceptability (unacc: unacceptible, acc: acceptible, good: good, v-good: very good) | Qualitative | unacc, acc, good | 0 | --- ### Acknowledgement This data set has been sourced from the Machine Learning Repository of University of California, Irvine [Car Evaluation Data Set (UC Irvine)](https://archive.ics.uci.edu/ml/datasets/Car+Evaluation). The UCI page mentions following as the donor of the dataset: + Marko Bohanec (marko.bohanec '@' ijs.si) + Blaz Zupan (blaz.zupan '@' ijs.si)
This diff is collapsed. Click to expand it.
 Data Science Dojo
Copyright (c) 2016 - 2019 --- **Level:** Intermediate
**Recommended Use:** Regression Models
**Domain:** Real Estate
## Real Estate Valuation Data Set ### Can you predict the price of a house? --- ![](310.jpg) --- This *intermediate* level data set has 414 rows and 7 columns. It provides the market historical data set of real estate valuations which are collected from Sindian Dist., New Taipei City, Taiwan. This data set is recommended for learning and practicing your skills in **exploratory data analysis**, **data visualization**, and **regression modelling techniques**. Feel free to explore the data set with multiple **supervised** and **unsupervised** learning techniques. The Following data dictionary gives more details on this data set: --- ### Data Dictionary | Column Position | Atrribute Name | Definition | Data Type | Example | % Null Ratios | |------------------- |---------------------------------------- |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |-------------- |--------------------------------- |--------------- | | 1 | X1 transaction date | The transaction date (for example, 2013.250=2013 March, 2013.500=2013 June, etc.) | Qualitative | 2013.500, 2013.500, 2013.333 | 0 | | 2 | X2 house age | The house age (unit: year) | Quantitative | 19.5, 13.3, 5.0 | 0 | | 3 | X3 distance to the nearest MRT station | The distance to the nearest MRT station (unit: meter) | Quantitative | 390.5684, 405.21340, 23.38284 | 0 | | 4 | X4 number of convenience stores | The number of convenience stores in the living circle on foot | Quantitative | 6, 8, 1 | 0 | | 5 | X5 latitude | The geographic coordinate, latitude (unit: degree) | Quantitative | 24.97937, 24.97544, 24.94925 | 0 | | 6 | X6 longtitude | The geographic coordinate, longitude (unit: degree) | Quantitative | 121.54243, 121.49587, 121.51151 | 0 | | 7 | Y house price of unit area | The house price of unit area (10000 New Taiwan Dollar/Ping, where Ping is a local unit, 1 Ping = 3.3 meter squared) for example, 29.3 = 293,000 New Taiwan Dollar/Ping | Quantitative | 29.3, 33.6, 47.7 | 0 | ### Acknowledgement This data set has been sourced from the Machine Learning Repository of University of California, Irvine [Real Estate Valuation Data Set (UC Irvine)](https://archive.ics.uci.edu/ml/datasets/Real+estate+valuation+data+set).
The UCI page mentions the following as the original source of the data set:
*Yeh, I. C., & Hsu, T. K. (2018). Building real estate valuation models with comparative approach through case-based reasoning. Applied Soft Computing, 65, 260-271* \ No newline at end of file

123 KB

 Data Science Dojo
Copyright (c) 2016 - 2019 --- **Level:** Intermediate
**Recommended Use:** Classification/Clustering
**Domain:** Education/Web
## User Knowledge Modeling Data Set ### Predict student's knowledge level --- ![](342309-PA9PNI-658.jpg) --- This *intermediate* level data set has 403 rows and 6 columns. The data set has been divided into training and testing (training: 258, testing: 145). It is a real dataset about the students' knowledge status about the subject of Electrical DC Machines. This data set is recommended for learning and practicing your skills in **exploratory data analysis**, **data visualization**, and **classification** and **clustering** techniques. Feel free to explore the data set with multiple **supervised** and **unsupervised** learning techniques. The Following data dictionary gives more details on this data set: --- ### Data Dictionary | Column Position | Atrribute Name | Definition | Data Type | Example | % Null Ratios | |------------------- |---------------- |------------------------------------------------------------------------- |-------------- |------------------------- |--------------- | | 1 | STG | The degree of study time for goal object materials | Quantitative | 0.060, 0.100, 0.080 | 0 | | 2 | SCG | The degree of repetition number of user for goal object materials | Quantitative | 0.000, 0.100, 0.250 | 0 | | 3 | STR | The degree of study time of user for related objects with goal object | Quantitative | 0.10, 0.15, 0.05 | 0 | | 4 | LPR | The exam performance of user for related objects with goal object | Quantitative | 0.98, 0.10, 0.01 | 0 | | 5 | PEG | The exam performance of user for goal objects | Quantitative | 0.66, 0.56, 0.33 | 0 | | 6 | UNS | The knowledge level of user (Very Low, Low, Middle, High) | Qualitative | "High", "Middle", "Low" | 0 | --- ### Acknowledgement This data set has been sourced from the Machine Learning Repository of University of California, Irvine [User Knowledge Modeling Data Set (UC Irvine)](https://archive.ics.uci.edu/ml/datasets/User+Knowledge+Modeling).
The UCI page mentions the following publication as the original source of the data set:
*H. T. Kahraman, Sagiroglu, S., Colak, I., Developing intuitive knowledge classifier and modeling of users' domain dependent data in web, Knowledge Based Systems, vol. 37, pp. 283-295, 2013* \ No newline at end of file
 Data Science Dojo
Copyright (c) 2016 - 2019 --- **Level:** Beginner
**Recommended Use:** Classification Models
**Domain:** Mobile/Location
## Wireless Indoor Localization Data Set ### Predict location from wifi signal strength --- ![](181.jpg) --- This *beginner* level data set has 2000 rows and 8 columns. The data set contains wifi signal strength observed from 7 wifi devices on a smartphone collected in indoor space. The data could be used to estimate the location in one of the four rooms. This data set is recommended for learning and practicing your skills in **exploratory data analysis**, **data visualization**, and **classification modelling techniques**. Feel free to explore the data set with multiple **supervised** and **unsupervised** learning techniques. The Following data dictionary gives more details on this data set: --- ### Data Dictionary | Column Position | Atrribute Name | Definition | Data Type | Example | % Null Ratios | |------------------- |---------------- |-------------------------------------- |-------------- |--------------- |--------------- | | 1 | Wifi 1 | Signal strength of wifi 1 | Quantitative | -42, -67, -50 | 0 | | 2 | Wifi 2 | Signal strength of wifi 2 | Quantitative | -58, -64, -40 | 0 | | 3 | Wifi 3 | Signal strength of wifi 3 | Quantitative | -65, -50, -70 | 0 | | 4 | Wifi 4 | Signal strength of wifi 4 | Quantitative | -63, -59, -37 | 0 | | 5 | Wifi 5 | Signal strength of wifi 5 | Quantitative | -80, -73, -56 | 0 | | 6 | Wifi 6 | Signal strength of wifi 6 | Quantitative | -90, -75, -64 | 0 | | 7 | Wifi 7 | Signal strength of wifi 7 | Quantitative | -91, -65, -83 | 0 | | 8 | Room | One of the four rooms (1, 2, 3, 4) | Quantitative | 1, 2, 3 | 0 | --- ### Acknowledgement This data set has been sourced from the Machine Learning Repository of University of California, Irvine [Wireless Indoor Localization Data Set (UC Irvine)](https://archive.ics.uci.edu/ml/datasets/Wireless+Indoor+Localization). The UCI page mentions the following 2 publications as the original source of the data set: *1. Rajen Bhatt, 'Fuzzy-Rough Approaches for Pattern Classification: Hybrid measures, Mathematical analysis, Feature selection algorithms, Decision tree algorithms, Neural learning, and Applications', Amazon Books*
*2. Jayant G Rohra, Boominathan Perumal, Swathi Jamjala Narayanan, Priya Thakur, and Rajen B Bhatt, 'User Localization in an Indoor Environment Using Fuzzy Hybrid of Particle Swarm Optimization & Gravitational Search Algorithm with Neural Networks', in Proceedings of Sixth International Conference on Soft Computing for Problem Solving,2017, pp. 286-295.*
This diff is collapsed. Click to expand it.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!