README.md 22.1 KB
Newer Older
Rahim Rasool committed
1 2 3 4 5 6 7 8 9 10
Data Science Dojo <br/>
Copyright (c) 2016 - 2019

---

**<span style="color:#E57932">Level:</span>** Intermediate <br/>
**<span style="color:#E57932">Recommended Use:</span>** Classification Models<br/>
**<span style="color:#E57932">Domain:</span>** Social<br/> 


Rahim Rasool committed
11
## Census Income Data Set
Rahim Rasool committed
12

Rahim Rasool committed
13
### Predict whether income exceeds $50K/year:
Rahim Rasool committed
14 15 16

![](rawpixel-557125-unsplash.jpg)

Rahim Rasool committed
17 18 19
This *intermediate* level data set was extracted from the census bureau database. There are 48842 instances of data set, mix of continuous and discrete (train=32561, test=16281). 
The data set has 15 attribute which include age, sex, education level and other relevant details of a person. The data set will help to improve your skills in **Exploratory Data Analysis**, **Data Wrangling**, **Data Visualization** and **Classification Models**.
Feel free to explore the data set with multiple **supervised** and **unsupervised** learning techniques. The Following data dictionary gives more details on this data set:
Rahim Rasool committed
20 21 22 23


### Data Dictionary:

Rahim Rasool committed
24
| **Column   Position** 	| **Atrribute Name** 	| **Definition**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           	| **Data Type**    	| **Example**                                 	| **% Null Ratios** 	|
Rahim Rasool committed
25
|-------------------	|----------------	|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------	|--------------	|-----------------------------------------	|---------------	|
Rahim Rasool committed
26 27
| 1                 	| age            	| Age (years)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      	| Quantitative 	| 38, 42, 71                              	| 0             	|
| 2                 	| workclass      	| Workclass 8 different categories: (Private, Self-emp-not-inc,   Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     	| Qualitative  	| "Private", Local-gov", "Never-worked"   	| 6             	|
Rahim Rasool committed
28
| 3                 	| fnlwgt         	| Final Weight* 	| Quantitative 	| 83311, 338409                           	| 0             	|
Rahim Rasool committed
29 30 31 32 33 34 35 36 37 38 39 40 41
| 4                 	| education      	| Education: (Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     	| Qualitative  	| "Bachelors", "9th", "Preschool"         	| 0             	|
| 5                 	| education-num  	| Years of education                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   	| Quantitative 	| 13, 9, 7                                	| 0             	|
| 6                 	| marital-status 	| Marital Status: (Married-civ-spouse,   Divorced, Never-married, Separated, Widowed, Married-spouse-absent,   Married-AF-spouse)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     	| Qualitative  	| "Divorced", Separated", "Widowed"       	| 0             	|
| 7                 	| occupation     	| Occupation: (Tech-support, Craft-repair,   Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners,   Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv,   Protective-serv, Armed-Forces)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               	| Qualitative  	| "Tech-support", "Armed Forces", "Sales" 	| 6             	|
| 8                 	| relationship   	| Relationship:(Wife, Own-child, Husband,   Not-in-family, Other-relative, Unmarried)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             	| Qualitative  	| "Wife", "Unmarried", "Own-child"        	| 0             	|
| 9                 	| race           	| Race: (White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              	| Qualitative  	| "White", "Asian-Pac-Islander", "Other"  	| 0             	|
| 10                	| sex            	| Sex: (Male, Female)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           	| Qualitative  	| Male, Female                            	| 0             	|
| 11                	| capital-gain   	| Amount of capital gained                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      	| Quantitative 	| 14084, 0, 5178                          	| 0             	|
| 12                	| capital-loss   	| Amount of capital lost                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        	| Quantitative 	| 0, 2042, 1902                           	| 0             	|
| 13                	| hours-per-week 	| Number of hours worked per week                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               	| Quantitative 	| 40, 50, 70                              	| 0             	|
| 14                	| native-country 	| Native country: (United-States, Cambodia, England, Puerto-Rico, Canada, Germany,   Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras,   Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland,   France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary,   Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador,   Trinadad&Tobago, Peru, Hong, Holand-Netherlands)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             	| Qualitative  	| "China", "Italy", "Vietnam"             	| 2             	|
| 15                	| income         	| Either the income is greater than $50,000 or lesser than and equal to $50,000: (>50K, <=50K)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              	| Qualitative  	| ">50K", "<=50K"                         	| 0             	|

Rahim Rasool committed
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
*Description of fnlwgt (final weight):

The weights on the CPS files are controlled to independent estimates of the civilian noninstitutional population of the US.  These are prepared monthly for us by Population Division here at the Census Bureau.  We use 3 sets of controls.

These are:
1.  A single cell estimate of the population 16+ for each state.
2.  Controls for Hispanic Origin by age and sex.
3.  Controls by Race, age and sex.

We use all three sets of controls in our weighting program and "rake" through them 6 times so that by the end we come back to all the controls we used. 

The term estimate refers to population totals derived from CPS by creating "weighted tallies" of any specified socio-economic characteristics of the population.

People with similar demographic characteristics should have similar weights.  There is one important caveat to remember about this statement.  That is that since the CPS sample is actually a collection of 51 state samples, each with its own probability of selection, the statement only applies within state.


Rahim Rasool committed
58
### Acknowledgement:
Rahim Rasool committed
59 60 61


This data set has been sourced from the Machine Learning Repository of University of California, Irvine [Census Income Data Set (UC Irvine)](http://mlr.cs.umass.edu/ml/datasets/Census+Income). The UCI page mentions [US Census Bureau](http://www.census.gov/ftp/pub/DES/www/welcome.html) as the original source of the data set.