README.md 22.1 KB
Newer Older
Rahim Rasool committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
Data Science Dojo <br/>
Copyright (c) 2016 - 2019

---

**<span style="color:#E57932">Level:</span>** Intermediate <br/>
**<span style="color:#E57932">Recommended Use:</span>** Classification Models<br/>
**<span style="color:#E57932">Domain:</span>** Social<br/>


## Census Income Data Set

### Predict whether income exceeds $50K/year:

![](rawpixel-557125-unsplash.jpg)

This *intermediate* level data set was extracted from the census bureau database. There are 48842 instances of data set, mix of continuous and discrete (train=32561, test=16281).
The data set has 15 attribute which include age, sex, education level and other relevant details of a person. The data set will help to improve your skills in **Exploratory Data Analysis**, **Data Wrangling**, **Data Visualization** and **Classification Models**.
Feel free to explore the data set with multiple **supervised** and **unsupervised** learning techniques. The Following data dictionary gives more details on this data set:


### Data Dictionary:

| **Column   Position** 	| **Attribute Name** 	| **Definition**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           	| **Data Type**    	| **Example**                                 	| **% Null Ratios** 	|
|-------------------	|----------------	|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------	|--------------	|-----------------------------------------	|---------------	|
| 1                 	| age            	| Age (years)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      	| Quantitative 	| 38, 42, 71                              	| 0             	|
| 2                 	| workclass      	| Workclass 8 different categories: (Private, Self-emp-not-inc,   Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     	| Qualitative  	| "Private", Local-gov", "Never-worked"   	| 6             	|
| 3                 	| fnlwgt         	| Final Weight* 	| Quantitative 	| 83311, 338409                           	| 0             	|
| 4                 	| education      	| Education: (Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     	| Qualitative  	| "Bachelors", "9th", "Preschool"         	| 0             	|
| 5                 	| education-num  	| Years of education                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   	| Quantitative 	| 13, 9, 7                                	| 0             	|
| 6                 	| marital-status 	| Marital Status: (Married-civ-spouse,   Divorced, Never-married, Separated, Widowed, Married-spouse-absent,   Married-AF-spouse)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     	| Qualitative  	| "Divorced", Separated", "Widowed"       	| 0             	|
| 7                 	| occupation     	| Occupation: (Tech-support, Craft-repair,   Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners,   Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv,   Protective-serv, Armed-Forces)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               	| Qualitative  	| "Tech-support", "Armed Forces", "Sales" 	| 6             	|
| 8                 	| relationship   	| Relationship:(Wife, Own-child, Husband,   Not-in-family, Other-relative, Unmarried)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             	| Qualitative  	| "Wife", "Unmarried", "Own-child"        	| 0             	|
| 9                 	| race           	| Race: (White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              	| Qualitative  	| "White", "Asian-Pac-Islander", "Other"  	| 0             	|
| 10                	| sex            	| Sex: (Male, Female)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           	| Qualitative  	| Male, Female                            	| 0             	|
| 11                	| capital-gain   	| Amount of capital gained                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      	| Quantitative 	| 14084, 0, 5178                          	| 0             	|
| 12                	| capital-loss   	| Amount of capital lost                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        	| Quantitative 	| 0, 2042, 1902                           	| 0             	|
| 13                	| hours-per-week 	| Number of hours worked per week                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               	| Quantitative 	| 40, 50, 70                              	| 0             	|
| 14                	| native-country 	| Native country: (United-States, Cambodia, England, Puerto-Rico, Canada, Germany,   Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras,   Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland,   France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary,   Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador,   Trinadad&Tobago, Peru, Hong, Holand-Netherlands)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             	| Qualitative  	| "China", "Italy", "Vietnam"             	| 2             	|
| 15                	| income         	| Either the income is greater than $50,000 or lesser than and equal to $50,000: (>50K, <=50K)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              	| Qualitative  	| ">50K", "<=50K"                         	| 0             	|

*Description of fnlwgt (final weight):

The weights on the CPS files are controlled to independent estimates of the civilian noninstitutional population of the US.  These are prepared monthly for us by Population Division here at the Census Bureau.  We use 3 sets of controls.

These are:
1.  A single cell estimate of the population 16+ for each state.
2.  Controls for Hispanic Origin by age and sex.
3.  Controls by Race, age and sex.

We use all three sets of controls in our weighting program and "rake" through them 6 times so that by the end we come back to all the controls we used.

The term estimate refers to population totals derived from CPS by creating "weighted tallies" of any specified socio-economic characteristics of the population.

People with similar demographic characteristics should have similar weights.  There is one important caveat to remember about this statement.  That is that since the CPS sample is actually a collection of 51 state samples, each with its own probability of selection, the statement only applies within state.


### Acknowledgement:


This data set has been sourced from the Machine Learning Repository of University of California, Irvine [Census Income Data Set (UC Irvine)](http://mlr.cs.umass.edu/ml/datasets/Census+Income). The UCI page mentions [US Census Bureau](http://www.census.gov/ftp/pub/DES/www/welcome.html) as the original source of the data set.