Commit cd6d014b by Rahim Rasool

Update README.md

parent 1c5def58
......@@ -21,11 +21,11 @@ Feel free to explore the data set with multiple **supervised** and **unsupervise
### Data Dictionary:
| Column Position | Atrribute Name | Definition | Data Type | Example | % Null Ratios |
| **Column Position** | **Atrribute Name** | **Definition** | **Data Type** | **Example** | **% Null Ratios** |
|------------------- |---------------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |-------------- |----------------------------------------- |--------------- |
| 1 | age | Age (years) | Quantitative | 38, 42, 71 | 0 |
| 2 | workclass | Workclass 8 different categories: (Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked) | Qualitative | "Private", Local-gov", "Never-worked" | 6 |
| 3 | fnlwgt | This is the Final Weight attribute which is constinuous. Following is its description. The weights on the CPS files are controlled to independent estimates of the civilian noninstitutional population of the US. These are prepared monthly for us by Population Division here at the Census Bureau. We use 3 sets of controls. These are: 1. A single cell estimate of the population 16+ for each state. 2. Controls for Hispanic Origin by age and sex. 3. Controls by Race, age and sex. We use all three sets of controls in our weighting program and "rake" through them 6 times so that by the end we come back to all the controls we used. The term estimate refers to population totals derived from CPS by creating "weighted tallies" of any specified socio-economic characteristics of the population. People with similar demographic characteristics should have similar weights. There is one important caveat to remember about this statement. That is that since the CPS sample is actually a collection of 51 state samples, each with its own probability of selection, the statement only applies within state. | Quantitative | 83311, 338409 | 0 |
| 3 | fnlwgt | Final Weight* | Quantitative | 83311, 338409 | 0 |
| 4 | education | Education: (Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool) | Qualitative | "Bachelors", "9th", "Preschool" | 0 |
| 5 | education-num | Years of education | Quantitative | 13, 9, 7 | 0 |
| 6 | marital-status | Marital Status: (Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse) | Qualitative | "Divorced", Separated", "Widowed" | 0 |
......@@ -39,5 +39,21 @@ Feel free to explore the data set with multiple **supervised** and **unsupervise
| 14 | native-country | Native country: (United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands) | Qualitative | "China", "Italy", "Vietnam" | 2 |
| 15 | income | Either the income is greater than $50,000 or lesser than and equal to $50,000: (>50K, <=50K) | Qualitative | ">50K", "<=50K" | 0 |
*Description of fnlwgt (final weight):
The weights on the CPS files are controlled to independent estimates of the civilian noninstitutional population of the US. These are prepared monthly for us by Population Division here at the Census Bureau. We use 3 sets of controls.
These are:
1. A single cell estimate of the population 16+ for each state.
2. Controls for Hispanic Origin by age and sex.
3. Controls by Race, age and sex.
We use all three sets of controls in our weighting program and "rake" through them 6 times so that by the end we come back to all the controls we used.
The term estimate refers to population totals derived from CPS by creating "weighted tallies" of any specified socio-economic characteristics of the population.
People with similar demographic characteristics should have similar weights. There is one important caveat to remember about this statement. That is that since the CPS sample is actually a collection of 51 state samples, each with its own probability of selection, the statement only applies within state.
### Acknowledgement:
Asuncion, A. & Newman, D.J. (2007). UCI Machine Learning Repository [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science.
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment