Name Last update

Data Science Dojo
Copyright (c) 2019 - 2020

Level: Intermediate
Recommended Use: Classification
Domain: Healthcare

Risk Factors for Cervical Cancer

The Risk Factor for Cervical Cancer Dataset contains demographic information, habits, and historic medical records of 858 patients. Some of these patients were diagnosed with Cervical Cancer, Cervical Intraepithelial Neoplasia or HPV.

This data set can be used to determine if smoking, invasive birth control methods and a history of STDs can lead to Cervical Cancer. This information can also be used to determine if a patient is susceptible to cancer by looking at their medical history.

This data set is recommended for exploring data visualization techniques and implementing classification models.

Data Dictionary

Column Number Attribute Attribute Description Data Type
1 Age Current Age Numeric
2 Number of sexual partners Number of Sexual Partners Numeric
3 First sexual intercourse Age of First Sexual Intercourse Numeric
4 Num of pregnancies Number of Pregnancies Numeric
5 Smokes Is the Patient a Smoker?
[0: No, 1: Yes]
6 Smokes (years) Number of Years the Patient has been Smoking Numeric
7 Smokes (packs/year) Number of Packs/year Numeric
8 Hormonal Contraceptives Does the Patient use Hormonal Contraceptives? [0: No, 1: Yes] Numeric/Bool
9 Hormonal Contraceptives (years) Number of Years Hormonal Contraceptives have been used for Numeric
10 IUD Does the Patient use IUD?
[0: No, 1: Yes]
11 IUD (years) Number of Years IUD has been used for Numeric
12 STDs History of STDs?
[0: No, 1: Yes]
13 STDs (number) Number of STDs Numeric
14 STDs:condylomatosis Suffered from this specific STD?
[0: No, 1: Yes]
15 STDs:cervical condylomatosis Suffered from this specific STD?
[0: No, 1: Yes]
16 STDs:vaginal condylomatosis Suffered from this specific STD?
[0: No, 1: Yes]
17 STDs:vulvo-perineal condylomatosis Suffered from this specific STD?
[0: No, 1: Yes]
18 STDs:syphilis Suffered from this specific STD?
[0: No, 1: Yes]
19 STDs:pelvic inflammatory disease Suffered from this specific STD?
[0: No, 1: Yes]
20 STDs:genital herpes Suffered from this specific STD?
[0: No, 1: Yes]
21 STDs:molluscum contagiosum Suffered from this specific STD?
[0: No, 1: Yes]
22 STDs:AIDS Suffered from this specific STD?
[0: No, 1: Yes]
23 STDs:HIV Suffered from this specific STD?
[0: No, 1: Yes]
24 STDs:Hepatitis B Suffered from this specific STD?
[0: No, 1: Yes]
25 STDs:HPV Suffered from this specific STD?
[0: No, 1: Yes]
26 STDs: Number of diagnosis Number of STD Diagnosis Numeric
27 STDs: Time since first diagnosis Years since First Diagnosis Numeric
28 STDs: Time since last diagnosis Years since Last Diagnosis Numeric
29 Dx:Cancer Cancer Diagnosis
[0: No, 1: Yes]
30 Dx:CIN Cervical Intraepithelial Neoplasia Diagnosis
[0: No, 1: Yes]
31 Dx:HPV Human Papilloma Virus Diagnosis
[0: No, 1: Yes]
32 Dx Other Diagnosis
[0: No, 1: Yes]
33 Hinselmann Hinselmann Test Recommended
[0: No, 1: Yes]
34 Schiller Schiller Test Recommended
[0: No, 1: Yes]
35 Citology Citology Recommended
[0: No, 1: Yes]
36 Biopsy Biopsy Recommended
[0: No, 1: Yes]


This data set has been sourced from the Machine Learning Repository of University of California, Irvine Risk Factors for Cervical Cancer Dataset (UC Irvine).
The UCI page mentions the following publication as the original source of the data set:
Kelwin Fernandes, Jaime S. Cardoso, and Jessica Fernandes. 'Transfer Learning with Partial Observability Applied to Cervical Cancer Screening.' Iberian Conference on Pattern Recognition and Image Analysis. Springer International Publishing, 2017