README.md 3.59 KB
Newer Older
Rahim Rasool committed
1
Data Science Dojo <br/>
2
Copyright (c) 2019 - 2020
Rahim Rasool committed
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

---

**Level:** Intermediate <br/>
**Recommended Use:** Clustering/Classification Models<br/>
**Domain:** Web<br/> 

## Travel Reviews Data Set 

### Group similar travel reviews 


---
![](180.jpg)
---

This *intermediate* level data set has 980 rows and 11 columns.
The data set includes reviews on destinations in 10 categories mentioned across East Asia. Each traveler rating is mapped as Excellent(4), Very Good(3), Average(2), Poor(1), and Terrible(0) and average rating is used against each category per user.
This data set is populated by crawling TripAdvisor.com. 

This data set is recommended for learning and practicing your skills in **exploratory data analysis**, **data visualization**, **clustering** and **classification modelling techniques**. 
Feel free to explore the data set with multiple **supervised** and **unsupervised** learning techniques. The Following data dictionary gives more details on this data set:

---

### Data Dictionary 

| Column   Position 	| Atrribute Name 	| Definition                                        	| Data Type    	| Example                        	| % Null Ratios 	|
|-------------------	|----------------	|---------------------------------------------------	|--------------	|--------------------------------	|---------------	|
| 1                 	| User ID        	| Unique user   id                                  	| Qualitative  	| "User 4", "User 13", "User 19" 	| 0             	|
| 2                 	| Category 1     	| Average user   feedback on art galleries          	| Quantitative 	| 0.45, 0.74, 0.67               	| 0             	|
| 3                 	| Category 2     	| Average user   feedback on dance clubs            	| Quantitative 	| 1.8, 1.44, 1.36                	| 0             	|
| 4                 	| Category 3     	| Average user   feedback on juice bars             	| Quantitative 	| 0.29, 2.75, 1.36               	| 0             	|
| 5                 	| Category 4     	| Average user   feedback on restaurants            	| Quantitative 	| 0.57, 0.45, 0.38               	| 0             	|
| 6                 	| Category 5     	| Average user   feedback on museums                	| Quantitative 	| 0.46, 0.98, 0.82               	| 0             	|
| 7                 	| Category 6     	| Average user   feedback on resorts                	| Quantitative 	| 1.52, 1.74, 3.38               	| 0             	|
| 8                 	| Category 7     	| Average user   feedback on parks/picnic spots     	| Quantitative 	| 3.18, 3.2, 3.18                	| 0             	|
| 9                 	| Category 8     	| Average user   feedback on beaches                	| Quantitative 	| 2.96, 2.87, 2.86               	| 0             	|
| 10                	| Category 9     	| Average user   feedback on theaters               	| Quantitative 	| 1.57, 1.38, 1.79               	| 0             	|
| 11                	| Category 10    	| Average user   feedback on religious institutions 	| Quantitative 	| 2.86, 2.34, 2.8                	| 0             	|

---

### Acknowledgement

This data set has been sourced from the Machine Learning Repository of University of California, Irvine [Travel Reviews Data Set (UC Irvine)](https://archive.ics.uci.edu/ml/datasets/Travel+Reviews).<br/> 
The UCI page mentions the following publication as the original source of the data set:<br/>
*Renjith, Shini, A. Sreekumar, and M. Jathavedan. 2018. Evaluation of Partitioning Clustering Algorithms for Processing Social Media Data in Tourism Domain. In 2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 12731. IEEE*