Data Science Dojo
Copyright (c) 2019 - 2020


Level: Intermediate
Recommended Use: Clustering/Classification Models
Domain: Web

Travel Reviews Data Set

Group similar travel reviews


This intermediate level data set has 980 rows and 11 columns. The data set includes reviews on destinations in 10 categories mentioned across East Asia. Each traveler rating is mapped as Excellent(4), Very Good(3), Average(2), Poor(1), and Terrible(0) and average rating is used against each category per user. This data set is populated by crawling TripAdvisor.com.

This data set is recommended for learning and practicing your skills in exploratory data analysis, data visualization, clustering and classification modelling techniques. Feel free to explore the data set with multiple supervised and unsupervised learning techniques. The Following data dictionary gives more details on this data set:


Data Dictionary

Column Position Atrribute Name Definition Data Type Example % Null Ratios
1 User ID Unique user id Qualitative "User 4", "User 13", "User 19" 0
2 Category 1 Average user feedback on art galleries Quantitative 0.45, 0.74, 0.67 0
3 Category 2 Average user feedback on dance clubs Quantitative 1.8, 1.44, 1.36 0
4 Category 3 Average user feedback on juice bars Quantitative 0.29, 2.75, 1.36 0
5 Category 4 Average user feedback on restaurants Quantitative 0.57, 0.45, 0.38 0
6 Category 5 Average user feedback on museums Quantitative 0.46, 0.98, 0.82 0
7 Category 6 Average user feedback on resorts Quantitative 1.52, 1.74, 3.38 0
8 Category 7 Average user feedback on parks/picnic spots Quantitative 3.18, 3.2, 3.18 0
9 Category 8 Average user feedback on beaches Quantitative 2.96, 2.87, 2.86 0
10 Category 9 Average user feedback on theaters Quantitative 1.57, 1.38, 1.79 0
11 Category 10 Average user feedback on religious institutions Quantitative 2.86, 2.34, 2.8 0

Acknowledgement

This data set has been sourced from the Machine Learning Repository of University of California, Irvine Travel Reviews Data Set (UC Irvine).
The UCI page mentions the following publication as the original source of the data set:
Renjith, Shini, A. Sreekumar, and M. Jathavedan. 2018. Evaluation of Partitioning Clustering Algorithms for Processing Social Media Data in Tourism Domain. In 2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 12731. IEEE