README.md 2.1 KB
Newer Older
Rebecca Merrett committed
1
Data Science Dojo <br/>
Rebecca Merrett committed
2
Copyright (c) 2020
Rebecca Merrett committed
3 4 5 6 7 8 9

---

**Level:** Advanced <br/>
**Recommended Use:** Text Analytics<br/>
**Domain:** Marketing<br/>

Rebecca Merrett committed
10
# Amazon product reviews data
Rebecca Merrett committed
11 12 13 14 15 16 17 18

This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014.

This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).

This dataset is probably preferable for sentiment analysis type tasks.


Rebecca Merrett committed
19 20
## Link to dataset
[aggressively deduplicated data (18gb)] (http://snap.stanford.edu/data/amazon/productGraph/aggressive_dedup.json.gz)
Rebecca Merrett committed
21 22 23 24 25

No duplicates whatsoever (82.83 million reviews). file removes duplicates more aggressively, removing duplicates even if they are written by different users. This accounts for users with multiple accounts or plagiarized reviews.

Format is one-review-per-line in (loose) json. See examples below for further help reading the data.

Rebecca Merrett committed
26
## Sample review <br/>
Rebecca Merrett committed
27
![](amazon_reviews_example.PNG)
Rebecca Merrett committed
28 29 30

where

Rebecca Merrett committed
31 32 33 34 35 36 37 38
- reviewerID - ID of the reviewer, e.g. A2SUAM1J3GNN3B
- asin - ID of the product, e.g. 0000013714
- reviewerName - name of the reviewer
- helpful - helpfulness rating of the review, e.g. 2/3
- reviewText - text of the review
- overall - rating of the product
- summary - summary of the review
- unixReviewTime - time of the review (unix time)
Rebecca Merrett committed
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
- reviewTime - time of the review (raw)


## Acknowledgement

This data set has been sourced from [jmcauley.ucsd.edu/data/amazon/links.html](http://jmcauley.ucsd.edu/data/amazon/links.html)

### Use of this data requires citation

Please cite one or both of the following if you use the data in any way:

Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering <br/>
R. He, J. McAuley <br/>
WWW, 2016 <br/>
[pdf](http://cseweb.ucsd.edu/~jmcauley/pdfs/www16a.pdf)

Image-based recommendations on styles and substitutes <br/>
J. McAuley, C. Targett, J. Shi, A. van den Hengel <br/>
SIGIR, 2015 <br/>
[pdf](http://cseweb.ucsd.edu/~jmcauley/pdfs/sigir15.pdf)