README.md 1.32 KB
Newer Older
Rebecca Merrett committed
1
# Amazon product reviews data
Rebecca Merrett committed
2 3 4 5 6 7 8 9

This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014.

This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).

This dataset is probably preferable for sentiment analysis type tasks.


Rebecca Merrett committed
10 11
## Link to dataset
[aggressively deduplicated data (18gb)] (http://snap.stanford.edu/data/amazon/productGraph/aggressive_dedup.json.gz)
Rebecca Merrett committed
12 13 14 15 16

No duplicates whatsoever (82.83 million reviews). file removes duplicates more aggressively, removing duplicates even if they are written by different users. This accounts for users with multiple accounts or plagiarized reviews.

Format is one-review-per-line in (loose) json. See examples below for further help reading the data.

Rebecca Merrett committed
17
## Sample review <br/>
Rebecca Merrett committed
18
![](amazon_reviews_example.PNG)
Rebecca Merrett committed
19 20 21

where

Rebecca Merrett committed
22 23 24 25 26 27 28 29 30
- reviewerID - ID of the reviewer, e.g. A2SUAM1J3GNN3B
- asin - ID of the product, e.g. 0000013714
- reviewerName - name of the reviewer
- helpful - helpfulness rating of the review, e.g. 2/3
- reviewText - text of the review
- overall - rating of the product
- summary - summary of the review
- unixReviewTime - time of the review (unix time)
- reviewTime - time of the review (raw)