Commit 73d47b3d by Rebecca Merrett

Update README.md

parent 20f823fe
Amazon product reviews data # Amazon product reviews data
This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014. This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014.
...@@ -7,24 +7,24 @@ This dataset includes reviews (ratings, text, helpfulness votes), product metada ...@@ -7,24 +7,24 @@ This dataset includes reviews (ratings, text, helpfulness votes), product metada
This dataset is probably preferable for sentiment analysis type tasks. This dataset is probably preferable for sentiment analysis type tasks.
Link to Dataset ## Link to dataset <br/>
aggressively deduplicated data (18gb) aggressively deduplicated data (18gb)
No duplicates whatsoever (82.83 million reviews). file removes duplicates more aggressively, removing duplicates even if they are written by different users. This accounts for users with multiple accounts or plagiarized reviews. No duplicates whatsoever (82.83 million reviews). file removes duplicates more aggressively, removing duplicates even if they are written by different users. This accounts for users with multiple accounts or plagiarized reviews.
Format is one-review-per-line in (loose) json. See examples below for further help reading the data. Format is one-review-per-line in (loose) json. See examples below for further help reading the data.
Sample review ## Sample review <br/>
IMAGE HERE IMAGE HERE
where where
reviewerID - ID of the reviewer, e.g. A2SUAM1J3GNN3B - reviewerID - ID of the reviewer, e.g. A2SUAM1J3GNN3B
asin - ID of the product, e.g. 0000013714 - asin - ID of the product, e.g. 0000013714
reviewerName - name of the reviewer - reviewerName - name of the reviewer
helpful - helpfulness rating of the review, e.g. 2/3 - helpful - helpfulness rating of the review, e.g. 2/3
reviewText - text of the review - reviewText - text of the review
overall - rating of the product - overall - rating of the product
summary - summary of the review - summary - summary of the review
unixReviewTime - time of the review (unix time) - unixReviewTime - time of the review (unix time)
reviewTime - time of the review (raw) - reviewTime - time of the review (raw)
\ No newline at end of file \ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment