Commit aafd8550 by Rebecca Merrett


parent 8e9e28a5
......@@ -7,8 +7,8 @@ This dataset includes reviews (ratings, text, helpfulness votes), product metada
This dataset is probably preferable for sentiment analysis type tasks.
## Link to dataset <br/>
aggressively deduplicated data (18gb)
## Link to dataset
[aggressively deduplicated data (18gb)] (
No duplicates whatsoever (82.83 million reviews). file removes duplicates more aggressively, removing duplicates even if they are written by different users. This accounts for users with multiple accounts or plagiarized reviews.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment