Commit 20f823fe by Rebecca Merrett

Update Amazon product reviews dataset

parent 805534ad
## Link to Dataset <br/>
aggressively deduplicated data (18gb)
No duplicates whatsoever (82.83 million reviews). file removes duplicates more aggressively, removing duplicates even if they are written by different users. This accounts for users with multiple accounts or plagiarized reviews.
Format is one-review-per-line in (loose) json. See examples below for further help reading the data.
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment