diff --git a/Amazon Product Reviews/Amazon product reviews dataset b/Amazon Product Reviews/Amazon product reviews dataset index e69de29..8b161bf 100644 --- a/Amazon Product Reviews/Amazon product reviews dataset +++ b/Amazon Product Reviews/Amazon product reviews dataset @@ -0,0 +1,6 @@ +## Link to Dataset
+aggressively deduplicated data (18gb) + +No duplicates whatsoever (82.83 million reviews). file removes duplicates more aggressively, removing duplicates even if they are written by different users. This accounts for users with multiple accounts or plagiarized reviews. + +Format is one-review-per-line in (loose) json. See examples below for further help reading the data. \ No newline at end of file