From 20f823fe7862c875f60a75ecc4f9825760942a50 Mon Sep 17 00:00:00 2001 From: Rebecca Merrett Date: Mon, 10 Feb 2020 19:03:47 +0000 Subject: [PATCH] Update Amazon product reviews dataset --- Amazon Product Reviews/Amazon product reviews dataset | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/Amazon Product Reviews/Amazon product reviews dataset b/Amazon Product Reviews/Amazon product reviews dataset index e69de29..8b161bf 100644 --- a/Amazon Product Reviews/Amazon product reviews dataset +++ b/Amazon Product Reviews/Amazon product reviews dataset @@ -0,0 +1,6 @@ +## Link to Dataset
+aggressively deduplicated data (18gb) + +No duplicates whatsoever (82.83 million reviews). file removes duplicates more aggressively, removing duplicates even if they are written by different users. This accounts for users with multiple accounts or plagiarized reviews. + +Format is one-review-per-line in (loose) json. See examples below for further help reading the data. \ No newline at end of file -- libgit2 0.26.0