From aafd8550bc13938a22e24970899446a26cd7d0d7 Mon Sep 17 00:00:00 2001 From: Rebecca Merrett Date: Mon, 10 Feb 2020 19:14:25 +0000 Subject: [PATCH] Update README.md --- Amazon Product Reviews/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Amazon Product Reviews/README.md b/Amazon Product Reviews/README.md index ca645d2..c99b532 100644 --- a/Amazon Product Reviews/README.md +++ b/Amazon Product Reviews/README.md @@ -7,8 +7,8 @@ This dataset includes reviews (ratings, text, helpfulness votes), product metada This dataset is probably preferable for sentiment analysis type tasks. -## Link to dataset
-aggressively deduplicated data (18gb) +## Link to dataset +[aggressively deduplicated data (18gb)] (http://snap.stanford.edu/data/amazon/productGraph/aggressive_dedup.json.gz) No duplicates whatsoever (82.83 million reviews). file removes duplicates more aggressively, removing duplicates even if they are written by different users. This accounts for users with multiple accounts or plagiarized reviews. -- libgit2 0.26.0