This project demonstrates the use of Social Network Analysis in building a recommendation system using Amazon's book co-purchase data. The methodology is adaptable to other product categories such as DVDs and Music CDs.
Source: Amazon product co-purchasing network metadata
Content:
- Product ID: Numeric (0 to 548,551).
- ASIN: Amazon Standard Identification Number (10-character alphanumeric).
- Title: Product name.
- Group: Category (e.g., books, music CDs, DVDs, VHS).
- SalesRank: Sales ranking within its category.
- Similar Products: ASINs of co-purchased items.
- Categories: Product category hierarchy.
- Reviews: Number, average rating, and details of individual reviews.
- Collection Period: Summer of 2006.
Processed fields include:
- ID, ASIN, Title, SalesRank, TotalReviews, AvgRating: Extracted directly.
- Categories: Concatenated, converted to lowercase, stemmed and filtered (removing punctuation, digits, stop words) to retain unique words.
- Copurchased: Extracted from the "Similar Products" field, filtered to include only ASINs with associated metadata.
- Nodes: Represent ASINs.
- Edges: Link co-purchased ASINs.
- Edge weight: Calculated based on category similarity, ranging from 0 (least similar) to 1 (most similar).
where, 0 ≤Similarity≤1 such that: 0 is the least similar and 1 the most similar.
- DegreeCentrality: Number of connections each node has.
- ClusteringCoeff: Indicates the extent to which nodes cluster together.
In the analysis, the concept of Ego Networks is used. In such a network, we take a focal node and call it the “ego”, and the nodes that have edges with the ego are termed the “alters”. Each alter of an ego network forms its own ego network. The intertwining of all the ego networks forms the social network.
-
Initial Step: Select a product ASIN (e.g., 0875421210) and retrieve its metadata.
-
Ego Network Creation: Construct a degree-1 ego network based on co-purchased books.
-
Graph Trimming: Apply the island method with a threshold ≥ 0.5 to narrow down similar books.
-
Recommendations: Select the top five similar books based on "AvgRating" and "TotalReviews".