+ This notebook illustrates the final course project for the SMM638 course. The project is based on the analysis of a dataset regarding the friendship network and genre preferences of Deezer users. The first section of the notebook provides an overview of the project, the second section describes the data, the third section outlines the problem, and the fourth section describes the submission package.
+
+
+
+
+
+
+
+
+
1 Overview
+
Like other streaming platforms, Deezer contains a wealth of digital traces, which can be used to analyze user behavior, and, therefore, to create or refine products and improve business model execution (e.g., by adopting a recommendation system that help a platform business better engage with audiences).
+
Network analysis methods and tools play a key role when it comes to analyzing digital-traces like the one we have in the Deezer dataset. Particularly, network analysis offers an effective framework within which to appreciate the similarity between entities — being users or the genres they may favorite — and, possibly, cluster these entities into homogenous groups — e.g., users that share similar music genres or genres that are liked by the same users. Let us consider a two-mode (or bipartite) network \(X\), where \(N\) users are connected to \(K\) genres via the ‘like’ relationship:
where \(a_{ij}\) is the ‘like’ relationship between user \(i\) and genre \(j\). The matrix \(X\) can be used to create a user-user network \(Y\) (\(N\) x \(N\)) and a genre-genre network \(Z\) (\(K\) x \(K\)):
+
\[
+Y = X \cdot X^T
+\]
+
\[
+Z = X^T \cdot X
+\]
+
The user-user network \(Y\) is a one-mode, non-directed, weighted graph where nodes are users and edges are mutual likes, i.e., the counts of music genres that users \(i\) and \(j\) share. The genre-genre network \(Z\) is a one-mode, non-directed, weighted graph where nodes are genres and edges are mutual likers, i.e., the counts of users that like both genres \(i\) and \(j\). Consider the following example of ‘like’ network, including five users and three music genres:
Both \(Y\) and \(Z\) can be further analyzed using network analysis tools — e.g., block-modeling — or conventional statistical tools — e.g., cluster analysis – to identify homogenous groups of entities (users and genres for \(Y\) and \(Z\), respectively).
The data were scraped from Deezer in November 2017
+
**_edges.csv represent friendships networks of users from 3 European countries, that is, Croatia, Hungary, and Romania. Nodes represent the users and edges are the mutual friendships2
+
**_genres.json contain the genre preferences of users — each key is a user identifier, the genres loved are given as lists. Genre notations are consistent across users. In each dataset users could like 84 distinct genres. Liked genre lists were compiled based on the liked song lists
+
+
+
2.1 Friendship networks
+
For illustrative purposes, let us inspect the friendship network for the case of Croatia. First, we load Pandas and NetworkX, then we load the data:
+
+
+Code
+
# load modules
+import pandas as pd
+import networkx as nx
+# load data
+fr = pd.read_csv('../data/deezer_clean_data/HR_edges.csv')
+# data preview
+fr.head()
+
+
+
+
+
+
+
+
+
+
node_1
+
node_2
+
+
+
+
+
0
+
0
+
4076
+
+
+
1
+
0
+
29861
+
+
+
2
+
0
+
53717
+
+
+
3
+
0
+
23820
+
+
+
4
+
0
+
39945
+
+
+
+
+
+
+
+
The data preview shows that the friendship network for Croatia is a list of edges, where each edge is a pair of user identifiers. The data can be used to create a network object using NetworkX:
Using code introspection, it is possible to see that the network object fr_g is a NetworkX object of type Graph and that it has 54,573 nodes and 498,202 edges. To familiarize with the data, we test if fr_g is connected:
+
+
+Code
+
nx.is_connected(fr_g)
+
+
+
True
+
+
+
Then, we consider the degree distribution of the network:
+
+
+Code
+
# import further modules
+import numpy as np
+from matplotlib import pyplot as plt
+from collections import Counter
+# compute node degree
+dd = Counter(dict(fr_g.degree()).values())
+# plot the degree distribution
+fig = plt.figure(figsize=(4, 3))
+ax = fig.add_subplot(111)
+ax.scatter(dd.keys(), dd.values(), color="limegreen", alpha=0.15)
+ax.set_yscale("log")
+ax.set_xscale("log")
+ax.set_xlabel("Log(Degree)")
+ax.set_ylabel("Log(Counts of nodes)")
+ax.grid(True, ls="--")
+plt.show()
+
+
+
+
+
+
+
+
It is self-explanatory that the degree distribution of the friendship network for Croatia is right-skewed, which is a common feature of social networks. We can try to getter a better understanding of the network — including the presence and locatio of ‘hub’ users — by visualizing it. Since the network is large, we may benefit from using the visualization capabilities of graph-tool, a Python API wrapping around C++ code, a more efficient alternative to pure Python NetworkX:
+
+
+Code
+
# import further module
+from graph_tool.allimport*
+# iterate over the Pandas DataFrame to create the graph and edges to it
+edges = [(str(u), str(v)) for u, v in fr[['node_1', 'node_2']].values]
+fer_gt = Graph(edges, hashed=True, directed=False)
+# plot the network
+# graph_tool.draw.graph_draw(fer_gt, output_size=(500, 500), output="fer_gt.png")
+# load image
+from IPython.display import Image
+Image(filename='fer_gt.png')
+
+
+
+
+
+
+
+
It is worth noticing the friendship network presents a periphery of users with low degree and, plausibly, a core of users with high degree. However, the figure does not provide a clear picture of the core of the network, which deserves further investigation.
+
+
+
2.2 Music genre preferences
+
Building on the previous sub-section, we consider the preferences of users as per HR_genres.json files. These files are JSON files, which can be loaded using the json module:
At this stage, we have a dictionary where each key is a user identifier and the corresponding value is a list of genres that the user likes. For example, above is the list of music genres that user 11542 likes. We can convert the dictionary into a Pandas DataFrame drawing upon Pandas’ json_normalize function:
The data preview shows that the DataFrame pr has a single column, genres, which contains lists of genres that users like. To make the data more amenable to analysis, we can explode the lists of genres into separate rows drawing upon Pandas’ explode function:
For illustrative purposes, we can consider the distribution of genres liked by users in the dataset:
+
+
+Code
+
genres = Counter(pr.groupby('genres').size())
+fig = plt.figure(figsize=(6, 3))
+ax = fig.add_subplot(111)
+ax.hist(genres.keys(), color="magenta", alpha=0.5)
+ax.set_xticklabels(["{:,}".format(int(x)) for x in ax.get_xticks()])
+ax.set_xlabel("Degree -- number of likers")
+ax.set_ylabel("Counts of music genres")
+ax.grid(True, ls="--")
+plt.show()
+
+
+
+
+
+
+
+
The degree distribution of music genres liked by users is right-skewed, meaning there are few genres that are liked by many users and many genres that are liked by few users.
+
+
+
+
3 Problem description
+
Your employer is a consultancy firm that has been hired by ‘Sonic,’ a major music label, to better understand the categories that compose the music market. The artists and repertoire roles at Sonic have struggled to make sense of the association between the ‘music genre’ tags (e.g., ‘International Pop’) in Deezer and similar services. Some think that these tags are sometimes redundant. In other circumstances, they are unclear or meaningless. Therefore, it is hard for Sonic to see clear targets in the market, and, consequently, correctly position new albums and musicians against consumer preferences. Sonic wants a map of the categories that form the music market. This map must be based on some digital traces – i.e., behavioral data – , easy to interpret, and well-grounded in demonstrable data patterns.
+
To help Sonic address its business problem, you have been asked to analyze the ‘Croatia’ dataset, briefly described in the previous section. You are expected to use network analytic methods and tools to:
+
+
Assess the similarity between Deezer music genres
+
Identify homogenous groups of Deezer music genres
+
Highlight how social ties among users influence the similarity between music genres
+
+
+
+
4 Submission package
+
The submission package consists of:
+
+
A report that includes:
+
+
The description of the workflow you have followed to address the problem (300 words MAX)
+
The results of your analysis, comprising text (300 words MAX) and exhibits (five among figures and tables MAX)
+
The interpretation of the results in light of the business problem (300 words MAX)
+
+
The computer code that allows me to fully reproduce your charts (being, R, Python, Julia, Rust, C++, Java, etc.). The code should be well-commented and easy to read. Non-reproducible exhibits will not be graded.
+
+
+
+
+
+
+
Footnotes
+
+
+
Benedek Rozemberczki, Ryan Davies, Rik Sarkar, and Charles Sutton. 2018. GEMSEC: Graph Embedding with Self Clustering. arXiv preprint arXiv:1802.03997.↩︎
+
The researchers who collected the data say that they have “… reindexed the nodes in order to achieve a certain level of anonymity.”↩︎
+
+
+
+
Source Code
+
---
+title: SMM638 Final Course Project Description
+author: Simone Santoni
+date: last-modified
+abstract-title: Synopsis
+abstract: This notebook illustrates the final course project for the SMM638 course. The project is based on the analysis of a dataset regarding the friendship network and genre preferences of Deezer users. The first section of the notebook provides an overview of the project, the second section describes the data, the third section outlines the problem, and the fourth section describes the submission package.
+warning: false
+fig-cap-location: top
+format:
+ html:
+ code-fold: true
+ code-tools: true
+ toc: true
+ toc-title: Table of Contents
+ toc-depth: 2
+ toc-location: right
+ number-sections: true
+ citations-hover: false
+ footnotes-hover: false
+ crossrefs-hover: false
+ theme: journal
+ fig-width: 9
+ fig-height: 6
+ ipynb: default
+ docx: default
+ typst:
+ number-sections: true
+ df-print: paged
+ #pdf:
+ # documentclass: scrartcl
+ # papersize: letter
+ # papersize: letter
+---
+
+# Overview
+
+Like other streaming platforms, Deezer contains a wealth of digital traces, which can be used to analyze user behavior, and, therefore, to create or refine products and improve business model execution (e.g., by adopting a recommendation system that help a platform business better engage with audiences).
+
+Network analysis methods and tools play a key role when it comes to analyzing digital-traces like the one we have in the Deezer dataset. Particularly, network analysis offers an effective framework within which to appreciate the similarity between entities --- being users or the genres they may favorite --- and, possibly, cluster these entities into homogenous groups --- e.g., users that share similar music genres or genres that are liked by the same users. Let us consider a two-mode (or bipartite) network $X$, where $N$ users are connected to $K$ genres via the 'like' relationship:
+
+$$
+X =
+\begin{bmatrix}
+a_{11} & a_{12} & \cdots & a_{1k} \\
+a_{21} & a_{22} & \cdots & a_{2k} \\
+\vdots & \vdots & \ddots & \vdots \\
+a_{n1} & a_{n2} & \cdots & a_{nk}
+\end{bmatrix}
+$$
+
+where $a_{ij}$ is the 'like' relationship between user $i$ and genre $j$. The matrix $X$ can be used to create a user-user network $Y$ ($N$ x $N$) and a genre-genre network $Z$ ($K$ x $K$):
+
+$$
+Y = X \cdot X^T
+$$
+
+$$
+Z = X^T \cdot X
+$$
+
+The user-user network $Y$ is a one-mode, non-directed, weighted graph where nodes are users and edges are mutual likes, i.e., the counts of music genres that users $i$ and $j$ share. The genre-genre network $Z$ is a one-mode, non-directed, weighted graph where nodes are genres and edges are mutual likers, i.e., the counts of users that like both genres $i$ and $j$. Consider the following example of 'like' network, including five users and three music genres:
+
+$$
+X =
+\begin{bmatrix}
+1 & 0 & 1 \\
+1 & 0 & 0 \\
+0 & 0 & 1 \\
+0 & 1 & 1 \\
+0 & 1 & 1 \\
+\end{bmatrix}
+$$
+
+The user-user network $Y$ ($X \cdot X^T$) is:
+
+$$
+Y =
+\begin{bmatrix}
+2 & 1 & 1 & 1 & 1\\
+1 & 1 & 0 & 0 & 0\\
+1 & 0 & 1 & 1 & 1\\
+1 & 0 & 1 & 2 & 2\\
+1 & 0 & 1 & 2 & 2\\
+\end{bmatrix}
+$$
+
+whereas the genre-genre network $Z$ ($X^T \cdot X$) is:
+
+$$
+Z =
+\begin{bmatrix}
+2 & 1 & 0 \\
+1 & 4 & 3 \\
+0 & 3 & 4 \\
+\end{bmatrix}
+$$
+
+Both $Y$ and $Z$ can be further analyzed using network analysis tools --- e.g., block-modeling --- or conventional statistical tools --- e.g., cluster analysis -- to identify homogenous groups of entities (users and genres for $Y$ and $Z$, respectively).
+
+# Data
+
+The data for the final course project is stored in the [`data/deezer_clean_data`](https://github.com/simoneSantoni/net-analysis-smm638/tree/master/data/deezer_clean_data) directory of [GitHub repository of SMM638](https://github.com/simoneSantoni/net-analysis-smm638). The data, which were gathered for a network science project,[^1] are also available in the website of [Stanford Network Analysis Project](https://snap.stanford.edu/data/gemsec-Deezer.html).
+
+Below are some key aspects about the data:
+
++ The data were scraped from Deezer in November 2017
++ `**_edges.csv` represent friendships networks of users from 3 European countries, that is, Croatia, Hungary, and Romania. Nodes represent the users and edges are the mutual friendships[^2]
++ `**_genres.json` contain the genre preferences of users --- each key is a user identifier, the genres loved are given as lists. Genre notations are consistent across users. In each dataset users could like 84 distinct genres. Liked genre lists were compiled based on the liked song lists
+
+[^1]: Benedek Rozemberczki, Ryan Davies, Rik Sarkar, and Charles Sutton. 2018. GEMSEC: Graph Embedding with Self Clustering. arXiv preprint arXiv:1802.03997.
+
+[^2]: The researchers who collected the data say that they have "*... reindexed the nodes in order to achieve a certain level of anonymity*."
+
+## Friendship networks
+
+For illustrative purposes, let us inspect the friendship network for the case of Croatia. First, we load `Pandas` and `NetworkX`, then we load the data:
+
+```{python}
+# load modules
+import pandas as pd
+import networkx as nx
+# load data
+fr = pd.read_csv('../data/deezer_clean_data/HR_edges.csv')
+# data preview
+fr.head()
+```
+
+The data preview shows that the friendship network for Croatia is a list of edges, where each edge is a pair of user identifiers. The data can be used to create a network object using `NetworkX`:
+
+```{python}
+fr_g = nx.from_pandas_edgelist(fr, source='node_1', target='node_2')
+fr_g?
+```
+
+Using code introspection, it is possible to see that the network object `fr_g` is a NetworkX object of type `Graph` and that it has 54,573 nodes and 498,202 edges. To familiarize with the data, we test if `fr_g` is connected:
+
+```{python}
+nx.is_connected(fr_g)
+```
+
+Then, we consider the degree distribution of the network:
+
+```{python}
+#| fig-cap: Degree distribution of the friendship network for Croatia
+#| fig-cap-location: margin
+#| label: fig-degree-distribution
+# import further modules
+import numpy as np
+from matplotlib import pyplot as plt
+from collections import Counter
+# compute node degree
+dd = Counter(dict(fr_g.degree()).values())
+# plot the degree distribution
+fig = plt.figure(figsize=(4, 3))
+ax = fig.add_subplot(111)
+ax.scatter(dd.keys(), dd.values(), color="limegreen", alpha=0.15)
+ax.set_yscale("log")
+ax.set_xscale("log")
+ax.set_xlabel("Log(Degree)")
+ax.set_ylabel("Log(Counts of nodes)")
+ax.grid(True, ls="--")
+plt.show()
+```
+
+It is self-explanatory that the degree distribution of the friendship network for Croatia is right-skewed, which is a common feature of social networks. We can try to getter a better understanding of the network --- including the presence and locatio of 'hub' users --- by visualizing it. Since the network is large, we may benefit from using the visualization capabilities of [`graph-tool`](https://graph-tool.skewed.de/), a Python API wrapping around C++ code, a more efficient alternative to pure Python `NetworkX`:
+
+```{python}
+#| fig-cap: Friendship network for Croatia (N=54,573)
+#| fig-cap-location: margin
+#| label: fig-fr-gt
+# import further module
+from graph_tool.allimport*
+# iterate over the Pandas DataFrame to create the graph and edges to it
+edges = [(str(u), str(v)) for u, v in fr[['node_1', 'node_2']].values]
+fer_gt = Graph(edges, hashed=True, directed=False)
+# plot the network
+# graph_tool.draw.graph_draw(fer_gt, output_size=(500, 500), output="fer_gt.png")
+# load image
+from IPython.display import Image
+Image(filename='fer_gt.png')
+```
+
+It is worth noticing the friendship network presents a periphery of users with low degree and, plausibly, a core of users with high degree. However, the figure does not provide a clear picture of the core of the network, which deserves further investigation.
+
+## Music genre preferences
+
+Building on the previous sub-section, we consider the preferences of users as per `HR_genres.json` files. These files are JSON files, which can be loaded using the `json` module:
+
+```{python}
+import json
+withopen('../data/deezer_clean_data/HR_genres.json', 'r') as f:
+ pr_json = json.load(f)
+pr_json["11542"]
+```
+
+At this stage, we have a dictionary where each key is a user identifier and the corresponding value is a list of genres that the user likes. For example, above is the list of music genres that user `11542` likes. We can convert the dictionary into a `Pandas` DataFrame drawing upon Pandas' `json_normalize` function:
+
+```{python}
+pr = pd.json_normalize(pr_json).T
+pr.rename({0: 'genres'}, axis=1, inplace=True)
+pr.head()
+```
+
+The data preview shows that the DataFrame `pr` has a single column, `genres`, which contains lists of genres that users like. To make the data more amenable to analysis, we can explode the lists of genres into separate rows drawing upon Pandas' `explode` function:
+
+```{python}
+pr = pr.explode('genres')
+pr.reset_index(inplace=True)
+pr.rename({'index': 'user_id'}, axis=1, inplace=True)
+pr.head()
+```
+
+For illustrative purposes, we can consider the distribution of genres liked by users in the dataset:
+
+```{python}
+#| fig-cap: Degree distribution for music genres in the 'Croatia' dataset
+#| fig-cap-location: margin
+#| label: fig-genres-distribution
+genres = Counter(pr.groupby('genres').size())
+fig = plt.figure(figsize=(6, 3))
+ax = fig.add_subplot(111)
+ax.hist(genres.keys(), color="magenta", alpha=0.5)
+ax.set_xticklabels(["{:,}".format(int(x)) for x in ax.get_xticks()])
+ax.set_xlabel("Degree -- number of likers")
+ax.set_ylabel("Counts of music genres")
+ax.grid(True, ls="--")
+plt.show()
+```
+
+The degree distribution of music genres liked by users is right-skewed, meaning there are few genres that are liked by many users and many genres that are liked by few users.
+
+# Problem description
+
+Your employer is a consultancy firm that has been hired by 'Sonic,' a major music label, to better understand the categories that compose the music market. The [artists and repertoire](https://en.wikipedia.org/wiki/Artists_and_repertoire) roles at Sonic have struggled to make sense of the association between the 'music genre' tags (e.g., 'International Pop') in Deezer and similar services. Some think that these tags are sometimes redundant. In other circumstances, they are unclear or meaningless. Therefore, it is hard for Sonic to see clear targets in the market, and, consequently, correctly position new albums and musicians against consumer preferences. Sonic wants a map of the categories that form the music market. This map must be based on some digital traces -- i.e., behavioral data -- , easy to interpret, and well-grounded in demonstrable data patterns.
+
+To help Sonic address its business problem, you have been asked to analyze the 'Croatia' dataset, briefly described in the previous section. You are expected to use network analytic methods and tools to:
+
+1. Assess the similarity between Deezer music genres
+2. Identify homogenous groups of Deezer music genres
+3. Highlight how social ties among users influence the similarity between music genres
+
+# Submission package
+
+The submission package consists of:
+
++ A report that includes:
+ - The description of the workflow you have followed to address the problem (300 words MAX)
+ - The results of your analysis, comprising text (300 words MAX) and exhibits (five among figures and tables MAX)
+ - The interpretation of the results in light of the business problem (300 words MAX)
++ The computer code that allows me to fully reproduce your charts (being, R, Python, Julia, Rust, C++, Java, etc.). The code should be well-commented and easy to read. Non-reproducible exhibits will not be graded.
+
+ This notebook shows communities in a network — that is, groups of nodes densely connected to each others and sparsely connected with outgroup nodes. Specifically, the attention revolves around two popular community detection algorithms like Girvan-Newman and Louvain’s.
+
+
+
+
+
+
+
+
+
1 Notebook setup
+
For this tutorial, we rely on ‘usual suspects’ Python packages, like numpy, matplotlib, and networkx. The latter is the most popular Python package for the creation, manipulation, and study of the structure small to moderate size networks.
+
+
+Code
+
import numpy as np
+import matplotlib.pyplot as plt
+import networkx as nx
+
+
+
+
+
2 Load Karate Club network
+
The Karate Club dataset is a well-known social network dataset representing the friendships between 34 members of a karate club at a US university in the 1970s1. The network consists of 34 nodes and 78 edges, where nodes represent members and edges represent friendships. The dataset is often used for testing community detection algorithms, as it naturally splits into two communities due to a conflict between the club’s instructor and the administrator, leading to the formation of two separate clubs.
+
+
+Code
+
G = nx.karate_club_graph()
+
+
+
+
+
3 Visualize the network
+
The visual inspection of the network (see Figure 1) reveals two distinct groups of nodes that may correspond to two communities, i.e., groups of nodes that are more densely connected to each other than to nodes outside the group. Communities often represent functional units within the network, such as groups of friends in a social network, modules in a biological network, or clusters of related documents in an information network. However, we need to produce conclusive evidence that these groups are indeed communities.
4 Community detection using Girvan-Newman’s algorithm
+
networkx provides an implementation of the Girvan-Newman2 algorithm, which is a hierarchical clustering method based on edge betweenness centrality. The algorithm iteratively removes the edge with the highest betweenness centrality, recalculates the centrality of the remaining edges, and identifies the connected components of the graph. The process continues until the desired number of communities is reached.
+
Let us consider the first iteration of the Girvan-Newman algorithm, which consists of computing edge betweenness centrality. In Figure 2), the edges are color-coded against their betweenness centrality values, with warmer colors indicating higher centrality.
The visual inspection of edge betweenness centrality suggests that the edge connecting nodes 0 and 31 has the highest centrality. We can check this by sorting the edges by centrality and examining the top five edges.
The second step consists of removing the 0-31 and recalculating the centrality of the remaining edges. It is straight-forward that G will still be connected. In other words, we will not be able to see the two groups of nodes that get disconnected because of the removal one specific edge. Therefore, we will not have identified any partitioning of the network, that is, community structure. The process is repeated until the network breaks down into two connected components least.
+
+
+Code
+
# remove edge 0-31
+G.remove_edge(0, 31)
+# recalculate edge betweenness centrality
+edge_betweenness = nx.edge_betweenness_centrality(G)
+# inspect the first 5 edges by centrality
+edge_betweenness_sorted =sorted(edge_betweenness.items(), key=lambda x: x[1], reverse=True)
+print(edge_betweenness_sorted[:5])
+# double check that the graph is still connected
+print(nx.is_connected(G))
Figure 3 visualizes the network after removing the edge 0-31. The two communities are clearly visible, with nodes 0 and 31 belonging to different groups.
The intuition behind the Girvan-Newman algorithm is that edges connecting different communities have higher betweenness centrality, as they are crucial for connecting the communities. By iteratively removing these edges, the algorithm effectively identifies the communities in the network. For example, Figure 3 shows the G is at risk to get disconnected if edges like 0-2, 0-8, and 19-33 are removed.
+
Luckily, networkx provides a convenient function community.girvan_newman to automate the process of community detection using the Girvan-Newman algorithm. The function returns an iterator over the discovered communities, allowing us to stop the algorithm at a specific number of communities. Let us apply the Girvan-Newman algorithm to the Karate Club network and visualize the communities.
+
+
+Code
+
# we must re-add the edge 0-31 to the graph
+G.add_edge(0, 31)
+# Girvan-Newman algorithm
+fit = nx.community.girvan_newman(G)
+tuple(sorted(c) for c innext(fit))
By default, the Girvan-Newman algorithm stops when the graph is partitioned into two communities. However, we can specify the desired number of communities by stopping the algorithm at a specific level. For example, we can stop the algorithm at the third level to obtain three communities.
+
+
+Code
+
import itertools
+k =4
+fit = nx.community.girvan_newman(G)
+limited = itertools.takewhile(lambda c: len(c) <= k, fit)
+for communities in limited:
+print(tuple(sorted(c) for c in communities))
The visual inspection of Girvan-Newman’s algorithm outcome is a plausible place to start to adjudicate between alternative community structures.3 Let us start by visualizing the network with two communities (see Figure 4).
+
+
+Code
+
# fit the Girvan-Newman algorithm
+fit = nx.community.girvan_newman(G)
+# we retain the first three partitions of the network
+k =4
+# get the membership of the nodes into communities
+limited = itertools.takewhile(lambda c: len(c) <= k, fit)
+fits = {}
+for _, communities inenumerate(limited):
+ fits[_] =tuple(sorted(c) for c in communities)
+# get the membership of the nodes into communities
+two_communities = fits[0]
+# color code the communities
+colors = ["plum"if node in two_communities[0] else"lightgreen"for node in G.nodes]
+# visualize the network
+nx.draw(
+ G,
+ pos,
+ with_labels=True,
+ node_color=colors,
+ node_size=300,
+ edge_color="gray",
+)
+
+
+
+
+
+
+
+
One may point out that the solution in Figure 4 presents a clear-cut division of the network into two communities. However, the division is not perfect, as some nodes are on the boundary between the two communities (see for example nodes 2 and 13). This is a common issue in community detection, as nodes can have multiple connections to different communities. The Girvan-Newman algorithm is a divisive method that partitions the network into communities by removing edges, which may lead to suboptimal results.
+
The presence of boarder nodes is not the most concerning issue in this case, though. The lower-left section of Figure 4 indicates the presence of a group of nodes that are densely connected to each other but are not clearly part of the two main communities. Let us visualize the network with three communities to investigate this further.
+
+
+Code
+
# color code the communities
+three_communities = fits[1]
+# print(three_communities)
+colors = [
+ (
+"plum"if node in three_communities[0]
+else"lightgreen"if node in three_communities[1]
+else"lightblue"
+ )
+for node in G.nodes
+]
+# visualize the network
+nx.draw(
+ G,
+ pos,
+ with_labels=True,
+ node_color=colors,
+ node_size=300,
+ edge_color="gray",
+)
+
+
+
+
+
+
+
+
The three-community structure does not yield the expected representation of the network (in which nodes 4, 5, 6, 10, and 16 form their own community). Instead, it is node 9, a ‘boarder’ node, that gets assigned to the third community. In light of this unsatisfactory solution, one may want to render and visualize the four-community structure (see Figure 6).
+
+
+Code
+
# color code the communities
+four_communities = fits[2]
+# print(three_communities)
+colors = [
+ (
+"plum"if node in four_communities[0]
+else"lightgreen"if node in four_communities[1]
+else"orange"if node in four_communities[2]
+else"lightblue"
+ )
+for node in G.nodes
+]
+# visualize the network
+nx.draw(
+ G,
+ pos,
+ with_labels=True,
+ node_color=colors,
+ node_size=300,
+ edge_color="gray",
+)
+
+
+
+
+
+
+
+
+
+
5 Community detection using Louvaine’s algorithm
+
The Louvain community detection algorithm4 is a popular method for identifying communities in large networks. It is an iterative, modularity-based algorithm that optimizes the modularity of a partition of the network5. Modularity is a measure of the density of links inside communities compared to links between communities.
+
The algorithm operates in two main phases that are repeated iteratively. In the first phase, each node is assigned to its own community. Then, for each node, the algorithm considers moving it to the community of each of its neighbors, choosing the move that results in the highest increase (or smallest decrease) in modularity. This process is repeated for all nodes until no further improvement can be achieved.
+
In the second phase, the algorithm aggregates nodes belonging to the same community into a single node, creating a new, smaller network. Edges between the new nodes are weighted by the sum of the weights of the edges between the original nodes in the corresponding communities. The first phase is then reapplied to this new network.
+
These two phases are repeated iteratively until the modularity no longer increases significantly. The result is a hierarchical decomposition of the network into communities, which can be represented at different levels of granularity. The Louvain algorithm is efficient and can handle large networks, making it widely used in various applications, including social network analysis, biology, and information retrieval.
+
Let us consider an example of applying the Louvain algorithm to the Karate Club network. The community module in networkx provides an implementation of the Louvain algorithm, which we can use to detect communities in the network.
+
+
+Code
+
# Louvain algorithm fit
+fit = nx.community.louvain_communities(G)
+# retriece the communities
+communities =tuple(sorted(c) for c in fit)
+print(communities)
The community structure solution that maximizes the modularity criterion comprisese the following communities: 0, 1, 2, 3, 7, 13, 17, 19, 21 and 4, 5, 6, 10, 16 and 8, 9, 11, 12, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33. The following Figure 7 visualize the network with the identified communities.
+
+
+Code
+
colors = [
+ (
+"plum"if node in communities[0]
+else"lightgreen"if node in communities[1]
+else"lightblue"
+ )
+for node in G.nodes
+]
+# visualize the network
+nx.draw(
+ G,
+ pos,
+ with_labels=True,
+ node_color=colors,
+ node_size=300,
+ edge_color="gray",
+)
+
+
+
+
+
+
+
+
The Louvain algorithm is not only capable to isolate the most plausible community structure in a network. It can also handle weighted networks. Let us consider the case of a weighted Karate Club network, where the edge weights represent the strength of the friendship between members (see Figure 8). The following code snippet shows how to create a weighted version of the Karate Club network and apply the Louvain algorithm to detect communities.
+
+
+Code
+
# weighted Karate Club network
+G_weighted = nx.karate_club_graph()
+# assign random weights to the edges
+import numpy as np
+for u, v in G_weighted.edges:
+ G_weighted[u][v]["weight"] = np.random.random_integers(1, 10)
+# visualize the weighted network
+nx.draw(
+ G_weighted,
+ pos,
+ with_labels=True,
+ node_color="lightgray",
+ node_size=300,
+ edge_color=[G_weighted[u][v]["weight"] for u, v in G_weighted.edges],
+ edge_cmap=plt.cm.Greens,
+ edge_vmin=0,
+ edge_vmax=10,
+)
+
+
+
+
+
+
+
+
Then, we fit the Louvain algorithm to the weighted network and visualize the communities — see Figure 9.
+
+
+Code
+
# fit the Louvain algorithm to the weighted network
+fit = nx.community.louvain_communities(G_weighted, weight="weight")
+# retrieve the communities
+communities =tuple(sorted(c) for c in fit)
+# visualize the network with the identified communities
+colors = [
+ (
+"plum"if node in communities[0]
+else"lightgreen"if node in communities[1]
+else"lightblue"
+ )
+for node in G_weighted.nodes
+]
+# visualize the network
+nx.draw(
+ G_weighted,
+ pos,
+ with_labels=True,
+ node_color=colors,
+ node_size=300,
+ edge_color=[G_weighted[u][v]["weight"] for u, v in G_weighted.edges],
+ edge_cmap=plt.cm.Greens,
+ edge_vmin=0,
+ edge_vmax=10,
+)
+
+
+
+
+
+
+
+
Considering the weighted network, the Louvain algorithm yields some notable results:
+
+
The strong ties between nodes 0, 4m, and 10 make nodes 0 and 4 part of the same community, despite the redundant ties to nodes 5, 6 and 15 (compare Figure 7 and Figure 9)
+
Nodes located at the boarder of the communities are more likely to be assigned to the community with which they share the strongest ties. For example, node 9 is assigned to the same community as node 30; node 19 is assigned to to the same community as node 3.
+
+
+
+
+
+
+
Footnotes
+
+
+
Zachary, W. W. (1977). An information flow model for conflict and fission in small groups. Journal of anthropological research, 33(4), 452-473. doi:10.1086/jar.33.4.3629752↩︎
+
Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821-7826. doi:10.1073/pnas.122653799↩︎
+
It is worth noticing that Girvan-Newman’s algorithm is not deterministic, and the results may vary depending on the initial conditions and the order in which edges are removed. Therefore, it is essential to consider multiple runs of the algorithm and compare the results to identify robust communities.↩︎
+
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008. doi:10.1088/1742-5468/2008/10/P10008↩︎
+
Nicolas Dugué, Anthony Perez. Directed Louvain : maximizing modularity in directed networks. [Research Report] Université d’Orléans. 2015. hal-01231784. https://hal.archives-ouvertes.fr/hal-01231784↩︎
+
+
+
+
Source Code
+
---
+title: Community Detection
+author: Simone Santoni
+date: last-modified
+abstract-title: Synopsis
+abstract: This notebook shows communities in a network --- that is, groups of nodes densely connected to each others and sparsely connected with outgroup nodes. Specifically, the attention revolves around two popular community detection algorithms like Girvan-Newman and Louvain's.
+warning: false
+fig-cap-location: top
+format:
+ html:
+ code-fold: true
+ code-tools: true
+ toc: true
+ toc-title: Table of Contents
+ toc-depth: 2
+ toc-location: right
+ number-sections: true
+ citations-hover: false
+ footnotes-hover: false
+ crossrefs-hover: false
+ theme: journal
+ fig-width: 9
+ fig-height: 6
+ ipynb: default
+ docx: default
+ typst:
+ number-sections: true
+ df-print: paged
+ #pdf:
+ # documentclass: scrartcl
+ # papersize: letter
+ # papersize: letter
+---
+
+# Notebook setup
+
+For this tutorial, we rely on 'usual suspects' Python packages, like `numpy`, `matplotlib`, and `networkx`. The latter is the most popular Python package for the creation, manipulation, and study of the structure small to moderate size networks.
+
+```{python}
+import numpy as np
+import matplotlib.pyplot as plt
+import networkx as nx
+```
+
+# Load Karate Club network
+
+The Karate Club dataset is a well-known social network dataset representing the friendships between 34 members of a karate club at a US university in the 1970s[^0]. The network consists of 34 nodes and 78 edges, where nodes represent members and edges represent friendships. The dataset is often used for testing community detection algorithms, as it naturally splits into two communities due to a conflict between the club's instructor and the administrator, leading to the formation of two separate clubs.
+
+[^0]: Zachary, W. W. (1977). An information flow model for conflict and fission in small groups. Journal of anthropological research, 33(4), 452-473. doi:10.1086/jar.33.4.3629752
+
+```{python}
+G = nx.karate_club_graph()
+```
+
+# Visualize the network
+
+The visual inspection of the network (see @fig-karate-club) reveals two distinct groups of nodes that may correspond to two communities, i.e., groups of nodes that are more densely connected to each other than to nodes outside the group. Communities often represent functional units within the network, such as groups of friends in a social network, modules in a biological network, or clusters of related documents in an information network. However, we need to produce conclusive evidence that these groups are indeed communities.
+
+```{python}
+# | fig-cap: Visualization of the Karate Club network
+# | label: fig-karate-club
+# | fig-cap-location: margin
+# | fig-width: 500
+# fix node positions for better visualization
+pos = nx.spring_layout(G, seed=123)
+# draw the network
+nx.draw(
+ G, pos, with_labels=True, node_color="lightgray", node_size=300, edge_color="gray"
+)
+```
+
+# Community detection using Girvan-Newman's algorithm
+
+`networkx` provides an implementation of the Girvan-Newman[^1] algorithm, which is a hierarchical clustering method based on edge betweenness centrality. The algorithm iteratively removes the edge with the highest betweenness centrality, recalculates the centrality of the remaining edges, and identifies the connected components of the graph. The process continues until the desired number of communities is reached.
+
+[^1]: Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821-7826. doi:10.1073/pnas.122653799
+
+Let us consider the first iteration of the Girvan-Newman algorithm, which consists of computing edge betweenness centrality. In @fig-karate-club-betweenness), the edges are color-coded against their betweenness centrality values, with warmer colors indicating higher centrality.
+
+```{python}
+#| fig-cap: Edge betweenness centrality in the Karate Club network
+#| label: fig-karate-club-betweenness
+#| fig-cap-location: margin
+#| fig-width: 500
+# edge betweenness centrality
+edge_betweenness = nx.edge_betweenness_centrality(G)
+# network visualization
+nx.draw(
+ G,
+ pos,
+ with_labels=True,
+ node_color="lightgray",
+ node_size=300,
+ edgelist=edge_betweenness.keys(),
+ edge_color=list(edge_betweenness.values()),
+ edge_cmap=plt.cm.Reds,
+ edge_vmin=0,
+ edge_vmax=0.1,
+)
+```
+
+The visual inspection of edge betweenness centrality suggests that the edge connecting nodes `0` and `31` has the highest centrality. We can check this by sorting the edges by centrality and examining the top five edges.
+
+```{python}
+edge_betweenness_sorted =sorted(edge_betweenness.items(), key=lambda x: x[1], reverse=True)
+print(edge_betweenness_sorted[:5])
+```
+
+The second step consists of removing the `0-31` and recalculating the centrality of the remaining edges. It is straight-forward that `G` will still be connected. In other words, we will not be able to see the two groups of nodes that get disconnected because of the removal one specific edge. Therefore, we will not have identified any partitioning of the network, that is, community structure. The process is repeated until the network breaks down into two connected components least.
+
+```{python}
+# remove edge 0-31
+G.remove_edge(0, 31)
+# recalculate edge betweenness centrality
+edge_betweenness = nx.edge_betweenness_centrality(G)
+# inspect the first 5 edges by centrality
+edge_betweenness_sorted =sorted(edge_betweenness.items(), key=lambda x: x[1], reverse=True)
+print(edge_betweenness_sorted[:5])
+# double check that the graph is still connected
+print(nx.is_connected(G))
+```
+
+@fig-karate-club-removed-edge visualizes the network after removing the edge `0-31`. The two communities are clearly visible, with nodes `0` and `31` belonging to different groups.
+
+```{python}
+#| fig-cap: Visualization of the Karate Club network after removing the edge 0-31
+#| label: fig-karate-club-removed-edge
+#| fig-cap-location: margin
+nx.draw(
+ G,
+ pos,
+ with_labels=True,
+ node_color="lightgray",
+ node_size=300,
+ edgelist=edge_betweenness.keys(),
+ edge_color=list(edge_betweenness.values()),
+ edge_cmap=plt.cm.Reds,
+ edge_vmin=0,
+ edge_vmax=0.1,
+)
+```
+
+The intuition behind the Girvan-Newman algorithm is that edges connecting different communities have higher betweenness centrality, as they are crucial for connecting the communities. By iteratively removing these edges, the algorithm effectively identifies the communities in the network. For example, @fig-karate-club-removed-edge shows the `G` is at risk to get disconnected if edges like `0-2`, `0-8`, and `19-33` are removed.
+
+Luckily, `networkx` provides a convenient function `community.girvan_newman` to automate the process of community detection using the Girvan-Newman algorithm. The function returns an iterator over the discovered communities, allowing us to stop the algorithm at a specific number of communities. Let us apply the Girvan-Newman algorithm to the Karate Club network and visualize the communities.
+
+```{python}
+# we must re-add the edge 0-31 to the graph
+G.add_edge(0, 31)
+# Girvan-Newman algorithm
+fit = nx.community.girvan_newman(G)
+tuple(sorted(c) for c innext(fit))
+```
+
+By default, the Girvan-Newman algorithm stops when the graph is partitioned into two communities. However, we can specify the desired number of communities by stopping the algorithm at a specific level. For example, we can stop the algorithm at the third level to obtain three communities.
+
+```{python}
+import itertools
+k =4
+fit = nx.community.girvan_newman(G)
+limited = itertools.takewhile(lambda c: len(c) <= k, fit)
+for communities in limited:
+print(tuple(sorted(c) for c in communities))
+```
+
+The visual inspection of Girvan-Newman's algorithm outcome is a plausible place to start to adjudicate between alternative community structures.[^2] Let us start by visualizing the network with two communities (see @fig-karate-club-two-communities).
+
+[^2]: It is worth noticing that Girvan-Newman's algorithm is not deterministic, and the results may vary depending on the initial conditions and the order in which edges are removed. Therefore, it is essential to consider multiple runs of the algorithm and compare the results to identify robust communities.
+
+```{python}
+#| fig-cap: Visualization of the Karate Club network with two communities
+#| fig-cap-location: margin
+#| fig-width: 500
+#| label: fig-karate-club-two-communities
+# fit the Girvan-Newman algorithm
+fit = nx.community.girvan_newman(G)
+# we retain the first three partitions of the network
+k =4
+# get the membership of the nodes into communities
+limited = itertools.takewhile(lambda c: len(c) <= k, fit)
+fits = {}
+for _, communities inenumerate(limited):
+ fits[_] =tuple(sorted(c) for c in communities)
+# get the membership of the nodes into communities
+two_communities = fits[0]
+# color code the communities
+colors = ["plum"if node in two_communities[0] else"lightgreen"for node in G.nodes]
+# visualize the network
+nx.draw(
+ G,
+ pos,
+ with_labels=True,
+ node_color=colors,
+ node_size=300,
+ edge_color="gray",
+)
+```
+
+One may point out that the solution in @fig-karate-club-two-communities presents a clear-cut division of the network into two communities. However, the division is not perfect, as some nodes are on the boundary between the two communities (see for example nodes `2` and `13`). This is a common issue in community detection, as nodes can have multiple connections to different communities. The Girvan-Newman algorithm is a divisive method that partitions the network into communities by removing edges, which may lead to suboptimal results.
+
+The presence of boarder nodes is not the most concerning issue in this case, though. The lower-left section of @fig-karate-club-two-communities indicates the presence of a group of nodes that are densely connected to each other but are not clearly part of the two main communities. Let us visualize the network with three communities to investigate this further.
+
+```{python}
+# | fig-cap: Visualization of the Karate Club network with three communities
+# | fig-cap-location: margin
+# | fig-width: 500
+# | label: fig-karate-club-three-communities
+# color code the communities
+three_communities = fits[1]
+# print(three_communities)
+colors = [
+ (
+"plum"if node in three_communities[0]
+else"lightgreen"if node in three_communities[1]
+else"lightblue"
+ )
+for node in G.nodes
+]
+# visualize the network
+nx.draw(
+ G,
+ pos,
+ with_labels=True,
+ node_color=colors,
+ node_size=300,
+ edge_color="gray",
+)
+```
+
+The three-community structure does not yield the expected representation of the network (in which nodes `4`, `5`, `6`, `10`, and `16` form their own community). Instead, it is node `9`, a 'boarder' node, that gets assigned to the third community. In light of this unsatisfactory solution, one may want to render and visualize the four-community structure (see @fig-karate-club-four-communities).
+
+```{python}
+# | fig-cap: Visualization of the Karate Club network with four communities
+# | fig-cap-location: margin
+# | fig-width: 500
+# | label: fig-karate-club-four-communities
+# color code the communities
+four_communities = fits[2]
+# print(three_communities)
+colors = [
+ (
+"plum"if node in four_communities[0]
+else"lightgreen"if node in four_communities[1]
+else"orange"if node in four_communities[2]
+else"lightblue"
+ )
+for node in G.nodes
+]
+# visualize the network
+nx.draw(
+ G,
+ pos,
+ with_labels=True,
+ node_color=colors,
+ node_size=300,
+ edge_color="gray",
+)
+```
+
+# Community detection using Louvaine's algorithm
+
+The Louvain community detection algorithm[^3] is a popular method for identifying communities in large networks. It is an iterative, modularity-based algorithm that optimizes the modularity of a partition of the network[^4]. Modularity is a measure of the density of links inside communities compared to links between communities.
+
+[^3]: Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008. doi:10.1088/1742-5468/2008/10/P10008
+
+[^4]: Nicolas Dugué, Anthony Perez. Directed Louvain : maximizing modularity in directed networks. [Research Report] Université d’Orléans. 2015. hal-01231784. https://hal.archives-ouvertes.fr/hal-01231784
+
+The algorithm operates in two main phases that are repeated iteratively. In the first phase, each node is assigned to its own community. Then, for each node, the algorithm considers moving it to the community of each of its neighbors, choosing the move that results in the highest increase (or smallest decrease) in modularity. This process is repeated for all nodes until no further improvement can be achieved.
+
+In the second phase, the algorithm aggregates nodes belonging to the same community into a single node, creating a new, smaller network. Edges between the new nodes are weighted by the sum of the weights of the edges between the original nodes in the corresponding communities. The first phase is then reapplied to this new network.
+
+These two phases are repeated iteratively until the modularity no longer increases significantly. The result is a hierarchical decomposition of the network into communities, which can be represented at different levels of granularity. The Louvain algorithm is efficient and can handle large networks, making it widely used in various applications, including social network analysis, biology, and information retrieval.
+
+Let us consider an example of applying the Louvain algorithm to the Karate Club network. The `community` module in `networkx` provides an implementation of the Louvain algorithm, which we can use to detect communities in the network.
+
+```{python}
+# Louvain algorithm fit
+fit = nx.community.louvain_communities(G)
+# retriece the communities
+communities =tuple(sorted(c) for c in fit)
+print(communities)
+```
+
+The community structure solution that maximizes the modularity criterion comprisese the following communities: `0, 1, 2, 3, 7, 13, 17, 19, 21` and `4, 5, 6, 10, 16` and `8, 9, 11, 12, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33`. The following @fig-karate-club-louvain-communities visualize the network with the identified communities.
+
+```{python}
+# | fig-cap: Visualization of the Karate Club network with Louvain communities
+# | fig-cap-location: margin
+# | fig-width: 500
+# | label: fig-karate-club-louvain-communities
+colors = [
+ (
+"plum"if node in communities[0]
+else"lightgreen"if node in communities[1]
+else"lightblue"
+ )
+for node in G.nodes
+]
+# visualize the network
+nx.draw(
+ G,
+ pos,
+ with_labels=True,
+ node_color=colors,
+ node_size=300,
+ edge_color="gray",
+)
+```
+
+The Louvain algorithm is not only capable to isolate the most plausible community structure in a network. It can also handle weighted networks. Let us consider the case of a weighted Karate Club network, where the edge weights represent the strength of the friendship between members (see @fig-karate-club-weighted). The following code snippet shows how to create a weighted version of the Karate Club network and apply the Louvain algorithm to detect communities.
+
+```{python}
+#| fig-cap: Visualization of the weighted Karate Club network
+#| label: fig-karate-club-weighted
+#| fig-cap-location: margin
+#| fig-width: 500
+# weighted Karate Club network
+G_weighted = nx.karate_club_graph()
+# assign random weights to the edges
+import numpy as np
+for u, v in G_weighted.edges:
+ G_weighted[u][v]["weight"] = np.random.random_integers(1, 10)
+# visualize the weighted network
+nx.draw(
+ G_weighted,
+ pos,
+ with_labels=True,
+ node_color="lightgray",
+ node_size=300,
+ edge_color=[G_weighted[u][v]["weight"] for u, v in G_weighted.edges],
+ edge_cmap=plt.cm.Greens,
+ edge_vmin=0,
+ edge_vmax=10,
+)
+```
+
+Then, we fit the Louvain algorithm to the weighted network and visualize the communities --- see @fig-karate-club-louvain-communities-weighted.
+
+```{python}
+#| fig-cap: Visualization of the Karate Club network with Louvain communities in the weighted network
+#| fig-cap-location: margin
+#| fig-width: 500
+#| label: fig-karate-club-louvain-communities-weighted
+# fit the Louvain algorithm to the weighted network
+fit = nx.community.louvain_communities(G_weighted, weight="weight")
+# retrieve the communities
+communities =tuple(sorted(c) for c in fit)
+# visualize the network with the identified communities
+colors = [
+ (
+"plum"if node in communities[0]
+else"lightgreen"if node in communities[1]
+else"lightblue"
+ )
+for node in G_weighted.nodes
+]
+# visualize the network
+nx.draw(
+ G_weighted,
+ pos,
+ with_labels=True,
+ node_color=colors,
+ node_size=300,
+ edge_color=[G_weighted[u][v]["weight"] for u, v in G_weighted.edges],
+ edge_cmap=plt.cm.Greens,
+ edge_vmin=0,
+ edge_vmax=10,
+)
+```
+
+Considering the weighted network, the Louvain algorithm yields some notable results:
+
++ The strong ties between nodes `0`, `4`m, and `10` make nodes `0` and `4` part of the same community, despite the redundant ties to nodes `5`, `6` and `15` (compare @fig-karate-club-louvain-communities and @fig-karate-club-louvain-communities-weighted)
++ Nodes located at the boarder of the communities are more likely to be assigned to the community with which they share the strongest ties. For example, node `9` is assigned to the same community as node `30`; node `19` is assigned to to the same community as node `3`.
+