Introduction course to Network analysis and visualization with Gephi.
By José Luis Losada
☞ Course outline
- Showcase
- Networks
- Formalization and file formats
- Metrics
- Tools
- Data
- Hands-on
- Tutorials, manuals, references
- Characters Networks:
- Co-occurrence in Drama: Dracor
- Co-occurrence in Narrative: Les Miserables (graph); Les Miserables (matrix)
- Dynamic: Visualising the dynamics of character networks
- Paratexts: Project Bieses, corpus CECLE & CICLE
- Textual Networks:
- Historical Networks:
- Spatial Networks:
- Bibliographic networks:
- Citation: Vosviewer
- Content similarity: Connected Papers, inciteful
- Cultural networks:
- Semantic networks:
- Author (ego) networks:
network | nodes | edges |
---|---|---|
Theater Plays | character | co-appearance on the scene |
Stylometry | plays | stylistic similarity |
Scientific collaboration | authors | co-authoring |
... | ... | ... |
-
Method of representing connection or interaction patterns between parts of a system.
-
The concept of network supposes a relational structure that can be studied (1) in a logical and mathematical way: Graph theory (discipline). History: Euler and the seven bridges of Königsberg.
-
(2) Exploration through visualization.
“Networks are extraordinary calculating devices, but they are also maps, instruments of navigation and representation” (Jacomy 2017: 155).
- Network: points joined by lines.
- points: nodes or vertices.
- lines: edges o links.
- Attributes: extra information about nodes or edges
- Types of networks:
- Defined by the nodes: bipartite, simple, disconnected, ...
- Define by the edges: multiple, directed, ...
Edgelists, matrices, adjacency lists
Edgelist: it is a set of structured data that contains at least two columns: a column of nodes that are the source of a connection (source) and another column of nodes that are the destination of the connection (target). The rest of the columns correspond to the attributes.
source | target | weight | lang | type |
---|---|---|---|---|
Juan | Elena | 4 | esp | undirected |
Juan | Hans | 2 | de | undirected |
Juan | Marta | 1 | eng | undirected |
Juan | Marek | 1 | de | undirected |
... | ... | ... | ... | ... |
Adjacency matrix: a square matrix (equal number of columns and rows)
Juan | Hans | Elena | Marta | Marek | |
---|---|---|---|---|---|
Juan | 0 | 1 | 1 | 1 | 1 |
Hans | 1 | 0 | 0 | 1 | 1 |
Elena | 1 | 0 | 0 | 0 | 0 |
Marta | 1 | 1 | 0 | 0 | 0 |
Marek | 1 | 1 | 0 | 0 | 0 |
CSV
. Edgelist in CSV:
source,target,language,weight
Juan,Elena,esp,4
Juan,Hans,de,2
Juan,Marta,eng,1
Juan,Marek,de,1
Juan,Marek,esp,1
Juan,Marek,pol,5
Hans,Marta,eng,1
Hans,Marek,de,1
CSV
. Edgelist + Nodes in CSV:
source,target
1,4
1,2
1,3
id,Label
1,Juan
2,Hans
3,Marta
4,Elena
It is recommended to save structured data in CSV, although Gephi accepts tables in Excel.
gexf
(XML)
[...]
<node id="Marek" label="Marek">
<attvalues>
<attvalue for="att1" value="2.0"/>
</attvalues>
<viz:size value="4.0"/>
<viz:position x="-22.013721" y="26.080078"/>
<viz:color r="255" g="99" b="71"/>
</node>
</nodes>
<edges>
<edge id="0" source="Juan" target="Hans" weight="2.0"/>
<edge id="1" source="Juan" target="Elena" weight="4.0"/>
<edge id="2" source="Juan" target="Marta"/>
<edge id="3" source="Juan" target="Marek" weight="7.0"/>
<edge id="4" source="Hans" target="Marta"/>
<edge id="5" source="Hans" target="Marek"/>
</edges>
</graph>
</gexf>
- More file formats (supported by Gephi)
Same graph, different layout.
Bipartite network
- Common Gephi Algorithms: Force Atlas, Fruchterman Reingold,...
- Degree centrality: nº of connections.
- Betweenness centrality: bridge nodes.
- Eigenvector centrality: nodes connected to well-connected nodes.
- Modularity (Louvain, Leiden algorithms): clusters of nodes.
- ...
Workflow: from data to visualization.
- Programming languages (full workflow): R, Python, JavaScript,...
- OpenRefine, Table2net,...
- Tableau, Nodegoat,...
- Gephi, Cytoscape, VOSviewer, Graphext, orange,...
Gephi has restarted its development in recent years. It can be downloaded from its https://gephi.org page or directly from the repository on github gephi/releases.
One of the advantages of the new versions (since 0.9.3) is that it already comes with Java (program language and execution environment for programs such as Gephi). More about the installation at https://gephi.org/users/install/.
New in 2023! Gephi Lite
They are located in Tools > Plugin
. They add extra functionalities to Gephi (metrics, import, export, spatializations, ...).
-
Multimode networks transformation: it projects a bipartite network into a simple one.
-
Sigma exporter: it exports the graph to visualize it dynamically using javascript and html.
-
Leiden algorithm: Modularity algorithm.
CSV and GEXF files are located in the folder /data
in this repository
Co-appearance character networks in theater. The source of the data is https://dracor.org, from where they can be downloaded; I add them to /data
just as back up copy.
calderon_VidaEsSueno_ezlinavis.csv
span000014-valle-luces.gexf
35 literary awards and 1325 award-winning authors: data obtained from Wikidata. CSV table with 3 variables: prizes, winners and gender (masc./fem.); bipartite network and simple networks in GEXF format.
authors_and_awards.csv
authors_and_awards.gexf
authors.gexf
awards.gexf
Dataset (+ node and egdes lists) is available in editio/premios-literarios and Zenodo: José Luis Losada (2022)
Stylometry Network of plays of 17th. C. Spanish Theater. The nodes represent plays linked according to their stylistic similarity. Analysis performed using the consensus tree (2000-5000 MFW) and Delta distance with the R package, stylo (Eder, Rybicki and Kestemont, 2016), on a corpus of circa 700 plays and 50 authors. Interactive visualization in: Stylometry on Drama
stylometry_theater.gexf
Co-authoring network of 3500 publications on Stylometry. The bibliography has been compiled by Christof Schöch, Bibliography on Stylometry, 2017, DOI: 10.5281/zenodo.835190.
biblio_stylo.gexf
Correspondence network of Alexander von Humboldt (sample of 105 letters). Data obtained from edition humboldt digital (CC BY-SA 4.0.) Sender, receiver, and date sent extracted from letters encoded in TEI.
humboldt_edgelist.csv
humboldt_network.gexf
☞ Practice the basics of an edgelist, how to load it into Gephi and perform the first steps of visualization and metrics.
- Dracor > tools > https://ezlinavis.dracor.org > Examples > Calderón de la Barca> download edge list.
- Gephi > File > Import spreadsheet (CSV) > next > finish.
- Layout: Fruchterman Reingold.
- Nodes size based on degree: Appearance > nodes > size [icon circles ] > Ranking > Choose an attribute > Degree [min. 10 - max. 50].
- Nodes labels: "copy data to other column" (Data laboratory). Alternative: "select attributes to display as labels" (Overview).
- Centrality measures (Betweenness/Eigenvector): Segismundo vs Clarín (statistics > Network Diameter; Eigenvector Centrality).
☞ Familiarize with GEXF file format, open en Gephi, nodes attribute (male/female).
- Dracor > corpora > Spanish Drama Corpus > Valle Inclán, Luces de bohemia > Downloads > Archivo en gexf.
- Gephi > open > [no changes] > ok.
- Data exploration: label, gender (Data laboratory).
- Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > gender
- Layout: Force Atlas 2 [Prevent overlap, Disuade Hubs, Scaling = 40] > run|stop.
☞ Transform structured data (CSV) into an edgelist (GEXF)
/data
>authors_and_awards.csv
- table2net (transformation in the browser).
- Load table > Type of Network > Nodes > Build the network > Download.
- 3.1 Network type: bipartite.
- 3.2 Nodes 1: authors | attribute: masc/fem.
- 3.3 Nodos 2: awards.
☞ Explore bipartite networks.
- Gephi > open
authors_and_awards.gexf
.
- Layout: Force Atlas 2 > run|stop; > Prevent overlap > run|stop; Zoom
- Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > Type
- Appearance > nodes > size [icon circles ] > Ranking > Choose an attribute > Degree [min. 10 - max. 50] (number of authors by award).
- Nodes Labels: Show node Labels; More settings > Labels > Hide non-selected.
- [reset colors] > Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > gender.
☞ Explore simple networks
Files are available in /data/awards.gexf
; /data/authors.gexf
. They can also be created from the structured data (CSV) with (table2net) o using a transformation from the bipartite network (☞ vide infra).
- Gephi > open
awards.gexf
-
Layout: Force atlas 2 [Prevent overlap, Disuade Hubs, Scaling = 50]
-
Appearance > nodes > size [icon circles ] > Ranking > Choose an attribute > Degree [min. 5 - max. 30].
-
Modularity: Community detection > Modularity > run.
-
Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > Modularity Class.
-
Check centrality metrics:
- Gephi > open
authors.gexf
- Layout: Layout: Fruchterman Reingold.
- Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > sexlabel.
- Appearance > nodes > size [icon circles ] > Ranking > Choose an attribute > Degree [min. 5 - max. 30].
☞ Switching from one type of network to another (projection).
- Plugin: multimodal networks transformation.
- Bipartite Network.
- Load attributes > type:
- Award > Author / Author > Award (Simple network of awards)
- Author > Award / Award > Author (Simple network of authors)
- Remove nodes, edges.
- Run.
☞ Explore textual networks
- Gephi > open
stylometry_theater.gexf
.
- Layout: Force atlas 2 [Prevent overlap, Disuade Hubs, Scaling = 200].
- Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > Classes (autores) > Palette > Generate [Limit number of colors: unchecked] > generate.
- Appearance > nodes > size [icon circles ] > Unique > size = 20.
- Nodes Labels: Show node Labels; More settings > Labels > Hide non-selected.
Compare with modularity algorithms:
- Modularity: Community detection > Modularity > run.
- Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > Modularity Class.
☞ Explore disconnected networks
- Gephi > open
biblio_stylo.gexf
.
- Layout: Fruchterman Reingold (compare with Force Atlas 2).
- Compare with modularity algorithms.
☞ Explore directed networks, Gephi's limits with multiple edges, filters and timelines.
- Gephi > File > Import spreadsheet (CSV) > next > Time representation [Intervals] > Finish > Edges merge strategy [Don't merge]
- Layout: Fruchterman Reingold
- Nodes labels: "copy data to other column" (Data laboratory) to allow for searching (cmd/ctrl F); (Overview): labels "Hide non-selected"; (Overview): edges "Selection color checked" (in-out).
- (Data laboratory) multiple edges? Humboldt -> Ehrenberg
-
Gephi > File > Import spreadsheet (CSV) [...] Finish > Edges merge strategy [merge] > New workspace.
-
Filters (see Using filters in Gephi)
- Filters > Edges > Mutual Edges > Filter
- Filters > Topology > In Degree | Out Degree > Filter
- Timeline
-
Use the network with multiple edges (be aware of the limitations also for the timeline)
-
(Data laboratory) Merge columns > date_sent > columns to merge > merge strategy > Create time interval > Parse dates
-
Enable timeline > Set time format (bottom left) [date format] > Set play settings (bottom left) [one bound].
☞ Static and dynamic forms of graph representation outside Gephi
- Panel Overview: Screeshot (left), More settings (right)...
- Panel Preview: export SVG, PNG, PDF.
- Plugin: Sigma Exporter. It creates a folder with the required libraries, data and files to display the graph interactively in a browser. It is necessary to upload it to a web server, for example, using Github Pages. For testing purposes, It is possible to launch a local server: Instructions.
- Retina (Web app, beta): Visualization in the browser (offline / online) from a GEXF file.
- Cosmograph: Visualization in the browser from a .csv file, also timelines.
- Albert-László Barabási, Network Science, 2016.
- Mathieu Bastian, Sebastien Heymann, Mathieu Jacomy, “Gephi: An Open Source Software for Exploring and Manipulating Networks”, International AAAI Conference on Weblogs and Social Media, 2009, pp. 361-362.
- Gephi, Learn how to use Gephi.
- Martin Grandjean, Gephi: Introduction to Network Analysis and Visualization, 14/10/2015.
- Mathieu Jacomy, “A standard for presenting network visualizations”, Reticular, 01/03/2019, https://reticular.hypotheses.org/834.
- Mathieu Jacomy, Venturini, Tommaso, Liliana Bounegru, and Jonathan Gray (2017). “How to Tell Stories with Networks: Exploring the Narrative Affordances of Graphs with the Iliad”. In The Datafied Society, edited by Mirko Tobias Schäfer and Karin van Es, 155–170. Amsterdam. https://doi.org/10.1515/9789048531011-014
- Clément Levallois, Gephi tutorials, Last update: 2022.
- Mark Newman, Networks: An Introduction, Oxford University Press, 2010.
- Katherine Ognyanova, Static and dynamic network visualization with R, 2021
- Katharina A. Zweig, Network Analysis Literacy: A Practical Approach to the Analysis of Networks, Springer, 2016.