Skip to content

RISE-UNIBAS/networks_gephi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Network analysis in the Humanities. Gephi

Introduction course to Network analysis and visualization with Gephi.

By José Luis Losada

☞ Course outline

Network analysis in the Humanities

Showcase

Networks

network nodes edges
Theater Plays character co-appearance on the scene
Stylometry plays stylistic similarity
Scientific collaboration authors co-authoring
... ... ...
  • Method of representing connection or interaction patterns between parts of a system.

  • The concept of network supposes a relational structure that can be studied (1) in a logical and mathematical way: Graph theory (discipline). History: Euler and the seven bridges of Königsberg.

  • (2) Exploration through visualization.

“Networks are extraordinary calculating devices, but they are also maps, instruments of navigation and representation” (Jacomy 2017: 155).

Basic concepts. Nodes and edges

  • Network: points joined by lines.
  • points: nodes or vertices.
  • lines: edges o links.
  • Attributes: extra information about nodes or edges
  • Types of networks:

Simple Network

Bipartite Network

Multiple Network

Multiple and Directed Network

Formalization and file formats

Formalization

Edgelists, matrices, adjacency lists

Edgelist: it is a set of structured data that contains at least two columns: a column of nodes that are the source of a connection (source) and another column of nodes that are the destination of the connection (target). The rest of the columns correspond to the attributes.

source target weight lang type
Juan Elena 4 esp undirected
Juan Hans 2 de undirected
Juan Marta 1 eng undirected
Juan Marek 1 de undirected
... ... ... ... ...

Adjacency matrix: a square matrix (equal number of columns and rows)

Juan Hans Elena Marta Marek
Juan 0 1 1 1 1
Hans 1 0 0 1 1
Elena 1 0 0 0 0
Marta 1 1 0 0 0
Marek 1 1 0 0 0

File Formats

  • CSV. Edgelist in CSV:
source,target,language,weight
Juan,Elena,esp,4
Juan,Hans,de,2
Juan,Marta,eng,1
Juan,Marek,de,1
Juan,Marek,esp,1
Juan,Marek,pol,5
Hans,Marta,eng,1
Hans,Marek,de,1
  • CSV. Edgelist + Nodes in CSV:
source,target
1,4
1,2
1,3

id,Label
1,Juan
2,Hans
3,Marta
4,Elena

It is recommended to save structured data in CSV, although Gephi accepts tables in Excel.

  • gexf (XML)
[...]
      <node id="Marek" label="Marek">
        <attvalues>
          <attvalue for="att1" value="2.0"/>
        </attvalues>
        <viz:size value="4.0"/>
        <viz:position x="-22.013721" y="26.080078"/>
        <viz:color r="255" g="99" b="71"/>
      </node>
    </nodes>
    <edges>
      <edge id="0" source="Juan" target="Hans" weight="2.0"/>
      <edge id="1" source="Juan" target="Elena" weight="4.0"/>
      <edge id="2" source="Juan" target="Marta"/>
      <edge id="3" source="Juan" target="Marek" weight="7.0"/>
      <edge id="4" source="Hans" target="Marta"/>
      <edge id="5" source="Hans" target="Marek"/>
    </edges>
  </graph>
</gexf>

Visualization (spatialization)

Same graph, different layout.

Bipartite network

Algorithms for drawing the graph

  • Common Gephi Algorithms: Force Atlas, Fruchterman Reingold,...

Metrics

  • Degree centrality: nº of connections.
  • Betweenness centrality: bridge nodes.
  • Eigenvector centrality: nodes connected to well-connected nodes.
  • Modularity (Louvain, Leiden algorithms): clusters of nodes.
  • ...

degree-distribution

Tools

Workflow: from data to visualization.

work flow

  • Programming languages (full workflow): R, Python, JavaScript,...
  • OpenRefine, Table2net,...
  • Tableau, Nodegoat,...
  • Gephi, Cytoscape, VOSviewer, Graphext, orange,...

Gephi. Open Graph Viz Platform

Gephi has restarted its development in recent years. It can be downloaded from its https://gephi.org page or directly from the repository on github gephi/releases.

One of the advantages of the new versions (since 0.9.3) is that it already comes with Java (program language and execution environment for programs such as Gephi). More about the installation at https://gephi.org/users/install/.

New in 2023! Gephi Lite

Interface: Panel Overview

Plugins for Gephi:

They are located in Tools > Plugin. They add extra functionalities to Gephi (metrics, import, export, spatializations, ...).

  • Multimode networks transformation: it projects a bipartite network into a simple one.

  • Sigma exporter: it exports the graph to visualize it dynamically using javascript and html.

  • Leiden algorithm: Modularity algorithm.

Data for this course

CSV and GEXF files are located in the folder /data in this repository

Theater

Co-appearance character networks in theater. The source of the data is https://dracor.org, from where they can be downloaded; I add them to /data just as back up copy.

  • calderon_VidaEsSueno_ezlinavis.csv
  • span000014-valle-luces.gexf

Literary awards

35 literary awards and 1325 award-winning authors: data obtained from Wikidata. CSV table with 3 variables: prizes, winners and gender (masc./fem.); bipartite network and simple networks in GEXF format.

  • authors_and_awards.csv
  • authors_and_awards.gexf
  • authors.gexf
  • awards.gexf

Dataset (+ node and egdes lists) is available in editio/premios-literarios and Zenodo: José Luis Losada (2022) DOI

Stylometry

Stylometry Network of plays of 17th. C. Spanish Theater. The nodes represent plays linked according to their stylistic similarity. Analysis performed using the consensus tree (2000-5000 MFW) and Delta distance with the R package, stylo (Eder, Rybicki and Kestemont, 2016), on a corpus of circa 700 plays and 50 authors. Interactive visualization in: Stylometry on Drama

  • stylometry_theater.gexf

Bibliography

Co-authoring network of 3500 publications on Stylometry. The bibliography has been compiled by Christof Schöch, Bibliography on Stylometry, 2017, DOI: 10.5281/zenodo.835190.

  • biblio_stylo.gexf

Correspondence

Correspondence network of Alexander von Humboldt (sample of 105 letters). Data obtained from edition humboldt digital (CC BY-SA 4.0.) Sender, receiver, and date sent extracted from letters encoded in TEI.

  • humboldt_edgelist.csv
  • humboldt_network.gexf

Step-by-step instructions

Character networks

☞ Practice the basics of an edgelist, how to load it into Gephi and perform the first steps of visualization and metrics.

  1. Dracor > tools > https://ezlinavis.dracor.org > Examples > Calderón de la Barca> download edge list.
  2. Gephi > File > Import spreadsheet (CSV) > next > finish.
  • Layout: Fruchterman Reingold.
  • Nodes size based on degree: Appearance > nodes > size [icon circles ] > Ranking > Choose an attribute > Degree [min. 10 - max. 50].
  • Nodes labels: "copy data to other column" (Data laboratory). Alternative: "select attributes to display as labels" (Overview).
  • Centrality measures (Betweenness/Eigenvector): Segismundo vs Clarín (statistics > Network Diameter; Eigenvector Centrality).

☞ Familiarize with GEXF file format, open en Gephi, nodes attribute (male/female).

  1. Dracor > corpora > Spanish Drama Corpus > Valle Inclán, Luces de bohemia > Downloads > Archivo en gexf.
  2. Gephi > open > [no changes] > ok.
  • Data exploration: label, gender (Data laboratory).
  • Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > gender
  • Layout: Force Atlas 2 [Prevent overlap, Disuade Hubs, Scaling = 40] > run|stop.

From the data to the network: awards and winners

☞ Transform structured data (CSV) into an edgelist (GEXF)

  1. /data > authors_and_awards.csv
  2. table2net (transformation in the browser).
  3. Load table > Type of Network > Nodes > Build the network > Download.
  • 3.1 Network type: bipartite.
  • 3.2 Nodes 1: authors | attribute: masc/fem.
  • 3.3 Nodos 2: awards.

Awards and winners network (1)

☞ Explore bipartite networks.

  1. Gephi > open authors_and_awards.gexf.
  • Layout: Force Atlas 2 > run|stop; > Prevent overlap > run|stop; Zoom
  • Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > Type
  • Appearance > nodes > size [icon circles ] > Ranking > Choose an attribute > Degree [min. 10 - max. 50] (number of authors by award).
  • Nodes Labels: Show node Labels; More settings > Labels > Hide non-selected.
  • [reset colors] > Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > gender.

Awards and winners network (2)

☞ Explore simple networks

Files are available in /data/awards.gexf; /data/authors.gexf. They can also be created from the structured data (CSV) with (table2net) o using a transformation from the bipartite network (☞ vide infra).

  1. Gephi > open awards.gexf
  • Layout: Force atlas 2 [Prevent overlap, Disuade Hubs, Scaling = 50]

  • Appearance > nodes > size [icon circles ] > Ranking > Choose an attribute > Degree [min. 5 - max. 30].

  • Modularity: Community detection > Modularity > run.

  • Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > Modularity Class.

  • Check centrality metrics:

    • Statistics > eigenvector Centrality.
    • Appearance > nodes > size [icon circles ] > Ranking > Choose an attribute > eigenvector Centrality.
  1. Gephi > open authors.gexf
  • Layout: Layout: Fruchterman Reingold.
  • Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > sexlabel.
  • Appearance > nodes > size [icon circles ] > Ranking > Choose an attribute > Degree [min. 5 - max. 30].

☞ Switching from one type of network to another (projection).

  1. Plugin: multimodal networks transformation.
  • Bipartite Network.
  • Load attributes > type:
    • Award > Author / Author > Award (Simple network of awards)
    • Author > Award / Award > Author (Simple network of authors)
  • Remove nodes, edges.
  • Run.

Stylometry

☞ Explore textual networks

  1. Gephi > open stylometry_theater.gexf.
  • Layout: Force atlas 2 [Prevent overlap, Disuade Hubs, Scaling = 200].
  • Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > Classes (autores) > Palette > Generate [Limit number of colors: unchecked] > generate.
  • Appearance > nodes > size [icon circles ] > Unique > size = 20.
  • Nodes Labels: Show node Labels; More settings > Labels > Hide non-selected.

Compare with modularity algorithms:

  • Modularity: Community detection > Modularity > run.
  • Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > Modularity Class.

Bibliography

☞ Explore disconnected networks

  1. Gephi > open biblio_stylo.gexf.
  • Layout: Fruchterman Reingold (compare with Force Atlas 2).
  • Compare with modularity algorithms.

Correspondece

☞ Explore directed networks, Gephi's limits with multiple edges, filters and timelines.

  1. Gephi > File > Import spreadsheet (CSV) > next > Time representation [Intervals] > Finish > Edges merge strategy [Don't merge]
  • Layout: Fruchterman Reingold
  • Nodes labels: "copy data to other column" (Data laboratory) to allow for searching (cmd/ctrl F); (Overview): labels "Hide non-selected"; (Overview): edges "Selection color checked" (in-out).
  1. (Data laboratory) multiple edges? Humboldt -> Ehrenberg

  1. Gephi > File > Import spreadsheet (CSV) [...] Finish > Edges merge strategy [merge] > New workspace.

  2. Filters (see Using filters in Gephi)

  • Filters > Edges > Mutual Edges > Filter
  • Filters > Topology > In Degree | Out Degree > Filter
  1. Timeline
  • Use the network with multiple edges (be aware of the limitations also for the timeline)

  • (Data laboratory) Merge columns > date_sent > columns to merge > merge strategy > Create time interval > Parse dates

  • Enable timeline > Set time format (bottom left) [date format] > Set play settings (bottom left) [one bound].

Out of Gephi: Publication possibilities

☞ Static and dynamic forms of graph representation outside Gephi

  1. Panel Overview: Screeshot (left), More settings (right)...
  2. Panel Preview: export SVG, PNG, PDF.
  3. Plugin: Sigma Exporter. It creates a folder with the required libraries, data and files to display the graph interactively in a browser. It is necessary to upload it to a web server, for example, using Github Pages. For testing purposes, It is possible to launch a local server: Instructions.
  4. Retina (Web app, beta): Visualization in the browser (offline / online) from a GEXF file.
  5. Cosmograph: Visualization in the browser from a .csv file, also timelines.

Tutorials, manuals, references

Releases

No releases published