Tool to analyze and visualize dependencies between cells in Excel spreadsheets in order to get an understanding of the complexity.
Will generate a graph of the dependencies between cells in an Excel spreadsheet. Data extracted with openpyxl
(https://foss.heptapod.net/openpyxl/openpyxl), the graph is generated with the networkx
library (https://networkx.org/) and is visualized using matplotlib
.
Single-cell references in a formula sitting in cell A3
like =A1+A2
is considered a dependency between the node A3
and the nodes A2
and A1
.
graph TD
A3 --> A1
A3 --> A2
A3["A3=A1+A2"]
A range defined in a formula like =SUM(B1:B3)
is kept as a single node in the graph, but all the containing cells are expanded as dependencies of the range node.
So when a cell, C1
contains =SUM(B1:B3)
the graph will look like this:
graph TD
R -->B1
R -->B2
R -->B3
R["B1:B3"]
C1 --> R
C1["C1=SUM(B1:B3)"]
PyPi project: graphedexcel
pip install graphedexcel
python -m venv venv
source venv/bin/activate
pip install -e .
python -m graphedexcel <path_to_excel_file>
usage: graphedexcel [-h] [--as-directed-graph] [--no-visualize]
[--layout {spring,circular,kamada_kawai,shell,spectral}]
[--config CONFIG] [--output-path OUTPUT_PATH]
[--open-image]
path_to_excel
Process an Excel file to build and visualize dependency graphs.
positional arguments:
path_to_excel Path to the Excel file to process.
options:
-h, --help show this help message and exit
--as-directed-graph, -d
Treat the dependency graph as directed.
--no-visualize, -n Skip the visualization of the dependency
graph.
--layout, -l {spring,circular,kamada_kawai,shell,spectral}
Layout algorithm for graph visualization
(default: spring).
--config, -c CONFIG Path to the configuration file for
visualization. See README for details.
--output-path, -o OUTPUT_PATH
Specify the output path for the generated
graph image.
--open-image Open the generated image after visualization.
The following is the output of running the script on the sample docs/Book1.xlsx
file.
=== Dependency Graph Summary ===
Cell/Node count 70
Dependency count 100
=== Most connected nodes ===
Range Madness!A2:A11 22
Range Madness!B2:B11 11
Range Madness!F1 10
Main Sheet!B5 4
Main Sheet!B22 4
Detached !A2:A4 4
Range Madness!B2 4
Range Madness!B3 4
Range Madness!B4 4
Range Madness!B5 4
=== Most used functions ===
SUM 4
POWER 1
Visualizing the graph of dependencies.
This might take a while...
Graph visualization saved to images/.\Book1.xlsx.png
More in docs/images
folder.
You can customize the graph visualization settings by passing a path to a JSON configuration file. This allows you to override the default settings with your own preferences.
Look at https://networkx.org/documentation/stable/reference/generated/networkx.drawing.nx_pylab.draw_networkx.html for the available settings.
The default settings for the graph visualization in the various sizes (from graph_visualizer.py
):
# Default settings for the graph visualization
base_graph_settings = {
"node_size": 50, # the size of the node
"width": 0.2, # the width of the edge between nodes
"edge_color": "black", # the color of the edge between nodes
"linewidths": 0, # the stroke width of the node border
"with_labels": False, # whether to show the node labels
"font_size": 10, # the size of the node labels
"cmap": "tab20b", # the color map to use for coloring nodes
"fig_size": (10, 10), # the size of the figure
}
# Sized-based settings for small, medium, and large graphs
small_graph_settings = {
"with_labels": False,
"alpha": 0.8}
medium_graph_settings = {
"node_size": 30,
"with_labels": False,
"alpha": 0.4,
"fig_size": (20, 20),
}
large_graph_settings = {
"node_size": 20,
"with_labels": False,
"alpha": 0.2,
"fig_size": (25, 25),
}
To override these settings, create a JSON file (e.g., graph_settings.json) with the desired settings. Here is an example of a JSON configuration file:
{
"node_size": 40,
"edge_color": "blue",
"with_labels": true,
"font_size": 12,
"alpha": 0.6
}
To use the custom configuration, pass the path to the JSON file as an argument to the script:
python -m graphedexcel myexcel.xlsx --config graph_settings.json
This will render the graph using the custom settings defined in the JSON file.
Just run pytest in the root folder.
pytest