Installation | Syntax | Citation guidelines | Examples | Feedback | Change log
(22 Sep 2024)
This package allows users to draw Sankey plots in Stata. It is based on the Sankey Guide published on the Stata Guide on Medium on October 2021.
The package can be installed via SSC or GitHub. The GitHub version, might be more recent due to bug fixes, feature updates etc, and may contain syntax improvements and changes in default values. See version numbers below. Eventually the GitHub version is published on SSC.
SSC (v1.81):
ssc install sankey, replace
GitHub (v1.81):
net install sankey, from("https://raw.githubusercontent.com/asjadnaqvi/stata-sankey/main/installation/") replace
The palettes
package is required to run this command:
ssc install palettes, replace
ssc install colrspace, replace
ssc install graphfunctions, replace
Even if you have these packages installed, please check for updates: ado update, update
.
If you want to make a clean figure, then it is advisable to load a clean scheme. These are several available and I personally use the following:
ssc install schemepack, replace
set scheme white_tableau
You can also push the scheme directly into the graph using the scheme(schemename)
option. See the help file for details or the example below.
I also prefer narrow fonts in figures with long labels. You can change this as follows:
graph set window fontface "Arial Narrow"
The syntax for the latest version is as follows:
sankey value [if] [in] [weight], from(var) to(var)
[ by(var) palette(str) colorby(layer|level) colorvar(var) stock stock2 colorvarmiss(str) colorboxmiss(str)
smooth(1-8) gap(num) recenter(mid|bot|top) ctitles(list) ctgap(num) ctsize(num) ctposition(bot|top)
ctcolor(str) labangle(str) labsize(str) labposition(str) labgap(str) showtotal labprop labscale(num)
valsize(str) valcondition(num) format(str) valgap(str) novalues valprop valscale(num)
novalright novalleft nolabels sort1(value|name[, reverse]) sort2(value|order[, reverse]) align fill
lwidth(str) lcolor(str) alpha(num) offset(num) boxwidth(str) percent wrap(num) * ]
See the help file help sankey
for details.
The most basic use is as follows:
sankey value, from(var1) to(var2) [by(level)]
where var1
and var2
are source and destination variables respectively against which the value
variable is plotted. The by()
variable defines the levels and is optional since v1.72.
Software packages take countless hours of programming, testing, and bug fixing. If you use this package, then a citation would be highly appreciated. Suggested citations:
in BibTeX
@software{sankey,
author = {Naqvi, Asjad},
title = {Stata package ``sankey''},
url = {https://github.com/asjadnaqvi/stata-sankey},
version = {1.81},
date = {2024-10-16}
}
or simple text
Naqvi, A. (2024). Stata package "sankey" version 1.81. Release date 16 October 2024. https://github.com/asjadnaqvi/stata-sankey.
or see SSC citation (updated once a new version is submitted)
Get the example data from GitHub:
import excel using "https://github.com/asjadnaqvi/stata-sankey/blob/main/data/sankey_example2.xlsx?raw=true", clear first
Let's test the sankey
command:
sankey value, from(source) to(destination) by(layer)
sankey value, from(source) to(destination) by(layer) smooth(2)
sankey value, from(source) to(destination) by(layer) smooth(8)
sankey value, from(source) to(destination) by(layer) recenter(bot)
sankey value, from(source) to(destination) by(layer) recenter(top)
sankey value, from(source) to(destination) by(layer) gap(0)
sankey value, from(source) to(destination) by(layer) gap(20)
sankey value, from(source) to(destination) by(layer) noval showtot
sankey value, from(source) to(destination) by(layer) sort1(name)
sankey value, from(source) to(destination) by(layer) sort1(value)
sankey value, from(source) to(destination) by(layer) sort1(value) sort2(value)
sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(value)
sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(value, reverse)
sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(order)
sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(order, reverse)
Custom sorting on a value:
gen source2 = .
gen destination2 = .
foreach x in source destination {
replace `x'2 = 1 if `x'=="Blog"
replace `x'2 = 2 if `x'=="LinkedIn"
replace `x'2 = 3 if `x'=="Twitter"
replace `x'2 = 4 if `x'=="Direct"
replace `x'2 = 5 if `x'=="App"
replace `x'2 = 6 if `x'=="Medium"
replace `x'2 = 7 if `x'=="Website"
replace `x'2 = 8 if `x'=="Homepage"
replace `x'2 = 9 if `x'=="Total"
replace `x'2 = 10 if `x'=="Google"
replace `x'2 = 11 if `x'=="Facebook"
}
lab de labels 1 "Blog" 2 "LinkedIn" 3 "Twitter" 4 "Direct" 5 "App" 6 "Medium" 7 "Website" 8 "Homepage" 9 "Total" 10 "Google" 11 "Facebook", replace
lab val source2 labels
lab val destination2 labels
sankey value, from(source2) to(destination2) by(layer)
sankey value, from(source) to(destination) by(layer) boxwid(5)
sankey value, from(source) to(destination) by(layer) valcond(200)
sankey value, from(source) to(destination) by(layer) valcond(300)
sankey value, from(source) to(destination) by(layer) palette(CET C6)
sankey value, from(source) to(destination) by(layer) colorby(level)
gen trace1 = 1 if source=="App"
sankey value, from(source) to(destination) by(layer) colorvar(trace1)
cap drop trace2
gen trace2 = .
replace trace2 = 1 if source=="App" & destination=="App" & layer==0
replace trace2 = 2 if source=="App" & destination=="App" & layer==1
replace trace2 = 3 if source=="App" & destination=="App" & layer==2
replace trace2 = 4 if source=="App" & destination=="Total" & layer==3
sankey value, from(source) to(destination) by(layer) colorvar(trace2)
sankey value, from(source) to(destination) by(layer) colorvar(trace2) palette(Oranges)
sankey value, from(source) to(destination) by(layer) colorvar(trace2) palette(Blues) ///
colorvarmiss(gs13) colorboxmiss(gs13)
sankey value, from(source) to(destination) by(layer) colorvar(trace2) ///
palette(blue*0.1 blue*0.3 blue*0.5 blue*0.7) colorvarmiss(gs13) colorboxmiss(gs13)
sankey value, from(source) to(destination) by(layer) ctitles(Cat1 Cat2 Cat3 Cat4 Cat5)
sankey value, from(source) to(destination) by(layer) ctitles(Cat1 Cat2 Cat3 Cat4 Cat5) ctg(-100)
sankey value, from(source) to(destination) by(layer) ctitles("Cat 1" "Cat 2" "Cat 3" "Cat 4" "Cat 5") ctg(-100)
sankey value, from(source) to(destination) by(layer) ctitles("Cat 1" "Cat 2" "Cat 3" "Cat 4" "Cat 5") ctpos(top) ctg(100) recenter(top)
sankey value, from(source) to(destination) by(layer) noval showtot palette(CET C6) ///
laba(0) labpos(3) labg(-1) offset(10)
sankey value, from(source) to(destination) by(layer) novalleft
sankey value, from(source) to(destination) by(layer) novalright
sankey value, from(source) to(destination) by(layer) noval
sankey value, from(source) to(destination) by(layer) nolabels
sankey value, from(source) to(destination) by(layer) valprop vals(2)
sankey value, from(source) to(destination) by(layer) labprop labs(2)
sankey value, from(source) to(destination) by(layer) palette(CET C6) alpha(60) ///
labs(2.5) laba(0) labpos(3) labg(-1) offset(5) noval showtot ///
ctitles("Cat 1" "Cat 2" "Cat 3" "Cat 4" "Cat 5") ctg(-100) cts(3) ///
title("My sankey plot", size(6)) note("Made with the #sankey package.", size(2.2)) ///
xsize(2) ysize(1)
import excel using "https://github.com/asjadnaqvi/stata-sankey/blob/main/data/sankey_stocks.xlsx?raw=true", clear first
sankey value, from(source) to(destination) by(layer) xsize(2) ysize(1) showtotal
sankey value, from(source) to(destination) by(layer) xsize(2) ysize(1) showtotal stock
sankey value, from(source) to(destination) by(layer) xsize(2) ysize(1) showtotal stock2
Please open an issue to report errors, feature enhancements, and/or other requests.
v1.81 (16 Oct 2024)
- Weights are now allowed. It is still advisable to prepare the data beforehand.
wrap()
now requires graphfunctions for label wrapping the respects word boundaries.- Option
stock2
added that collapses stocks on the right (incoming) and removes own flows. In contrast,stock
collapses stocks on the left (out-going). - Various code fixes should remove additional small bugs.
v1.8 (22 Sep 2024)
- Added option
align
to align flows. Works only if there is just one parent (still beta). - Added option
fill
to extrapolate missing flows. Works only if there is just one parent (still beta). - Added option
n()
to allow users to increase the number of points for generating the arcs. Default is 30. - Quite a large code clean up so the command should run a bit faster.
v1.74 (11 Jun 2024)
- Added
wrap()
option for wrapping labels. - Minor code cleanups.
v1.73 (16 Mar 2024)
- If the
from()
andto()
variables have value labels, then the order of the value labels is respected. This allows the users to have full control of the order of the drawing of the layers through value labels (requested by Katie Naylor + others). - The command now throws an error if
from()
andto()
have different format types. Both have to be either string or numeric variables. This was necessary to implement in order to implement the above change. - Minor code cleanups.
v1.72 (12 Feb 2024)
- Fixed
labprop
from wrong calculation the label sizes. valcond()
now passes on to box labels. Was removed but has been put back in.by()
changed to optional. Assumes one layer if not specified. This is mostly a quality of life improvement. A warning message is displayed to ensure thatby()
is not left out by mistake.ctsize()
converted to string allow size names.ctcolor()
added.- Help file improved.
- Minor code cleanups
v1.71 (15 Jan 2024)
- Fixed a bug where numerical
from()
andto()
variables with value labels were messing up the labels in the final figure (reported by Ian White).
v1.7 (06 Nov 2023)
- Fixed
valcond()
dropping bar values. - Fixed
ctitles()
getting random colors. It now defaults to black. - Added
ctpos()
option to change column title position. - Added
percent
option which is still beta. Convert flows to percent values.
v1.61 (22 Jul 2023)
saving()
option added (requested by Anirban Basu).- Minor fixes.
v1.6 (11 Jun 2023)
- Complete rewrite of the base routines. The code is 30% smaller but several times faster.
- The option
sortby()
split intosort1()
andsort2()
for clarity. - Added support for numerical variables with value labels.
- Option
stock
added to collapse own flows (source = destination) to box heights (requested by Oras Alabas). - Several code optimizations and minor bug fixes.
v1.51 (25 May 2023)
- Added background checks for
from()
andto()
variable. This ensures that the code runs regardless of the variable types. Ideally both should be strings.
v1.5 (30 Apr 2023)
- Added
laprop
,titleprop
, andlabscale()
for scaling values and labels. - Added
novalright
,novalleft
,nolabels
options. - Added
sortby(., reverse)
option. - Help file improved in its layout.
v1.4 (23 Apr 2023)
- Fixed major bugs with unbalanced panels.
- Added column title options.
- Added option to draw colors by variables.
- Several bug fixes and improvements to the code.
v1.31 (04 Apr 2023)
- Fixed the color of categories. Previous version was resulting in wrong color assignments.
v1.3 (26 Feb 2023)
- Node bundling added which align nodes in front of each other. This looks better especially if flows are passing through certain nodes.
- Option
sortby()
added that allows alphabetical sorting (sortby(name)
) or numerical sortingsortby(value)
(Thanks to Fabian Unterlass for detailed feedback). - Option
boxwdith()
added to allow adjusting the width of node boxes.
v1.21 (15 Feb 2023)
valcond()
fixed.- Error in gaps fixed.
v1.2 (02 Feb 2023)
- Unbalanced Sankey's are now allowed. This means that incoming and outgoing layers do not necessarily have to be equal. Outgoing can be larger than incoming.
- A category can now also start in the middle.
- Various bug fixes.
v1.1 (13 Dec 2022)
- Option
valformat()
renamed to justformat
. This aligns it with standard Stata usages. - A new option
offset()
added to displace x-axis on the right-hand side. Offset is given in percentage share of x-axis range. This allows rotated labels to be displaced properly. - Checks for missing bilateral flow combinations. Hitting a non-flow combo was causing the code to crash.
v1.0 (08 Dec 2022)
- Public release.