Sankey diagrams

Ever seen these cool-looking diagrams that show flows of energy, resources, or costs with thick, colorful arrows on the internet? These diagrams are known as Sankey diagrams.


They originated in the late 19th century, named after Irish engineer Captain Matthew Henry Phineas Riall Sankey, who first used them in 1898 to illustrate the energy efficiency of steam engines. Sankey's diagram, displayed right below, showed how energy was lost as heat, using proportional arrows to represent energy distribution and loss.

Sankey original diagram

This intuitive style caught on in engineering and later spread to other fields like economics and environmental science, where it's still used to visualize complex flows in a clear, impactful way.


There are several Python libraries available for creating Sankey diagrams, but one of the most popular and visually appealing options is Plotly. It has the possibility to create interactive plots that allow users to explore data dynamically, making them ideal for web-based applications. However, if you want to save a static Plotly figure, it's recommended to install a compatible version of Kaleido and export the figure as an SVG file for better quality and resolution. In my case, I created a virtual environment, and typed

     
$ pip install plotly # version 5.24.1
$ pip install kaleido==0.2.1
    

Following the examples available in the official documentation, it is possible to build a first diagram to understand the basics of Plotly figures. Let's start with a couple of nodes, as shown below.

Simple Sankey diagram

This Sankey diagram was built using the following code. One can note that a couple of features were added in comparison with the example of Basic Sankey Diagram provided in the official documentation. Not only saving the figure in SVG required some adjustments, but also colors were added to the nodes and links to enhance the readability.

     
import plotly.graph_objects as go
import plotly.io as pio
pio.kaleido.scope.mathjax = None

fig = go.Figure(data=[go.Sankey(
    node = dict(
        pad = 15,
        thickness = 20,
        line = dict(color = "black", 
                    width = 0.5),
        label = ["A1", "A2", "B1", 
                 "B2", "C1", "C2"],
        color =  [
        "rgba(31, 119, 180, 0.8)",
        "rgba(255, 127, 14, 0.8)",
        "rgba(44, 160, 44, 0.8)",
        "rgba(214, 39, 40, 0.8)",
        "rgba(140, 86, 75, 0.8)",
        "rgba(148, 103, 189, 0.8)",]
    ),
    link = dict(
        source = [0, 1, 0, 2, 3, 3],
        target = [2, 3, 3, 4, 4, 5],
        value = [8, 4, 2, 8, 4, 2],
        color =  [
        "rgba(31, 119, 180, 0.2)",
        "rgba(255, 127, 14, 0.2)",
        "rgba(31, 119, 180, 0.2)",
        "rgba(44, 160, 44, 0.2)",
        "rgba(214, 39, 40, 0.2)",
        "rgba(214, 39, 40, 0.2)",]
  ))])

fig.update_layout(
    font_size=20,
    margin=dict(l=20, r=20, 
                t=20, b=20) 
)

fig.write_image("example_1.svg", 
                width=600, height=400)
    

After this first example, it was interesting to get to use Plotly on a more complex case. Inspired again by the official documentation, I downloaded the JSON file, available on GitHub, containing the data used in the More complex Sankey diagram case. The structure of the dictionary contained in the JSON file can be consulted following the link below.

Once the JSON file was read in the Python script, it was possible to build the following complex figure, involving close to fifty initial entries, four intermediate steps and more than a dozen outputs:

More complicated example of Sankey diagram

The following code has been used to create this figure. It is worth mentioning that the link created in the code is only working when the figure is directly shown with fig.show(), and not when it is saved as a SVG file.

     
import plotly.graph_objects as go
import json

import plotly.io as pio
pio.kaleido.scope.mathjax = None

with open("sankey_energy.json") as json_file:
    data = json.load(json_file)

opacity = 0.4
data["data"][0]["node"]["color"] = ["rgba(255,0,255, 0.8)" 
                                    if color == "magenta" 
                                    else color 
                                    for color in data["data"][0]["node"]["color"]]
data["data"][0]["link"]["color"] = [data["data"][0]["node"]["color"][src].replace("0.8", str(opacity))
                                    for src in data["data"][0]["link"]["source"]]

fig = go.Figure(data=[go.Sankey(
    valueformat = ".0f",
    valuesuffix = "TWh",
    # Define nodes
    node = dict(
        pad = 15,
        thickness = 15,
        line = dict(color = "black", width = 0.5),
        label =  data["data"][0]["node"]["label"],
        color =  data["data"][0]["node"]["color"]
    ),
    # Add links
    link = dict(
        source =  data["data"][0]["link"]["source"],
        target =  data["data"][0]["link"]["target"],
        value =  data["data"][0]["link"]["value"],
        label =  data["data"][0]["link"]["label"],
        color =  data["data"][0]["link"]["color"]
))])

fig.update_layout(
    title=dict(
        text="Energy forecast for 2050<br>
              Source: Department of Energy & Climate Change, 
              Tom Counsell via 
              <a href='https://bost.ocks.org/mike/sankey/'>Mike Bostock</a>",
        font=dict(size=20),
        x=0.98, 
    ),
    font_size=15,
    margin=dict(l=20, r=20, 
                t=30, b=20) 
)

fig.write_image("example_2.svg", 
                width=1200, height=600)
    

Even though the complexity of the diagram increased, the number of lines of code remains of the same order of magnitude as in the previous example. This particularity is due to an implementation that is highly based on the structure of the JSON file, which makes the whole process very intuitive.

Maybe intuitive enough to reproduce the Captain Matthew Henry Phineas Riall Sankey original diagram.