How to format your data to build Sankeys and alluvial diagrams
A vital thing to consider when representing flow is data formatting. This is also one of the most common questions we get from users. Here's how you can do it:
In this article
How to format your data to build a Sankey diagram
To build a Sankey diagram you need to wrangle your data into a long format, that is one row per record.
Make sure that you have
one row per record. The data may look repetitive, but this is okay.
This is an example of what your data should look like.
Here, we are looking at the number of refugees resettled in 2020. Notice how we have multiple rows with the same Source ("Country of origin") and Target ("Country of asylum") values but with different counts for "Cases". This isn't an issue because the template will aggregate these rows and adjust the width of the link accordingly.
- Next, bind your Sourceand Targetcolumns to the correct bindings. In our example, "Country of origin" is our Source column and "Country of asylum" is our Target column.
- Typically, you'll also want to bind a column to Values to size your links, though this isn't required. If you don't add a column of values, your Sankey will size links based on the number of rows in the data.
- The resulting Sankey would look like this:
How to format your data to build an alluvial diagram
Alluvial diagrams represent discrete flows between elements, meaning that the flow has ordered stages or steps. This makes the wrangling process of the data slightly different from the Sankey diagram.
- First, you'll need to make sure your dataset is in a long format (one row per record), just like with a Sankey diagram. The easiest way to do this is by identifying your Source and Target nodes, which determine where your data is coming from and where it is going. Even though we are working with steps, all records need to be added to these two columns.
In the example above, we are first plotting the flow from Afghanistan to Europe, and then the flow from Europe to the United Kingdom all on the same two columns.
- Now you have to specify the steps. Unlike Sankeys, alluvial diagrams follow a specific order and we need to tell the template what that order is. Steps are determined by numeric values: either years or simple numbers that specify the steps. Here we are just using 1, 2, and 3 because there are only 2 steps in this flow: from "Country of origin" (step 1), to "Region of asylum" (step 2) and, lastly, to "Country of asylum" (step 3).
In the Data tab, you'll need to bind your
Target columns, as well as your
Step from and
Step to columns.
The resulting alluvial diagram would look like this: