Mix.install([
{:jason, "~> 1.4"},
{:kino, "~> 0.9", override: true},
{:youtube, github: "brooklinjazz/youtube"},
{:hidden_cell, github: "brooklinjazz/hidden_cell"}
])
Upon completing this lesson, a student should be able to answer the following questions.
- How are lists stored under the hood, and how does this impact their performance for reading vs updating?
- How are tuples stored under the hood and how does this impact their performance for reading vs writing?
- When should you use a list vs a tuple?
Lists and Tuples are intended for fundamentally different use cases.
Lists are dynamic-sized containers meant to be optimized for updating elements within the list, while Tuples are fixed-sized containers optimized for reading values.
In this lesson, we'll go over how Lists and Tuples are implemented in Elixir so that you can understand their strengths and weaknesses in order to use them more effectively.
We will use the following large list and large tuple to perform several benchmarks.
large_list = Enum.to_list(1..10_000_000)
large_tuple = List.to_tuple(large_list)
Lists are stored in memory as linked lists. That means that each element in the list is actually stored in pairs of two. The first element of the pair is the value, the second element of the pair is the location of the next element.
For example, let's take the list [2, 3]
.
Notice below, that each element in the list is stored as a pair in the heap. The first element of the pair is the value, and the second element of the pair is the location of the next element.
In fact, [2, 3]
can also be written as pairs like so [2 | [3 | []]]
, where each cell is written
as [head | tail]
. The head is the value, and the tail is the reference to the rest of the list.
You may sometimes hear these called cons
cells.
Tuples are stored contiguously in memory. Contiguously means that each element in the tuple shares a common border.
For example, the tuple {1, 2, 3}
could be stored like so.
Lists and tuples serve different purposes. Due to their structure they are more or less performant for certain operations.
We'll use the built-in Erlang library's :timer.tc/1
function to compare the amount of time in microseconds
that it takes to perform common operations on a large set of data.
:timer.tc(fn ->
_expensive_computation = 10000 ** 10000
"return value of the function"
end)
Simultaneous operations in your computer compete for resources, and our measurement tools are not perfect, so you may notice that execution time is not always consistent.
Tuples are fixed-size containers, so their length is known upfront. It's a constant
For lists, we only know the location of the first element. So, we need to traverse the entire
list to find its length.
That means it takes
Let's compare the time it takes to determine the length of an equal-sized list and tuple.
We can use tuple_size
to count the length of a tuple, and length
to determine the length of a list.
{tuple_time, _result} = :timer.tc(fn -> tuple_size(large_tuple) end)
{list_time, _result} = :timer.tc(fn -> length(large_list) end)
%{tuple: tuple_time, list: list_time}
The exact result will be different each time, however as expected, the tuple is nearly instant and the list takes far longer.
Prepending in a list is fast, because we only need to create a new pair of elements in memory
and point it to the head of the existing list. Let's prepend 1
to [2, 3]
as an example.
[1 | [2, 3]]
Therefore, prepending an element to a list is
Tuples however, are stored contiguously in memory, so in order to make any change the entire tuple must
be copied. Since we need to enumerate through the tuple and copy each value, prepending
and element to a tuple is
Let's prepend 0
to {1, 2, 3}
as an example. To prepend a value we can use either Tuple.insert_at/3.
Tuple.insert_at({1, 2, 3}, 0, 0)
Let's compare the time it takes to prepend an element to an equal sized list and tuple.
{tuple_time, _result} = :timer.tc(fn -> Tuple.insert_at(large_tuple, 0, 0) end)
{list_time, _result} = :timer.tc(fn -> List.insert_at(large_list, 0, 0) end)
%{tuple: tuple_time, list: list_time}
Once again, as expected prepending the list is nearly instant, where as prepending a tuple takes some time.
In the Elixir cell below, use the :timer.tc/1
function to time how long it takes to prepend a value to a list with 50000
elements.
Example Solution
We can prepend a list using [head | tail]
syntax
list = Enum.to_list(1..50000)
:timer.tc(fn -> ["my value" | list] end)
Alternatively we can use List.insert_at/3.
list = Enum.to_list(1..50000)
:timer.tc(fn -> List.insert_at(list, 0, "my value") end)
Generally, [head | tail]
is more idiomatic.
In the Elixir cell below, use the :timer.tc/1
function to time how long it takes to prepend a value to a tuple with 50000
elements.
Example Solution
tuple = Enum.to_list(1..50000) |> List.to_tuple()
:timer.tc(fn -> Tuple.insert_at(tuple, 0, 0) end)
Accessing an element in a tuple is constant
flowchart TB
subgraph Accessing Memory
lu["Lower Bound: O(1), Upper Bound: O(n)"]
direction LR
subgraph Memory
L -- location to --> I -- location to --> S -- location to --> T -- location to --> E[...]
end
subgraph Time Complexity
0["O(1)"] --> L
1["O(2)"] --> I
2["O(3)"] --> S
3["O(4)"] --> T
4["O(n)"] --> E
end
end
That means retrieving the head of any list regardless of its size is constant
Let's compare accessing the first element of a tuple and list of equal size.
{tuple_time, _result} = :timer.tc(fn -> elem(large_tuple, 0) end)
{list_time, _result} =
:timer.tc(fn ->
[head | _tail] = large_list
head
end)
%{tuple: tuple_time, list: list_time}
As expected, both are very fast. In fact the list is even faster than the tuple. However, this changes drastically when we try to access the last element in the list and tuple.
{tuple_time, _result} = :timer.tc(fn -> elem(large_tuple, 9_999_999) end)
{list_time, _result} = :timer.tc(fn -> Enum.at(large_list, 9_999_999) end)
%{tuple: tuple_time, list: list_time}
In the Elixir cell below, use the :timer.tc/1
function to time how long it takes
to access the first element, and last element of a list with 5000
elements.
Then do the same for tuples.
Concatenating a tuple requires copying both tuples in memory, therefore it's performance cost is
However
Why don't tuples support concatenation? Because they should be used for fixed-sized containers. Jose Valim explains.
You can't concatenate tuples. The only reason is that you are not supposed to use them as such. Most of tuple usage requires knowing their size and things get blurrier if you can concatenate them. Furthermore, concatenating tuples requires copying both tuples in memory, which is not efficient. In other words, if you want to concatenate tuples, you may have the wrong data structure. You have two options:
- Use lists
- Compose the tuples: instead of a ++ b, just write {a, b}
However for a list, only the first list must be copied, so it has
For example,
[1, 2, 3] ++ [4, 5]
That means that so long as your first list is small, it's a fairly performant operation, even if the second list is massive. However, it will be expensive if the first list is large.
{small_first_list, _result} = :timer.tc(fn -> [1] ++ large_list end)
{large_first_list, _result} = :timer.tc(fn -> large_list ++ [1] end)
%{small_first_list: small_first_list, large_first_list: large_first_list}
In the Elixir cell below, use the :timer.tc/1
function to time how long it takes
to concatenate a list with one element and a list with 5000 elements.
Then try to concatenate a list with 5000 elements to a list with 1 element. Notice how it takes longer.
In a functional programming language, we do not mutate values in a list or tuple. Instead, we copy values where necessary.
When updating a list, we copy the elements before the updated element, then reuse the ones after it.
list = [1, 2, 3, 4, 5]
List.replace_at(list, 2, 7)
Similarly to accessing an element,
that means it's constant time complexity
Tuples however, must be copied over completely when updated. We can use put_elem/3
to update
an element in a tuple.
tuple = {1, 2, 3, 4, 5}
put_elem(tuple, 2, 7)
Let's compare the time it takes to update the first element of a list and tuple of equal size.
{tuple_time, _result} = :timer.tc(fn -> put_elem(large_tuple, 0, 7) end)
{list_time, _result} = :timer.tc(fn -> List.replace_at(large_list, 0, 7) end)
%{tuple: tuple_time, list: list_time}
As expected, the list is fast and the tuple is slow. Now let's compare the time it takes to update the last element of a list and tuple of equal size.
{tuple_time, _result} = :timer.tc(fn -> put_elem(large_tuple, 9_999_999, 7) end)
{list_time, _result} = :timer.tc(fn -> List.replace_at(large_list, 9_999_999, 7) end)
%{tuple: tuple_time, list: list_time}
The list is far slower now that it needs to look through the entire list, while the tuple is approximately the same speed as updating the first element.
In the Elixir cell below, use :timer.tc/1
to compare updating the first element of a list
and tuple with 5000
elements.
In the Elixir cell below, use :timer.tc/1
to compare updating the last element of a list and tuple
with 5000
elements.
Inserting in a list follows the same pattern as updating a list. We can reuse elements after the insertion, but must copy elements before the insertion.
list = [1, 2, 3, 4, 5]
List.insert_at(list, 3, 7)
Inserting an element in a tuple requires copying the entire tuple.
tuple = {1, 2, 3, 4, 5}
Tuple.insert_at(tuple, 3, 7)
Based on this knowledge, we expect that inserting in a tuple is always
We've already proven earlier that prepending a list has
{tuple_time, _result} = :timer.tc(fn -> Tuple.insert_at(large_tuple, 10_000_000, 7) end)
{list_time, _result} = :timer.tc(fn -> List.insert_at(large_list, 10_000_000, 7) end)
%{tuple: tuple_time, list: list_time}
In the Elixir cell below, use :timer.tc
to insert to the end of a large 50000
element list and tuple.
Deleting an element in a list requires copying elements prior to the deletion, and reusing elements after the deletion.
list = [1, 2, 3, 4, 5]
List.delete_at(list, 3)
Deleting a tuple requires copying over every non-deleted element.
Let's prove that deleting any element in the tuple has a similar performance cost, and that deleting the first element in a list is less expensive than deleting the last.
{tuple_time, _result} = :timer.tc(fn -> Tuple.delete_at(large_tuple, 0) end)
{list_time, _result} = :timer.tc(fn -> List.delete_at(large_list, 0) end)
%{tuple: tuple_time, list: list_time}
{tuple_time, _result} = :timer.tc(fn -> Tuple.delete_at(large_tuple, 9_999_999) end)
{list_time, _result} = :timer.tc(fn -> List.delete_at(large_list, 9_999_999) end)
%{tuple: tuple_time, list: list_time}
When we modify a tuple, the new version of the tuple will contain an entire copy of the tuple differing only in the modified element.
When you modify the
flowchart
subgraph Operation on nth element in a list
n[copied elements] --> m[modified element] --> t[tail of previous list]
end
When we use the term copy, it's important to be clear that we mean shallow copy.
You likely won't need to concern yourself with this often, but it's useful to be aware of the difference between deep copying and shallow copy and the performance implications.
When we shallow copy, we copy the reference to the data, rather than the data itself.
A deep copy, copies the actual underlying data.
In a shallow copy, primitive values such as 1
will simply be copied as 1
. However data types which contain
references will be copied as references.
Let's take an example with tuples.
a = {1, 2, 3}
b = {4, 5, 6}
c = {7, 8, 9}
In the code above, a
, b
, and c
exist on the stack and all contain pointers to actual memory on the heap
flowchart LR
subgraph Stack
a
b
c
end
subgraph Heap
a --pointer--> a1["{1, 2, 3}"]
b --pointer--> b1["{4, 5, 6}"]
c --pointer--> c1["{7, 8, 9}"]
end
b2 = {11, 12, 13}
a_tuple = {a, b, c}
new_tuple = put_elem(a_tuple, 1, b2)
new_tuple
Under the hood, when the computer copies a_tuple
to create new_tuple
, it creates a reference
to the original variable on the stack, so they share memory on the heap.
flowchart LR
subgraph Stack
subgraph a_tuple
a1[a]
b1[b]
c1[c]
end
subgraph new_tuple
a2[a] --reference--> a1
c2[c] --reference--> c1
b2
end
end
subgraph Heap
a1 --pointer --> a3["{1, 2, 3}"]
b1 --pointer--> b3["{4, 5, 6}"]
c1 --pointer--> c4["{7, 8, 9}"]
b2 --pointer--> c5["{10, 11, 12}"]
end
In memory, that might look like the following.
Elixir handles garbage collection for us, so we often don't need to concern ourselves with this implementation detail.
Whenever a variable is no longer referenced, it is free to be garbage collected.
For example, if we have the variables a
, b
, and c
, if we rebind b
, we no longer have to store the underlying data and it will be garbage collected.
a = {1, 2, 3}
b = {4, 5, 6}
c = {7, 8, 9}
# {4, 5, 6} Is Now Free To Be Garbage Collected Because It Is No Longer Referenced By Any Variable.
b = {10, 11, 12}
This frees up the tuple {4, 5, 6}
for garbage collection, because it is no longer accessible.
Below we use b
and b2
to represent rebinding the b
variable.
Operations with tuples and lists will have the following Big
[
[operation: "length", tuple: "O(1)", list: "O(n)"],
[operation: "prepend", tuple: "O(n)", list: "O(1)"],
[operation: "insert", tuple: "O(n)", list: "O(n*)"],
[operation: "access", tuple: "O(1)", list: "O(n*)"],
[operation: "update/replace", tuple: "O(n)", list: "O(n*)"],
[operation: "delete", tuple: "O(n)", list: "O(n*)"],
[operation: "concatenation", tuple: "O(n1 + n2)", list: "O(n1)"]
]
|> Kino.DataTable.new()
n* will be the index of the operation, instead of the number of elements in the list.
- Tuples are
$O(1)$ for reading a value and the length, but require$O(n)$ for all other operations. - List operations are generally
$O(n)$ complexity where$n$ is the index of the operation, meaning operations early in a list can be very performant and are preferred when possible. - Tuples should be used for fixed-size containers.
- Lists should be used for dynamic-size containers.
- Functional languages do not allow mutation, they instead rely on shallow-copying elements to avoid unnecessary memory consumption.
Consider the following resource(s) to deepen your understanding of the topic.
DockYard Academy now recommends you use the latest Release rather than forking or cloning our repository.
Run git status
to ensure there are no undesirable changes.
Then run the following in your command line from the curriculum
folder to commit your progress.
$ git add .
$ git commit -m "finish Lists Vs Tuples reading"
$ git push
We're proud to offer our open-source curriculum free of charge for anyone to learn from at their own pace.
We also offer a paid course where you can learn from an instructor alongside a cohort of your peers. We will accept applications for the June-August 2023 cohort soon.