Skip to content

Latest commit

 

History

History
697 lines (479 loc) · 22.9 KB

lists_vs_tuples.livemd

File metadata and controls

697 lines (479 loc) · 22.9 KB

Lists Vs Tuples

Mix.install([
  {:jason, "~> 1.4"},
  {:kino, "~> 0.9", override: true},
  {:youtube, github: "brooklinjazz/youtube"},
  {:hidden_cell, github: "brooklinjazz/hidden_cell"}
])

Navigation

Review Questions

Upon completing this lesson, a student should be able to answer the following questions.

  • How are lists stored under the hood, and how does this impact their performance for reading vs updating?
  • How are tuples stored under the hood and how does this impact their performance for reading vs writing?
  • When should you use a list vs a tuple?

Overview

Lists and Tuples are intended for fundamentally different use cases.

Lists are dynamic-sized containers meant to be optimized for updating elements within the list, while Tuples are fixed-sized containers optimized for reading values.

In this lesson, we'll go over how Lists and Tuples are implemented in Elixir so that you can understand their strengths and weaknesses in order to use them more effectively.

We will use the following large list and large tuple to perform several benchmarks.

large_list = Enum.to_list(1..10_000_000)
large_tuple = List.to_tuple(large_list)

Lists

Lists are stored in memory as linked lists. That means that each element in the list is actually stored in pairs of two. The first element of the pair is the value, the second element of the pair is the location of the next element.

For example, let's take the list [2, 3].

Notice below, that each element in the list is stored as a pair in the heap. The first element of the pair is the value, and the second element of the pair is the location of the next element.

In fact, [2, 3] can also be written as pairs like so [2 | [3 | []]], where each cell is written as [head | tail]. The head is the value, and the tail is the reference to the rest of the list. You may sometimes hear these called cons cells.

Tuples

Tuples are stored contiguously in memory. Contiguously means that each element in the tuple shares a common border.

For example, the tuple {1, 2, 3} could be stored like so.

Lists Vs Tuples

Lists and tuples serve different purposes. Due to their structure they are more or less performant for certain operations.

We'll use the built-in Erlang library's :timer.tc/1 function to compare the amount of time in microseconds that it takes to perform common operations on a large set of data.

:timer.tc(fn ->
  _expensive_computation = 10000 ** 10000
  "return value of the function"
end)

Simultaneous operations in your computer compete for resources, and our measurement tools are not perfect, so you may notice that execution time is not always consistent.

Length

Tuples are fixed-size containers, so their length is known upfront. It's a constant $O(1)$ time to determine the size of a tuple no matter how large.

For lists, we only know the location of the first element. So, we need to traverse the entire list to find its length. That means it takes $O(n)$ time where $n$ is the number of elements in the list.

Let's compare the time it takes to determine the length of an equal-sized list and tuple. We can use tuple_size to count the length of a tuple, and length to determine the length of a list.

{tuple_time, _result} = :timer.tc(fn -> tuple_size(large_tuple) end)
{list_time, _result} = :timer.tc(fn -> length(large_list) end)

%{tuple: tuple_time, list: list_time}

The exact result will be different each time, however as expected, the tuple is nearly instant and the list takes far longer.

Prepending

Prepending in a list is fast, because we only need to create a new pair of elements in memory and point it to the head of the existing list. Let's prepend 1 to [2, 3] as an example.

[1 | [2, 3]]

Therefore, prepending an element to a list is $O(1)$ complexity. This holds true no matter the size of the list, because the work done remains the same.

Tuples however, are stored contiguously in memory, so in order to make any change the entire tuple must be copied. Since we need to enumerate through the tuple and copy each value, prepending and element to a tuple is $O(n)$ complexity.

Let's prepend 0 to {1, 2, 3} as an example. To prepend a value we can use either Tuple.insert_at/3.

Tuple.insert_at({1, 2, 3}, 0, 0)

Let's compare the time it takes to prepend an element to an equal sized list and tuple.

{tuple_time, _result} = :timer.tc(fn -> Tuple.insert_at(large_tuple, 0, 0) end)
{list_time, _result} = :timer.tc(fn -> List.insert_at(large_list, 0, 0) end)

%{tuple: tuple_time, list: list_time}

Once again, as expected prepending the list is nearly instant, where as prepending a tuple takes some time.

Your Turn

In the Elixir cell below, use the :timer.tc/1 function to time how long it takes to prepend a value to a list with 50000 elements.

Example Solution

We can prepend a list using [head | tail] syntax

list = Enum.to_list(1..50000)
:timer.tc(fn -> ["my value" | list] end)

Alternatively we can use List.insert_at/3.

list = Enum.to_list(1..50000)
:timer.tc(fn -> List.insert_at(list, 0, "my value") end)

Generally, [head | tail] is more idiomatic.

In the Elixir cell below, use the :timer.tc/1 function to time how long it takes to prepend a value to a tuple with 50000 elements.

Example Solution
tuple = Enum.to_list(1..50000) |> List.to_tuple()
:timer.tc(fn -> Tuple.insert_at(tuple, 0, 0) end)

Accessing

Accessing an element in a tuple is constant $O(1)$ complexity because the size of the tuple is already known. Accessing an element in a list is $O(n)$ complexity where n is the index of the element we need to access.

flowchart TB
subgraph Accessing Memory
lu["Lower Bound: O(1), Upper Bound: O(n)"]
direction LR
  subgraph Memory
    L -- location to --> I -- location to --> S -- location to --> T -- location to --> E[...]
  end
  subgraph Time Complexity
  0["O(1)"] --> L
  1["O(2)"] --> I
  2["O(3)"] --> S
  3["O(4)"] --> T
  4["O(n)"] --> E
  end
end

Loading

That means retrieving the head of any list regardless of its size is constant $O(1)$ complexity, and retrieving the last element of the list is $O(n)$ complexity.

Let's compare accessing the first element of a tuple and list of equal size.

{tuple_time, _result} = :timer.tc(fn -> elem(large_tuple, 0) end)

{list_time, _result} =
  :timer.tc(fn ->
    [head | _tail] = large_list
    head
  end)

%{tuple: tuple_time, list: list_time}

As expected, both are very fast. In fact the list is even faster than the tuple. However, this changes drastically when we try to access the last element in the list and tuple.

{tuple_time, _result} = :timer.tc(fn -> elem(large_tuple, 9_999_999) end)

{list_time, _result} = :timer.tc(fn -> Enum.at(large_list, 9_999_999) end)

%{tuple: tuple_time, list: list_time}

Your Turn

In the Elixir cell below, use the :timer.tc/1 function to time how long it takes to access the first element, and last element of a list with 5000 elements.

Then do the same for tuples.

Concatenation

Concatenating a tuple requires copying both tuples in memory, therefore it's performance cost is $O(n1 + n2)$ where $n1$ is the number of elements in the first tuple and $n2$ is the number of elements in the second tuple.

However $O(n1 + n2)$ is only the theoretical performance cost. In reality it's higher because tuples do not support concatenation, and you would first need to convert them into a list.

Why don't tuples support concatenation? Because they should be used for fixed-sized containers. Jose Valim explains.

You can't concatenate tuples. The only reason is that you are not supposed to use them as such. Most of tuple usage requires knowing their size and things get blurrier if you can concatenate them. Furthermore, concatenating tuples requires copying both tuples in memory, which is not efficient. In other words, if you want to concatenate tuples, you may have the wrong data structure. You have two options:

  • Use lists
  • Compose the tuples: instead of a ++ b, just write {a, b}

However for a list, only the first list must be copied, so it has $O(n1)$ cost where $n1$ is the number of elements in the first list. The first copied list can now simply point to the head of the second list in memory.

For example,

[1, 2, 3] ++ [4, 5]

That means that so long as your first list is small, it's a fairly performant operation, even if the second list is massive. However, it will be expensive if the first list is large.

{small_first_list, _result} = :timer.tc(fn -> [1] ++ large_list end)
{large_first_list, _result} = :timer.tc(fn -> large_list ++ [1] end)

%{small_first_list: small_first_list, large_first_list: large_first_list}

Your Turn

In the Elixir cell below, use the :timer.tc/1 function to time how long it takes to concatenate a list with one element and a list with 5000 elements.

Then try to concatenate a list with 5000 elements to a list with 1 element. Notice how it takes longer.

Updating (Replacing)

In a functional programming language, we do not mutate values in a list or tuple. Instead, we copy values where necessary.

When updating a list, we copy the elements before the updated element, then reuse the ones after it.

list = [1, 2, 3, 4, 5]
List.replace_at(list, 2, 7)

Similarly to accessing an element, that means it's constant time complexity $O(1)$ to update an element at the start, and $O(n)$ complexity to access an element at the end.

Tuples however, must be copied over completely when updated. We can use put_elem/3 to update an element in a tuple.

tuple = {1, 2, 3, 4, 5}
put_elem(tuple, 2, 7)

Let's compare the time it takes to update the first element of a list and tuple of equal size.

{tuple_time, _result} = :timer.tc(fn -> put_elem(large_tuple, 0, 7) end)
{list_time, _result} = :timer.tc(fn -> List.replace_at(large_list, 0, 7) end)

%{tuple: tuple_time, list: list_time}

As expected, the list is fast and the tuple is slow. Now let's compare the time it takes to update the last element of a list and tuple of equal size.

{tuple_time, _result} = :timer.tc(fn -> put_elem(large_tuple, 9_999_999, 7) end)
{list_time, _result} = :timer.tc(fn -> List.replace_at(large_list, 9_999_999, 7) end)

%{tuple: tuple_time, list: list_time}

The list is far slower now that it needs to look through the entire list, while the tuple is approximately the same speed as updating the first element.

Your Turn

In the Elixir cell below, use :timer.tc/1 to compare updating the first element of a list and tuple with 5000 elements.

In the Elixir cell below, use :timer.tc/1 to compare updating the last element of a list and tuple with 5000 elements.

Inserting

Inserting in a list follows the same pattern as updating a list. We can reuse elements after the insertion, but must copy elements before the insertion.

list = [1, 2, 3, 4, 5]
List.insert_at(list, 3, 7)

Inserting an element in a tuple requires copying the entire tuple.

tuple = {1, 2, 3, 4, 5}
Tuple.insert_at(tuple, 3, 7)

Based on this knowledge, we expect that inserting in a tuple is always $O(n)$ complexity, where as inserting in a list will be faster the earlier in the list you insert.

We've already proven earlier that prepending a list has $O(1)$ time complexity, now let's prove that inserting at the end of both a list and a tuple has $O(n)$ time complexity.

{tuple_time, _result} = :timer.tc(fn -> Tuple.insert_at(large_tuple, 10_000_000, 7) end)
{list_time, _result} = :timer.tc(fn -> List.insert_at(large_list, 10_000_000, 7) end)

%{tuple: tuple_time, list: list_time}

Your Turn

In the Elixir cell below, use :timer.tc to insert to the end of a large 50000 element list and tuple.

Deleting

Deleting an element in a list requires copying elements prior to the deletion, and reusing elements after the deletion.

list = [1, 2, 3, 4, 5]
List.delete_at(list, 3)

Deleting a tuple requires copying over every non-deleted element.

Let's prove that deleting any element in the tuple has a similar performance cost, and that deleting the first element in a list is less expensive than deleting the last.

{tuple_time, _result} = :timer.tc(fn -> Tuple.delete_at(large_tuple, 0) end)
{list_time, _result} = :timer.tc(fn -> List.delete_at(large_list, 0) end)

%{tuple: tuple_time, list: list_time}
{tuple_time, _result} = :timer.tc(fn -> Tuple.delete_at(large_tuple, 9_999_999) end)
{list_time, _result} = :timer.tc(fn -> List.delete_at(large_list, 9_999_999) end)

%{tuple: tuple_time, list: list_time}

Copying

When we modify a tuple, the new version of the tuple will contain an entire copy of the tuple differing only in the modified element.

When you modify the $nth$ element in a list, the new list will contain a copy of the first $n - 1$ elements, then a modified element, then the tail of the previous list.

flowchart
subgraph Operation on nth element in a list
n[copied elements] --> m[modified element] --> t[tail of previous list]
end
Loading

Shallow Copying

When we use the term copy, it's important to be clear that we mean shallow copy.

You likely won't need to concern yourself with this often, but it's useful to be aware of the difference between deep copying and shallow copy and the performance implications.

When we shallow copy, we copy the reference to the data, rather than the data itself.

A deep copy, copies the actual underlying data.

In a shallow copy, primitive values such as 1 will simply be copied as 1. However data types which contain references will be copied as references.

Shallow Copying Tuples

Let's take an example with tuples.

a = {1, 2, 3}
b = {4, 5, 6}
c = {7, 8, 9}

In the code above, a, b, and c exist on the stack and all contain pointers to actual memory on the heap

flowchart LR
subgraph Stack
a
b
c
end
subgraph Heap
a --pointer--> a1["{1, 2, 3}"]
b --pointer--> b1["{4, 5, 6}"]
c --pointer--> c1["{7, 8, 9}"]
end

Loading
b2 = {11, 12, 13}

a_tuple = {a, b, c}
new_tuple = put_elem(a_tuple, 1, b2)

new_tuple

Under the hood, when the computer copies a_tuple to create new_tuple, it creates a reference to the original variable on the stack, so they share memory on the heap.

flowchart LR
subgraph Stack
  subgraph a_tuple
    a1[a]
    b1[b]
    c1[c]
  end
  subgraph new_tuple
    a2[a] --reference--> a1
    c2[c] --reference--> c1
    b2
  end
end
subgraph Heap
  a1 --pointer --> a3["{1, 2, 3}"]
  b1 --pointer--> b3["{4, 5, 6}"]
  c1 --pointer--> c4["{7, 8, 9}"]
  b2 --pointer--> c5["{10, 11, 12}"]
end

Loading

In memory, that might look like the following.

Garbage Collection

Elixir handles garbage collection for us, so we often don't need to concern ourselves with this implementation detail.

Whenever a variable is no longer referenced, it is free to be garbage collected. For example, if we have the variables a, b, and c, if we rebind b, we no longer have to store the underlying data and it will be garbage collected.

a = {1, 2, 3}
b = {4, 5, 6}
c = {7, 8, 9}

# {4, 5, 6} Is Now Free To Be Garbage Collected Because It Is No Longer Referenced By Any Variable.
b = {10, 11, 12}

This frees up the tuple {4, 5, 6} for garbage collection, because it is no longer accessible. Below we use b and b2 to represent rebinding the b variable.

Conclusion

Operations with tuples and lists will have the following Big $O$ notation.

[
  [operation: "length", tuple: "O(1)", list: "O(n)"],
  [operation: "prepend", tuple: "O(n)", list: "O(1)"],
  [operation: "insert", tuple: "O(n)", list: "O(n*)"],
  [operation: "access", tuple: "O(1)", list: "O(n*)"],
  [operation: "update/replace", tuple: "O(n)", list: "O(n*)"],
  [operation: "delete", tuple: "O(n)", list: "O(n*)"],
  [operation: "concatenation", tuple: "O(n1 + n2)", list: "O(n1)"]
]
|> Kino.DataTable.new()

n* will be the index of the operation, instead of the number of elements in the list.

Key Takeaways

  • Tuples are $O(1)$ for reading a value and the length, but require $O(n)$ for all other operations.
  • List operations are generally $O(n)$ complexity where $n$ is the index of the operation, meaning operations early in a list can be very performant and are preferred when possible.
  • Tuples should be used for fixed-size containers.
  • Lists should be used for dynamic-size containers.
  • Functional languages do not allow mutation, they instead rely on shallow-copying elements to avoid unnecessary memory consumption.

Further Reading

Consider the following resource(s) to deepen your understanding of the topic.

Commit Your Progress

DockYard Academy now recommends you use the latest Release rather than forking or cloning our repository.

Run git status to ensure there are no undesirable changes. Then run the following in your command line from the curriculum folder to commit your progress.

$ git add .
$ git commit -m "finish Lists Vs Tuples reading"
$ git push

We're proud to offer our open-source curriculum free of charge for anyone to learn from at their own pace.

We also offer a paid course where you can learn from an instructor alongside a cohort of your peers. We will accept applications for the June-August 2023 cohort soon.

Navigation