Skip to content

Pagination

Gustavo De Micheli edited this page Aug 30, 2024 · 1 revision

Pagination in Cassandra works a bit differently than with Relational Databases, any API that wants to provide pagination must account for this.

In a relational database a paginated query would use LIMIT and OFFSET to control which page we're fetching, something like this:

-- Get first 10 element page
SELECT * FROM USERS
LIMIT 10

-- Get second 10 element page
SELECT * FROM USERS
LIMIT 10 OFFSET 10

In Cassandra we can specify a limit, that is, how big the page is. We cannot specify an offset from where to continue our pagination, unfortunately.

Iterating Results

Whether we execute a PreparedStatement synchronously or asynchronously, Cassandra always paginates results through different network requests. This is more evident in its asynchronous API.

Please see [Iterating Results](Iterating Results)

Pager

If we want to paginate over the results of a query we can use a Pager, which can be constructed from a ScalaPreparedStatement:

val query = "SELECT * FROM hotels_by_country WHERE country = ?".toCQL.prepare[String].as[Hotel]
// query: ScalaPreparedStatement1[String, Hotel] = net.nmoncho.helenus.internal.cql.ScalaPreparedStatement1@cd5a5eb

val pager = query.pager("NL")
// pager: Pager[Hotel] = net.nmoncho.helenus.internal.cql.Pager@d6fc560

val (nextPager, firstPage) = pager.execute(pageSize = 2)
// nextPager: Pager[Hotel] = net.nmoncho.helenus.internal.cql.Pager@fd44505
// firstPage: Iterator[Hotel] = non-empty iterator

val (finalPager, secondPage) = pager.execute(pageSize = 10)
// finalPager: Pager[Hotel] = net.nmoncho.helenus.internal.cql.Pager@378d712d
// secondPage: Iterator[Hotel] = non-empty iterator
  • A Pager is created from a ScalaPrepareStatement by providing the query parameters we want that query to run with. The statement won't be executed yet, but we must provide the parameters up front (more on this below).
  • To obtain a page, we can call one of the execution methods on the Pager. These methods take how many results we want for that page
  • The result we get from this execution is the query results as an Iterator and the Pager we can use to get the next page.

Continuing Paging Execution

Cassandra relies on a PagingState to resume execution at a later time. We can save these PagingStates and create a Pager with it to resume execution

val Some(pagingState) = nextPager.pagingState
// pagingState: PagingState = 001e001000120010526f7474657264616d2048696c746f6ef07ffffffdf07ffffffd0bacc121391f44d2e77b5d0e6d99d4d60004

val Success(continuedPager) = query.pager(pagingState, "NL")
// continuedPager: Pager[Hotel] = net.nmoncho.helenus.internal.cql.Pager@6cf5564f

val (_, secondPageAgain) = continuedPager.execute(pageSize = 10)
// secondPageAgain: Iterator[Hotel] = non-empty iterator

Serializing PagingState

Imagine a user is paging over the Hotels available in Rotterdam. A Stateless Web App would require a way to send and receive a PagingState so paging can be resumed.

We can do this with a PagerSerializer which takes care of serializing and deserializing a PagingState:

import net.nmoncho.helenus.api.cql.PagerSerializer

implicit val serializer: PagerSerializer[String] = PagerSerializer.DefaultPagingStateSerializer
// serializer: PagerSerializer[String] = net.nmoncho.helenus.api.cql.PagerSerializer$DefaultPagingStateSerializer$@16957e69

val Some(encodedState) = nextPager.encodePagingState
// encodedState: String = "001e001000120010526f7474657264616d2048696c746f6ef07ffffffdf07ffffffd0bacc121391f44d2e77b5d0e6d99d4d60004"

val Success(anotherContinuedPager) = query.pager(encodedState, "NL")
// anotherContinuedPager: Pager[Hotel] = net.nmoncho.helenus.internal.cql.Pager@350a15e4

val (_, secondPageAnotherTime) = anotherContinuedPager.execute(pageSize = 10)
// secondPageAnotherTime: Iterator[Hotel] = non-empty iterator

We can plug our own implementation if we want to perform a more sophisticated serialization.

Why do we have to provide query parameters at construction?

Helenus API requires users to specify which query parameters will be used before we actually execute the query. While this seems like a limitation, it's actually a way to prevent errors from happening.

By Cassandra's design:

The paging state can only be reused with the exact same statement (same query string, same parameters). It is an opaque value that is only meant to be collected, stored and re-used. If you try to modify its contents or reuse it with a different statement, the results are unpredictable.

This means that all three: Query, Parameters, and PagingState are strongly tied together.

Ideally we would be able to provide a single Pager that allows query different pages using different query parameters from a single query. Instead we must define an API that doesn't let users alter what query parameters were initially provided.