Skip to content

Connections & Pooling

bchavez edited this page Jun 10, 2019 · 86 revisions

The C# driver supports two types of connections.

  • A single connection to a RethinkDB server.
  • A set of pooled connections to a RethinkDB cluster.

The advantage of connection pooling is two-fold: redundancy and load balancing. When a connection pool is used, and a connection to a node is interrupted, subsequent queries get routed to different nodes in the cluster. Additionally, various pooling strategies can be used to distribute workloads among nodes in a cluster.

Failure Scenarios

The C# driver will not retry a failed query under any circumstance (single or pooled type connections). A failed query can result from a network communication error when connectivity to a node is lost. The moment when the underlying network connection fails the following occurs:

  • If there are multiple threads waiting to write to a network stream of a down node, an exception is thrown on those incident threads.
  • If there are multiple threads awaiting responses from a down node, an exception (via faulted task) is thrown on those awaiting tasks. This applies to cursors and changefeeds.

In each failure case as described above, when a node goes down, it is the responsibility of the developer to retry a query if needed. Under a connection pool scenario, as a best practice, wait at least 1.5 seconds before retrying a failed query. Adding a delay ensures a higher probability of success on the next retry and allows time for the connection pool supervisor to mark other dead nodes. When a node is marked dead, the dead node is skipped from selection until the node's connection can be re-established by the supervisor. In the case when all nodes are down in a connection pool, all queries will fail.

Tip: Polly is a helpful .NET utility can help write Circuit Breaker and Wait/Retry code.

Async Connect

Each .Connect() method has an associated .ConnectAsync() for asynchronous establishment of a connection.

Connection Lifetime

Initializing a RethinkDB connection is expensive. A single Connection to a RethinkDB server or a ConnectionPool to a RethinkDB cluster should be created and connected at application start-up, one-time, for the duration of an application's lifetime. A single Connection and a ConnectionPool are both designed to be used by multiple concurrent threads. Additionally, reusing a single Connection and ConnectionPool after a query logic error is safe as long as the network connection is still functional. Applications using Inversion of Control (IoC) containers should configure a single Connection or a ConnectionPool as singleton lifetimes.

Encrypted Server Connections

The driver supports connecting to a RethinkDB server using SSL/TLS. More info about connecting to RethinkDB server using SSL/TLS and dual-licensing use can be found here.


Single Connection (No Pooling)

To create one connection to a RethinkDB server (without connection pooling):

var R = RethinkDb.Driver.RethinkDB.R;
var conn = R.Connection()
            .Hostname("192.168.0.100") // Hostnames and IP addresses work.
            .Port(28015) // .Port() is optional. Default driver port number.
            .Timeout(60)
            .Connect();
  
var result = R.Now().Run<DateTimeOffset>(conn);

In the example above, the connection object conn should be configured as a singleton an IoC container. The RethinkDb.Driver.Net.IConnection interface is a convenient interface to register the conn object instance with an IoC container.

Additionally, multiple threads can share the connection conn variable. The connection is ready to be used when the .Connect() method returns conn. Connection management is explicit. A failed connection must be manually .Reconnect()ed in the event of a failed node.

Lastly, the root query object variable R, as shown in the example above, should not be configured with an IoC container. The variable R is in no way associated with the connection object conn. R is a free standing variable (or static) that can be accessed anywhere by multiple threads in an application. In fact, the variable R can be used across logically separated database clusters to compose queries. Simply reference to the static field RethinkDb.Driver.RethinkDB.R to begin writing a query. As standard practice, store a reference to RethinkDb.Driver.RethinkDB.R in a variable, field, or property named R as a shorter way to compose ReQL queries.


Connection Pooling

Note: Currently, the Java driver does not support connection pooling. The following APIs are subject to change.

Note: The ConnectonPool does not allow hostnames as seeds for several reasons outlined here. Applications that require DNS hostname seeds must resolve hostnames to IPs before seeding the driver's ConnectionPool. Applications may experience high CPU load in .NET Core / Linux environments when resolving hostnames with DNS, please read this GOTCHA! for more information.

Round-Robin Strategy

To create a connection pool using a round-robin node selection strategy:

var R = RethinkDb.Driver.RethinkDB.R;
var conn = R.ConnectionPool()
            .Seed(new[] {"192.168.0.11:28015", "192.168.0.12:28015"})
            .PoolingStrategy(new RoundRobinHostPool())
            .Discover(true)
            .Connect();

var result = R.Now().Run<DateTimeOffset>(conn);

The connection can be used by multiple threads. The .Connect() method returns when at least one connection to a node has been successfully established.

In the example above, the connection object conn should be configured as a singleton an IoC container. The RethinkDb.Driver.Net.IConnection interface is a convenient interface to register the conn object instance with an IoC container.

The root query object variable R, as shown in the example above, should not be configured with an IoC container. The variable R is in no way associated with the connection object conn. R is a free standing variable (or static) that can be accessed anywhere by multiple threads in an application. In fact, the variable R can be used across logically separated database clusters to compose queries. Simply reference to the static field RethinkDb.Driver.RethinkDB.R to begin writing a query. As standard practice, store a reference to RethinkDb.Driver.RethinkDB.R in a variable, field, or property named R as a shorter way to compose ReQL queries.

The .Seed() method seeds the driver with IP addresses of well-known cluster nodes. If .Discover(true), the driver attempts to discover new nodes when nodes added to the RethinkDB cluster. The driver does this by setting up a change feed on a system table and listens for changes. Additionally, pre-existing nodes (not originally part of the seed list), are automatically discovered and connected to when .Discover(true).

The .PoolingStrategy(new RoundRobinHostPool()) provides a round-robin node selection strategy.

There are two construction arguments for RoundRobinHostPool that control reconnection intervals. When a host goes down the pool supervisor waits for a time span of retryDelayInitial before reconnecting to the node. If a reconnection attempt fails, the retry delay doubles. Doubling of the retry delay stops when the doubling of the retry delay is greater than retryDelayMax; after that, every subsequent retry is retryDelayMax.

  • retryDelayInitial - The initial retry delay when a host goes down. How long to wait before immediately retrying the connection. The default is 30 seconds.
  • retryDelayMax - The maximum retry delay. The default is 15 minutes.
Recommended Use

Round Robin selection strategy works well as a general algorithm for applications. Round Robin selection offers little overhead when selecting nodes to run queries. Developers should start with the Round Robin selection strategy until an alternative node selection strategy is needed.

Epsilon Greedy Strategy

Another connection pooling strategy is Epsilon Greedy. Epsilon Greedy is an algorithm that allows the connection pool to learn about "better" servers based on speed and how well they perform. This algorithm gives a weighted request rate to better-performing nodes in the cluster, while still distributing requests to all nodes (proportional to their performance).

A good overview of Epsilon Greedy is available here.

To setup an epsilon greedy host pool:

var conn = R.ConnectionPool()
       .Seed(new[] { "192.168.0.11:28015" })
       .PoolingStrategy(
           new EpsilonGreedyHostPool(decayDuration, EpsilonCalculator.Linear())
        )
       .Discover(true)
       .Connect();
  • decayDuration - The amount of time to cycle through all EpsilonBuckets (0...120). This decay duration is divided by EpsilonBuckets (default: 5 min / 120 buckets = 2.5 seconds per bucket). IE: The average will be taken every decayDuration/EpsilonBuckets seconds. To use the default decayDuration pass null for this argument.
  • EpsilonCalculator - Given the weighted average among EpsilonBuckets slot measurements, calculate the node's EpsilonValue using EpsilonCalculator.Linear/Logarithmic/Polynomial(exponent).
Recommended Use

Make careful consideration of the decayDuration value before using the Epsilon Greedy selection strategy. For example, when the decayDuration is 5 minutes the algorithm is expecting the average sustained request rate to a RethinkDB cluster will be greater than, on average, 1 request per second per node. The rationale for this is simply the decayDuration. The decayDuration decays (zeroes) averages over a period of time. If the average request rate per node is below one, the application does not benefit from using the Epsilon Greedy selection strategy. When the average request rate is below one, the algorithm behaves similarly to a purely random selection with unnecessary computational overhead. The computational overhead is due to the calculation of the EpsilonValue for every node in the cluster.