Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_transaction backwards compatibility #43

Closed
xarg opened this issue Nov 10, 2017 · 17 comments
Closed

run_transaction backwards compatibility #43

xarg opened this issue Nov 10, 2017 · 17 comments

Comments

@xarg
Copy link

xarg commented Nov 10, 2017

I understand the motivation behind the conflict handling and the need to use a callback, but in existing projects this is going to take a lot of work before being able to switch from PG to CRDB.

Are there any plans to add a backwards compatibility mode where the conflicts are not treated, but instead a warning is raised? This way the migrations from PG to CRDB could be done progressively.

@xarg xarg changed the title run_transaction temporary compatibilyt run_transaction backwards compatibility Nov 10, 2017
@tbg
Copy link
Member

tbg commented Nov 10, 2017

@xarg it would be nice to be able to do that, but not handling conflicts is not an option as the resulting database would not behave in a well-defined way (by any way of defining this). Similarly, just "not carrying out" the operation is similarly problematic.

Please let us know if that's not what you had in mind. Note that as an (inferior) workaround, you can also just run the transaction invocation in a loop as long as it returns the respective failure. The latter is roughly what existing apps "should" already do if they use Postgres on SERIALIZABLE mode (I realize that conflicts are rarer there, so that few real-world apps handle them).

@xarg
Copy link
Author

xarg commented Nov 10, 2017

Thanks for clarifying @tschottdorf

I believe I understand a bit better why this is not a good idea for CRDB especially after reading this article: https://www.cockroachlabs.com/docs/stable/transactions.html#transaction-retries

Wouldn't running transaction in a loop lead to a similar behavior as the run_transaction implementation? In that case I'll just run_transaction since it's the canonical way to do it.

I was also wondering why the run_transaction was not implemented as a context manager? In my opinion it's a much better fit (and nicer syntax) than using callbacks.

Instead of:

def callback(session):
    return session.query(M).all()
return run_transaction(sessionmaker, callback)

This:

with run_transaction(sessionmaker) as session:
    return session.query(M).all()

Is there a particular reason to use callbacks?

@xarg
Copy link
Author

xarg commented Nov 10, 2017

Closing this for now. There is a better context manager discussion here: #25

@xarg xarg closed this as completed Nov 10, 2017
@tbg
Copy link
Member

tbg commented Nov 10, 2017

@xarg #25 is indeed a better issue to discuss the session, thanks! The problem with

Wouldn't running transaction in a loop lead to a similar behavior as the run_transaction implementation? In that case I'll just run_transaction since it's the canonical way to do it.

is that the "undirected retry" (i.e. the loop) will create a new transaction in each iteration, which is less effective when conflicts do occur. For example, it can make it more likely that transactions are starved (i.e. have to restart repeatedly, without getting closer to committing).

@bdarnell
Copy link
Contributor

#25 isn't really about using context managers, even though some of the discussion uses that syntax. It's about whether the commit() call should appear in the application code or not.

The reason it's not a context manager is that context managers cannot perform retries: they execute their bodies zero or one times. There's nothing in python that combines reusable code objects with exception handling. You need either a def or loop to have code that can be repeated, and then something else to handle exceptions. A non-function-based interface would have to look something like:

for attempt in run_transaction(sessionmaker):
    with attempt as session:
        return session.query(M).all()

There are some things about this syntax that I like (everything's "in order" and I don't need to name a function), and some things I don't (this is a very unusual pattern and there's no "function signature" to guide you into it, and there's an extra horizontal indent).

@xarg
Copy link
Author

xarg commented Nov 13, 2017

The reason it's not a context manager is that context managers cannot perform retries: they execute their bodies zero or one times.

Since you can reuse the same session even if the transaction has failed I came up with this code (look at
cockroachdb_transaction context manager). I'm not sure if it would work with the SERIALIZATION_FAILURE? Trying to figure out how to test this.

from contextlib import contextmanager

import psycopg2
import psycopg2.errorcodes
import sqlalchemy.engine
import sqlalchemy.exc
import sqlalchemy.orm

from cockroachdb.sqlalchemy.dialect import savepoint_state

from xxx import cockroachdb_session


class _NestedTransaction(object):
    """Wraps begin_nested() to set the savepoint_state thread-local.

    This causes the savepoint statements that are a part of this retry
    loop to be rewritten by the dialect.
    """

    def __init__(self, session):
        self.session = session

    def __enter__(self):
        try:
            savepoint_state.cockroach_restart = True
            self.txn = self.session.begin_nested()
            if isinstance(self.session, sqlalchemy.orm.Session):
                # Sessions are lazy and don't execute the savepoint
                # query until you ask for the connection.
                self.session.connection()
        finally:
            savepoint_state.cockroach_restart = False
        return self

    def __exit__(self, typ, value, tb):
        try:
            savepoint_state.cockroach_restart = True
            self.txn.__exit__(typ, value, tb)
        finally:
            savepoint_state.cockroach_restart = False


@contextmanager
def cockroachdb_transaction():
    """A cockroach db transaction context manager"""

    with cockroachdb_session.begin():
        while True:
            try:
                with _NestedTransaction(cockroachdb_session):
                    yield cockroachdb_session
                    break
            except sqlalchemy.exc.DatabaseError as e:
                if isinstance(e.orig, psycopg2.OperationalError):
                    if e.orig.pgcode == psycopg2.errorcodes.SERIALIZATION_FAILURE:
                        continue
                raise

This is how I use it:

with cockroachdb_transaction() as session:
    session.query(Model).all()

@bdarnell
Copy link
Contributor

This won't work: @contextmanager assumes that cockroachdb_transaction() will yield exactly once. We have a special function for testing this: SELECT crdb_internal.force_retry('100ms') will fail with a serialization error if the current time is within 100ms of the transaction start time.

@xarg
Copy link
Author

xarg commented Nov 13, 2017

Thanks @bdarnell I see what you mean. Sorry if it's too basic of a question: but why can't the retry be implemented on the session level? Was it tried and SA does not allow it or not?

What I mean by session level is having something like:

s = Session()
try:
   x = s.query(M).first()
   x.attr = 1
   s.add(x)
   s.commit() # retry here? the session object already has all the changes
except NotAConflictException:
   s.rollback()
finally:
  s.close()

@bdarnell
Copy link
Contributor

On the second try, x = s.query(M).first() may return different results than it did the first time, and then s.add(x) would do something different. There's no way to know this without re-running your application's code.

Note that if you use the query builder interface (or raw sql) instead of the ORM, you may be able to construct a single SQL statement that does what you want. If so, the cockroach server can take care of all the retries for you and you shouldn't see serialization failures being raised to the application.

@xarg
Copy link
Author

xarg commented Nov 13, 2017

Thanks again for your response (and for having the patience with me). I still don't understand it well so let me try to illustrate this another way.

Suppose I have 2 async workers (worker A and B). They both try to update the same row (a user full name).

s = Session()
try:
   u = s.query(User).get(user_id)
   u.name = 'Joe ' + worker_name  
   u.add(u)
   u.commit()
except NotAConflictException:
   u.rollback()
finally:
  u.close()

Now A and B get there, select the user row and now A is first to update the name and to a SAVEPOINT RELEASE and B is behind and gets the conflict error.

In PG worker B will just wait until the lock is released from that row and update with the new value (overwriting the prev value) so in this scenario the name will be Joe B. I guess it is open for discussion if this is 'correct' or not. In some cases this is fine, in others if I wanted to make sure there is no accidental overwriting I would use WITH FOR UPDATE or some other synchronization method.

If I understood anything from our discussion in CRDB the above would raise a conflict and allow the app to retry the same transaction (optionally) checking that I'm not overwriting anything. I'm also guessing that this exacerbated by CRDB because it needs to sync between nodes so it could be that it's not really a different client doing work on that row but a CRDB node - so the conflict is more likely.

While I think that is the 'correct' way to do it (probably for any db) - it's verbose and I wonder why not use the Automatic Retries by default and also allow the developer to decide where to use the retry logic. Does that make sense?

@bdarnell
Copy link
Contributor

I guess it is open for discussion if this is 'correct' or not. In some cases this is fine, in others if I wanted to make sure there is no accidental overwriting I would use WITH FOR UPDATE or some other synchronization method.

This is a question of transaction isolation levels. Postgres defaults to READ COMMITTED, a very weak level of isolation which permits this kind of overwriting. CockroachDB defaults to SERIALIZABLE, which does not. For more on why this is important, see this blog post.

The retry loop is not technically cockroach-specific: If you changed the isolation level to SERIALIZABLE in postgres, you would get exactly the same restart behavior and need to use a loop like this. However, it's a much bigger issue for cockroach than postgres because of our defaults (we don't even support READ COMMITTED, so you can't just reduce the isolation level to avoid a restart here).

While I think that is the 'correct' way to do it (probably for any db) - it's verbose and I wonder why not use the Automatic Retries by default and also allow the developer to decide where to use the retry logic. Does that make sense?

We can only do automatic retries when we can see the entire transaction in the SQL (this generally requires constructing the queries by hand instead of using an ORM). We can't retry if the transaction is a combination of SQL and logic in your application code.

Note that you may not need to modify the way you run transactions if your application handles retries at another level. For example, in an offline-first mobile app you probably do all your writes asynchronously from the user's perspective and can tolerate network outages, so it doesn't matter if the server-side request returns an error when there is a transaction conflict.

@alanhamlett
Copy link

We can only do automatic retries when we can see the entire transaction in the SQL (this generally requires constructing the queries by hand instead of using an ORM). We can't retry if the transaction is a combination of SQL and logic in your application code.

How about detecting when someone uses nested transactions and disabling auto-retries? Or allowing the user to disable auto-retries and in that case supporting nested transactions?

@bdarnell
Copy link
Contributor

How about detecting when someone uses nested transactions and disabling auto-retries?

Unfortunately it doesn't work in that order. The retry protocol requires a SAVEPOINT statement to be issued before we know what the application code is going to do.

Or allowing the user to disable auto-retries and in that case supporting nested transactions?

You don't have to use run_transaction; you can control the transaction yourself if you want (so you can't get nesting, but you have one level of transaction under your control). But if you do that you must be aware of the chance of retryable errors (which are common in 1.1, but less so in the upcoming 2.0) and retry as needed.

@alanhamlett
Copy link

Oh, so one level of begin_nested already works since I'm not importing and using cockroachdb.sqlalchemy.run_transaction?

@bdarnell
Copy link
Contributor

No, begin_nested is not supported. You get one level of begin and commit if you avoid using cockroachdb.sqlalchemy.run_transaction

@alanhamlett
Copy link

alanhamlett commented Mar 20, 2018

Then Flask-SQLAlchemy is already using that one level in it's request session, so I can't use transactions.

I'm on v2.0-beta.20180319 without using run_transaction and haven't noticed any failed statements. Are they more common under heavy load or many nodes? I'm using only one node and haven't fully ramped up the production load yet.

@bdarnell
Copy link
Contributor

Retryable errors mainly occur in the presence of concurrent transactions that interfere with each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants