Skip to content
Tom Larkworthy edited this page Apr 11, 2014 · 12 revisions

Sends an item from user account to another, with the following features:

  • either user can abort
  • no deadlocks exists, even if one player goes offline
  • formally verified, items can never be created nor destroyed

source hsm, compiled rules

Design Sketch

the ordered stages of a successful transfer is:

  • IDLE: both players start in the idle state
  • TX: A states they want to send their item to B
  • RX: B states they want to receive the item from A
  • ACK_RX: A acknowledges B
  • ACK_TX: B acknowledges A
  • Boom, commit inventory updates

Send item interaction diagram

Prepare to send

For example, we want a player to be able to go from the IDLE state, to the TX state, if, and only if, they:

  • are authenticated
  • are in the IDLE state
  • are heading to the TX state
  • have an item to transfer
  • set a valid player target to transfer to, in tx_ptr
  • are not sending to themselves (not strictly necessary)
  • move their item from the item slot, to the tx_itm slot, by nulling item and setting tx_itm
  • don't mess around with rx_itm or rx_ptr

This can be expressed in a Firebase ".write" rule as follows:

(   //1: IDLE -> TX, self
    ($user == auth.username)                /*type  */
    && data.child('state').val() == 'IDLE'  /*from  */ 
    && newData.child('state').val() == 'TX' /*to    */ 
    && ( /*guards*/
    	data.child('item').val() != null &&
		root.child('users').child(newData.child('tx_ptr').val()).child('state').val()!=null &&
		newData.child('tx_ptr').val() != $user  		
	) && ( /*effect*/
		newData.child('tx_itm').val() == data.child('item').val() &&
		newData.child('tx_ptr').val() != null &&
		newData.child('item').val() == null
	)
	&& newData.child('rx_ptr').val() == data.child('rx_ptr').val() //lock for rx_ptr
	&& newData.child('rx_itm').val() == data.child('rx_itm').val() //lock for rx_itm
)

Prepare to receive

For a player to prepare to receive they must -be authenticated -in the IDLE state -heading to RX state -not have an item already -be receiving from a player that is sending to them -update the rx_itm record with the incoming item -not mess around with

This is another possible direction a players record could travel to, so the following boolean clause should be ORed with the ".write" rule above

( //2: IDLE -> RX, self
	($user == auth.username)                /*type  */
	&& data.child('state').val() == 'IDLE'  /*from  */
	&& newData.child('state').val() == 'RX' /*to    */
	&& ( /*guards*/
		data.child('item').val() == null &&
		$user == root.child('users').child(
            newData.child('rx_ptr').val()).child('tx_ptr').val()
	) && ( /*effect*/
		newData.child('rx_itm').val() == 
            root.child('users').child(newData.child('rx_ptr').val()).child('tx_itm').val() &&
		newData.child('rx_ptr').val() != null
	)
	&& newData.child('item').val() == data.child('item').val()     //lock for item
	&& newData.child('tx_ptr').val() == data.child('tx_ptr').val() //lock for tx_ptr
	&& newData.child('tx_itm').val() == data.child('tx_itm').val()
)

As you can see. Many of the clauses are boiler plate for expressing the starting and ending state, the authentication, and locking variables down that are not relevant. If you forget to do any of these things, the player will be able cheat. Firesafe lets you express the state machine logic for the boilerplate code, allowing you to concentrate on the tricky parts, the guard conditions (logical constraints that prevent a state transition) and effects (things that change the variables).

In Firesafe, we express the overall JSON structure of the Firebase as normal. Firesafe looks for special variables to expand into state machined, namely:

  • ".variables" -- which identifies which variables are to be locked
  • ".states" -- the possible states the machine can be in
  • ".transition_types" -- the different kinds of transitions that may exist, typically authentication rules
  • ".transitions" -- the transitions of the state machine

After the players have transitioned to RX and TX. The next stage is for the transmitter to ACK, then the receiver should ACK too. I felt the sender ACKing the receiver, is really a progression of the receivers state. Thus for our protocol, ACK_RX follows RX, but can only be performed by the sender.

We have three different transition types in the 2PC implementation, indicating whether the user is altering their own record, the person they are trading with, or either party.

".transition_types":{
	"self":"$user == auth.username",
	"other":"auth.username == data.child('rx_ptr').val() || 
             auth.username == data.child('tx_ptr').val()",
	"either":"$user == 
              auth.username || 
              auth.username == data.child('rx_ptr').val() || 
              auth.username == data.child('tx_ptr').val()"
}

Acknowledgment

So the implementation of the ACKing in Firesafe ".transitions" structure this looks like this

".transitions":{
	...
	"ACK_RX":{"from":"RX", "to":"ACK_RX", type:"other",
		"guard":"
			root.child('users').child(data.child('rx_ptr').val())
                .child('state').val() == 'TX' && 
			root.child('users').child(
                data.child('rx_ptr').val()).child('tx_ptr').val() == $user" //IMPORTANT
	},"ACK_TX":{"from":"TX", "to":"ACK_TX", type:"other",
		"guard":"
			root.child('users').child(
                newData.child('tx_ptr').val()).child('state').val() == 'ACK_RX'"
	} 
}

Quite a number of bugs were discovered during formal verification of this protocol around ACKing. In particular, without the proper guards following the rx_ptr, and tx_ptr pointers, it was possible for

  • Bill to send to John (Bill:TX, John:RX)
  • Bill rollback, then send to Fred (Bill: TX, John: RX, Fred: RX)
  • John ACK_RX, Fred ACK_RX and Bill ACK_TX).

John and Fred could then both commit, and both parties ended with Bill's item. A classic dupe exploit!

Committing

Actually committing the transaction is relatively straight forward. The only important thing is that after both parties ACK, either party is granted access to push the final state transitions through on each other. This avoids the problem of one player disconnecting and blocking the transaction. If that happens, the other player can update their record and does not get stranded in transaction limbo. Thus the transition type is set to either

"COMMIT_TX":{"from":"ACK_TX", "to":"IDLE", type:"either",
	"guard":"
		root.child('users').child(data.child('tx_ptr').val()).child('state').val() == 'ACK_RX'",
	"effect":"
		newData.child('item').val() == null &&
		newData.child('tx_ptr').val() == null &&
		newData.child('tx_itm').val() == null"
}

Rolling back

Before both parties ACK, one player might go offline. In this case the remaining player needs to be able to abort the transfer all the way back to their IDLE state:-

"CANCEL_ACK_RX":{"from":"ACK_RX", "to":"RX", type:"either",
	"guard":"
		root.child('users').child(data.child('rx_ptr').val()).child('state').val() == 'TX' &&
		root.child('users').child(data.child('rx_ptr').val()).child('tx_ptr').val() == $user"
}, 
"CANCEL_RX":{"from":"RX", "to":"IDLE", type:"either",
	"guard":"
		root.child('users').child(data.child('rx_ptr').val()).child('state').val() == 'TX'",
	"effect":"
		newData.child('rx_itm').val() == null &&
		newData.child('rx_ptr').val() == null &&
		newData.child('item').val()   == null"
}

The final stage of unwinding the transfer is undoing the initial IDLE->TX transition the sender made. The receive is not blocked however at this point, so it doesn't matter if only the sender can do this:-

"CANCEL_TX":{"from":"TX", "to":"IDLE", type:"self",
	"guard":"
		(root.child('users').child(data.child('tx_ptr').val()).child('state').val() != 'RX' &&
		root.child('users').child(data.child('tx_ptr').val()).child('state').val() != 'ACK_RX') ||(
			root.child('users').child(data.child('tx_ptr').val()).child('rx_ptr').val() != $user)",
	"effect":"
		newData.child('tx_itm').val() == null &&
		newData.child('tx_ptr').val() == null &&
		newData.child('item').val() == data.child('tx_itm').val()"
}

The guard conditions for this final abort was very tricky to get right. What could happen is the receiver aborts and goes to IDLE state, then initiates another transfer elsewhere. This then messes with what the TXer sees as the receivers state. So to cancel the TX, it is necessary to check the the receiver has canceled first (i.e. not in RX or ACK_RX) by inspecting their state, BUT ignoring that state information if the user is involved in a different transfer (check the receivers rx_ptr). Hence an OR clause makes its way into the guard!

Race Condition?

There is one final, totally mental situation that might occur. We stated that simultaneous ACK_TX and ACK_RX means the transaction is going to happen. However, what if the ACK_RX is canceled at the exact same time the ACT_TX occurs? I have no idea if this is possible on Firebase, but if it did happen the players would seize. So, just for safety, we have the ability to unwind the ACK_TX if the receiver is somehow in the RX state after they performed a CANCEL_RX

"CANCEL_ACK_TX":{"from":"ACK_TX", "to":"TX", type:"either", //rare/impossible event that ACK_RX->RX canceled simultaneously as ACK_TX
	"guard":"
		root.child('users').child(data.child('tx_ptr').val()).child('state').val() == 'RX' &&
		root.child('users').child(data.child('tx_ptr').val()).child('rx_ptr').val() == $user"
}

This situation was discovered in formal verification. I am not sure if its possible though. It depends on how Firebase deals with concurrent writes. I have erred on the side of caution by modeling writes as concurrent (if both parties pre-conditions check out, two writes can go through at exactly the same time).

However, we allow either user to rollback the transition before the ACK_TX.

Overview

The complete deadlock free with rollback:-

send_item.hsm

Client examples

Client side code for executing a transaction is in the unit tests