Skip to content

Commit

Permalink
Merge pull request #1 from dcousens/bip69fixes
Browse files Browse the repository at this point in the history
BIP69 edits by @dcousens
  • Loading branch information
Kristov Atlas committed Sep 11, 2015
2 parents 831a003 + 8368e75 commit fe9249a
Showing 1 changed file with 65 additions and 37 deletions.
102 changes: 65 additions & 37 deletions bip-0069.mediawiki
Original file line number Diff line number Diff line change
Expand Up @@ -2,79 +2,102 @@
BIP: BIP: 69
Title: Lexicographical Indexing of Transaction Inputs and Outputs
Authors: Kristov Atlas <kristov@openbitcoinprivacyproject.org>
Editors: Daniel Cousens <bips@dcousens.com>
Status: Draft
Type: Informational
Created: 2015-06-12
</pre>

==Abstract==

Currently there is no standard for bitcoin wallet clients when ordering transaction inputs and outputs. As a result, wallet clients often have a discernible blockchain fingerprint, and can leak private information about their users. By contrast, a standard for non-deterministic sorting could be difficult to audit. This document proposes deterministic lexicographical sorting, using hashes of previous transactions and output indices to sort transaction inputs, as well as values and scriptPubKeys to sort transaction outputs.
Currently there is no standard for bitcoin wallet clients when ordering transaction inputs and outputs.
As a result, wallet clients often have a discernible blockchain fingerprint, and can leak private information about their users.
By contrast, a standard for non-deterministic sorting could be difficult to audit.
This document proposes deterministic lexicographical sorting, using hashes of previous transactions and output indices to sort transaction inputs, as well as values and scriptPubKeys to sort transaction outputs.

==Copyright==

This BIP is in the public domain.

==Motivation==

Currently, there is no clear standard for how wallet clients ought to order transaction inputs and outputs. Since wallet clients are left to their own devices to determine this ordering, they often leak information about their users’ finances. For example, a wallet client might naively order inputs based on when addresses were added to a wallet by the user through importing or random generation. Many wallets will place spending outputs first and change outputs second, leaking information about both the sender and receiver’s finances to passive blockchain observers. Such information should remain private not only for the benefit of consumers, but in higher order financial systems must be kept secret to prevent fraud. A researcher recently demonstrated this principle when he detected that Bitstamp leaked information when creating exchange transactions, enabling potential espionage among traders. [1]

One way to address these privacy weaknesses is by randomly ordering inputs and outputs. [2] After all, the order of inputs and outputs does not impact the function of the transaction they belong to, making random sorting viable. Unfortunately, it can be difficult to prove that this sorting process is genuinely randomly sorted based on code or run-time analysis, especially if the software is closed source. A malicious software developer can abuse the ordering of inputs and outputs as a side channel of leaking information. For example, if an attacker can patch a victim’s HD wallet client to order inputs and outputs based on the bits of a master private key, then the attacker can eventually steal all of the victim’s funds by monitoring the blockchain. Non-deterministic methods of sorting are difficult to audit because they are not repeatable.

The lack of standardization between wallet clients when ordering inputs and outputs can yield predictable quirks that characterize particular wallet clients or services. Such quirks create unique fingerprints that a privacy attacker can employ through simple passive blockchain observation.

The solution is to create an algorithm for sorting transaction inputs and outputs that is deterministic. Since it is deterministic, it should also be unambiguous — that is, given a particular transaction, the proper order of inputs and outputs should be obvious. To make this standard as widely applicable as possible, it should rely on information that is downloaded by both full nodes (with or without typical efficiency techniques such as pruning) and SPV nodes. In order to ensure that it does not leak confidential data, it must rely on information that is publicly accessible through the blockchain. The use of public blockchain information also allows a transaction to be sorted even when it is a multi-party transaction, such as in the example of a CoinJoin.
Currently, there is no clear standard for how wallet clients ought to order transaction inputs and outputs.
Since wallet clients are left to their own devices to determine this ordering, they often leak information about their users’ finances.
For example, a wallet client might naively order inputs based on when addresses were added to a wallet by the user through importing or random generation.
Many wallets will place spending outputs first and change outputs second, leaking information about both the sender and receiver’s finances to passive blockchain observers.
Such information should remain private not only for the benefit of consumers, but in higher order financial systems must be kept secret to prevent fraud.
Currently, there is no clear standard for how wallet clients ought to order transaction inputs and outputs.
Since wallet clients are left to their own devices to determine this ordering, they often leak information about their users’ finances.
For example, a wallet client might naively order inputs based on when addresses were added to a wallet by the user through importing or random generation.
Many wallets will place spending outputs first and change outputs second, leaking information about both the sender and receiver’s finances to passive blockchain observers.
Such information should remain private not only for the benefit of consumers, but in higher order financial systems must be kept secret to prevent fraud.
A researcher recently demonstrated this principle when he detected that Bitstamp leaked information when creating exchange transactions, enabling potential espionage among traders. [1]

One way to address these privacy weaknesses is by randomly ordering inputs and outputs. [2]
After all, the order of inputs and outputs does not impact the function of the transaction they belong to, making random sorting viable.
Unfortunately, it can be difficult to prove that this sorting process is genuinely randomly sorted based on code or run-time analysis, especially if the software is closed source.
A malicious software developer can abuse the ordering of inputs and outputs as a side channel of leaking information.
For example, if an attacker can patch a victim’s HD wallet client to order inputs and outputs based on the bits of a master private key, then the attacker can eventually steal all of the victim’s funds by monitoring the blockchain.
Non-deterministic methods of sorting are difficult to audit because they are not repeatable.

The lack of standardization between wallet clients when ordering inputs and outputs can yield predictable quirks that characterize particular wallet clients or services.
Such quirks create unique fingerprints that a privacy attacker can employ through simple passive blockchain observation.

The solution is to create an algorithm for sorting transaction inputs and outputs that is deterministic.
Since it is deterministic, it should also be unambiguous — that is, given a particular transaction, the proper order of inputs and outputs should be obvious.
To make this standard as widely applicable as possible, it should rely on information that is downloaded by both full nodes (with or without typical efficiency techniques such as pruning) and SPV nodes.
In order to ensure that it does not leak confidential data, it must rely on information that is publicly accessible through the blockchain.
The use of public blockchain information also allows a transaction to be sorted even when it is a multi-party transaction, such as in the example of a CoinJoin.

==Specification==

===Applicability===

This BIP applies to any transaction for which the order of its inputs and outputs does not impact the transaction’s function. Currently, this refers to any transaction that employs the SIGHASH_ALL signature hash type, in which signatures commit to the exact order of inputs and outputs. Transactions that use SIGHASH_ANYONECANPAY and/or SIGHASH_NONE may include inputs and/or outputs that are not signed; however, compliant software should still emit transactions with lexicographically sorted inputs and outputs, even though they may later be modified by others.
This BIP applies to any transaction for which the order of its inputs and outputs does not impact the transaction’s function.
Currently, this refers to any transaction that employs the SIGHASH_ALL signature hash type, in which signatures commit to the exact order of inputs and outputs.
Transactions that use SIGHASH_ANYONECANPAY and/or SIGHASH_NONE may include inputs and/or outputs that are not signed; however, compliant software should still emit transactions with lexicographically sorted inputs and outputs, even though they may later be modified by others.

In the event that future protocol upgrades introduce new signature hash types, compliant software should apply the lexicographical ordering principle analogously.

While out of scope of this BIP, protocols that do require a specified order of inputs/outputs (e.g. due to use of SIGHASH_SINGLE) should consider the goals of this BIP and how best to adapt them to the specific needs of those protocols.

===Lexicographical Sorting===
===Lexicographical Ordering===

Ordering of inputs and outputs will rely on the output of sorting functions. These functions can be defined as taking two inputs or two outputs as parameters and returning their appropriate ordering with respect to each other.
Lexicographical ordering is an algorithm for comparison used to sort two sets based on their cartesian order within their common superset.
Lexicographic order is also often known as alphabetical order, or dictionary order.

Byte arrays must be sorted with an algorithm that produces the same output as the following comparison algorithm:
Common implementations include:

<nowiki>#returns -1 if barr1 is lesser, 1 if barr1 is greater, and 0 if equal
def bytearr_cmp(barr1, barr2):
pos = 0
while (pos < len(barr1) and pos < len(barr2)):
if (barr1[pos] < barr2[pos]):
return -1;
elif (barr1[pos] > barr2[pos]):
return 1;
pos = pos + 1
#the shorter array will be ordered first
if (len(barr1) < len(barr2)):
return -1
elif (len(barr1) > len(barr2)):
return 1
else:
return 0</nowiki>
* `std::lexicographical_compare` in C++ [5]
* `cmp` in Python 2.7
* `memcmp` in C [6]
* `Buffer.compare` in Node.js [7]
N.B. These comparisons do not need to operate in constant time since they are not processing secret information.
For more information, see the wikipedia entry on Lexicographical order. [8]

===Transaction Inputs===
N.B. All comparisons do not need to operate in constant time since they are not processing secret information.

104 Transaction inputs are defined by the hash of a previous transaction, the output index of of a UTXO from that previous transaction, the size of an unlocking script, the unlocking script, and a sequence number. [3] For sorting inputs, the hash of the previous transaction and the output index within that transaction are sufficient for sorting purposes; each transaction hash has an extremely high probability of being unique in the blockchain — this is enforced for coinbase transactions by BIP30 — and output indices within a transaction are unique. For the sake of efficiency, transaction hashes should be compared first before output indices, since output indices from different transactions are often equivalent, while all bytes of the transaction hash are effectively random variables.
===Transaction Inputs===

Hashes of previous transactions are considered for the purposes of this BIP in their little-endian, byte array format in order to match the traditional, human-readable string representation of the hashes. They must be sorted in accordance with the output of the bytearr_cmp() function above: the hash with the earliest lesser byte is ordered first, and shorter hashes are ordered before longer ones as a tie-breaker. In the event of two matching transaction hashes, output indices will be compared based on their integer value, with the smaller value ordered first. A further tie is extremely improbable for the aforementioned reasons.
Transaction inputs are defined by the hash of a previous transaction, the output index of of a UTXO from that previous transaction, the size of an unlocking script, the unlocking script, and a sequence number. [3]
For sorting inputs, the hash of the previous transaction and the output index within that transaction are sufficient for sorting purposes; each transaction hash has an extremely high probability of being unique in the blockchain — this is enforced for coinbase transactions by BIP30 — and output indices within a transaction are unique.
For the sake of efficiency, transaction hashes should be compared first before output indices, since output indices from different transactions are often equivalent, while all bytes of the transaction hash are effectively random variables.

Because the hash of previous transactions and output indices must be included in a signed transaction, wallet clients capable of signing transactions will necessarily have access to this data.
Previous transaction hashes (in reversed byte-order) are to be sorted in ascending order, lexicographically.
In the event of two matching transaction hashes, the respective previous output indices will be compared by their integer value, in ascending order.
If the previous output indices match, the inputs are considered equal.

Transaction malleability will not negatively impact the correctness of this process. Even if a wallet client follows this process using unconfirmed UTXOs as inputs and an attacker changes modifies the blockchain’s record of the hash of the previous transaction, the wallet client will include the invalidated previous transaction hash in its input data, and will still correctly sort with respect to that invalidated hash.
Transaction malleability will not negatively impact the correctness of this process.
Even if a wallet client follows this process using unconfirmed UTXOs as inputs and an attacker changes modifies the blockchain’s record of the hash of the previous transaction, the wallet client will include the invalidated previous transaction hash in its input data, and will still correctly sort with respect to that invalidated hash.

===Transaction Outputs===

A transaction output is defined by its scriptPubKey and amount. [3] For sorting purposes, we will consider a scriptPubKey in its byte array representation, and a bitcoin amount in terms of their integer number of satoshis (smallest amount ordered first).
A transaction output is defined by its scriptPubKey and amount. [3]
For the sake of efficiency, amounts should be compared first for sorting, since they contain fewer bytes of information (8 bytes) compared to a standard P2PKH scriptPubKey (25 bytes). [4]

For the sake of efficiency, amounts will be considered first for sorting, since they contain fewer bytes of information (8 bytes) compared to a standard P2PKH scriptPubKey (25 bytes). [4] When the values are tied, the scriptPubKey is then considered. In the event of a tie between scriptPubKeys, sorting is irrelevant since the outputs are exactly equivalent.
Transaction output amounts (as 64-bit unsigned integers) are to be sorted in ascending order.
In the event of two matching output amounts, the respective output scriptPubKeys (as a byte-array) will be compared lexicographically, in ascending order.
If the scriptPubKeys match, the outputs are considered equal.

===Examples===

Expand Down Expand Up @@ -128,6 +151,10 @@ Outputs:
* [[https://github.com/OpenBitcoinPrivacyProject/wallet-ratings/blob/master/2015-1/criteria.md|2: OBPP Random Indexing as Countermeasure]]
* [[https://github.com/aantonop/bitcoinbook/blob/develop/ch05.asciidoc|3: Mastering Bitcoin]]
* [[https://en.bitcoin.it/wiki/Script|4: Bitcoin Wiki on Script]]
* [[http://www.cplusplus.com/reference/algorithm/lexicographical_compare|5: std::lexicographical_compare]]
* [[http://www.cplusplus.com/reference/cstring/memcmp|6: memcmp]]
* [[https://nodejs.org/api/buffer.html#buffer_class_method_buffer_compare_buf1_buf2|7: Buffer.compare]]
* [[https://en.wikipedia.org/wiki/Lexicographical_order|8: Lexicographical order]]
==Implementations==

Expand All @@ -138,4 +165,5 @@ Outputs:
==Acknowledgements==

Danno Ferrin &lt;danno@numisight.com&gt;, Sergio Demian Lerner &lt;sergiolerner@certimix.com&gt;, Justus Ranvier &lt;justus@openbitcoinprivacyproject.org&gt;, and Peter Todd &lt;pete@petertodd.org&gt; contributed to the design and motivations for this BIP. A similar proposal was submitted to the Bitcoin-dev mailing list independently by Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Danno Ferrin &lt;danno@numisight.com&gt;, Sergio Demian Lerner &lt;sergiolerner@certimix.com&gt;, Justus Ranvier &lt;justus@openbitcoinprivacyproject.org&gt;, and Peter Todd &lt;pete@petertodd.org&gt; contributed to the design and motivations for this BIP.
A similar proposal was submitted to the Bitcoin-dev mailing list independently by Rusty Russell &lt;rusty@rustcorp.com.au&gt;

0 comments on commit fe9249a

Please sign in to comment.