- 2024-02-05: First version (@melekes)
- 2024-05-28: Complete refactor (@melekes)
- 2024-06-07: Limit the scope to transaction hashing (@melekes)
- 2024-06-19: Explain why we don't expose this functionality in the CLI (@melekes)
Proposed
Hashing in CometBFT is currently implemented using crypto/tmhash
package, which itself relies on sha256
.
Among the things which are hashed are the block's header, evidence, consensus params, commit, partset header, transactions.
The transaction hash is used by:
- the built-in transaction indexer;
- the
/tx
and/tx_search
RPC endpoints, which allow users to search for a transaction using its hash; - mempool to identify transactions.
The problem some application developers are facing is a mismatch between the internal/app representation of transactions and the one employed by CometBFT. For example, Evmos wants transactions to be hashed using the RLP.
In order to be flexible, CometBFT needs to allow changing the transaction hashing algorithm if desired by the app developers.
The suggested solution could be used to change the hashing function for all structs, not just transactions. But the result of such a change is quite significant. If the chain is using a different hashing scheme, then it looses IBC-compatibility. The IBC modules assumes fixed hashing scheme. The destination chain needs to know the hashing function of the source chain in order to verify the validators hash. So, this remains a future work for now.
- Add
TxHashFunc
(transaction hashing function) toNewNode
as an option and pass this function down the stack => avoids gloval variables, but leads to a massive API breakage. The problem is we're not 100% sure this will be a final solution. So every time we decide to change it, we will be breaking tons of API. The suggested solution allows us to be more flexible. - Allow changing the hashing function for all structs => breaks IBC compatibility (see 'General hashing' above).
Give app developers a way to provide their own hash function.
Use sha256
by default, but give developers a way to change the hashing function:
import (
"crypto"
"hash"
"crypto/sha256"
)
var (
// Hash function used for transaction hashing.
txHash = crypto.SHA256
// fmtHash is a function that converts a byte slice to a string.
fmtHash = func(bz []byte) string {
return fmt.Sprintf("%X", bz)
}
)
// SetTxHash sets the hash function used for transaction hashing.
//
// Call this function before starting the node. Changing the hashing function
// after the chain has started can ONLY be done with a hard fork.
func SetTxHash(h crypto.Hash) {
txHash = h
}
// SetFmtHash sets the function used to convert a checksum to a string.
func SetFmtHash(f func([]byte) string) {
fmtHash = f
}
// Bytes is a wrapper around a byte slice that implements the fmt.Stringer.
type Bytes []byte
func (bz Bytes) String() string {
return fmtHash(bz)
}
func (bz Bytes) Bytes() []byte {
return bz
}
// Sum returns the checksum of the data as Bytes.
func Sum(bz []byte) Bytes {
return Bytes(TxHash.Hash.Sum(bz))
}
Let's break this down. By default, we use sha256
standard crypto library.
SetTxHash
allows developers to swap the default hashing function
with the hashing function of their choice. It will be used in:
- mempool;
- transaction indexer;
/tx
and/tx_search
RPC endpoints.
Note the Header's data_hash
will be different if the default hashing function
is changed.
SetFmtHash
allows developers to swap the default string function
(fmt.Sprintf("%X", bz)
) with their own implementation.
The design in the current ADR only aims to support custom hash functions, it does not support changing the hash function for an existing chain. If the application developer decides to change the default hashing scheme, they can only do so once before launching their app. If they attempt to upgrade after without a hard fork, the resulting hashes won't match. A hard fork would work.
The majority of chains should still use the default hashing function. That's
why we don't expose this functionality in the CLI or anything like that
(TxHashFunc
in NewNode
). Even though the number of chains using a different
hashing function can be significant, it's not the use-case we're optimizing
for. It's good to support it, but it's not the primary goal. Similarly, it's
good to support different p2p protocols, but we're optimizing for the default
one.
- Modular transaction hashing
- App developers need to take performance into account when choosing custom hash function.
- Global variables.