-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor: use wire-formatted byte arrays to represent names internally, no more strings #90
base: master
Are you sure you want to change the base?
Refactor: use wire-formatted byte arrays to represent names internally, no more strings #90
Conversation
93d7a08
to
db26f1a
Compare
c397d56
to
5c3d41f
Compare
750b316
to
1aa326d
Compare
@@ -3351,7 +3196,7 @@ hsk_dns_rrs_clean(hsk_dns_rrs_t *rrs, uint16_t type) { | |||
*/ | |||
|
|||
bool | |||
hsk_dns_is_subdomain(const char *parent, const char *child) { | |||
hsk_dns_is_subdomain(const uint8_t *parent, const uint8_t *child) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While playing with this method I wrote the tests for this and it fails when child == parent
.
Test case can be included in test/hnsd-test.c
. Third test will fail both on master and this branch.
So this bug is not related to the PR, but in general to this method.
void
test_hsk_dns_is_subdomain() {
printf(" test_hsk_dns_is_subdomain\n");
typedef struct {
uint8_t parent[255];
uint8_t child[255];
bool result;
} hsk_dns_is_subdomain_t;
const hsk_dns_is_subdomain_t tests[] = {
{
.parent = "\x09""handshake""\x03""org""\x00",
.child = "\x04""hnsd""\x09""handshake""\x03""org""\x00",
.result = true
},
{
.parent = "\x04""hnsd""\x09""handshake""\x03""org""\x00",
.child = "\x09""handshake""\x03""org""\x00",
.result = false
},
{
.parent = "\x09""handshake""\x03""org""\x00",
.child = "\x09""handshake""\x03""org""\x00",
.result = true
},
{
.parent = "\x09""handshake""\x03""org""\x00",
.child = "\x04""test""\x03""org""\x00",
.result = false
}
};
for (int i = 0; i < ARRAY_SIZE(tests); i++) {
const hsk_dns_is_subdomain_t test = tests[i];
bool result = hsk_dns_is_subdomain(test.parent, test.child);
assert(result == test.result);
}
}
Javascript version will return expected results.
https://github.com/chjj/bns/blob/1bc5d056f4372e62cefd43112ca8b1ebc3372f86/lib/util.js#L317-L319
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed that equal name should count here looking at knot resolver tests (and their knot_dname_in_bailiwick()
function)
https://gitlab.nic.cz/knot/knot-dns/blob/master/tests/libknot/test_dname.c#L522-529
fixed and added test in fe6e1d5
src/dns.h
Outdated
* cmp: (optional) map of labels and their position in the DNS | ||
* message for label compression | ||
*/ | ||
static bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the function is static, it's private to that file. So it should not be part of header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 fixed in f5200ab
src/ns.c
Outdated
if (memcmp(req->tld, "\x03""bit", 4) == 0 // Namecoin | ||
|| memcmp(req->tld, "\x03""eth", 4) == 0 // ENS | ||
|| memcmp(req->tld, "\x03""exit", 5) == 0 // Tor | ||
|| memcmp(req->tld, "\x03""gnu", 4) == 0 // GNUnet (GNS) | ||
|| memcmp(req->tld, "\x03""i2p", 4) == 0 // Invisible Internet Project | ||
|| memcmp(req->tld, "\x03""onion", 6) == 0 // Tor | ||
|| memcmp(req->tld, "\x03""tor", 4) == 0 // OnioNS | ||
|| memcmp(req->tld, "\x03""zkey", 5) == 0) { // GNS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not this instead be:
if (memcmp(req->tld, "\x03""bit", 4) == 0 // Namecoin
|| memcmp(req->tld, "\x03""eth", 4) == 0 // ENS
|| memcmp(req->tld, "\x04""exit", 5) == 0 // Tor
|| memcmp(req->tld, "\x03""gnu", 4) == 0 // GNUnet (GNS)
|| memcmp(req->tld, "\x03""i2p", 4) == 0 // Invisible Internet Project
|| memcmp(req->tld, "\x05""onion", 6) == 0 // Tor
|| memcmp(req->tld, "\x03""tor", 4) == 0 // OnioNS
|| memcmp(req->tld, "\x04""zkey", 5) == 0) { // GNS
First byte size does not match the length of the label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or +1 to the third param to check for ending 0x00
byte
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 fixed in f5200ab
src/req.h
Outdated
@@ -27,8 +27,11 @@ typedef struct { | |||
size_t max_size; | |||
bool dnssec; | |||
|
|||
// For Unbound | |||
char namestr[HSK_DNS_MAX_NAME_STRING]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding this here, I believe it's more suitable to have it allocated only in hsk_rs_worker_resolve
, where it's actually needed. There's no need to carry full string version just for unbound call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great call thanks, added in 1fffc2b
src/rs.c
Outdated
// Unbound's name resolution API expects a single null-terminated string. | ||
// Since 0x00 is a valid byte mid-label in wire format we need to | ||
// convert `req->name` to presentation format (i.e. "\000" for 0x00) | ||
if (!hsk_dns_name_to_string(req->name, req->namestr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding this here, I believe it's more suitable to have it allocated only in hsk_rs_worker_resolve, where it's actually needed. There's no need to carry full string version just for unbound call.
from req.h comment
Here we can also get rid of this part and only translate to string inside hsk_rs_worker_resolve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 1fffc2b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of char *
for the name we could use typedef
and give better name. It will make code much more readable whenever we are working with names. This could also apply to labels
, because labels are always in the form of size + label
, it will clarify what to expect (also MAX_SIZE 63 + 1)
* Out: ret: pointer to string to be filled with name from | ||
* specified label through end of name | ||
* Out: ret: pointer to byte array to be filled with name from | ||
* specified label through end of name including final 0x00 | ||
*/ | ||
int | ||
hsk_dns_label_from2( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think name of this method is misleading, it returns sub name - not a specific label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it returns a name "starting from a given label" i.e. "label from". I suppose we could rename it hsk_dns_name_from_label()
?
// HSK_DNS_MAX_NAME includes all length bytes and final 0x00 | ||
#define HSK_DNS_MAX_NAME 255 | ||
// Byte arrays that include a length byte and final 0x00 (".") | ||
// will need to be size HSK_DNS_MAX_LABEL + 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think final 0x00 is necessary when we are working with the single label. Unless we want to mimic the name, in that case label is just a name with just single label in it. But because currently we don't have a way to differentiate, maybe it's easier that everything is in the name format ? (even labels?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But because currently we don't have a way to differentiate, maybe it's easier that everything is in the name format
This is my thinking as well. So all single labels or TLDs etc work just like full names do, i.e. they are read one label at a time starting with length until a label with length 0 is reached (the final 0x00)
char *n = malloc(size + 1); | ||
|
||
if (!n) | ||
hsk_dns_name_verify(const uint8_t *name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name of this method in hnsd is misleading. It's hsk_dns_tld_verify
instead of name_verify
. In hsd it's easier to understand rules->nameVerify automatically means tld
verify. But not in hnsd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth to check if this needs to be returned back to the methods we removed this as well as check dirty names. Even though dirty names are not a thing, we still need to make sure TLDs are following Handshake rules on top of DNS ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth to check if this needs to be returned back to the methods we removed this as well as check dirty names. Even though dirty names are not a thing, we still need to make sure TLDs are following Handshake rules on top of DNS ones.
Again I think that since we are sanitizing all input with this in hsk_ns_onrecv()
I don't think we need to put it back anywhere else downstream ...?
assert(ck); | ||
|
||
if (!hsk_dns_name_verify(name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we still need to verify if tld
is correct ?
Otherwise this method will never return false.
I guess that could make this more precise and instead move tld check to the hsk_cache_insert_data
and hsk_cache_get_data
before the cache_key_set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method will become void
and instead those two methods will return early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I think we can remove the return value. We check hsk_dns_name_verify()
in hsk_ns_onrecv()
which is the entry point for the user's request and is checked before attempting name resolution and before caching. So I think throughout hnsd we can assume that req->tld
is already valid.
so I changed hsk_cache_key_set()
to void -- what did you mean by "two methods" ?
fixed in f5200ab
@@ -276,30 +272,6 @@ hsk_cache_key_set(hsk_cache_key_t *ck, const char *name, uint16_t type) { | |||
case 2: | |||
ref = !hsk_resource_is_ptr(name); | |||
break; | |||
case 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with @pinheadmz, this should have been removed even before this update. As root wont parse/recognize SRV/TLSA//SMIMEA and OPENPGPKEY records at all.
@@ -473,14 +466,22 @@ hsk_resource_has_ns(const hsk_resource_t *res) { | |||
return false; | |||
} | |||
|
|||
void | |||
hsk_resource_write_synth(char *b32, uint8_t *name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we can just use memcpy
instead of sprintfs, sprintf does not make it more readable or easier to work with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it's more readable and consistent with other methods, if the uint8_t *name
- where we write - is first and char *b32
- we copy from - second arg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the example of this method (not fully tested): https://github.com/nodech/hnsd-misc-c/tree/master/sprintf-v-memcpy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it's more readable and consistent with other methods, if the
uint8_t *name
- where we write - is first andchar *b32
- we copy from - second arg.
I didn't make this change yet because I think the hnsd style for these kinds of methods is more like _write(in, out)
, such as hsk_dns_msg_write(const hsk_dns_msg_t *msg, uint8_t **data)
. Even though I think you're right about standard methods like memcpy(dest, src)
?
I copied your memcpy implementation of this function from your benchmark test, thanks!
fixed in f5200ab
src/resource.c
Outdated
} else { | ||
// NS and GLUE records have the NS names ready to go. | ||
assert(hsk_dns_name_is_fqdn(c->name)); | ||
strcpy(nsname, c->name); | ||
memcpy(rd->ns, c->name, sizeof(c->name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sizeof will always be HSK_DNS_MAX_NAME. This could be also rewritten using name_cpy
alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 switched to the constant for now in f5200ab
(will write the new helpers in future commit)
return false; | ||
|
||
int j = hsk_base32_decode_hex(&label[1], ip, false); | ||
// We can cast label to a string for the base32 function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this specific to PTR types? Because I believe 0x00
can be part of the label in general. (one of the advantages of using byte array instead of cstrings.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so because it's base32 encoded(which should not contain 0x00).
src/req.c
Outdated
|
||
// Reference. | ||
req->ns = NULL; | ||
|
||
// DNS stuff. | ||
req->id = msg->id; | ||
req->labels = hsk_dns_label_count(qs->name); | ||
strcpy(req->name, qs->name); | ||
memcpy(req->name, qs->name, sizeof(qs->name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sizeof will always be HSK_DNS_MAX_NAME.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 fixed in f5200ab
@@ -541,21 +544,21 @@ hsk_pool_resolve( | |||
return HSK_SUCCESS; | |||
} | |||
|
|||
hsk_name_req_t *head = hsk_map_get(&peer->names, req->hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peer->names
is list of tlds
instead, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah the names we have requested proofs for from a given peer.
src/pool.c
Outdated
|
||
for (req = pool->pending; req; req = next) { | ||
next = req->next; | ||
|
||
req->callback( | ||
req->name, | ||
(uint8_t *)req->tld, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this cast is not necessary as req->tld is uint8_t anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks, fixed in f5200ab
but actually looks like hsk_pool_merge_reqs()
is never actually used either! (only reference is commented out)
src/pool.c
Outdated
@@ -693,7 +696,7 @@ hsk_peer_timeout_reqs(hsk_peer_t *peer) { | |||
next = req->next; | |||
|
|||
req->callback( | |||
req->name, | |||
(uint8_t *)req->tld, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in f5200ab
This function used to convert wire-formatted byte arrays to strings. We are no longer using strings for names internally so this function got much simpler. All it really needs to do now is un-compress a name from a DNS message packet. Because this function is the main entry point for all names in the program, it is also critical to verify label and name size rules. Once names are ingested by name_parse() and stored in structs, we can for the most part assume they are valid.
This function used to convert names from strings to wire-formatted byte arrays. We are no longer using strings for names internally so this function got much simpler. All it really needs to do now is compress a name into a DNS message by adding label pointers where necessary. Because this function is the main output point for names it is critical to verify label and name lengths when we write.
hsk_dns_name_serialize() takes an argument hsk_dns_cmp_t which includes a generic map object hsk_map_t. This map is used to track names and labels as a message is being written, so compression pointers can be applied. Since names are byte arrays now and not strings this map needs to be reimplemented with corrected hash and equality functions.
Internally, hnsd keeps all names in wire-formatted byte arrays. There are only a few places where the name must be converted to a printable string, which requires "presentation format" e.g. //DDD for unprintable characters, etc. To string: - Log messages that print names being resolved - libunbound's recursive resolver API From string: - Only needed for testing (hnsd doesn't accept strings as input nor use them internally)
In addition to procesisng byte arrays instead of strings, repurpose this function to check a single label for validity as a Handshake TLD. The most important time to use it is when receiving a user's request. TLDs need to be valid before hashing and requesting Urkel proofs from full nodes.
These label utilities used to process strings but now we represent all names as byte arrays. For consistency, these functions always output in DNS wire format, even single labels. That means output always begins with a length byte and ends with a 0x00 terminator (aka ".")
In addition to refactoring hsk_ns_onrecv() to deal with names as byte arrays instead of strings, this fixes a bug where the root server wasn't properly handling requests for _synth by itself. Since a request for `_synth` has only one label, we should not request label with index -2 using hsk_dns_label_from()
Update the name_hash function (which is used when requesting proofs from full nodes) since names are not strings any more. In addition, try to be consistent with variable naming. "name" could be multiple labels, "tld" is always one label and, in the context of pool, important for requesting and verifying Urkel proofs.
https://www.rfc-editor.org/rfc/rfc3597#section-4 > To avoid such corruption, servers MUST NOT compress domain names > embedded in the RDATA of types that are class-specific or not well- > known. This requirement was stated in [RFC1123] without defining > the term "well-known"; it is hereby specified that only the RR types > defined in [RFC1035] are to be considered "well-known"." This commit was checked against pdns: https://github.com/PowerDNS/pdns/blob/master/pdns/dnsrecords.cc
Before this commmit, if record data was invalid and ..._read() failed, we would `goto fail` which would call hsk_dns_rrs_uninit(). However, uninit() only frees rr objects if the rrset has a size > 0 and only frees that many rr objects. By calling hsk_dns_rrs_push() BEFORE ..._read() we increment that size value and that ensures that if there is a failure, the rr will be freed. The rr is already allocated and pushing its pointer into the rrset before we write its data doesn't affect anything else.
Also updates the hard-coded root zone, see handshake-org/hs-names#12
f3d1cb6
to
e307bb7
Compare
Farewell
char *name
!In general, names in hnsd are now always represented as byte arrays, serialized in DNS wire format. That means (example):
<label 1 length> <label 1> <label 2 length> <label 2> 0x00
Even if we are just handling a single label like a TLD, we still store it like this. This means that full names are always max 255 bytes and single labels are always max 65 bytes (63 max label + 1 for length and + 1 for
0x00
)We only convert back to string ("presentation format") to print names out the log, and to pass to libunbound recursive.
For convention, variables called
name
can be assumed to be full names or possibly greater than one label. Variables namedtld
should just be a single label, but still prefixed with length and trailed by0x00
TODO:
RP
record so it can be tested (record data should not be compressed)[ ? ] add in the hsd integration tests from #87 (or maybe that belongs in followup PR)