Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite IPAddress without union #225

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

safocl
Copy link

@safocl safocl commented Nov 8, 2023

Rewrite IPAddress without union

Using a union allowed undefined behavior for 8-bit integer INPUT and OUTPUT arguments through a 32-bit integer (and vice versa). Write through union::uint8_t[16] and read through union::uint32_t[4] is an undefined behavior.

C++ standard says (https://timsong-cpp.github.io/cppwp/n4659/class.union#1):

At most one of the non-static data members of an object of union type can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [ Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence, and if a non-static data member of an object of this standard-layout union type is active and is one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of the standard-layout struct members; see [class.mem].  — end note ]"

and (https://timsong-cpp.github.io/cppwp/n4659/class.mem#22):

Two standard-layout unions are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in any order) have layout-compatible types.

you can't hope that compilers don't seem to do undefined behavior at the moment

Also created espressif/arduino-esp32#8829 PR dependent on changing the current API.

Using a union allowed undefined behavior for 8-bit integer INPUT and OUTPUT arguments through a 32-bit integer (and vice versa).
Write throught union::uint8_t[16] and read thoutght union::uint32_t[4]
is a undefined behavior.

C++ standard says (https://timsong-cpp.github.io/cppwp/n4659/class.union#1):
"At most one of the non-static data members of an object of union type
can be active at any time, that is, the value of at most one of the
non-static data members can be stored in a union at any time. [ Note:
One special guarantee is made in order to simplify the use of unions:
If a standard-layout union contains several standard-layout structs
that share a common initial sequence, and if a non-static data member
of an object of this standard-layout union type is active and is one
of the standard-layout structs, it is permitted to inspect the common
initial sequence of any of the standard-layout struct members; see
[class.mem].  — end note ]"

and (https://timsong-cpp.github.io/cppwp/n4659/class.mem#22):
"Two standard-layout unions are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in any order) have layout-compatible types."

you can't hope that compilers don't seem to do undefined behavior at the moment

Also created espressif/arduino-esp32#8829 PR dependent on changing the current API.
@CLAassistant
Copy link

CLAassistant commented Nov 8, 2023

CLA assistant check
All committers have signed the CLA.

Begin index didn't be selected.
@codecov-commenter
Copy link

codecov-commenter commented Nov 22, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (82a5055) 95.53% compared to head (d85523f) 95.42%.

Files Patch % Lines
api/IPAddress.cpp 95.55% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #225      +/-   ##
==========================================
- Coverage   95.53%   95.42%   -0.11%     
==========================================
  Files          16       16              
  Lines        1075     1050      -25     
==========================================
- Hits         1027     1002      -25     
  Misses         48       48              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aentinger
Copy link
Contributor

I have problems with all those heavy changes in this PR, most of which seem to be motivated by using C++ standard library containers and patterns.

Don't you think the problem can be addressed by enforcing a packed union, i.e. something like this

    union {
        uint8_t bytes[16];
        uint32_t dword[4];
    } _address __attribute__((packed, aligned(4)));

Although I have not invented this particular instance I may add that I have used (and use) this pattern repeatedly over the years, most recently here and here. I have yet to experience undefined behaviour.

TL;DR: No to heavy changes including introduction of C++ STL types, lighter tough solutions are preferred.

@safocl
Copy link
Author

safocl commented Nov 28, 2023

I have problems with all those heavy changes in this PR, most of which seem to be motivated by using C++ standard library containers and patterns.

Don't you think the problem can be addressed by enforcing a packed union, i.e. something like this

    union {
        uint8_t bytes[16];
        uint32_t dword[4];
    } _address __attribute__((packed, aligned(4)));

Although I have not invented this particular instance I may add that I have used (and use) this pattern repeatedly over the years, most recently here and here. I have yet to experience undefined behaviour.

TL;DR: No to heavy changes including introduction of C++ STL types, lighter tough solutions are preferred.

This trouble with using of union is a undefined behavior when value was written as uint8_t[16] and read as uint32_t[4] -- it are not compatible layouts. But struct { int x, y; } and int[2] is layout-compatible -- it can already be used in conjunction with each other.

std::array<> used in this PR for STL-style only. it doesn't make any problems. Using of the c-style arrays are bad practice.

@safocl
Copy link
Author

safocl commented Nov 28, 2023

this functionality can be implemented through a C-style array

@sgryphon
Copy link
Contributor

Thanks for changing back to C-style array -- that is a separate change (from just removing the union).

I think removing the union makes a lot of sense, as it added very little benefit.

In practice it probably has the same result (even if accessing through alternate union members has unspecified behaviour, I think in most cases they would end up in the overlapping memory and just work). But then in practice, I don't know how often the 32-bit word variations are actually used.

An IPv6 would rarely be handled as 32-bit words... either bytes or 16-bit chunks (how it is displayed) makes sense.

For an IPv4 address there actually is a defined representation as a 32-bit integer, e.g. http://2398766798 is the same as http://142.250.70.206, and that may have been what the original intention was.

However on little endian systems the octets 0x83, 0xFA, 0x46, 0xCE, in that order, will form the integer (uint32_t) 3460758158, so while each 4-byte part of the array can be accessed as a 32-bit value, the result is platform byte-order dependent, and may not necessarily match with the usage of integers representations of IPv4 address.

It would be interesting to know what the usage scenario is for accessing as 32-bit words, and whether that whole section could be deprecated and eventually removed.

In the meanwhile, removing the union seems good.

@per1234 per1234 added the bug label Dec 17, 2023
@safocl
Copy link
Author

safocl commented Dec 17, 2023

In practice it probably has the same result (even if accessing through alternate union members has unspecified behavior, I think in most cases they would end up in the overlapping memory and just work).

It is undefined behavior, not unspecified. Any compiler can at any time assume that there is no undefined behavior and compile the program in such a way that a certain number of instructions (code) have been optimized to eliminate undefined behavior. That is, literally anything can happen - including all logic being cut out, since only in this case will it be possible to achieve the absence of undefined behavior - which the compiler will happily do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants