Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option<bool> doesn't (de)serialize properly with serde #497

Open
dralley opened this issue Oct 28, 2022 · 15 comments
Open

Option<bool> doesn't (de)serialize properly with serde #497

dralley opened this issue Oct 28, 2022 · 15 comments
Labels
bug serde Issues related to mapping from Rust types to XML

Comments

@dralley
Copy link
Collaborator

dralley commented Oct 28, 2022

Using the following code:

use serde;
use quick_xml;

#[derive(serde::Serialize, serde::Deserialize)]
pub struct Player {
    spawn_forced: Option<bool>
}

fn main() {
    let data = Player {
       spawn_forced: None,
    };

    let mut deserialize_buffer = String::new();
    quick_xml::se::to_writer(&mut deserialize_buffer, &data).unwrap();
    println!("{}", &deserialize_buffer);

    quick_xml::de::from_reader::<&[u8], Player>(&deserialize_buffer.as_bytes())
                    .unwrap();
}

The code crashes during the deserialize step

[dalley@thinkpad playground]$ cargo run
   Compiling playground v0.1.0 (/home/dalley/Devel/practice/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.28s
     Running `target/debug/playground`
<Player><spawn_forced/></Player>
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidBoolean("")', src/main.rs:98:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

If instead spawn_forced is set to Some(true) or Some(false), it works fine. Only None, which serializes to <spawn_forced/>, panics upon deserialization.

[dalley@thinkpad playground]$ cargo run
   Compiling playground v0.1.0 (/home/dalley/Devel/practice/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.44s
     Running `target/debug/playground`
<Player><spawn_forced>true</spawn_forced></Player>
[dalley@thinkpad playground]$ cargo run
   Compiling playground v0.1.0 (/home/dalley/Devel/practice/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.28s
     Running `target/debug/playground`
<Player><spawn_forced>false</spawn_forced></Player>
@dralley dralley added bug serde Issues related to mapping from Rust types to XML labels Oct 28, 2022
@dralley
Copy link
Collaborator Author

dralley commented Oct 28, 2022

Test case minimized from djkoloski/rust_serialization_benchmark#34

@dralley
Copy link
Collaborator Author

dralley commented Oct 29, 2022

This is using the current master branch ^^

@Mingun
Copy link
Collaborator

Mingun commented Oct 30, 2022

This bug is not specific to bool -- any Option<T> fields that deserializing from empty element, are affecting. Let's look at such XMLs:

<root/>                               <!-- (1) None -->
<root><element/></root>               <!-- (2) ? -->
<root><element>$DATA</element></root> <!-- (3) Some(...) -->

While result for (1) and (3) are pretty obvious, that is hard to say for (2). The JSON equivalent of that structs is

{}
{ "element": null }
{ "element": $DATA }

How we should map the case (2)? serde_json maps it to None always. That means, that, for example, Option<()> and any Option<UnitStruct> are always deserialized as None, while they are serialized into "value": null.

quick-xml currently does the opposite: it always deserializes to Some(...). It is a bit tricky to handle this similar to JSON, because our Deserializer::deserialize_option does not advance event pointer, so the type under Option has access to an Event::Start. We only lookahead to one event, but Deserializer::deserialize_option requires lookahead for 2 events: we need to look at Event::Text in order to see that it is empty and we need to return None. The Event::Text generated, because <element/> is represented as <element></element> for a deserializer and we want no difference in handling between that two representations.

Fixing that likely will have a consequence, that Option<&str> and Option<String> will never be deserialized to Some("").

@dralley
Copy link
Collaborator Author

dralley commented Oct 31, 2022

How we should map the case (2)? serde_json maps it to None always. That means, that, for example, Option<()> and any Option are always deserialized as None, while they are serialized into "value": null.

Would it be possible to use a special attribute value, e.g. <element none="true" />?

@Mingun
Copy link
Collaborator

Mingun commented Oct 31, 2022

I would try to avoid that and use #[serde(deserialize_with = "path")] instead, if that would be possible, but handling xsi:nil="true" in that sense could be useful for interop (#464).

@Mingun
Copy link
Collaborator

Mingun commented Nov 20, 2022

Actually, that case is much more complex then I thought initially. It is unclear, how to process cases like

<element attribute="..."/>

At time of deciding if we should return None or not, we don't known how the type under Option would be deserialized. If it is a

struct Struct {
  #[serde(rename = "@attribute")]
  attribute: (),
}

then this XML snippet are definitely not None.

Because of the nature of XML we like to keep ability to deserialize things like

<element attribute="...">
  content
</element>

into a String "content", i.e. ignore excess attributes. This is unique situation for XML, because in JSON that XML would look like

{
  "@attribute": "...",
  "$text": "content"
}

and could not be deserialized into a String. Unfortunately, the situation when additional attributes added to the element are quite usual, so I think we should support that case. That means, however, that deserializing

<element attribute="..."/>

into an Option<String> we should return the same result as for

<element/>

which suggested to be None.

@RoJac88
Copy link

RoJac88 commented Jul 28, 2023

Is there any update on this, or maybe some kind of workaraound?

Currently the only way I cound get the serialization of an optional value to work is when the target type is Option<String>, but even then the result is not correct, since empty strings serialize to Some("") and not None.

@LB767
Copy link

LB767 commented Aug 21, 2023

The simplest workaround I found if you control both the serializing and de-serializing parts is to use the following serde attributes:
#[serde(skip_serializing_if = "Option::is_none", default)]

This problem is still quite bothersome though...

@jonaspleyer
Copy link

jonaspleyer commented Jul 16, 2024

The simplest workaround I found if you control both the serializing and de-serializing parts is to use the following serde attributes: #[serde(skip_serializing_if = "Option::is_none", default)]

This problem is still quite bothersome though...

To me this is not a good solution because I rely on multiple storage solutions (eg. Json, Ron, Sled) with the same (de)-serialization. I am mostly concerned with round-trips Rust -> xml -> Rust and obtaining identical results. Editing the serialization options could have non-desired effects in my other storage solutions. Thus I will avoid this approach.

@Mingun
Copy link
Collaborator

Mingun commented Jul 16, 2024

You have problems in round-trips in JSON too, and I think in any other format, they just different. If take JSON as a base point, at least #[serde(skip_serializing_if = "Option::is_none")] is necessary.

@jonaspleyer
Copy link

Yes I am aware that perfect round-trips are problematic in json as well. However, this MWE does work:

use serde::{Deserialize, Serialize};

fn main() {
    #[derive(Clone, Debug, Eq, PartialEq, Deserialize, Serialize)]
    struct Identifier {
        parent: Option<u64>,
    }
    let ident = Identifier { parent: None, };
    let json_string = serde_json::to_string(&ident).unwrap();
    let ident2: Identifier = serde_json::from_str(&json_string).unwrap();
    assert_eq!(ident, ident2);
}

which is my main concern at this point in time.

@MasterTemple
Copy link

Hello, I had a similar issue and would like to share my solution if it is of help to anyone in the future.

I wanted an element like <some_boolean_flag/> to appear when true and not appear when false.
I assume something like this was the original poster was intent with using Option<bool>.

Example

Initial Content:

<MyStruct>
    <some_field>123</some_field>
    <my_flag/>
</MyStruct>

Deserialized Struct:

MyStruct {
    some_field: 123,
    my_flag: true,
}

Reserialized XML:

<MyStruct>
    <some_field>123</some_field>
    <my_flag/>
</MyStruct>

Initial Content:

<MyStruct>
    <some_field>123</some_field>
</MyStruct>

Deserialized Struct:

MyStruct {
    some_field: 123,
    my_flag: false,
}

Reserialized XML:

<MyStruct>
    <some_field>123</some_field>
</MyStruct>

Code

use serde::{Deserialize, Serialize};

/// Used to skip adding boolean flag, such as `<some_boolean_flag/>`, when `false`
/// ```no_run
/// #[serde(skip_serializing_if = "is_false")]
/// ```
fn is_false(value: &bool) -> bool {
    !value
}

/// When a flag, such as `<some_boolean_flag/>`, is present, return `true`, else, return `false`
/// ```no_run
/// #[serde(deserialize_with = "deserialize_flag")]
/// ```
fn deserialize_flag<'de, D>(deserializer: D) -> Result<bool, D::Error>
where
    D: serde::Deserializer<'de>,
{
    let opt: Option<()> = Option::deserialize(deserializer)?;
    Ok(opt.is_some())
}

/// Will output an XML element, such as `<some_boolean_flag/>`, when `true` and nothing when `false`
/// ```no_run
/// #[serde(serialize_with = "serialize_flag")]
/// ```
/// Note: Output may look different based upon XML parser
/// - serde_xml_rs: `<some_boolean_flag />`
/// - quick_xml: `<some_boolean_flag/>`
fn serialize_flag<S>(flag: &bool, serializer: S) -> Result<S::Ok, S::Error>
where
    S: serde::Serializer,
{
    if *flag {
        serializer.serialize_unit()
    } else {
        serializer.serialize_none()
    }
}

#[derive(Debug, Serialize, Deserialize, PartialEq)]
struct MyStruct {
    some_field: i32,
    #[serde(
        default, // set to `false` when Deserialize errors because flag is absent
        deserialize_with = "deserialize_flag",
        serialize_with = "serialize_flag",
        skip_serializing_if = "is_false"
    )]
    my_flag: bool,
}

@DavidEichmann
Copy link

Hummm I find it very disturbing that roundtrip is not ensured by default. +1 for fixing this.

@DavidEichmann
Copy link

So I may be a bit naive, but to summarize my thoughts: I think it's extremely important (a fundamental use case of any serialization library) to support roundtrips of the form rust -> xml -> rust i.e. this property:

assert_eq!(
    x,
    quick_xml::de::from_str(&quick_xml::se::to_string(&x).unwrap()).unwrap()
);

Importantly, we do NOT need to support roundtrips of the form xml -> rust -> xml. This is for a number of reasons. The input xml may be arbitrary: written by another serialization library or by a human. We don't attempt to preserve whitespace, comments, or node ordering, and we reserve the right to only deserialize a subset of the input xml (e.g. by omitting certain tags/attributes).

@jonaspleyer
Copy link

jonaspleyer commented Oct 5, 2024

I agree with the round-trips rust -> xml -> rust. This is also precisely the use-case of why I started to use quick-xml but also why I stopped when I found out that this behaviour is not supported by default. The solution to annotate with serde options was also not good for me since I want to support multiple serialization formats for my users and these annotations might interfere with other working approaches.

However this is just my feedback. My current solution has been to deprecate xml for my application but I would like to introduce it again later if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

No branches or pull requests

7 participants