Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Bang Since 0.23.1 #590

Closed
JojiiOfficial opened this issue Apr 11, 2023 · 3 comments · Fixed by #802
Closed

Unexpected Bang Since 0.23.1 #590

JojiiOfficial opened this issue Apr 11, 2023 · 3 comments · Fixed by #802

Comments

@JojiiOfficial
Copy link

JojiiOfficial commented Apr 11, 2023

The following file can't be parsed since 0.23.1 as it throws an UnexpectedBang(69) when parsing with the given example code. Simply ignoring the error causes the given example code not to parse anything at all. Version 0.22.0 worked fine
JMdict.zip
This is just a trimmed file of the JMdict file to demonstrate the issue. You can find the original file here.

Working v0.22.0 Code:

use quick_xml::{events::Event, Reader};
use std::{fs::File, io::BufReader};

fn main() {
    let mut reader = Reader::from_reader(BufReader::new(File::open("./JMdict_short").unwrap()));
    reader.trim_text(true);
    let mut buf = vec![];

    loop {
        match reader.read_event(&mut buf) {
            Ok(Event::Start(ref e)) => {
                if let b"entry" = e.name() {
                    println!("{}", std::str::from_utf8(e.name()).unwrap());
                }
            }

            Ok(Event::Eof) => break,

            Err(e) => {
                panic!("{e:?}");
            }

            _ => (),
        }
    }
}

Broken v0.28.1 Code (same code, just migrated):

use quick_xml::{events::Event, Reader};
use std::{fs::File, io::BufReader};

fn main() {
    let mut reader = Reader::from_reader(BufReader::new(File::open("./JMdict_short").unwrap()));
    reader.trim_text(true);
    let mut buf = vec![];

    loop {
        match reader.read_event_into(&mut buf) {
            Ok(Event::Start(ref e)) => {
                if let b"entry" = e.name().0 {
                    println!("{}", std::str::from_utf8(e.name().0).unwrap());
                }
            }

            Ok(Event::Eof) => break,

            Err(e) => {
                panic!("{e:?}");
            }

            _ => (),
        }
    }
}
@Mingun
Copy link
Collaborator

Mingun commented Apr 11, 2023

It seems that this bug is happened when <!DOCTYPE > content contains at least two inner <!x> elements and BufReader buffer too small to hold them both. The minimal reproduction case that I've found is

/// Regression test for https://github.com/tafia/quick-xml/issues/590
#[test]
fn issue590() {
    let mut reader = Reader::from_reader(BufReader::with_capacity(
        16,
        &b"<!DOCTYPE t [<!1><!2>]>"[..],
        // 0      7       ^15    ^23
        //[                ] = capacity
    ));
    let mut buf = vec![];
    loop {
        match reader.read_event_into(&mut buf).unwrap() {
            Event::Eof => break,
            _ => (),
        }
    }
}

@JojiiOfficial
Copy link
Author

Interesting though that this is dependent on the buffer which should be transparent to the xml reader. But indeed increasing the buffer size fixes the issue.

@Mingun
Copy link
Collaborator

Mingun commented Sep 19, 2024

Duplicate of #533

@Mingun Mingun marked this as a duplicate of #533 Sep 19, 2024
@Mingun Mingun closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants