Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from YAML to JSON for build pipeline #4153

Closed
mbollmann opened this issue Dec 14, 2024 · 5 comments · Fixed by #4230
Closed

Switch from YAML to JSON for build pipeline #4153

mbollmann opened this issue Dec 14, 2024 · 5 comments · Fixed by #4230
Assignees
Milestone

Comments

@mbollmann
Copy link
Member

mbollmann commented Dec 14, 2024

YAML serialization (even with CDumper) is significantly slower than JSON serialization with msgspec in my testing (by a factor of at least 20); since Hugo also supports JSON for data files, we should probably switch the build pipeline to write JSON files instead.

@mbollmann mbollmann changed the title YAML serialization (even with CDumper) is significantly slower than JSON serialization with msgspec in my testing (by a factor of at least 20); since Hugo also supports JSON for data files, we should probably switch the build pipeline to write JSON files instead. Switch from YAML to JSON for build pipeline Dec 14, 2024
@mbollmann mbollmann self-assigned this Dec 14, 2024
@mjpost
Copy link
Member

mjpost commented Dec 16, 2024

Interesting; why would this be the case? I have noticed that YAML support is often secondary whereas JSON is often native, and the YAML parsing library is more difficult to use, but I wouldn't have thought there would be speed orders of this magnitude.

@mjpost mjpost added this to the 2025Q1 milestone Dec 16, 2024
@mjpost
Copy link
Member

mjpost commented Dec 16, 2024

I added this to a milestone just so that it doesn't disappear into our massive issues list...

@mbollmann
Copy link
Member Author

Interesting; why would this be the case? I have noticed that YAML support is often secondary whereas JSON is often native, and the YAML parsing library is more difficult to use, but I wouldn't have thought there would be speed orders of this magnitude.

I'm not sure about the reasons; I think the YAML format has more features at least, but maybe also more ambiguity?
In any case, this is consistent with benchmark results I could find on a quick search; for example:

@mjpost
Copy link
Member

mjpost commented Dec 16, 2024

Interesting. I suppose JSON parsing has benefited from of ton of focus it's received, being at the heart of web browser data representations. Still, I would never have guessed this. But this is good news, in part because on #4146, I implemented a JSON representation for the issue and was going to switch to YAML, which requires loading an extra library and making a few code changes. I'll stick to JSON.

It would be fantastic to have the build speed brought down.

@akoehn
Copy link
Member

akoehn commented Dec 21, 2024

I'm not sure about the reasons; I think the YAML format has more features at least, but maybe also more ambiguity?

YAML depends on intendation, so whitespaces have to be counted to create the correct structure. json does not rely on that and therefore probably does not need as much context. Additionally, json is simply everywhere, especially in server to server communication, whereas I have never seen huge yaml structures in production or as a data structure for data transfer.

There is a nice video going into detail what all goes into fast json parsing (a few years old already)

Switching to json sounds like a no-brainer to me, as human readability is not that important in our use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants