-
-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use YAML for saving and loading 1D flame simulations #1112
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1112 +/- ##
==========================================
+ Coverage 73.45% 73.49% +0.04%
==========================================
Files 365 365
Lines 47912 48187 +275
==========================================
+ Hits 35194 35417 +223
- Misses 12718 12770 +52
Continue to review full report at Codecov.
|
@speth ... thanks for taking this on. Without looking at details, I believe this is a very workable approach to moving away from XML. Do you anticipate creating an equivalent to |
While I'd love to have the ability to write HDF5 output files directly, my previous experience with using the HDF5 C++ library was that it was a beastly dependency to add to a project, especially when trying to build on Windows where there isn't (or at least wasn't) an easy way to install the library and headers. That was the main driver for choosing to base this on the already-available YAML input/output capabilities, even if HDF5 is better suited for this mostly numeric array-centric data. If you want to start an enhancement discussion to talk about a C++ equivalent to the Python |
Not sure that this is on my own short-term horizon, but I believe it may be a good new feature for 3.0. It's a substantial project. |
@speth ... I think it still makes sense to start the discussion now (Cantera/enhancements#119). To me, the ability to export to and import from YAML would be parallel to existing CSV and HDF5 (with the latter only supported for the Python API). In other words, it would be neat if we could import the YAML format from the Python API also (which presumably would be a simple pass-through to C++). You may already have that in the works ... |
Leaving a comment about HDF vs YAML before looking at details. The output produced by The same case output generated by
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@speth ... thank you for taking on the replacement of XML export. While this is not meant to be a full review, I commented on a couple of minor things I noticed. Beyond, there are a couple of Python examples left where XML is not replaced (and probably should be).
Regarding the PR itself, I don't have any concerns about the implementation per se, but I think it makes sense to discuss the long-term fate of storage structures for YAML. Cantera 2.5.1 introduced HDF output for Python, which is largely centered on SolutionArray
, and extends this philosophy to various 1D objects (the way it was conceived centers on the user-facing Python API). The YAML structure here is - as far as I am aware - mostly a translation of the legacy XML structure that focuses on the implementation, and not the user interface.
As mentioned earlier, I hope to see both SolutionArray
and (hopefully) HDF support pushed into the C++ core eventually (perhaps for Cantera 3.0?). That way, we can offer users a more convenient 'save' operation for reactor simulations, and have full array support in all languages. Specifically for 1D simulations, I believe more often than not users are interested in post-processing of results when loading an old solution. From that viewpoint, a direct access as SolutionArray
would be convenient. So designing the YAML structure with SolutionArray
import/export in mind would make some sense.
While I don't think that we necessarily have to have exact equivalence, here is what I noticed when comparing the two structures:
- the HDF format is a lot flatter, and uses some domain names directly rather than storing them as the
id
field. I believe that it would make sense to do the same for YAML: if the YAML file were to be imported directly using a generic YAML parser, you can access what belongs to an object by its name, rather than pick from a list after checking theid
field. - I do not think that the
domains
level is necessary. soret_enabled
,radiation_enabled
andemissivity_left/right
appear to be missing from YAML- The location of some parameters is consistent with C++ for YAML, whereas the HDF implementation follows the Python interface. I do not have an opinion here as you can argue both ways (C++ implementation vs user interface, and a revision of HDF could be done)
- Storing by species name makes sense for YAML as it is a human-readable format (HDF uses
X
for efficiency, and isn't meant to be read) - One thing I tried to ensure for HDF is that the heritage of the
Solution
is stored (seephase
) entry; the rationale here was that a solution can only be restored if you know what mechanism was used in the first place.
PS: out of curiosity ... what are the long-term plans to support import of xml
results after Cantera 2.6? (Drop support or use converter script?)
Good catch -- I had this on my to-do list for this PR, but it completely slipped my mind.
I agree that taking some cues from the Python HDF5 format for 1D flames is a good thought, though I think it's probably fine for these two formats to have some structural differences. I've updated the location of the domains so that they are stored in the top-level mapping, using the "id" as the key.
Eliminated, per above.
Good catch. These were never included in the XML format, so I missed them while adapting the existing serialization code.
Can you be more specific? I think the grouping of the grid-related properties ( One other difference is that YAML output puts the grid refinement criteria and the
Done.
The minimalist option is to say that Cantera 2.6 can be used convert existing XML solutions to YAML, as it has support for both. That will be viable as long as installing Cantera 2.6 in a conda environment works, which I think ought to be a fairly long time. Given that, I was leaning toward not creating a standalone converter script. |
Attaching the updated YAML output (generated by |
@speth ... thank you for taking the time to consider my observations and also to implement most of them. I really like the approach of letting You are correct that locating refinement criteria with the flow domain makes sense from the C++ perspective, even if the Python API currently goes a different route. We could address this with a revision of the HDF standard if/when we move it to the C++ core; same with the grouping of tolerances and refinement criteria (otherwise it's non-essential). Regarding tolerances, I have a slight preference for a flatter structure: it's never clear to me whether to group by absolute/relative tolerance or transient/steady solver, so I'd prefer to see something that is close to the Python/HDF API, i.e. tolerances:
steady_abstol: 1.0e-09
steady_reltol: 1.0e-04
transient_abstol: 9.999999999999999e-12 # <-- I know this formatting issue is a current limitation
transient_reltol: 1.0e-04 Using uppercase Is there a way to write all settings before the numeric blocks? Currently, tolerances are written before, whereas refinement criteria are written after. Finally, one observation is that currently some number blocks don't use horizontal space efficiently; this may be addressed by changing: velocity: [0.7112641405849059, 0.7112641405822596, 0.7112641404451637,
0.7112641377426332, 0.7112641104415781, 0.7112637339406135,
0.7112617435453226, 0.7112560064702751, 0.7112322153842345,
...
4.055246182368149, 4.055246182377726] to velocity: [
0.7112641405849059, 0.7112641405822596, 0.7112641404451637, 0.7112641377426332,
0.7112641104415781, 0.7112637339406135, 0.7112617435453226, 0.7112560064702751,
0.7112322153842345, 0.7111349509095231, 0.7109781680863129, 0.7106086071388092,
...
4.055246182368149, 4.055246182377726] (which in this case still fits 88 characters; I'd even consider going wider for this type of file). PS: this is what 102 characters would allow for (current output only has 2 columns) OH: [
-2.014801730705695e-17, -4.629141939774994e-19, -4.813801146169683e-20, 1.771106802152406e-20,
2.545810916390064e-20, 1.623461193735384e-18, 1.841329907298661e-17, 1.31188752889368e-16,
...
4.556290451629765e-04, 4.556290451629765e-04] PPS: the HDF writer suppresses |
Done, though I used names like
For the YAML input file format, the convention is to capitalize proper nouns, but little less, so I'd prefer to leave this as
Done.
You're right, getting only two values per line was a little silly, so I increased the nominal line length to 88 characters globally, though this is really just an estimate because the YAML emitter object doesn't provide enough information to know how many characters in you already are when you start serializing a particular sub-object. It also doesn't indent wrapped lines the way you or I might want, so the best we can do without a lot of extra effort is something like: H2O: [9.397257458934713e-10, 4.507432204414695e-08, 2.0583678723563e-06,
7.962288661066346e-05, 8.587434349611319e-04, 7.741273165985068e-03,
0.0196275630231541, 0.03800532491016567, 0.05808037087942033,
0.07665600042698147]
Resolved (and likewise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this implementation looks good to me. Using the AnyMap
emitters makes a lot of sense.
While I don't think that addressing some of the remaining YAML formatting issues needs to be taken on at this moment (these weren't introduced here after all), I hope to see them addressed before the release of 2.6 (I opened issue #1128).
If a single value applies for all components, just store the scalar value. If different values are needed for different components, store them as a map with the component names, so the tolerances can be restored even if the number of components changes.
This makes the saved YAML structure more similar to the HDF5 structure used by the Python module.
Rename 'timestamp' to 'date', add 'git-commit', and move all metadata fields ahead of the simulation state.
The new setting of 88 is sufficient to get three full-precision doubles on a single line, even when accounting for the space taken for the key in the first line (usually).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@speth ... thank you for addressing the last comments!
Changes proposed in this pull request
In order to move ahead with the deprecation and removal of the XML format, we need a replacement for saving and loading 1D simulation results for all language interfaces. While the HDF5-based format available in the Python module is clearly the best choice for that interface, using YAML is the simplest option for the other interfaces.
u
andv
(replaced byvelocity
andspread_rate
in Cantera 2.5.0).If applicable, provide an example illustrating new features this pull request is introducing
Checklist
scons build
&scons test
) and unit tests address code coverage