-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
828 MCAP specification for osi tracefiles #833
Conversation
doc/architecture/trace_file_binary_and_human_readable_formats.adoc
Outdated
Show resolved
Hide resolved
I would (rather quickly) need an example MCAP file for OSI v3.7.0 with (at best all) objects filled according to standard. |
Here a three example files: Example_OSI_MCAP.zip
I quickly tried to adapt most outcomes of the last meeting. The messages are basically empty except that varying timestamps are set. |
Signed-off-by: Timm Ruppert <timm.ruppert@persival.de>
89eaa8c
to
38249ce
Compare
Thanks! Some ground truth data from esmini with objects etc from as many types as possible would be great :)) |
I just updated esmini and it seems like there is an issue with the FMU.. Anyways, for starters here is one of our highway examples where I converted a native tracefile SensorView to an mcap file (the ground truth is therefore a submessage) I can provide a GT top level message as well but I am a bit tight on schedule for today and tomorrow. |
No, thank you. It was just important for me to have a "third set of eyes" creating a file in order to debug something. Otherwise everyone is always blaming in circles :) |
FYI: OSI TRacefile writer in openPASS |
Signed-off-by: Timm Ruppert <timm.ruppert@persival.de>
I have updated the spec based on our last discussion. In the next meeting we need to address the following things :
|
Documenting today's meeting concerning the points mentioned above:
|
Consider the following case: |
Signed-off-by: Timm Ruppert <timm.ruppert@persival.de>
List of potential optional metadata fields that mainly emerged from the Gaia-X project (focus on measurement data):
@TimmRuppert @ClemensLinnhoff @jdsika Feel free to add your opinion on which we should include. In case we keep a lot of the fields above I would list the less important metadata definitions with less normative priority:
|
2: This is also something that mcap natively supports, as far as I know. @TimmRuppert how does it handle varying frame rates? Does it take the mean? |
In any case everything needs good key-name and category-name. |
General:
Metadata:
|
I would like to suggest to add a comment to "Nested Top Level Messages" as e.g. the SensorView in SensorData like the following: When using ASAM OSI MCAP as a container for OSI traces the user is allowed to remove nested OSI Top level messages and add them as separate channels into the MCAP container" This comment shoule be added above the top level messages in the .proto files as well. What is your opinion about this? I think the nested messages were created as the collection of traces in one container was not defined at the time with .osi files? |
I think the nesting is not only for trace files, but also for the messages between FMUs directly. I think it is just simpler to send one SensorData instead of a SensorData, a SensorView and a GroundTruth. So I would not add this to the proto files but just in the MCAP trace file documentation. |
Besides the points @ClemensLinnhoff mentioned, this might result in a good portion of looking back and forth in the file in order to the retrieve the corresponding messages. Especially as the Nonetheless, I understand and totally share your motivation. Maybe something for OSI4. It would just require a lot of breaking changes. |
I don't think this is even at all related to trace files. I expect a trace file to contain the messages as they are being sent/received. No one is forcing anyone to use the nested messages, if they don't want to (I personally think they are usually a mistake in the case of SensorData->SensorView just for traceability purposes, but that's just me). But if they are used then they should be stored as is. Now if someone wants to be creative they can do all kinds of things as they like. It's not the standards job to say how to use it. So I think this needs no mention anywhere, since it is purely up to the user and use case. |
Signed-off-by: Timm Ruppert <timm.ruppert@persival.de>
Signed-off-by: ClemensLinnhoff <clemens.linnhoff@partner.bmw.de>
Signed-off-by: Timm Ruppert <timm.ruppert@persival.de>
Signed-off-by: Timm Ruppert <timm.ruppert@persival.de>
- `publish_time` field: | ||
** Must reflect the timestamp of the stored OSI top-level message | ||
** Must be in nanoseconds | ||
- `log_time` field: Must reflect the time when the message was enqueued for MCAP file addition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be removed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate further?
I see the issue. log_time
is defined twice
** `zero_time`: ISO 8601 YYYYMMDDThhmmss.f formatted point in time representing the zero time of the scenario | ||
** `timestamp`: ISO 8601 YYYYMMDDThhmmss.f formatted creation time of the file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specification of timezone is missing.
Why not use nanoseconds (unix epoch)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would personally also favor the unix epoch in ms or nanoseconds. The proposal is using ISO 8601 as it seems like that was used for trace file names etc. before.
That would also make more sense to be compliant with https://opensimulationinterface.github.io/osi-antora-generator/asamosi/latest/gen/structosi3_1_1EnvironmentalConditions.html#a636bb78627046f34208f42f586ab2086?
Lets wait an see if anyone disagrees.
- An MCAP file is considered a single dataset | ||
|
||
== Schema | ||
- `name` field: Full message type name, including package (e.g., `osi3.SensorData`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it defined somewhere what "full message type name" means exactly?
Does it make sense to specify that the channels must be named "osi3.MessageType"?
Especially, because you used "OSI3::SensorData" in line 60 (not in the same context but anyways).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it defined somewhere what "full message type name" means exactly?
I understand the protobuf documentation as if this is defined. But to be more precise I will change it to fully-qualified name of the protobuf message type
Does it make sense to specify that the channels must be named "osi3.MessageType"?
There is a mix usage of "must" and leaving it out. Should be changed.
Especially, because you used "OSI3::SensorData" in line 60 (not in the same context but anyways).
There I meant the osi3::SensorData Struct. But it makes sense to simply write "A channel containing OSI SensorData
messages" and circumvent this. I will change line 60
- Must allow other non-OSI data to be present in the MCAP file | ||
- Message records must be written into `chunk records` for indexed files | ||
- Only OSI top-level messages containing a timestamp field are permitted to be directly stored in MCAP channels | ||
- Must contain only a single scenario with a unique global time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't exactly understand what's meant by "unique global time". There could be multiple scenarios with the same time which means it can not be unique. Did you mean defined/specified global time or that all contained messages must be in the same time frame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our last meeting we agreed that an mcap must only contain one scenario (while technically it could contain multiple). We decided to ditch the technical possibility to store multiple independent scenarios to avoid extreme confusion with interesting files and usage that it not intended: One could come up with the idea to store all possible NCAP scenarios at once in a huge file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember that we talked about a common/unified time frame but not about that an MCAP file should only be allowed to contain a single scenario.
Also, I thought that the term "unique global time" is maybe a bit confusing as well. I'd rather write something like common time frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Must contain only a single scenario with a unique global time | |
- Must contain only a single scenario with a common time frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, I don't agree that an mcap file must contain only a single scenario. I think the word "scenario" could be misleading. You can put an arbitrary number of scenarios in one trace file. IMO the only relevant information here is that all the messages in the file must have a common time frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the term scenario is a bit misleading. You could have multiple scenarios one after another in one simulation. The important thing is, that all channels have the same time line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Must contain only a single scenario with a unique global time | |
- Must be limited to a single, unified sequence of events within the same time frame. |
|
||
|
||
== File-wide Metadata | ||
- Must include metadata with the name `versions` containing at least the following key-value pair: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be some convention or at least one sentence talking about the used separator character for the prefix.
I see that you used "-" as a separator in the examples. I feel like this is quite confusing (especially when reading this document) because to me it isn't exactly clear, what's part of the prefix name and what's the actual key when there are multiple "-" in the key or (e.g. "GAIA-X4PLC-AAD-hdmap-actual-key"). Something like a "." would probably make it more obvious what the actual prefix is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have any opinion about this. @jdsika had the idea to add prefixes, so lets see what he suggests.
|An optional prefix which may be used to specify the type of scenario (e.g. `cut-in`) or uniqueness of the setup (e.g. `target-5m`). May not contain any `_` characters. | ||
|
||
|opt. timestamp | ||
|Defines the absolute start time for a scenario or recording. If following the recommended zero time for the timestamps of the top-level messages, this time must represent the zero time. The format must adhere to ISO 8601 [cite:iso8601]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual timestamp format (including timezone information) should be specified. Probably the same as in OSI file naming convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my limited understanding of the ISO 8601, the .osi and .txth spec is not stating a format. I just provides the example 20210818T150542Z
and refers to ISO 8601
Something like YYYYMMDDThhmmssZ
would be a valid format with respect to ISO 8601 but YYYY-MM-DD-HH
as well right right?
I would like to add the format YYYYMMDDThhmmssZ
and the mention that it must be in UTC (not local, due to the Z) to the .osi/.txth trace file naming convention but this assumes that generalizing the example is not considered a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming convention even states that the recommended format is YYYMMDDThhmmssZ (even though with exemplary numbers). I would just add the format specification and maybe even consider decimal places. The recommended format remains the same as far as I'm concerned.
|opt. prefix | ||
|An optional prefix which may be used to specify the type of scenario (e.g. `cut-in`) or uniqueness of the setup (e.g. `target-5m`). May not contain any `_` characters. | ||
|
||
|opt. timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should state that the timestamp can only exceptionally be omitted when there really is no reference to a global time in the file?
I think if you have a real-world capture or any other trace file that has any meaningful relation to a global time frame, it should be visible in the filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion the meaningful relation to a global time is the exception here, since simulation should be the "normal" use-case rather than measurements.
Signed-off-by: TimmRuppert <158559152+TimmRuppert@users.noreply.github.com>
Signed-off-by: TimmRuppert <158559152+TimmRuppert@users.noreply.github.com>
Signed-off-by: TimmRuppert <158559152+TimmRuppert@users.noreply.github.com>
- Must include metadata with the name `asam_osi` containing at least the following key-value pairs: | ||
** `zero_time`: ISO 8601 YYYYMMDDThhmmss.fTZD formatted point in time representing the zero time of the scenario | ||
** `timestamp`: ISO 8601 YYYYMMDDThhmmss.fTZD formatted creation time of the file | ||
- It is strongly recommended to include metadata with the name `asam_osi` containing the following key-value pairs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be two "asam_osi" metadata records (see two lines above) or should all the metadata fields be in one metadata record?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I don't know if it is technically possible to have two metadata records with the same name (I assume it might be), it would make more sense to have all of that in one record. Considering your proposal to always speak of "a metadata record" in this context (see other comment) it should be clearer once this has been added.
Binary trace file. | ||
Messages are separated by a length specification before each message. | ||
The length is represented by a four-byte, little-endian, unsigned integer. | ||
The length does not include the integer itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The information of this line got lost.
I have taken the liberty to morph the current state into something that tries to be a more precise and normative specification while trying to be more minimalist in what it touches, and leaves other topics (like changes in the naming convention, the specific mapping of non-OSI meta-data into meta-data records) for either separate PRs or other layered specifications: #841. It also tries to make the spec more robust, by specifying more explicitly how to use MCAP elements (e.g. placement of records, chunking, ...). The other major change is that it now recommends making People who want to replay while keeping jitter of their middleware (due to asynchronous communication) intact can still do so, but more sane use cases that abstract away middleware jitter (or are synchronous in nature) can still reap the benefits of the MCAP format index machinery (one might suggest to MCAP that they might like to add indexing on published_time to enable both use cases at the same time, but that's a different story). |
Reference to a related issue in the repository
Closes #828
Add a description
Some questions to ask:
What is this change?
Is this a bug fix or a feature? Does it break any existing functionality or force me to update to a new version?
How has it been tested?
Take this checklist as orientation for yourself, if this PR is ready for the Change Control Board:
If you can’t check all of them, please explain why.
If all boxes are checked or commented and you have achieved at least one positive review, you can assign the label ReadyForCCBReview!