Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

828 MCAP specification for osi tracefiles #833

Closed
wants to merge 13 commits into from
29 changes: 0 additions & 29 deletions doc/architecture/formatting_script.adoc

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,16 @@ ifndef::include-only-once[]
:root-path: ../
include::{root-path}_config.adoc[]
endif::[]
= OSI trace file naming conventions
= Native binary and Human-readable Formats

**Name format**
== Binary .osi Format
Messages are separated by a four-byte, little-endian, unsigned integer specifying the length of each message.

The names of OSI trace files should have the following format:
== Human-readable .txth Format
Messages are stored as plain text, separated by newlines.

== Naming Convention
TimmRuppert marked this conversation as resolved.
Show resolved Hide resolved
Binary .osi and human-readable .txth files should follow this naming convention:

----
<timestamp>_<type>_<osi-version>_<protobuf-version>_<number-of-frames>_<custom-trace-name>.osi
Expand Down Expand Up @@ -76,3 +81,5 @@ The recommended file name is:
----
20210818T150542Z_sv_312_300_1523_highway.osi
----

NOTE: This naming convention does not apply to .mcap files, they must follow the naming convention described in their section.
78 changes: 0 additions & 78 deletions doc/architecture/trace_file_example.adoc

This file was deleted.

121 changes: 121 additions & 0 deletions doc/architecture/trace_file_mcap_format.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
ifndef::include-only-once[]
:root-path: ../
include::{root-path}_config.adoc[]
endif::[]
= MCAP Format

== General Requirements
- Must comply with the https://mcap.dev/spec[MCAP format specification] version `0x30`
- Must allow other non-OSI data to be present in the MCAP file
- Message records must be written into `chunk records` for indexed files
- Only OSI top-level messages containing a timestamp field are permitted to be directly stored in MCAP channels
- Must contain only a single scenario with a unique global time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't exactly understand what's meant by "unique global time". There could be multiple scenarios with the same time which means it can not be unique. Did you mean defined/specified global time or that all contained messages must be in the same time frame?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our last meeting we agreed that an mcap must only contain one scenario (while technically it could contain multiple). We decided to ditch the technical possibility to store multiple independent scenarios to avoid extreme confusion with interesting files and usage that it not intended: One could come up with the idea to store all possible NCAP scenarios at once in a huge file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that we talked about a common/unified time frame but not about that an MCAP file should only be allowed to contain a single scenario.

Also, I thought that the term "unique global time" is maybe a bit confusing as well. I'd rather write something like common time frame.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Must contain only a single scenario with a unique global time
- Must contain only a single scenario with a common time frame

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, I don't agree that an mcap file must contain only a single scenario. I think the word "scenario" could be misleading. You can put an arbitrary number of scenarios in one trace file. IMO the only relevant information here is that all the messages in the file must have a common time frame.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the term scenario is a bit misleading. You could have multiple scenarios one after another in one simulation. The important thing is, that all channels have the same time line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Must contain only a single scenario with a unique global time
- Must be limited to a single, unified sequence of events within the same time frame.

- An MCAP file is considered a single dataset

TimmRuppert marked this conversation as resolved.
Show resolved Hide resolved
== Schema
- `name` field: Full message type name, including package (e.g., `osi3.SensorData`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it defined somewhere what "full message type name" means exactly?
Does it make sense to specify that the channels must be named "osi3.MessageType"?
Especially, because you used "OSI3::SensorData" in line 60 (not in the same context but anyways).

Copy link
Author

@TimmRuppert TimmRuppert Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it defined somewhere what "full message type name" means exactly?

I understand the protobuf documentation as if this is defined. But to be more precise I will change it to fully-qualified name of the protobuf message type

Does it make sense to specify that the channels must be named "osi3.MessageType"?

There is a mix usage of "must" and leaving it out. Should be changed.

Especially, because you used "OSI3::SensorData" in line 60 (not in the same context but anyways).

There I meant the osi3::SensorData Struct. But it makes sense to simply write "A channel containing OSI SensorData messages" and circumvent this. I will change line 60

- `encoding` field: Must be `protobuf`
- `data` field: String-encoded `google::protobuf::FileDescriptorSet` for the OSI top-level message

== Channel
- `message_encoding` field: Must be `protobuf``
- `metadata` field:
** Must include an `osi_version` key, specifying the OSI SemVer version of the OSI top-level message contained within the channel
** Must include a `protobuf` key, specifying the protobuf SemVer version used to create the OSI top-level message contained within the channel
** Should include a `description` key, explaining the data's origin and purpose in natural language.


== Message
- `publish_time` field:
** Must reflect the timestamp of the stored OSI top-level message
** Must be in nanoseconds
- `log_time` field: Must reflect the time when the message was enqueued for MCAP file addition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be removed here?

Copy link
Author

@TimmRuppert TimmRuppert Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate further?

I see the issue. log_time is defined twice

** Must reflect the timestamp of the stored OSI top-level message
** Must be in nanoseconds


== File-wide Metadata
- Must include metadata with the name `versions` containing at least the following key-value pair:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be some convention or at least one sentence talking about the used separator character for the prefix.

I see that you used "-" as a separator in the examples. I feel like this is quite confusing (especially when reading this document) because to me it isn't exactly clear, what's part of the prefix name and what's the actual key when there are multiple "-" in the key or (e.g. "GAIA-X4PLC-AAD-hdmap-actual-key"). Something like a "." would probably make it more obvious what the actual prefix is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have any opinion about this. @jdsika had the idea to add prefixes, so lets see what he suggests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose using unambiguous words for mcap-related stuff: Use "metadata record" instead of "category" or "metadata" or "file metadata" when referring to a metadata record; use "metadata record name" instead of "metadata name" or "category name"; use "channel metadata" for channel metadata.

** `osi`: SemVer version of the minimum required OSI version
- Must include metadata with the name `asam_osi` containing at least the following key-value pairs:
** `zero_time`: ISO 8601 YYYYMMDDThhmmss.f formatted point in time representing the zero time of the scenario
** `timestamp`: ISO 8601 YYYYMMDDThhmmss.f formatted creation time of the file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specification of timezone is missing.

Why not use nanoseconds (unix epoch)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally also favor the unix epoch in ms or nanoseconds. The proposal is using ISO 8601 as it seems like that was used for trace file names etc. before.

That would also make more sense to be compliant with https://opensimulationinterface.github.io/osi-antora-generator/asamosi/latest/gen/structosi3_1_1EnvironmentalConditions.html#a636bb78627046f34208f42f586ab2086?

Lets wait an see if anyone disagrees.

- It is strongly recommended to include metadata with the name `asam_osi` containing the following key-value pairs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be two "asam_osi" metadata records (see two lines above) or should all the metadata fields be in one metadata record?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't know if it is technically possible to have two metadata records with the same name (I assume it might be), it would make more sense to have all of that in one record. Considering your proposal to always speak of "a metadata record" in this context (see other comment) it should be clearer once this has been added.

** `description`: Short human-readable scenario description
** `creator`: csv of person or company (not tool) creating the file
TimmRuppert marked this conversation as resolved.
Show resolved Hide resolved
** `license`` csv of spdx identifiers
TimmRuppert marked this conversation as resolved.
Show resolved Hide resolved
** `data_sources` csv of model, scenario player, etc.
- Additional custom metadata may be added, but it is recommended to add a category with the name `context` where the key represents a prefix and the value pointing to the specification of the metadata. This allows to add other (channel-wise) metadata with the stated prefix. Thus, it becomes clear what a metadata is about and where it is specified. The following examples are given:
** GAIA-X4PLC-AAD SHACL Shape
*** Assume you want to embed the hdmap of a scenario in the MCAP file.
*** The `context` category contains the key `GAIA-X4PLC-AAD-hdmap` with the value `https://github.com/GAIA-X4PLC-AAD/ontology-management-base/blob/main/hdmap/hdmap_shacl.ttl`
*** A channel metadata contains the key `GAIA-X4PLC-AAD-hdmap` with the value of the hdmap data in the given SHACL shape.
** openDrive Reference
*** Assume you want to express that oncoming traffic passes on the right side of the road.
*** The `context` category contains the key `openDrive` with the value `https://publications.pages.asam.net/standards/ASAM_OpenDRIVE/ASAM_OpenDRIVE_Specification/1.8.1/specification/index.html`
*** A file metadata in a new metadata category with the arbitrary name `specification` contains the key `openDrive-road-rule` with the value `RHT`
** Cycle time variation of a sensor
*** Assume you want to express the interface cycle time variation of a sensor.
*** The `context` category contains the key `iso_23150` with the value` ISO 23150:2011`
*** A channel containing `OSI3::SensorData` messages has metadata with the key `iso_23150-cycle-time-variation:` and the value `80`

== Compression
- OSI-compliant tooling must support compression types: `none`, `lz4`, and `zstd`


== Naming Convention
.mcap files must follow this naming convention:


----
<opt. prefix>_<opt. timestamp>_<type>_<opt. suffix>.mcap
----

When not using an optional field, the corresponding underscore delimiter must be omitted so that no double underscore is present.

[#tab-MCAP-file-naming-convention]
.MCAP file naming convention
[cols="1,1"]
|===
|Field |Explanation

|opt. prefix
|An optional prefix which may be used to specify the type of scenario (e.g. `cut-in`) or uniqueness of the setup (e.g. `target-5m`). May not contain any `_` characters.

|opt. timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should state that the timestamp can only exceptionally be omitted when there really is no reference to a global time in the file?

I think if you have a real-world capture or any other trace file that has any meaningful relation to a global time frame, it should be visible in the filename.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion the meaningful relation to a global time is the exception here, since simulation should be the "normal" use-case rather than measurements.

|Defines the absolute start time for a scenario or recording. If following the recommended zero time for the timestamps of the top-level messages, this time must represent the zero time. The format must adhere to ISO 8601 [cite:iso8601].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual timestamp format (including timezone information) should be specified. Probably the same as in OSI file naming convention.

Copy link
Author

@TimmRuppert TimmRuppert Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my limited understanding of the ISO 8601, the .osi and .txth spec is not stating a format. I just provides the example 20210818T150542Z and refers to ISO 8601

Something like YYYYMMDDThhmmssZ would be a valid format with respect to ISO 8601 but YYYY-MM-DD-HH as well right right?

I would like to add the format YYYYMMDDThhmmssZ and the mention that it must be in UTC (not local, due to the Z) to the .osi/.txth trace file naming convention but this assumes that generalizing the example is not considered a breaking change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming convention even states that the recommended format is YYYMMDDThhmmssZ (even though with exemplary numbers). I would just add the format specification and maybe even consider decimal places. The recommended format remains the same as far as I'm concerned.



|type
| Specifies the type of the contained the top-level message(s) and must be one of the following values:

`sv` file contains only `SensorView` messages. +
`gt` file contains only `GroundTruth` messages. +
`hvd` file contains only `HostVehicleData` messages. +
`sd` file contains only SensorData` messages. +
`tc` file contains only `TrafficCommand` messages. +
`tcu` file contains only `TrafficCommandUpdate` messages. +
`tu` file contains only `TrafficUpdate` messages. +
`mr` file contains only `MotionRequest` messages. +
`su` file contains only `StreamingUpdate` messages. +
`multi` file contains multiple, different types of of top-level messages (not including different channels of the same type).

|opt. suffix
|An optional suffix which may be used the same way as the optional prefix or be used to specify further details like the minimum required OSI version. May not contain any `_` characters.


|===


**Example**s

The following list shows examples of valid OSI MCAP file names:

- `20210818T150542Z_highway_sv.mcap`
TimmRuppert marked this conversation as resolved.
Show resolved Hide resolved
- `20210818T150542Z_highway_sv_run-1.mcap`
- `20210818T150542Z_highway_gt_OSI-3-7.mcap`
- `Highway_sd_version-1.mcap`
- `Highway-cut-in-no-collision_sd.mcap`
- `Target-5m_sd_resimulated-measurement.mcap`

NOTE: This naming convention does not apply to .osi and .txt files, they should follow the naming convention described in their section.
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,25 @@ ifndef::include-only-once[]
include::{root-path}_config.adoc[]
endif::[]
[#top-osi_trace_file_formats]
= OSI trace file formats
= Overview file formats

There are two formats for storing multiple serialized OSI messages in one trace file.
There are three formats for storing OSI messages in trace files:

*.osi::
Binary trace file.
Messages are separated by a length specification before each message.
The length is represented by a four-byte, little-endian, unsigned integer.
The length does not include the integer itself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The information of this line got lost.

Native binary trace file.

*.txth::
Human-readable plain-text trace file.
Messages are separated by newlines.

*.mcap::
Binary trace file supporting more advanced features like indexed data, additional metadata and more.

NOTE: Previous releases of OSI also supported a so-called plain-text trace file format, with file extension `.txt`.
This legacy format did not contain plain-text, but rather binary protobuf messages separated by a special separator.
For obvious reasons the format was deprecated and fully replaced with the `.osi` binary file format.
This release no longer contains any support for the legacy `.txt` file format.
These files may be used for manual checks.


TIP: For efficient handling of trace files, you can utilize the specification-compliant tools and libraries provided in the companion https://github.com/OpenSimulationInterface/osi-utilities[osi-utilities] repository. For example, convenient writer and reader classes are provided handling OSI messages for the different file formats.

14 changes: 3 additions & 11 deletions doc/open-simulation-interface_user_guide.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -57,19 +57,11 @@ include::./architecture/packaging_layer.adoc[leveloffset=+3]

=== OSI trace files

include::./architecture/trace_file_formats.adoc[leveloffset=+3]
include::./architecture/trace_file_overview_file_formats.adoc[leveloffset=+3]

include::./architecture/trace_file_naming.adoc[leveloffset=+3]
include::./architecture/trace_file_binary_and_human_readable_formats.adoc[leveloffset=+3]

// === Files and scripts

// include::./architecture/proto-files.adoc[leveloffset=+3]

// include::./architecture/test_scripts.adoc[leveloffset=+3]

include::./architecture/trace_file_example.adoc[leveloffset=+3]

include::./architecture/formatting_script.adoc[leveloffset=+3]
include::./architecture/trace_file_mcap_format.adoc[leveloffset=+3]


// Setting up OSI
Expand Down