Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed Schema Changes #156

Open
danielballan opened this issue Mar 4, 2020 · 4 comments
Open

Proposed Schema Changes #156

danielballan opened this issue Mar 4, 2020 · 4 comments

Comments

@danielballan
Copy link
Member

danielballan commented Mar 4, 2020

This is a long-term proposal, unrelated to databroker 1.0 or any of the upcoming release.

The following changes have been previously proposed and discussed at various times. Many are mutually un-coupled and could be considered separately. At some point we should decide which ones we want to do and execute them all in one step, tagging event-model 2.0.0.

Datum

  1. Add a time key.
  2. Add an index key with a unique monotonically increasing integer.
  3. If (2) is accepted, remove datum_id which would no longer be needed because (resource_uid, index) would be a unique key. Event documents would still refer to a Datum via a construction like the current datum_id, i.e. a string like {resource_uid}/{index}
  4. Remove some generality is favor of simplicity and efficiency: assume Datums are 1D slices. (All the ones we current have are, and it's hard to imagine a case that wouldn't be. Even if the external asset is "paragraphs in a Word document", you can slice on that.) Drop datum_kwargs and replace them with slice fields: start, stop, and step. All Datum documents would now have the same fields, and handlers could be simplified.

Change (4) might justify creating a new document (called "Partition", in view of its role as a 1-D slice?) and deprecating Datum rather than making major breaking modifications to Datum.

Resource

  1. Add version, referring the version of the spec, with an associated schema maintained with the handler. This will get a lot of use if (4) is accepted because all the handlers will be simplified.

Event

  1. Similar to (2), add an index key with a unique monotonically increasing integer.
  2. Similar to (3), if (6) is accepted, remove uid which would no longer be needed because (descriptor, index) would be a unique key.
@ambarb
Copy link

ambarb commented Mar 11, 2020

Is the Datum time key a timestamp per point (read)? Or is it just the recorded time of a uid?

If per point (or eventually maybe even a list/array of timestamps corresponding to a stack of images for a single point), then it would be good to have another key to describe how the time stamp was generated (default, cpu time recorded into the database). However, timestamps can be generated by the firmware or IOC of the device or from a TTL pulse that doesn't come from the same IOC or something else. For clarity and future proofing, it might be good to include timestamp origin/provenance key time.orgin

@danielballan
Copy link
Member Author

The proposed Datum time key is just when the document was generated (i.e. "the recorded time of a uid"), useful as internal metadata for grabbing batches of documents efficiently from the database or "replaying" document streams of old data with realistic timing.

@tacaswell
Copy link
Contributor

I 50/50 on adding the simpler spelling, but not convinced that we want to deprecate datum as-is.

@danielballan
Copy link
Member Author

I have come to the same feeling, reflecting on this in the weeks since I last updated the description. We might as well keep the flexibility of Datum around and simply add Partition as simpler, more locked-down, spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants