Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group yaml files by root namespace instead of signal #1345

Merged
merged 14 commits into from
Sep 17, 2024

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Aug 18, 2024

Proposal to change code organization for yaml files.

Now:

├── model
│   ├── traces
│   │   ├── database.yaml
│   ├── metrics
│   │   ├── database.yaml
│   ├── registry
│   │   ├── database.yaml
│   │   ├── server.yaml
│   ├── resources
│   │   ├── database.yaml # if ever necessary
├─ database-common.yaml

proposal

├── model
│   ├── database
│   │   ├── database-common.yaml
│   │   ├── database-events.yaml       # if ever necessary
│   │   ├── database-metrics.yaml
│   │   ├── database-registry.yaml
│   │   ├── database-resources.yaml    # if ever necessary
│   │   ├── database-spans.yaml
│   ├── server
│   │   ├── server-registry.yaml

Pros:

  • easier to navigate (e.g. open file by name), easier to update semconv when changing multiple signals
  • matches MD structure

TODO:

  • update CODEOWNERS

Merge requirement checklist

@MSNev
Copy link
Contributor

MSNev commented Aug 19, 2024

This seems to be somewhat related to this discussion (which is limited to events) #925

@AlexanderWert
Copy link
Member

+1 on the above proposal

-1 on the extended idea from the SIG meeting today on also including registry files into the domain directories. IMHO, that would cause confusion again about mixing definition with usage (i.e. referencing) of attributes.

@lmolkova lmolkova force-pushed the refactor-yaml-code-organization branch 3 times, most recently from 19ed86e to 6e4fb0a Compare August 30, 2024 23:11
@lmolkova
Copy link
Contributor Author

lmolkova commented Aug 30, 2024

-1 on the extended idea from the SIG meeting today on also including registry files into the domain directories. IMHO, that would cause confusion again about mixing definition with usage (i.e. referencing) of attributes.

@AlexanderWert, I'd like to understand the concern better.
I feel having registry yamls in the corresponding directory is still not strong enough and we should make it more formal - #1269.

I don't see how this PR changes the status quo wrt registry:

  1. we don't change markdown structure. It clearly documents what's in the registry and what's not
  2. we have checks that ensure how attributes are defined (we name corresponding group as registry.*, it must be an attribute group). We don't have any checks for the location of the registry files.
  3. If/once we have registry for other signals (metrics and events are good candidates - Should we have metrics registry at some point? #592), we'd have a problem with having everything in one folder
  4. In my mind yaml is still an implementation detail. We could move to XML and it won't change anything for the consumers of semantic conventions, so the location of the file is even smaller impl details.

Reasons why I want to have registry attributes in the domain folder:

  • having all yamls for given domain in one folder makes it MUCH easier to work on semantic conventions.

WDYT?

@lmolkova lmolkova force-pushed the refactor-yaml-code-organization branch from cf8e687 to 1cf7120 Compare August 31, 2024 03:38
@AlexanderWert
Copy link
Member

AlexanderWert commented Sep 2, 2024

@AlexanderWert, I'd like to understand the concern better.
I feel having registry yamls in the corresponding directory is still not strong enough and we should make it more formal - #1269.

I think the main question is: Is domain the same as namespace? I would say: NO!
And I agree with you that it doesn't matter much for the end-users / consumers of the SemConv as docs do not change. BUT, for the contributors (especially the ones not contributing on a regular basis to SemConv) it can become very confusing if we start mixing attribute definition (by namespace) and attribute usage (by domain), again. That's basically what we had before we introduced the registry.

Let me try to illustrate that by a concrete example:
Let's take the domain HTTP. We would create a new directory ./model/http which would contain the following current files:

  • ./model/http-common.yaml
  • ./model/trace/http.yaml
  • ./model/metrics/http.yaml

So far so good. And like that proposal, so far.
But now, we are proposing that we would also add the attribute definition files in there.
The HTTP domain covers attributes from many different namespaces:

  • http.*
  • network.*
  • url.*
  • user_agent.*
  • server.*
  • error.*

The proposal of including the registry files into the domains raises following confusion and inconsistencies:

  • So, if the http.yaml file from the registry would live under ./model/http/, where would all the definitions for the other attributes (network.*, ...) live?
  • Obviously, definitions of netwrok.*, url.*, etc. attributes should not live in ./model/http/! They are just reused here.
  • Would that mean that the registry files for url.*, network.*, etc. are defined in a domain url, network, etc. ? --> so we would have as many domains as we have namespaces today?
  • If yes to the above: Is url really a domain (I would say: NO, it's just a namespace). Domains should be of a broader semantical context and not too fine-grained as we don't want approver groups, WGs, etc. for each and every namespace.

So, my main concern is about the confusion of mixing namespaces (grouping for definition, easy to find and navigate) with domains (grouping by semantical usage and context --> higher level context that puts attributes into the actual semantical context).

By keeping the attributes registry (also in the yaml) structure, we clearly do a separation of concerns (definition vs. usage of attributes) and avoid the confusion about namespaces and semantical domains.

@lmolkova
Copy link
Contributor Author

lmolkova commented Sep 3, 2024

@AlexanderWert there is a general folder where the common attributes are defined - they can't live in any domain - they are cross-domain by design. This PR does not change that those are grouped together. Some of them have attribute groups defined in addition to the attributes, they also live in the same folder.

I don't see how this PR changes anything around it - if there is a confusion between domain and namespace, it already exists.

By keeping the attributes registry (also in the yaml) structure, we clearly do a separation of concerns (definition vs. usage of attributes) and avoid the confusion about namespaces and semantical domains.

We do keep the separation of concerns with this PR, I'd like to bring your attention back to the fact that separate folder is a very weak separation, and we have better means to enforce it.

Given that co-locating strongly related files significantly improves semconv development ergonomics and folder structure is internal implementation detail, I would still prefer to keep attribute definitions along with the span/metric/event definitions.

@AlexanderWert
Copy link
Member

AlexanderWert commented Sep 3, 2024

@AlexanderWert there is a general folder where the common attributes are defined ...

What general folder do you refer to? As of today ALL the attributes are defined in the registry folder which are grouped by the namespaces. The general yaml file is just a grouping of referenced attributes for convenience of rendering the docs.

I don't see how this PR changes anything around it - if there is a confusion between domain and namespace, it already exists.

I disagree with that! As explained above, right now, the only place where attributes are defined is the registry folder. Everywhere else attributes are only referenced (i.e. used in a concrete context). So, right now it's very clearly separated and those two concepts of domains and namespaces are not mixed at all.

As of today, the explicit registry allows for low barrier for introducing new (experimental) attributes. This was requested in this OTEP and that OTEP was closed because of the registry existing as an explicit thing. If there is no explicit registry folder anymore that allows to decouple definition of attributes from their usage, we raise the need of that OTEP again. Because then one couldn't easily just define attributes, but would always need to create a domain first. This loose coupling of attribute definition and attribute usage in context is an important tool for easier achievement of the ECS <--> SemConv merger , as well.

We do keep the separation of concerns with this PR, I'd like to bring your attention back to the fact that separate folder is a very weak separation, and we have better means to enforce it.

Yes, I'm aware that we have the tools that would allow to reconstruct an explicit, centralized registry in the docs from a less explicit, decentralized definition of the registry. BUT, again, we would couple attribute definition to domain again, which defeats one of the original purposes of the registry.

So, IMHO by doing that we would do a step back and sort of (not entirely) land where we were before.

Given that co-locating strongly related files significantly improves semconv development ergonomics and folder structure ...

I don't see why it's so important to have the attributes defined in the domain folders as it wouldn't make a huge difference in development ergonomics as long as we do the rest of the domain grouping nicely. Can't we just keep the registry separate, centralized, grouped by namespaces and explicit as it is now and just do the domain grouping for the actual definition of the semantics around domains?

I'd love to hear what others think.

@lmolkova
Copy link
Contributor Author

lmolkova commented Sep 3, 2024

@AlexanderWert is there a solution that would work for you that does not depend on the folder structure?

We have ~100 files in the registry folder now and will have much more over time.
We'll have metrics/event registry in the future.

It creates some problems:

  • when you modify messaging or databases, you're usually modifying attributes along with spans/metrics definitions. Having them in the same folder makes navigation easy. Having them in different folders makes it hard. I can show you a demo. Same is true if you just read yamls.
  • sometimes we forget to reference attribute but add it to the registry - e.g. [Database] db.operation.batch.size records number of queries  #1353
  • CODEOWNERS file is a good example that shows how ownership is spread across the repo

Could you please suggest some ways that would preserve the registry status AND allow to organize yaml folders in this repo in convenient way?

If the folder structure is the only way, then can you explain how it helps or makes anything different?

We have semconv tooling meeting on Wed at 7am PST (tomorrow), would you be able to make it? I'd be great to discuss over the call and there will be other people there.

@lmolkova lmolkova force-pushed the refactor-yaml-code-organization branch from 60924e1 to edbecf3 Compare September 15, 2024 19:24
@lmolkova lmolkova added tooling Regarding build, workflows, build-tools, ... Skip Changelog Label to skip the changelog check labels Sep 15, 2024
@lmolkova lmolkova marked this pull request as ready for review September 15, 2024 20:09
@lmolkova lmolkova requested review from a team September 15, 2024 20:09
@lmolkova lmolkova requested a review from a team September 15, 2024 20:09
Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the direction here, and appreciate the cleanup I'm seeing.

Approving the direction, have a few nits and of course, we may need to pause other merges for you to be able to get this in. Let's discuss in today's meeting!

CONTRIBUTING.md Outdated Show resolved Hide resolved
Makefile Show resolved Hide resolved
.github/CODEOWNERS Outdated Show resolved Hide resolved
.github/CODEOWNERS Show resolved Hide resolved
.github/CODEOWNERS Show resolved Hide resolved
@lmolkova lmolkova force-pushed the refactor-yaml-code-organization branch 2 times, most recently from 3310b74 to 12d3674 Compare September 16, 2024 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Skip Changelog Label to skip the changelog check tooling Regarding build, workflows, build-tools, ...
Projects
Archived in project
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants