Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eagerly initialize meta in connect.Error #585

Closed
wants to merge 4 commits into from

Conversation

pkwarren
Copy link
Contributor

Make the connect.Error type safer across goroutines by eagerly initializing meta (so calling Error.Meta() doesn't lazily initialize headers).

Make the connect.Error type safer across goroutines by eagerly
initializing meta (so calling Error.Meta() doesn't lazily initialize
headers).
@mattrobenolt
Copy link
Contributor

I don't think this is safe to do. I'm AFK for a bit, but I'll articulate when I'm back. Just want to make sure this isn't merged without my consideration.

@pkwarren
Copy link
Contributor Author

pkwarren commented Sep 12, 2023

I don't think this is safe to do. I'm AFK for a bit, but I'll articulate when I'm back. Just want to make sure this isn't merged without my consideration.

More context might be needed here - we had a few places where we were initializing commonly used errors as globals (i.e. var unauthorizedErr = connect.NewError(...)). These were getting flagged by the race detector even if no writes were made to metadata (simply calling Meta() in an interceptor was enough).

This still doesn't make concurrent writes to Meta() safe (that requires creating unique errors), but this does avoid the race on reads.

error.go Show resolved Hide resolved
@akshayjshah
Copy link
Member

@mattrobenolt We'll wait for your full thoughts :)

@mattrobenolt
Copy link
Contributor

mattrobenolt commented Sep 12, 2023

Is it expected for a caller or consumer to mutate the return of Meta()? That seems odd to me, but there are tests that reflect this happening. I'm not sure if this is a side effect of just tests, or if this is expected to be used this way in practice. I was under the impression that this should be somewhat immutable and not have an expectation that a caller can mutate/add new headers. It's a mutable structure, so we can't strictly prohibit it.

My initial gut reaction was the empty value connect.Error{} will potentially cause a panic when using .Meta() then. Even though we're not constructing an Error struct internally without going through NewError(), there's nothing that'd prohibit someone from doing so since Error is in the public API, especially since we are open to being interacted with errors.As(..) I'm not sure it's safe to assume 100% of the time, Error{} will go through the constructor.

Now, with that said, my original question is relevant since why is it not applicable to just return a nil map? All code that reads or interacts with the map will be fine, except mutating, which... isn't something we're documented or I'm sure we're expected to work. I would expect something like this to be a copy of the headers, and not the actual underlying headers from the request/response by reference.

One option to still be safe would be to not reassign back to e.meta, and just return an empty http.Header{}. But this goes back to expectations.

I personally think it's a bit silly to allocate in the constructor like this when the use case of these headers is very uncommon. (I wasn't even very aware this existed or why I'd want to use it.)

I'd personally prefer committing to make this not expect a mutation. And potentially even do a full copy within NewNotModifiedError (the only place I see this even being set).

I think treating this like Details() and expecting it to return the nil map is ideal, and if we need mutation, expose methods like SetMeta, AddMeta, etc, to explicitly do it, just like AddDetail.

Edit: yeah, looking a bit closer at how this is being used: https://github.com/connectrpc/connect-go/blob/main/error_not_modified_example_test.go#L49-L51

I would strongly not expect a caller to ever do something like:

err := connect.NewError(...)
err.Meta().Set("foo", "bar")

I get that you can, but I don't think I'd expect for this to work personally.

I'd maybe expect something like:

err.SetMeta(http.Header{"foo": []string{"bar"})

and set it myself. And anywhere internally that we consume .Meta() should be fine or can be made fine to accept the nil map since we're just merging these in. That'd make this more efficient on the hot path anyways, since this NotModifiedError, or Errors with metadata attached in general is going to be very uncommon relatively speaking.

@pkwarren
Copy link
Contributor Author

pkwarren commented Sep 12, 2023

The main concern with changing connect.Meta() to return a nil map is that it will technically be a breaking change (users may have code written already which mutates it to add additional headers). I do agree that having more APIs in place to make it easier to keep connect.Error immutable would be good - we considered that and settled on this to start to remove the race condition (which is likely to show up in other codebases as well - we already found it in 2 private repos).

You are right on Error{} however it isn't very useful in its default form. All of the struct fields (including items like the Code) aren't accessible, so any callers wanting to return a valid error need to use the constructor today.

@mattrobenolt
Copy link
Contributor

I'm not sure what constitutes an API change here, but both of these options, to me, would be considered an API change. Both options are breaking in undefined ways.

It doesn't seem to be documented or expected that you can mutate what's returned from Meta() to add additional fields. So to me, since there is no expectation set, any change here should be made explicit from an API surface and to go along with that, if we expect mutations to work, to document that use case and ensure we maintain that behavior.

Also, breaking connect.Error{} in a different way is not something that is documented either, and it's hard to predict exactly what people are doing, but technically today initializing an empty struct fully functions afaik. So similarly, I consider this undefined behavior since we don't explicitly say to not use it without going through NewError.

So breaking is a bit subjective here, and we have to pick one to break. This is all very very minor in the scheme of things, and I think a minor change in behavior towards our expectations along with documenting that change is likely fine.

Arguably, if you're doing either of these things, it's behaving or working in a way that isn't explicitly intended to do so today, so relying on that is a bit sketchy.

But ultimately, I'd favor saving the allocation on every Error object, and change the behavior to return a nil map, along with documenting that the return from Meta() is not intended to be mutated.

This path entirely exists on the extreme edge case of intending to return a 304 Not Modified. We create lots of errors in our typical hot path for many reasons. I doubt an additional allocation of an empty map is going to kill us, but it's definitely not needed for a vast majority of cases.

Another option might be to introduce another Error struct, one with meta and one without meta, but this sorta would also be a different API breaking change.

Don't let any of this block your decisions.

@pkwarren
Copy link
Contributor Author

Gonna go ahead and close this PR - thanks @mattrobenolt for the feedback. I agree with the majority of what you've said but I was worried more two separate race detector issues in this area that it felt like this would be a better solve than the current behavior.

Breaking connect.Error{} in a different way is not something that is documented either

One note - this isn't expected to work and any code using the struct initializer is technically broken today. The error code will be set to the zero value, which unlike the gRPC protocol doesn't equate with success. There is no connect.CodeOK.

That being said, we can re-introduce the behavior of lazily initializing the meta struct key for this case while still fixing the NewError constructor if needed.

This path entirely exists on the extreme edge case of intending to return a 304 Not Modified.

This is not completely true - there are a few places that are using Meta() today and expect support for writing headers. It might grow over time as other interceptors are written that require more capabilities. I think this requires more thought into how to safely evolve the API to support this.

@pkwarren pkwarren closed this Sep 13, 2023
@pkwarren pkwarren deleted the pkw/eager-init-meta branch September 13, 2023 23:11
@mattrobenolt
Copy link
Contributor

I was worried more two separate race detector issues in this area that it felt like this would be a better solve than the current behavior.

Yeah, I agree that this is a race condition and without a lock, we need to change some behavior here. I think it's also fair to define and say "this is not thread safe". The entire http.Header object isn't thread safe either. So it's probably fair also to just explicitly say this isn't thread safe.

One note - this isn't expected to work and any code using the struct initializer is technically broken today. The error code will be set to the zero value, which unlike the gRPC protocol doesn't equate with success.

This is correct, but I didn't spend the time investigating how this can be broken to make it work. Users do weird things, was my point. I don't actually know what an Error{code: 0} would do and I don't think it's worth investigating. Was just pointing it out as a contrary point. API surfaces whether they're intentional or not, become a part of the API surface. Changes, even strictly bug fixes, tend to "break" when people rely on something you didn't anticipate or expect.

This is not completely true - there are a few places that are using Meta() today and expect support for writing headers. It might grow over time as other interceptors are written that require more capabilities. I think this requires more thought into how to safely evolve the API to support this.

That's fair, and that's context you have that I don't. I would still agree and I think we're on the same page, introducing more explicit mutation APIs to do this similar to AddDetail might be better.

@jhump
Copy link
Member

jhump commented Sep 14, 2023

It doesn't seem to be documented or expected that you can mutate what's returned from Meta() to add additional fields.

@mattrobenolt, that's actually the only way to set headers from a handler with a unary response (so unary and client-stream RPCs) when there's an error. You create an error, call Meta(), add headers, and then return the error. That's actually why it currently allocates a map in the accessor, instead of returning nil. When one of these handlers returns an error, any *connect.Response that is returned (usually nil) is ignored, which is why the metadata must be put into the error instead.

both of these options, to me, would be considered an API change.

A change that is definitely backwards-compatible, even if the user did use zero values of connect.Error would be to initialize the map in the constructor and also leave it being lazily initialized in the accessor. That way there is no user-visible semantic change, regardless of whether the user used the factory function or not. It will also fix the race detector issue, because the map will already be initialized, avoiding the racy write in Meta(). (Though I must admit that code not using the factory function seems exceedingly unlikely since, as mentioned before, the zero value for this type is not useful.)

@jhump
Copy link
Member

jhump commented Sep 14, 2023

introducing more explicit mutation APIs to do this similar to AddDetail might be better.

Unfortunately, that is definitely a breaking change if you also want Meta() to return nil if no metadata was ever added. Like mentioned in previous comment, calling Meta() and mutating the result is the only way to set headers from a unary or client-streaming RPC handler and also return an error. And the accessor does the lazy allocation there specifically to support this case. So changing that is definitely not backwards compatible and, IMO, though unfortunate/annoying, is not worth a breaking change/new major version.

@mattrobenolt
Copy link
Contributor

Yeah, I get it. This is trying to mirror how an http.ResponseWriter works with Header().

I don't have an issue with that, I just also don't have an issue with this simply not being thread safe. http.ResponseWriter.Header() doesn't seem to suggest it's safe for concurrent use, and in fact, many of the implementations I'm looking at are not and do similar pattern as you're doing here.

In http2: https://cs.opensource.google/go/go/+/master:src/net/http/h2_bundle.go;l=6631-6640;drc=399b2a4b1b7857444c38305025cc793c9377e415;bpv=1;bpt=1

So while I'm arguing API changes and whatnot, I am just adding considerations.

I don't personally think anything needs to change and we can just document this isn't thread safe.

I still think there's a worthwhile consideration of APIs such as NewErrorWithMeta(...) as a proper way to add response headers, but I'm clearly missing context on why the current API is essential and I've never had a use case for it, not to say there isn't any, I'm just unaware.

I don't like the forced allocation, but to be fair, since internally we call .Meta() anyways, if we don't choose to return nil here, we're likely allocating an empty map always just to merge the headers, so ultimately I don't think it matters too much unless we can avoid the allocation entirely.

@jhump
Copy link
Member

jhump commented Sep 14, 2023

I don't have an issue with that, I just also don't have an issue with this simply not being thread safe. http.ResponseWriter.Header() doesn't seem to suggest it's safe for concurrent use, and in fact, many of the implementations I'm looking at are not and do similar pattern as you're doing here.

I think the main difference is that, intuitively, *connect.Error is essentially a value; http.ResponseWriter, on the other hand is not, and instead abstracts over behavior. Documenting that it's not thread-safe does not imply that multiple threads can't even safely use an accessor function. Usually with value types that are not thread-safe, it's mutations that are not safe, and if a value is treated as read-only, it is fine. But that is not the case when the accessor silently performs a mutation.

So, even if documented as not-thread-safe, this still feels like surprising behavior and a possible foot-gun.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants