-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push rescuer down into internal/maintenance
+ JobRow
to rivertype
#36
Conversation
92a6bf2
to
190dba3
Compare
// there's no way to put a helper in a shared package that can produce one, | ||
// which is why we have this copy/pasta. There are some potential alternatives | ||
// to this, but none of them are great. | ||
func jobRowFromInternal(internal *dbsqlc.RiverJob) *river.JobRow { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note we get to kill this duplicated convert function from rivertest
which is nice.
) | ||
|
||
type CustomErrorHandler struct{} | ||
|
||
func (*CustomErrorHandler) HandleError(ctx context.Context, job *river.JobRow, err error) *river.ErrorHandlerResult { | ||
func (*CustomErrorHandler) HandleError(ctx context.Context, job *rivertype.JobRow, err error) *river.ErrorHandlerResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the move, this is the only example that changes thanks to JobRow
being embedded on Job
.
I wanted to throw this one up as a possible alternative to a public facing `rivertype` module in #36, in which we demonstrate an alternate strategy using struct conversions. The problems it aims to solve are the same: * A way to access core job types from within subpackages, allowing more modularity (e.g. `Rescuer` could live with other maintenance services). * Long term, a way of giving drivers access to job primitives without having to reference the top-level `river`. The advantage of this approach is there's fewer user-facing changes, and the user never has to interact with an alternate package. It works by doing a struct conversion as necessary when moving from an internal package to one that bubbles up to `river`. Go allows pure struct conversions as long as their fields are the same, but there are some limitations: * Substructs like `AttemptError` can't be converted, even if identical. * Custom types like `JobState` can't be converted, even if identical. We work around these issues by: * Keeping errors as a `[][]byte` instead of it being unmarshaled by pgx. A helper called `ErrorsUnmarshaled()` is added to the top-level `JobRow` to give easy access to unmarshaled errors. * `State` is changed to a basic string akin to functions in `http`. This is technically a little less type-safe, but practically speaking would make very little difference because all these string-based types can be converted to and from each other with no checks anyway. Also, users aren't expected to access job state all that often. When they do, they can still use `job.State == river.JobStateAvailable` because the constants have been made available as strings. I think this approach does also work reasonably well, although I worry a little bit that there may be some sharp edge in the future that ends up limiting us in some unexpected way. Also, having to use helpers like `ErrorsUnmarshaled()` is definitely a little bit worse. That said, ~no user-facing API changes, and has some marginal performance benefits: * Converting structs between each other should be ~free. * Errors probably aren't used that much, so by not unmarshaling them most of the time, we save on a lot of unnecessary JSON parsing.
I wanted to throw this one up as a possible alternative to a public facing `rivertype` module in #36, in which we demonstrate an alternate strategy using struct conversions. The problems it aims to solve are the same: * A way to access core job types from within subpackages, allowing more modularity (e.g. `Rescuer` could live with other maintenance services). * Long term, a way of giving drivers access to job primitives without having to reference the top-level `river`. The advantage of this approach is there's fewer user-facing changes, and the user never has to interact with an alternate package. It works by doing a struct conversion as necessary when moving from an internal package to one that bubbles up to `river`. Go allows pure struct conversions as long as their fields are the same, but there are some limitations: * Substructs like `AttemptError` can't be converted, even if identical. * Custom types like `JobState` can't be converted, even if identical. We work around these issues by: * Keeping errors as a `[][]byte` instead of it being unmarshaled by pgx. A helper called `ErrorsUnmarshaled()` is added to the top-level `JobRow` to give easy access to unmarshaled errors. * `State` is changed to a basic string akin to functions in `http`. This is technically a little less type-safe, but practically speaking would make very little difference because all these string-based types can be converted to and from each other with no checks anyway. Also, users aren't expected to access job state all that often. When they do, they can still use `job.State == river.JobStateAvailable` because the constants have been made available as strings. I think this approach does also work reasonably well, although I worry a little bit that there may be some sharp edge in the future that ends up limiting us in some unexpected way. Also, having to use helpers like `ErrorsUnmarshaled()` is definitely a little bit worse. That said, ~no user-facing API changes, and has some marginal performance benefits: * Converting structs between each other should be ~free. * Errors probably aren't used that much, so by not unmarshaling them most of the time, we save on a lot of unnecessary JSON parsing. Unlike #36, I didn't push the rescuer down (just in case this is a total dead end), but this approach should make that possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the right approach once we started down the road of micro packages, though it does make me a bit sad to have to interface with multiple river packages for some of the built-in features 😕 Anyway I think we should ship this.
// within any given one. (i.e. Different Go processes have different IDs, | ||
// but IDs are shared within any given process.) A process generates a new | ||
// ULID (an ordered UUID) worker ID when it starts up. | ||
AttemptedBy []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post launch we are probably going to want to think more about the worker ID thing and maybe offer a way for users to set this per client to match an internal identifier they already have for the node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah +1. I was also thinking the other day that we might want to try and default to something a little more meaningful — even a hostname or something would be more human-friendly, but yeah, it should be configurable.
190dba3
to
f531209
Compare
Backstory: We've made some attempt to modularize River so that it's entire implementation doesn't sit in one top-level package. We had some success with this approach (see `internal/`), but after making some progress we eventually hit a brick wall with most of the core types, and even regressed as one of the queue maintainers (rescuer) had to live in the top-level package so it could access worker-related information. Modularity aside: I realized the other day that we also have another problem on the horizon. We have a new driver system, but current drivers like `riverpgxv5` cheat by implementing a trivial wrapper around pgx. Long term, these drives are going to have to look more like how `DBAdapter` looks today. They'll have functions like `JobInsert` and `JobComplete` specific to the particular driver implementation, and those functions will need to take and return types representing things like insert parameters and job return values. It might be possible to use types from `river` for this, but we'd have to invert the dependency requirement from `river` -> `riverpgxv5` and change a lot of tests. Being able to depend on a shared submodule (which `rivertype` could become) would be a lot easier. ## Approach I wanted to try and fix that and pursue further modularization, but it wasn't easy. I tried every trick in the book to cast types from something internal to public-facing etc., but as implemented, it's either not possible, or only possible with considerable downsides (see bottom). However, there is a change we can make that unlocks everything: `JobRow` is the core type that needs to be shared amongst everything. If it could live in a shared package, internal packages get the ability to reference it and whole puzzle unjumbles into crystal clarity. Previously, I'd convinced you that we could put core types into a separate `rivertype` package, which we did, and then later undid after finding a few techniques to share some internally referenced types. Here, we bring that idea back to some degree, but only for one type: `JobRow` (and okay, `AttemptError` too). This doesn't produce anywhere near the same degree of required API adjustments because `JobRow` is generally something that's returned from functions, and therefore `rivertype` never needs to be referenced. It's also embedded on `Job`, so its properties are available there again without importing `rivertype`. `Job` stays in the top-level package so worker `Work` signatures are not affected. Signatures of the error handler and a few other types are affected, but it's certainly the less common case. With that move in place we go on to demonstrate that further encapsulation becomes possible. We move the new work unit primitives into a shared internal package, and then make use of them in the rescuer so that it can decouple itself from the concept of a top-level worker. With that done, we can push it into `internal/maintenance` where it was supposed to be originally. Another benefit: when introducing `rivertest`, we had to duplicate the `jobRowFromInternal` function because of the same type problems mentioned above. With `JobRow` not in a shared package, we can push this helper into `dbsqlc` and have it shared amongst `river` and `rivertest`, eliminating the necessity for this extra function. And from here all the other pieces can fall too: job executor, producer, etc. could all be encapsulated separately if we'd like them to be. ## Alternatives considered ### Struct conversion by removing `AttemptError` Go allows struct to easily be converted from one to another so you could potentially have `river.JobRow(internalJobRow)`, but the limitation is that you can't do it as soon as substructs (e.g. `AttempError`) are introduced, even if the substructs are also identical to each other. One thing that we could potentially do here is change `AttemptError` back to a plain set of `[]byte`s and add an unmarshaling helper like `UnmarshalAttemptErrors() []AttemptError`. I think it'd work, but also has downsides of its own: * Prevents us from having adding other substructs to the job row in the future. * Caching unmarshaled attempt errors is difficult because if either struct picked up an unexported member, it can no longer be converted to the other. * The conversions and transform wrappers that'd need to happen to pass back and forth with interfaces like `ErrorHandler` would be tricky and add probably cruft. ### Use of unsafe Another trick you can do similar to the above: if you're sure two structs really are identical, you can convert one to the other with `unsafe.Pointer`, even if they include substructs like `AttemptError`. I made a prototype of this and it definitely works, so I really considered this as a good possible way forward. However, use of the `unsafe` package is frowned upon, and it's also banned from some hosted providers like Google App Engine, which seems bad for our use case.
f531209
to
88d79e2
Compare
Thanks! Going to get this shipped for tomorrow. |
(And added changelog.) |
Backstory: We've made some attempt to modularize River so that it's
entire implementation doesn't sit in one top-level package. We had some
success with this approach (see
internal/
), but after making someprogress we eventually hit a brick wall with most of the core types, and
even regressed as one of the queue maintainers (rescuer) had to live in
the top-level package so it could access worker-related information.
Modularity aside: I realized the other day that we also have another
problem on the horizon. We have a new driver system, but current drivers
like
riverpgxv5
cheat by implementing a trivial wrapper around pgx.Long term, these drives are going to have to look more like how
DBAdapter
looks today. They'll have functions likeJobInsert
andJobComplete
specific to the particular driver implementation, andthose functions will need to take and return types representing things
like insert parameters and job return values. It might be possible to
use types from
river
for this, but we'd have to invert the dependencyrequirement from
river
->riverpgxv5
and change a lot of tests.Being able to depend on a shared submodule (which
rivertype
couldbecome) would be a lot easier.
Approach
I wanted to try and fix that and pursue further modularization, but it
wasn't easy. I tried every trick in the book to cast types from
something internal to public-facing etc., but as implemented, it's
either not possible, or only possible with considerable downsides (see
bottom).
However, there is a change we can make that unlocks everything:
JobRow
is the core type that needs to be shared amongst everything. If it could
live in a shared package, internal packages get the ability to reference
it and whole puzzle unjumbles into crystal clarity.
Previously, I'd convinced you that we could put core types into a
separate
rivertype
package, which we did, and then later undid afterfinding a few techniques to share some internally referenced types.
Here, we bring that idea back to some degree, but only for one type:
JobRow
(and okay,AttemptError
too). This doesn't produce anywherenear the same degree of required API adjustments because
JobRow
isgenerally something that's returned from functions, and therefore
rivertype
never needs to be referenced. It's also embedded onJob
,so its properties are available there again without importing
rivertype
.Job
stays in the top-level package so workerWork
signatures are not affected.
Signatures of the error handler and a few other types are affected, but
it's certainly the less common case.
With that move in place we go on to demonstrate that further
encapsulation becomes possible. We move the new work unit primitives
into a shared internal package, and then make use of them in the rescuer
so that it can decouple itself from the concept of a top-level worker.
With that done, we can push it into
internal/maintenance
where it wassupposed to be originally.
Another benefit: when introducing
rivertest
, we had to duplicate thejobRowFromInternal
function because of the same type problemsmentioned above. With
JobRow
not in a shared package, we can push thishelper into
dbsqlc
and have it shared amongstriver
andrivertest
,eliminating the necessity for this extra function.
And from here all the other pieces can fall too: job executor, producer,
etc. could all be encapsulated separately if we'd like them to be.
Alternatives considered
Struct conversion by removing
AttemptError
Go allows struct to easily be converted from one to another so you could
potentially have
river.JobRow(internalJobRow)
, but the limitation isthat you can't do it as soon as substructs (e.g.
AttempError
) areintroduced, even if the substructs are also identical to each other.
One thing that we could potentially do here is change
AttemptError
back to a plain set of
[]byte
s and add an unmarshaling helper likeUnmarshalAttemptErrors() []AttemptError
. I think it'd work, but alsohas downsides of its own:
Prevents us from having adding other substructs to the job row in the
future.
Caching unmarshaled attempt errors is difficult because if either
struct picked up an unexported member, it can no longer be converted
to the other.
The conversions and transform wrappers that'd need to happen to pass
back and forth with interfaces like
ErrorHandler
would be tricky andadd probably cruft.
I took a stab at implementing this in #37.
Use of unsafe
Another trick you can do similar to the above: if you're sure two
structs really are identical, you can convert one to the other with
unsafe.Pointer
, even if they include substructs likeAttemptError
.I made a prototype of this and it definitely works, so I really
considered this as a good possible way forward.
However, use of the
unsafe
package is frowned upon, and it's alsobanned from some hosted providers like Google App Engine, which seems
bad for our use case.