Push rescuer down into `internal/maintenance` + `JobRow` to `rivertype` #36

brandur · 2023-11-18T19:36:16Z

Backstory: We've made some attempt to modularize River so that it's
entire implementation doesn't sit in one top-level package. We had some
success with this approach (see internal/), but after making some
progress we eventually hit a brick wall with most of the core types, and
even regressed as one of the queue maintainers (rescuer) had to live in
the top-level package so it could access worker-related information.

Modularity aside: I realized the other day that we also have another
problem on the horizon. We have a new driver system, but current drivers
like riverpgxv5 cheat by implementing a trivial wrapper around pgx.
Long term, these drives are going to have to look more like how
DBAdapter looks today. They'll have functions like JobInsert and
JobComplete specific to the particular driver implementation, and
those functions will need to take and return types representing things
like insert parameters and job return values. It might be possible to
use types from river for this, but we'd have to invert the dependency
requirement from river -> riverpgxv5 and change a lot of tests.
Being able to depend on a shared submodule (which rivertype could
become) would be a lot easier.

Approach

I wanted to try and fix that and pursue further modularization, but it
wasn't easy. I tried every trick in the book to cast types from
something internal to public-facing etc., but as implemented, it's
either not possible, or only possible with considerable downsides (see
bottom).

However, there is a change we can make that unlocks everything: JobRow
is the core type that needs to be shared amongst everything. If it could
live in a shared package, internal packages get the ability to reference
it and whole puzzle unjumbles into crystal clarity.

Previously, I'd convinced you that we could put core types into a
separate rivertype package, which we did, and then later undid after
finding a few techniques to share some internally referenced types.

Here, we bring that idea back to some degree, but only for one type:
JobRow (and okay, AttemptError too). This doesn't produce anywhere
near the same degree of required API adjustments because JobRow is
generally something that's returned from functions, and therefore
rivertype never needs to be referenced. It's also embedded on Job,
so its properties are available there again without importing
rivertype. Job stays in the top-level package so worker Work
signatures are not affected.

Signatures of the error handler and a few other types are affected, but
it's certainly the less common case.

With that move in place we go on to demonstrate that further
encapsulation becomes possible. We move the new work unit primitives
into a shared internal package, and then make use of them in the rescuer
so that it can decouple itself from the concept of a top-level worker.
With that done, we can push it into internal/maintenance where it was
supposed to be originally.

Another benefit: when introducing rivertest, we had to duplicate the
jobRowFromInternal function because of the same type problems
mentioned above. With JobRow not in a shared package, we can push this
helper into dbsqlc and have it shared amongst river and rivertest,
eliminating the necessity for this extra function.

And from here all the other pieces can fall too: job executor, producer,
etc. could all be encapsulated separately if we'd like them to be.

Alternatives considered

Struct conversion by removing `AttemptError`

Go allows struct to easily be converted from one to another so you could
potentially have river.JobRow(internalJobRow), but the limitation is
that you can't do it as soon as substructs (e.g. AttempError) are
introduced, even if the substructs are also identical to each other.

One thing that we could potentially do here is change AttemptError
back to a plain set of []bytes and add an unmarshaling helper like
UnmarshalAttemptErrors() []AttemptError. I think it'd work, but also
has downsides of its own:

Prevents us from having adding other substructs to the job row in the
future.
Caching unmarshaled attempt errors is difficult because if either
struct picked up an unexported member, it can no longer be converted
to the other.
The conversions and transform wrappers that'd need to happen to pass
back and forth with interfaces like ErrorHandler would be tricky and
add probably cruft.

I took a stab at implementing this in #37.

Use of unsafe

Another trick you can do similar to the above: if you're sure two
structs really are identical, you can convert one to the other with
unsafe.Pointer, even if they include substructs like AttemptError.

I made a prototype of this and it definitely works, so I really
considered this as a good possible way forward.

However, use of the unsafe package is frowned upon, and it's also
banned from some hosted providers like Google App Engine, which seems
bad for our use case.

brandur · 2023-11-18T19:39:07Z

rivertest/rivertest.go

-// there's no way to put a helper in a shared package that can produce one,
-// which is why we have this copy/pasta. There are some potential alternatives
-// to this, but none of them are great.
-func jobRowFromInternal(internal *dbsqlc.RiverJob) *river.JobRow {


Note we get to kill this duplicated convert function from rivertest which is nice.

brandur · 2023-11-18T19:40:06Z

example_error_handler_test.go

 )

 type CustomErrorHandler struct{}

-func (*CustomErrorHandler) HandleError(ctx context.Context, job *river.JobRow, err error) *river.ErrorHandlerResult {
+func (*CustomErrorHandler) HandleError(ctx context.Context, job *rivertype.JobRow, err error) *river.ErrorHandlerResult {


Despite the move, this is the only example that changes thanks to JobRow being embedded on Job.

I wanted to throw this one up as a possible alternative to a public facing `rivertype` module in #36, in which we demonstrate an alternate strategy using struct conversions. The problems it aims to solve are the same: * A way to access core job types from within subpackages, allowing more modularity (e.g. `Rescuer` could live with other maintenance services). * Long term, a way of giving drivers access to job primitives without having to reference the top-level `river`. The advantage of this approach is there's fewer user-facing changes, and the user never has to interact with an alternate package. It works by doing a struct conversion as necessary when moving from an internal package to one that bubbles up to `river`. Go allows pure struct conversions as long as their fields are the same, but there are some limitations: * Substructs like `AttemptError` can't be converted, even if identical. * Custom types like `JobState` can't be converted, even if identical. We work around these issues by: * Keeping errors as a `[][]byte` instead of it being unmarshaled by pgx. A helper called `ErrorsUnmarshaled()` is added to the top-level `JobRow` to give easy access to unmarshaled errors. * `State` is changed to a basic string akin to functions in `http`. This is technically a little less type-safe, but practically speaking would make very little difference because all these string-based types can be converted to and from each other with no checks anyway. Also, users aren't expected to access job state all that often. When they do, they can still use `job.State == river.JobStateAvailable` because the constants have been made available as strings. I think this approach does also work reasonably well, although I worry a little bit that there may be some sharp edge in the future that ends up limiting us in some unexpected way. Also, having to use helpers like `ErrorsUnmarshaled()` is definitely a little bit worse. That said, ~no user-facing API changes, and has some marginal performance benefits: * Converting structs between each other should be ~free. * Errors probably aren't used that much, so by not unmarshaling them most of the time, we save on a lot of unnecessary JSON parsing.

I wanted to throw this one up as a possible alternative to a public facing `rivertype` module in #36, in which we demonstrate an alternate strategy using struct conversions. The problems it aims to solve are the same: * A way to access core job types from within subpackages, allowing more modularity (e.g. `Rescuer` could live with other maintenance services). * Long term, a way of giving drivers access to job primitives without having to reference the top-level `river`. The advantage of this approach is there's fewer user-facing changes, and the user never has to interact with an alternate package. It works by doing a struct conversion as necessary when moving from an internal package to one that bubbles up to `river`. Go allows pure struct conversions as long as their fields are the same, but there are some limitations: * Substructs like `AttemptError` can't be converted, even if identical. * Custom types like `JobState` can't be converted, even if identical. We work around these issues by: * Keeping errors as a `[][]byte` instead of it being unmarshaled by pgx. A helper called `ErrorsUnmarshaled()` is added to the top-level `JobRow` to give easy access to unmarshaled errors. * `State` is changed to a basic string akin to functions in `http`. This is technically a little less type-safe, but practically speaking would make very little difference because all these string-based types can be converted to and from each other with no checks anyway. Also, users aren't expected to access job state all that often. When they do, they can still use `job.State == river.JobStateAvailable` because the constants have been made available as strings. I think this approach does also work reasonably well, although I worry a little bit that there may be some sharp edge in the future that ends up limiting us in some unexpected way. Also, having to use helpers like `ErrorsUnmarshaled()` is definitely a little bit worse. That said, ~no user-facing API changes, and has some marginal performance benefits: * Converting structs between each other should be ~free. * Errors probably aren't used that much, so by not unmarshaling them most of the time, we save on a lot of unnecessary JSON parsing. Unlike #36, I didn't push the rescuer down (just in case this is a total dead end), but this approach should make that possible.

bgentry

This is probably the right approach once we started down the road of micro packages, though it does make me a bit sad to have to interface with multiple river packages for some of the built-in features 😕 Anyway I think we should ship this.

bgentry · 2023-11-18T23:22:53Z

rivertype/job_row.go

+	// within any given one. (i.e. Different Go processes have different IDs,
+	// but IDs are shared within any given process.) A process generates a new
+	// ULID (an ordered UUID) worker ID when it starts up.
+	AttemptedBy []string


Post launch we are probably going to want to think more about the worker ID thing and maybe offer a way for users to set this per client to match an internal identifier they already have for the node.

Yeah +1. I was also thinking the other day that we might want to try and default to something a little more meaningful — even a hostname or something would be more human-friendly, but yeah, it should be configurable.

Backstory: We've made some attempt to modularize River so that it's entire implementation doesn't sit in one top-level package. We had some success with this approach (see `internal/`), but after making some progress we eventually hit a brick wall with most of the core types, and even regressed as one of the queue maintainers (rescuer) had to live in the top-level package so it could access worker-related information. Modularity aside: I realized the other day that we also have another problem on the horizon. We have a new driver system, but current drivers like `riverpgxv5` cheat by implementing a trivial wrapper around pgx. Long term, these drives are going to have to look more like how `DBAdapter` looks today. They'll have functions like `JobInsert` and `JobComplete` specific to the particular driver implementation, and those functions will need to take and return types representing things like insert parameters and job return values. It might be possible to use types from `river` for this, but we'd have to invert the dependency requirement from `river` -> `riverpgxv5` and change a lot of tests. Being able to depend on a shared submodule (which `rivertype` could become) would be a lot easier. ## Approach I wanted to try and fix that and pursue further modularization, but it wasn't easy. I tried every trick in the book to cast types from something internal to public-facing etc., but as implemented, it's either not possible, or only possible with considerable downsides (see bottom). However, there is a change we can make that unlocks everything: `JobRow` is the core type that needs to be shared amongst everything. If it could live in a shared package, internal packages get the ability to reference it and whole puzzle unjumbles into crystal clarity. Previously, I'd convinced you that we could put core types into a separate `rivertype` package, which we did, and then later undid after finding a few techniques to share some internally referenced types. Here, we bring that idea back to some degree, but only for one type: `JobRow` (and okay, `AttemptError` too). This doesn't produce anywhere near the same degree of required API adjustments because `JobRow` is generally something that's returned from functions, and therefore `rivertype` never needs to be referenced. It's also embedded on `Job`, so its properties are available there again without importing `rivertype`. `Job` stays in the top-level package so worker `Work` signatures are not affected. Signatures of the error handler and a few other types are affected, but it's certainly the less common case. With that move in place we go on to demonstrate that further encapsulation becomes possible. We move the new work unit primitives into a shared internal package, and then make use of them in the rescuer so that it can decouple itself from the concept of a top-level worker. With that done, we can push it into `internal/maintenance` where it was supposed to be originally. Another benefit: when introducing `rivertest`, we had to duplicate the `jobRowFromInternal` function because of the same type problems mentioned above. With `JobRow` not in a shared package, we can push this helper into `dbsqlc` and have it shared amongst `river` and `rivertest`, eliminating the necessity for this extra function. And from here all the other pieces can fall too: job executor, producer, etc. could all be encapsulated separately if we'd like them to be. ## Alternatives considered ### Struct conversion by removing `AttemptError` Go allows struct to easily be converted from one to another so you could potentially have `river.JobRow(internalJobRow)`, but the limitation is that you can't do it as soon as substructs (e.g. `AttempError`) are introduced, even if the substructs are also identical to each other. One thing that we could potentially do here is change `AttemptError` back to a plain set of `[]byte`s and add an unmarshaling helper like `UnmarshalAttemptErrors() []AttemptError`. I think it'd work, but also has downsides of its own: * Prevents us from having adding other substructs to the job row in the future. * Caching unmarshaled attempt errors is difficult because if either struct picked up an unexported member, it can no longer be converted to the other. * The conversions and transform wrappers that'd need to happen to pass back and forth with interfaces like `ErrorHandler` would be tricky and add probably cruft. ### Use of unsafe Another trick you can do similar to the above: if you're sure two structs really are identical, you can convert one to the other with `unsafe.Pointer`, even if they include substructs like `AttemptError`. I made a prototype of this and it definitely works, so I really considered this as a good possible way forward. However, use of the `unsafe` package is frowned upon, and it's also banned from some hosted providers like Google App Engine, which seems bad for our use case.

brandur · 2023-11-20T06:30:55Z

Thanks! Going to get this shipped for tomorrow.

brandur · 2023-11-20T06:31:14Z

(And added changelog.)

brandur force-pushed the brandur-push-down-job-row branch 2 times, most recently from 92a6bf2 to 190dba3 Compare November 18, 2023 19:38

brandur commented Nov 18, 2023

View reviewed changes

brandur mentioned this pull request Nov 18, 2023

Demonstrate internal job row type using struct conversion #37

Closed

brandur requested a review from bgentry November 18, 2023 20:40

bgentry approved these changes Nov 18, 2023

View reviewed changes

brandur force-pushed the brandur-push-down-job-row branch from 190dba3 to f531209 Compare November 20, 2023 06:08

brandur force-pushed the brandur-push-down-job-row branch from f531209 to 88d79e2 Compare November 20, 2023 06:18

brandur merged commit 62b5ae3 into master Nov 20, 2023
5 checks passed

brandur deleted the brandur-push-down-job-row branch November 20, 2023 06:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Push rescuer down into `internal/maintenance` + `JobRow` to `rivertype` #36

Push rescuer down into `internal/maintenance` + `JobRow` to `rivertype` #36

brandur commented Nov 18, 2023 •

edited

Loading

brandur Nov 18, 2023

brandur Nov 18, 2023

bgentry left a comment

bgentry Nov 18, 2023

brandur Nov 20, 2023

brandur commented Nov 20, 2023

brandur commented Nov 20, 2023

Push rescuer down into internal/maintenance + JobRow to rivertype #36

Push rescuer down into internal/maintenance + JobRow to rivertype #36

Conversation

brandur commented Nov 18, 2023 • edited Loading

Approach

Alternatives considered

Struct conversion by removing AttemptError

Use of unsafe

brandur Nov 18, 2023

Choose a reason for hiding this comment

brandur Nov 18, 2023

Choose a reason for hiding this comment

bgentry left a comment

Choose a reason for hiding this comment

bgentry Nov 18, 2023

Choose a reason for hiding this comment

brandur Nov 20, 2023

Choose a reason for hiding this comment

brandur commented Nov 20, 2023

brandur commented Nov 20, 2023

Push rescuer down into `internal/maintenance` + `JobRow` to `rivertype` #36

Push rescuer down into `internal/maintenance` + `JobRow` to `rivertype` #36

brandur commented Nov 18, 2023 •

edited

Loading

Struct conversion by removing `AttemptError`