Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance about what is properly part of a resource #56

Open
tomrogers42 opened this issue Dec 11, 2014 · 7 comments
Open

Guidance about what is properly part of a resource #56

tomrogers42 opened this issue Dec 11, 2014 · 7 comments

Comments

@tomrogers42
Copy link

I perennially run into difficulty trying to draw a line around which properties it's reasonable for a resource to return. I'd love it if you could address that in your docs, or talk about how you settled these issues when building your UI on top of the REST API.

Consider a system with users and orgs. I'd assert that it's reasonable for the UI to present a table of all the orgs, with some vital stats about each. If the table must show the user count for each org, there are two obvious solutions for acquiring the user count:

  • add userCount to the org resource
curl https://service.com/orgs

[
  {
    "id": "01234567-89ab-cdef-0123-456789abcdef"
    "name": "SPECTRE",
    "userCount": 21
  },
  // ... additional org descriptors
]
  • force clients to fetch the list of users for each org and count them
// repeat for each record returned by the org resource
curl https://service.com/orgs/{org_id}/users

[
  {
    id: "7e876304-bcd9-4ce6-9119-126be52f4486",
    username: "eblofeld",
    "created_at": "1908-05-25T12:00:00Z",
    "updated_at": "1981-06-24T13:00:00Z",]
  },
  // ... plus 20 more user descriptors
]

Approach 1 can lead to death by a thousand cuts: the server code backing each resource grows with every new property required by UI, increasing CPU cost but reducing API traffic. And per your versioning strategy, the only way to withdraw support for any such property is to cut a new version (while maintaining the old version).

Approach 2 results in both client and server doing lots of work that neither is really interested in; in this case, the content of each user descriptor is discarded, because the only data needed is the length of the array. For collections with many members or whose members have many properties, you also end up with heavier payloads. It also results very quickly in a huge number of requests: 1 + n * c, where n is the number of rows returned and c is the number of "vital statistics" that aren't part of the base response.

I'm reluctant to draw the line based on the underlying database schema, since that feels like an implementation detail that ought to be abstracted at or by this layer.

Your comments are very much appreciated.

@geemus
Copy link
Member

geemus commented Dec 12, 2014

Great question, and as you hint, not one that lends itself easily to a universal answer. So, in short, "it depends". I'll add some brief thoughts though.

First off, I think in general that it is often best to start with a really generic version of things. ie in this case, orgs and apps distinctly (and you can pull back the list and count yourself). This is not what either the UI or server want to be doing, per se, but it supports both and most anything else a different UI might want. I think having that foundation is really valuable, especially since UI needs often change (quickly) it can be dangerous to chase it too quickly.

That said, when patterns have settled down, I think it can be quite valuable to add things to help in this case. In your example you mention counts as being an important thing. I suspect I would add that value in a more accessible way, albeit one a bit different from yours.

Note that I haven't followed this approach in practice, so there may be hidden gotchas, but my inclination would be toward something like:

curl https://service.com/orgs/{org_id}/stats/{resource_id}

Initially it could just support the resource_id users and the serialization might only include count, but it seems like a generic-enough pattern that it could nicely accommodate a lot of similar features as needed while avoiding the need to change existing production endpoints.

Hope that sheds some light, I'd certainly be happy to discuss some more (I realize my response is pretty hyper-focused on the particulars of this example after all). I think there will always be tension when you control both a client and api to try to mesh them tightly, but I think the threat of overfitting is very real and problematic (especially if you have or hope to have additional clients or usage outside your control).

Thanks!

@neonstalwart
Copy link
Contributor

i'm faced with the same problem.

i've seen a very elegant approach for determining counts... request a range of 0 (or 1) items and inspect the response to determine the count.

EDIT: for context, https://github.com/persvr/pintura#pagingrange-requests is what my pagination scheme is based on

request:

Range: items=0-0

response:

Content-Range: items 0-0/80

based on the Content-Range header in this response i can see that 80 items match my request.

then, in order to reduce the number of http requests from the client, provide a server endpoint to handle batched requests - something along the lines of https://github.com/persvr/pintura#bulk-updates-and-comet. to implement this on the client, i use a utility for making requests to the server and this gives me a way to collate the requests together and then make one request to the server at the end of an event turn.

maybe this won't suit everyone but i think it's a reasonable solution to this problem and may give some ideas about how to approach this in a way that you're comfortable with.

@geemus
Copy link
Member

geemus commented Dec 12, 2014

@neonstalwart nice. Yeah, for counts in particular that definitely seems like an elegant option.

@camcam5313
Copy link

CH counts ?

@geemus
Copy link
Member

geemus commented Dec 15, 2014

@camcam5313 CH? Not sure I follow, could you elaborate? Thanks.

@tomrogers42
Copy link
Author

Thanks, all. I think the particulars of my example are obscuring the larger question, which is just: what rules or criterion do you use to determine which properties are included in a resource? Given that any resource will satisfy an infinite number of predicates, which ones should the server include in responses?

I'm currently trying to attack this from opposite ends, in the hopes that they'll meet in the middle. First, how do you establish a baseline set of properties? Do you just recapitulate your storage schema? And second, are there any rules-of-thumb for readily determining that a given datapoint should not be exposed on that resource?

To give you an example of what I think I'm looking for, here's something I'm considering:

  • any property that can be modified by a resource ought to be readable via that same resource

I'm trying to articulate similar guidelines that might be used to sort UI-driven changes between base resources and other API paths.

@geemus
Copy link
Member

geemus commented Dec 17, 2014

@tomrogers42 yeah, as always, it can be tricky to generalize effectively.

I think it can be good to start with some foundation/baseline. For instance, we just asserted that everything should have, at least a uuid, created_at and updated_at timestamps. Occasionally not all are needed (things that can be created, but never changed, for instance, don't REALLY need updated_at). That said, always having it doesn't really hurt anything and the consistency has some value of it's own.

I think anything readable should be writable is a great additional guideline. I think a lot about input/output parity when I'm working on these sorts of things. So in addition to just saying that they should be similar, I try (where it makes sense) to even make the json format of the data similar. The more similarity the fewer the number of new and/or surprising things you have to figure out along the way.

I think you should be wary of recapitulating storage schema. In some cases that is just fine, but in other cases it can be symptomatic of a leaky abstraction. In many cases the end users can be provided a better naming or more useful arrangement than what the db holds explicitly (this seems likely to be more true the more legacy the datastore is). With greenfield things, however, it seems likely that the datastore and API data will track pretty closely, at least initially.

Ideally I would provide some additional guidelines, but I'm not sure what else can be easily generalized toward. So hopefully some further thoughts on your questions and points will still be helpful, even if we haven't necessarily arrived at a clear checklist quite yet. Definitely some extra stuff we could extract from here to improve on the guide though already, I think/hope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants