-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add limited UPDATE/DELETE(tables only) #2195
Conversation
ce01ee7
to
6dbd614
Compare
6dbd614
to
1d3db35
Compare
Docs for this one already at PostgREST/postgrest-docs#519 |
Will look at this later today. |
The docs say:
Yet, I can't see a test-case with a view that exposes the ctid column. Did I just miss it? I would really like to see an example. |
I don't think it's possible to support this feature for all views from which we can infer PKs, but only for auto-updateable views. Those views are required to have only one table in the But as soon as you have multiple tables in the Example: create table t (p int primary key);
create view v as
select p, generate_series(1, 2) from t;
-- then create some basic instead of triggers for DELETE When you run a request on this view to delete It's possible to construct other examples that do the opposite: Deleting more rows than were requested. |
All true with the above, nice catch! ⚾
We can determine that by checking that all of the postgrest/src/PostgREST/DbStructure/Table.hs Lines 19 to 28 in 1d3db35
postgrest/src/PostgREST/DbStructure.hs Lines 336 to 351 in 1d3db35
Or those somehow will also be true if there are INSTEAD OF triggers defined for the view? In which case we also need to check that there are no INSTEAD OF triggers defined? |
Those are currently determined by The second argument is a boolean |
And once you have a version of that, that is only true for "auto", then it would be enough to only check for the update case for You could have an auto-updateable view that still defines an instead of trigger - but maybe only for one of the two. Although I don't know what |
Hm, so if we are using the postgrest/test/spec/Feature/OptionsSpec.hs Lines 39 to 50 in 1d3db35
Because it's not testing for the |
Ok, so what I'm thinking is we should have an additional atribute in our |
Those test-cases are fine, I think. They do what they say, they test whether an auto-updateable view has the right options response. We do return that same response for "just-updateable" (not auto, but trigger-based), too (and rightly so), but that doesn't make the first test wrong or so. This is not about "auto-and-only-auto-updateable" views. |
Maybe we can just switch the existing fields from a boolean to something like: |
According to @laurenceisla the I'll just disable this feature for views now and leave it for a later enhancement. |
97788c8
to
a3b0908
Compare
The 501 Not Implemented status is good for missing implementations, whenever PATCH/DELETE is tried on views it will respond with that now. If the view is not present in the internal cache, if will also reject the request. I've also left the |
Just adding more details to the explanation: we can't know for sure that a VIEW is auto updatable using Maybe one way is to ignore all the VIEWs that have all of the INSTEAD OF RULEs present for this feature, because they cannot be differentiated if that's the case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the test-cases tests the interaction between order=..
and limit=...
- and I think this is still broken.
|
||
-- the "ctid" system column is always available to tables | ||
mutRangeF :: QualifiedIdentifier -> [FieldName] -> (SqlFragment, SqlFragment) | ||
mutRangeF mainQi rangeId = ( | ||
BS.intercalate " AND " $ | ||
(\col -> pgFmtColumn mainQi col <> " = " <> pgFmtColumn (QualifiedIdentifier mempty "pgrst_affected_rows") col) <$> | ||
(if null rangeId then ["ctid"] else rangeId) | ||
, if null rangeId then pgFmtColumn mainQi "ctid" else BS.intercalate ", " (pgFmtColumn mainQi <$> rangeId) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current implementation, you don't need pk columns at all. You can just do it with the ctid
column only. This will make it less complex for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given what I mentioned below, it does make a lot of sense to order by ctid
only and not by PK, because doing this:
INSERT INTO projects VALUES (1, 'Windows 7', 1);
INSERT INTO projects VALUES (2, 'Windows 10', 1);
INSERT INTO projects VALUES (5, 'Orphan', NULL);
INSERT INTO projects VALUES (4, 'OSX', 2);
INSERT INTO projects VALUES (3, 'IOS', 2);
Gives through http:
GET /projects
[{"id":1,"name":"Windows 7","client_id":1},
{"id":2,"name":"Windows 10","client_id":1},
{"id":5,"name":"Orphan","client_id":null},
{"id":4,"name":"OSX","client_id":2},
{"id":3,"name":"IOS","client_id":2}]
Which is the same as doing GET /projects?order=ctid
but not the same as doing GET /projects?order=id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically with ctid
we can ensure that DELETE /projects?limit=2
deletes the same rows as the ones obtained through GET /projects?limit=2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing bad about preferring ctid
is that you have to give explicit SELECT privilege to the column for it to work(or a full wide SELECT) and that is weird - giving the PK columns a SELECT privilege is a more usual operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So because of the above - the need to do an special GRANT SELECT(ctid)
for limit
to work - I think in fact ctid
should be left out and limit
should depend on PKs.
Later on, it would also be weird to tell users to expose ctid
in ther VIEWs for limit
to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a user chooses to order on non-pk columns, you can't show this error, because you'd need to detect the uniqueness of those columns again - which is exactly what we have trouble with for views
Forgot about that, so I can't suggest ordering by a pk because a view might have more rows than pk values as in the example you showed here. Got it.
Is this what you understood, too?
Ah, yes I understand the mechanism however the "implicit" part got me thinking into a server side default for row_count
.
That would basically solve the "postgrest let's you delete all your rows by default" problem - sometimes some users make the mistake to send deletes without filters, this won't stop even with limit
I believe.
So I think enforcing a row_count
server side that is overridable on the client would solve that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. A server-side default, which is overridable from the client-side, would be completely unrelated to this issue here, though. The implicit row-count here would just be to avoid mistakes in the order
clause - while the default you are proposing would be a protection against unlimited delete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, if we instead run this query implicitly as the following:
DELETE /v!row_count=lte.2?limit=2&order=static
@wolfgangwalther Now that I've realized, this would mean that limit
needs row_count
and that for this feature both might as well be the same.
So how about this. I'll go ahead and apply the row_count
logic for the limit
but will not expose the actual row_count
special filter. We can add that one later on when we settle on a syntax.
(I do see row_count
being useful in other ways but for this feature we can apply it implicitly as you mentioned)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And now that I think about it. If we enforce row_count
implicitly on limit
, then it would be safe to assume the PK cols for the order/join
on views as well?
So the above problem would be no more?
I mean because we'd be already fulfilling our part of the contract - the affected rows would not exceed the limit
. And if the VIEW query reduces the amount of rows mapped to the PKs this would not be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how about this. I'll go ahead and apply the
row_count
logic for thelimit
but will not expose the actualrow_count
special filter.
Ok.
I mean because we'd be already fulfilling our part of the contract - the affected rows would not exceed the
limit
. And if the VIEW query reduces the amount of rows mapped to the PKs this would not be an issue.
I think it would still have very odd interactions with offset
. For me, it'd be much simpler to implement and explain to always require and use the columns in order
- and hint in the docs and error messages at the requirements for that. We'd need that for views without inferred PK columns anyway - and this way we don't have a complicated mix of what to do in which situation.
handleDelete identifier context@(RequestContext _ ctxDbStructure ApiRequest{..} _) = do | ||
when (iTopLevelRange /= RangeQuery.allRange && findIfView identifier (dbTables ctxDbStructure)) $ | ||
throwError $ Error.NotImplemented "limit/offset is not implemented for views" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this in App.hs
really seems the wrong place. This should go into ApiRequest.hs
, I think - because this module is concerned with parsing the user's request and matching it up with the schema cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go into ApiRequest.hs, I think - because this module is concerned with parsing the user's request and matching it up with the schema cache.
Hmm, I never thought of ApiRequest like that, I mean the only reason why the schema cache is used there is because of the function overloading which needs the Proc info and that is something that I thought be better moved to a different module.
If you check handleSingleUpsert
it has a similar logic:
postgrest/src/PostgREST/App.hs
Lines 373 to 376 in a3b0908
handleSingleUpsert :: QualifiedIdentifier -> RequestContext-> DbHandler Wai.Response | |
handleSingleUpsert identifier context@(RequestContext _ _ ApiRequest{..} _) = do | |
when (iTopLevelRange /= RangeQuery.allRange) $ | |
throwError Error.PutRangeNotAllowedError |
So to be consistent I thought App.hs
was the better place for this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. The handleSingleUpsert
doesn't even depend on the schema cache, so it's even more out of place. It's an invalid request, so that should really go into ApiRequest
.
"pgrst_affected_rows AS (" <> | ||
"SELECT " <> SQL.sql rangeIdF <> " FROM " <> mainTbl <> | ||
whereLogic <> " " <> | ||
"ORDER BY " <> SQL.sql rangeIdF <> " " <> limitOffsetF range <> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to order this by rangeIdF
here? Shouldn't we order this by whatever was put in the query string as ?order=...
? Otherwise the result of limit will be unpredictable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to order this by rangeIdF here?
Yes, as mentioned in the crunchy data blog post:
A word of warning:
Operating (particularly for a destructive operation) on some poorly-defined criteria is purposefully difficult. In general, the ORDER BY clause is strongly encouraged for any operation using LIMIT for a mutable query.
Shouldn't we order this by whatever was put in the query string as ?order=...?
If the client doesn't specify an order
then that would be bad because of the above.
I guess we could replace the implicit ORDER BY PK
by the explicit ?order
though..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say we just use the ?order
and add the ctid
column at the end of that. If there is no ?order
, it's just the ctid
column.
...
Really? I feel pressurized to give quick reviews, because you seem to be in a hurry. I do have plenty of other things on my plate right now. I don't think this was ready to merge, yet. And I really don't think we should merge a feature as quick as this, and then immediately create a release. We should have time to review and test stuff like this, before immediately bumping major... I don't like this! |
More specifically, I would expect a request like the following: DELETE /items?order=non_pk&limit=2 On a table like this:
To delete the 3/4 rows. However, I think right now the 1/2 rows are deleted. This is what I read from the code. I wasn't able to test this myself, because I fear you're going to release this before I will be able to do so. |
Yeah, really sorry about this @wolfgangwalther. I also decided against doing a major release, so no worries. I'll only make a pre-release for this one. |
So basically the |
"UPDATE " <> mainTbl <> " SET " <> SQL.sql nonRangeCols <> " " <> | ||
"FROM (SELECT * FROM json_populate_recordset (null::" <> mainTbl <> " , " <> SQL.sql selectBody <> " )) _ " <> | ||
whereLogic <> " " <> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this change always adds a WHERE
clause to the UPDATE
query - even if the request is completely unconditional.
This breaks using pg-safeupdate
. The request now goes through instead of failing because of a previously unconditional UPDATE
query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that pg-safeupdate
is not required anymore because we're disallowing unfiltered patch/unfiltered delete unless a limit
is applied.
Looks like this change always adds a WHERE clause to the UPDATE query - even if the request is completely unconditional.
Yeah, because of the above, if no filter is added then WHERE false
is done.
One thing better with the pg-safeupdate
safeguard is that it fails with an error while the WHERE false
will not do anything.
That's only the case of unfiltered UPDATE though. For unfiltered DELETE, we're going to fail with an error.
(I need to go back to this one because it will be inconsistent otherwise)
Would we need to keep compatibility with pg-safeupdate
now that we have this? Why?
Not sure but maybe we could do LIMIT 0
instead of WHERE false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, because of the above, if no filter is added then WHERE false is done.
Oh, hell. I thought this was a review on #2311.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressing this in #2405
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this change always adds a WHERE clause to the UPDATE query - even if the request is completely unconditional.
I think the change in this PR still allows the full table update without conditionals. The one that disallowed this was #2311 if I'm not mistaken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compare #2418
This feature only works for tables, not views
As discussed in #2156.
Applies this technique to limit rows affected by UPDATE/DELETEs.
Sample generated query:
Tasks
max-rows
affects bulk insert #2155