Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add limited UPDATE/DELETE(tables only) #2195

Merged
merged 5 commits into from
Mar 26, 2022
Merged

Add limited UPDATE/DELETE(tables only) #2195

merged 5 commits into from
Mar 26, 2022

Conversation

steve-chavez
Copy link
Member

@steve-chavez steve-chavez commented Mar 21, 2022

This feature only works for tables, not views

As discussed in #2156.

Applies this technique to limit rows affected by UPDATE/DELETEs.

Sample generated query:

PATCH /projects?limit=1

{ "name" : "BeOS"}
WITH
  pgrst_payload AS ( SELECT '{ "name" : "BeOS"}'::json AS json_data),
  pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload),
  pgrst_update_body AS (SELECT * FROM json_populate_recordset (null::"test"."projects" , (SELECT val FROM pgrst_body) ) LIMIT 1),
  pgrst_affected_rows AS (SELECT "test"."projects"."id" FROM "test"."projects" ORDER BY "test"."projects"."id" LIMIT 1 OFFSET 0)
UPDATE "test"."projects"
SET "name" = (SELECT "name" FROM pgrst_update_body)
FROM pgrst_affected_rows
WHERE "test"."projects"."id" = "pgrst_affected_rows"."id"
RETURNING 1;

Tasks

@steve-chavez steve-chavez force-pushed the limited-up-del branch 2 times, most recently from ce01ee7 to 6dbd614 Compare March 22, 2022 18:58
@steve-chavez steve-chavez marked this pull request as ready for review March 22, 2022 19:04
@steve-chavez
Copy link
Member Author

Docs for this one already at PostgREST/postgrest-docs#519

@wolfgangwalther
Copy link
Member

Will look at this later today.

@wolfgangwalther
Copy link
Member

The docs say:

It will work on a view if the PK or ctid <https://www.postgresql.org/docs/current/ddl-system-columns.html>_ is included as one of its columns.

Yet, I can't see a test-case with a view that exposes the ctid column. Did I just miss it? I would really like to see an example.

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Mar 25, 2022

I don't think it's possible to support this feature for all views from which we can infer PKs, but only for auto-updateable views. Those views are required to have only one table in the FROM, among others, by PostgreSQL. In those cases, there is a strict 1-to-1 relationship between rows in the view and in the underlying table, and this feature can work. That's why PostgreSQL makes those views auto-updateable by default.

But as soon as you have multiple tables in the FROM or have set-returning functions in the SELECT etc. - this is not guaranteed to work anymore: The columns in the view might be based on the tables PK columns - but they are not neccessarily unique anymore.

Example:

create table t (p int primary key);
create view v as
select p, generate_series(1, 2) from t;
-- then create some basic instead of triggers for DELETE

When you run a request on this view to delete limit=2 rows, it might actually delete 1 rows, or 2 rows. Depending on whether the same p is returned from the CTE twice or not.

It's possible to construct other examples that do the opposite: Deleting more rows than were requested.

@steve-chavez
Copy link
Member Author

All true with the above, nice catch! ⚾

but only for auto-updateable views.

We can determine that by checking that all of the tableInsertable/tableUpdatable/tableDeletable are true right?

data Table = Table
{ tableSchema :: Schema
, tableName :: TableName
, tableDescription :: Maybe Text
-- The following fields identify what can be done on the table/view, they're not related to the privileges granted to it
, tableInsertable :: Bool
, tableUpdatable :: Bool
, tableDeletable :: Bool
}
deriving (Show, Ord, Generic, JSON.ToJSON)

(
c.relkind IN ('r','p')
OR (
c.relkind IN ('v','f')
-- CMD_INSERT - see allTables query below for explanation
AND (pg_relation_is_updatable(c.oid::regclass, TRUE) & 8) = 8
)
) AS insertable,
(
c.relkind IN ('r','p')
OR (
c.relkind IN ('v','f')
-- CMD_UPDATE
AND (pg_relation_is_updatable(c.oid::regclass, TRUE) & 4) = 4
)
) as updatable,

Or those somehow will also be true if there are INSTEAD OF triggers defined for the view? In which case we also need to check that there are no INSTEAD OF triggers defined?

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Mar 25, 2022

Or those somehow will also be true if there are INSTEAD OF triggers defined for the view? In which case we also need to check that there are no INSTEAD OF triggers defined?

Those are currently determined by pg_relation_is_updatable, which is a pg-internal function as it's not documented. However:
https://github.com/postgres/postgres/blob/f6f0db4d62400ff88f523dcc4d7e25f9506bc0d8/src/backend/utils/adt/misc.c#L611-L625

The second argument is a boolean include_triggers. We call it with TRUE, so the fields you mentioned do include INSTEAD OF triggers. You'd need the result of the call with FALSE to tell whether it's auto-updateable or not, I think.

@wolfgangwalther
Copy link
Member

We can determine that by checking that all of the tableInsertable/tableUpdatable/tableDeletable are true right?

And once you have a version of that, that is only true for "auto", then it would be enough to only check for the update case for UPDATE and the delete case for DELETE.

You could have an auto-updateable view that still defines an instead of trigger - but maybe only for one of the two. Although I don't know what pg_relation_is_updateable returns in that case anyway.

@steve-chavez
Copy link
Member Author

Hm, so if we are using the pg_relation_is_updatable with TRUE as mentioned above then the following tests are not true?

context "auto updatable" $ do
it "includes read/write verbs for auto updatable views with pk" $ do
r <- request methodOptions "/projects_auto_updatable_view_with_pk" [] ""
liftIO $
simpleHeaders r `shouldSatisfy`
matchHeader "Allow" "OPTIONS,GET,HEAD,POST,PUT,PATCH,DELETE"
it "includes read/write verbs for auto updatable views without pk" $ do
r <- request methodOptions "/projects_auto_updatable_view_without_pk" [] ""
liftIO $
simpleHeaders r `shouldSatisfy`
matchHeader "Allow" "OPTIONS,GET,HEAD,POST,PATCH,DELETE"

Because it's not testing for the auto on "auto-updatable"? They're just "updatable" thanks to the INSTEAD OF triggers?

@steve-chavez
Copy link
Member Author

Ok, so what I'm thinking is we should have an additional atribute in our Table structure for isAutoUpdatableView that will use the pg_relation_is_updateable with FALSE on the 3 operations(maybe only one is necessary though).

@wolfgangwalther
Copy link
Member

Hm, so if we are using the pg_relation_is_updatable with TRUE as mentioned above then the following tests are not true?

context "auto updatable" $ do
it "includes read/write verbs for auto updatable views with pk" $ do
r <- request methodOptions "/projects_auto_updatable_view_with_pk" [] ""
liftIO $
simpleHeaders r `shouldSatisfy`
matchHeader "Allow" "OPTIONS,GET,HEAD,POST,PUT,PATCH,DELETE"
it "includes read/write verbs for auto updatable views without pk" $ do
r <- request methodOptions "/projects_auto_updatable_view_without_pk" [] ""
liftIO $
simpleHeaders r `shouldSatisfy`
matchHeader "Allow" "OPTIONS,GET,HEAD,POST,PATCH,DELETE"

Because it's not testing for the auto on "auto-updatable"? They're just "updatable" thanks to the INSTEAD OF triggers?

Those test-cases are fine, I think. They do what they say, they test whether an auto-updateable view has the right options response. We do return that same response for "just-updateable" (not auto, but trigger-based), too (and rightly so), but that doesn't make the first test wrong or so. This is not about "auto-and-only-auto-updateable" views.

@wolfgangwalther
Copy link
Member

Ok, so what I'm thinking is we should have an additional atribute in our Table structure for isAutoUpdatableView that will use the pg_relation_is_updateable with FALSE on the 3 operations(maybe only one is necessary though).

Maybe we can just switch the existing fields from a boolean to something like: Manually | Automatic | No. "Manually" would be the case when triggers were created - but also with CREATE RULE, that's why I didn't use "trigger" in it.

@steve-chavez
Copy link
Member Author

You could have an auto-updateable view that still defines an instead of trigger - but maybe only for one of the two. Although I don't know what pg_relation_is_updateable returns in that case anyway.

According to @laurenceisla the pg_relation_is_updateable is not accurate in the presence of RULEs so detecting only auto-updatable views is going to take more work.

I'll just disable this feature for views now and leave it for a later enhancement.

@steve-chavez
Copy link
Member Author

The 501 Not Implemented status is good for missing implementations, whenever PATCH/DELETE is tried on views it will respond with that now. If the view is not present in the internal cache, if will also reject the request.

I've also left the ctid fallback since it will work now that we're dealing only with tables.

@laurenceisla
Copy link
Member

According to @laurenceisla the pg_relation_is_updateable is not accurate in the presence of RULEs so detecting only auto-updatable views is going to take more work.

Just adding more details to the explanation: we can't know for sure that a VIEW is auto updatable using pg_relation_is_updatable with FALSE as a parameter, if we create a non auto updatable VIEW and add INSTEAD OF RULEs for all the updatable events to that VIEW, it will return 28 which means it's insertable, deletable and updatable (the same result if a VIEW is auto updatable).

Maybe one way is to ignore all the VIEWs that have all of the INSTEAD OF RULEs present for this feature, because they cannot be differentiated if that's the case.

@steve-chavez steve-chavez merged commit f3f9030 into main Mar 26, 2022
@steve-chavez steve-chavez changed the title Add limited UPDATE/DELETE Add limited UPDATE/DELETE(tables only) Mar 26, 2022
Copy link
Member

@wolfgangwalther wolfgangwalther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the test-cases tests the interaction between order=.. and limit=... - and I think this is still broken.

Comment on lines +338 to +346

-- the "ctid" system column is always available to tables
mutRangeF :: QualifiedIdentifier -> [FieldName] -> (SqlFragment, SqlFragment)
mutRangeF mainQi rangeId = (
BS.intercalate " AND " $
(\col -> pgFmtColumn mainQi col <> " = " <> pgFmtColumn (QualifiedIdentifier mempty "pgrst_affected_rows") col) <$>
(if null rangeId then ["ctid"] else rangeId)
, if null rangeId then pgFmtColumn mainQi "ctid" else BS.intercalate ", " (pgFmtColumn mainQi <$> rangeId)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current implementation, you don't need pk columns at all. You can just do it with the ctid column only. This will make it less complex for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given what I mentioned below, it does make a lot of sense to order by ctid only and not by PK, because doing this:

INSERT INTO projects VALUES (1, 'Windows 7', 1);
INSERT INTO projects VALUES (2, 'Windows 10', 1);
INSERT INTO projects VALUES (5, 'Orphan', NULL);
INSERT INTO projects VALUES (4, 'OSX', 2);
INSERT INTO projects VALUES (3, 'IOS', 2);

Gives through http:

GET /projects

[{"id":1,"name":"Windows 7","client_id":1},
 {"id":2,"name":"Windows 10","client_id":1},
 {"id":5,"name":"Orphan","client_id":null},
 {"id":4,"name":"OSX","client_id":2},
 {"id":3,"name":"IOS","client_id":2}]

Which is the same as doing GET /projects?order=ctid but not the same as doing GET /projects?order=id.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically with ctid we can ensure that DELETE /projects?limit=2 deletes the same rows as the ones obtained through GET /projects?limit=2.

Copy link
Member Author

@steve-chavez steve-chavez Mar 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing bad about preferring ctid is that you have to give explicit SELECT privilege to the column for it to work(or a full wide SELECT) and that is weird - giving the PK columns a SELECT privilege is a more usual operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So because of the above - the need to do an special GRANT SELECT(ctid) for limit to work - I think in fact ctid should be left out and limit should depend on PKs.

Later on, it would also be weird to tell users to expose ctid in ther VIEWs for limit to work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a user chooses to order on non-pk columns, you can't show this error, because you'd need to detect the uniqueness of those columns again - which is exactly what we have trouble with for views

Forgot about that, so I can't suggest ordering by a pk because a view might have more rows than pk values as in the example you showed here. Got it.

Is this what you understood, too?

Ah, yes I understand the mechanism however the "implicit" part got me thinking into a server side default for row_count.

That would basically solve the "postgrest let's you delete all your rows by default" problem - sometimes some users make the mistake to send deletes without filters, this won't stop even with limit I believe.

So I think enforcing a row_count server side that is overridable on the client would solve that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. A server-side default, which is overridable from the client-side, would be completely unrelated to this issue here, though. The implicit row-count here would just be to avoid mistakes in the order clause - while the default you are proposing would be a protection against unlimited delete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if we instead run this query implicitly as the following:
DELETE /v!row_count=lte.2?limit=2&order=static

@wolfgangwalther Now that I've realized, this would mean that limit needs row_count and that for this feature both might as well be the same.

So how about this. I'll go ahead and apply the row_count logic for the limit but will not expose the actual row_count special filter. We can add that one later on when we settle on a syntax.

(I do see row_count being useful in other ways but for this feature we can apply it implicitly as you mentioned)

Copy link
Member Author

@steve-chavez steve-chavez Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And now that I think about it. If we enforce row_count implicitly on limit, then it would be safe to assume the PK cols for the order/join on views as well?

So the above problem would be no more?

I mean because we'd be already fulfilling our part of the contract - the affected rows would not exceed the limit. And if the VIEW query reduces the amount of rows mapped to the PKs this would not be an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how about this. I'll go ahead and apply the row_count logic for the limit but will not expose the actual row_count special filter.

Ok.

I mean because we'd be already fulfilling our part of the contract - the affected rows would not exceed the limit. And if the VIEW query reduces the amount of rows mapped to the PKs this would not be an issue.

I think it would still have very odd interactions with offset. For me, it'd be much simpler to implement and explain to always require and use the columns in order - and hint in the docs and error messages at the requirements for that. We'd need that for views without inferred PK columns anyway - and this way we don't have a complicated mix of what to do in which situation.

Comment on lines +398 to +401
handleDelete identifier context@(RequestContext _ ctxDbStructure ApiRequest{..} _) = do
when (iTopLevelRange /= RangeQuery.allRange && findIfView identifier (dbTables ctxDbStructure)) $
throwError $ Error.NotImplemented "limit/offset is not implemented for views"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this in App.hs really seems the wrong place. This should go into ApiRequest.hs, I think - because this module is concerned with parsing the user's request and matching it up with the schema cache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go into ApiRequest.hs, I think - because this module is concerned with parsing the user's request and matching it up with the schema cache.

Hmm, I never thought of ApiRequest like that, I mean the only reason why the schema cache is used there is because of the function overloading which needs the Proc info and that is something that I thought be better moved to a different module.

If you check handleSingleUpsert it has a similar logic:

handleSingleUpsert :: QualifiedIdentifier -> RequestContext-> DbHandler Wai.Response
handleSingleUpsert identifier context@(RequestContext _ _ ApiRequest{..} _) = do
when (iTopLevelRange /= RangeQuery.allRange) $
throwError Error.PutRangeNotAllowedError

So to be consistent I thought App.hs was the better place for this logic.

Copy link
Member

@wolfgangwalther wolfgangwalther Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. The handleSingleUpsert doesn't even depend on the schema cache, so it's even more out of place. It's an invalid request, so that should really go into ApiRequest.

"pgrst_affected_rows AS (" <>
"SELECT " <> SQL.sql rangeIdF <> " FROM " <> mainTbl <>
whereLogic <> " " <>
"ORDER BY " <> SQL.sql rangeIdF <> " " <> limitOffsetF range <>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to order this by rangeIdF here? Shouldn't we order this by whatever was put in the query string as ?order=...? Otherwise the result of limit will be unpredictable.

Copy link
Member Author

@steve-chavez steve-chavez Mar 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to order this by rangeIdF here?

Yes, as mentioned in the crunchy data blog post:

A word of warning:
Operating (particularly for a destructive operation) on some poorly-defined criteria is purposefully difficult. In general, the ORDER BY clause is strongly encouraged for any operation using LIMIT for a mutable query.

Shouldn't we order this by whatever was put in the query string as ?order=...?

If the client doesn't specify an order then that would be bad because of the above.

I guess we could replace the implicit ORDER BY PK by the explicit ?order though..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we just use the ?order and add the ctid column at the end of that. If there is no ?order, it's just the ctid column.

@wolfgangwalther
Copy link
Member

@steve-chavez steve-chavez requested a review from wolfgangwalther 7 hours ago

...

@steve-chavez steve-chavez merged commit f3f9030 into main 2 hours ago

Really?

I feel pressurized to give quick reviews, because you seem to be in a hurry. I do have plenty of other things on my plate right now. I don't think this was ready to merge, yet.

And I really don't think we should merge a feature as quick as this, and then immediately create a release. We should have time to review and test stuff like this, before immediately bumping major...

I don't like this!

@wolfgangwalther
Copy link
Member

None of the test-cases tests the interaction between order=.. and limit=... - and I think this is still broken.

More specifically, I would expect a request like the following:

DELETE /items?order=non_pk&limit=2

On a table like this:

pk non_pk
1 d
2 c
3 b
4 a

To delete the 3/4 rows. However, I think right now the 1/2 rows are deleted.

This is what I read from the code. I wasn't able to test this myself, because I fear you're going to release this before I will be able to do so.

@steve-chavez
Copy link
Member Author

We should have time to review and test stuff like this, before immediately bumping major...
I don't like this!

Yeah, really sorry about this @wolfgangwalther. I also decided against doing a major release, so no worries. I'll only make a pre-release for this one.

@steve-chavez
Copy link
Member Author

More specifically, I would expect a request like the following:
DELETE /items?order=non_pk&limit=2

So basically the DELETE /items?order=non_pk&limit=2 should delete the same items obtained through GET /items?order=non_pk&limit=2. True, that's not working right now, because the order is only applied to the returned rows and not the affected ones.

@steve-chavez steve-chavez mentioned this pull request Mar 30, 2022
3 tasks
@laurenceisla laurenceisla deleted the limited-up-del branch May 19, 2022 18:45
Comment on lines +119 to +121
"UPDATE " <> mainTbl <> " SET " <> SQL.sql nonRangeCols <> " " <>
"FROM (SELECT * FROM json_populate_recordset (null::" <> mainTbl <> " , " <> SQL.sql selectBody <> " )) _ " <>
whereLogic <> " " <>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this change always adds a WHERE clause to the UPDATE query - even if the request is completely unconditional.

This breaks using pg-safeupdate. The request now goes through instead of failing because of a previously unconditional UPDATE query.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that pg-safeupdate is not required anymore because we're disallowing unfiltered patch/unfiltered delete unless a limit is applied.

Looks like this change always adds a WHERE clause to the UPDATE query - even if the request is completely unconditional.

Yeah, because of the above, if no filter is added then WHERE false is done.

One thing better with the pg-safeupdate safeguard is that it fails with an error while the WHERE false will not do anything.
That's only the case of unfiltered UPDATE though. For unfiltered DELETE, we're going to fail with an error.

(I need to go back to this one because it will be inconsistent otherwise)

Would we need to keep compatibility with pg-safeupdate now that we have this? Why?


Not sure but maybe we could do LIMIT 0 instead of WHERE false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, because of the above, if no filter is added then WHERE false is done.

Oh, hell. I thought this was a review on #2311.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing this in #2405

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this change always adds a WHERE clause to the UPDATE query - even if the request is completely unconditional.

I think the change in this PR still allows the full table update without conditionals. The one that disallowed this was #2311 if I'm not mistaken.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare #2418

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants