You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
763: Wrap column default expressions in ParenExprto match MySQL's requirements
PostgreSQL's syntax does not require column default expressions to be wrapped in parens, but MySQL's does, so when we translate the column default value expressions to the vitess AST, we need to wrap them in parens so that they execute in GMS without triggering an error.
Fixes #751
762: Add support for ALTER TABLE ADD FOREIGN KEY
Helps with data imports, since it's common to add FKs at the end of an import script.
The regression below seems to come from the FK being added, but we don't yet support removing an FK, so the drop table call now fails.
Related to #724
#730
For some index joins, the analyzer will create a specific type of plan that creates MySQL ranges rather than Doltgres ranges. It appears as though there may be divergent branches for the join logic, so I attempted to look for the source of the divergence, however I came up short.
For now, rather than chasing this down and delaying a PR (since Tim needs this fixed asap), we can pass the lookup to the internal Dolt table. This will return incorrect results in some situations, but it won't panic for now, so I'll follow up with a better search through GMS at a later date to merge the index join paths.
741: Support automatic transaction management with COPY FROM STDIN
736: Expanding tests for the steps in the Doltgres Getting Started Guide
Added more test coverage over the Doltgres Getting Started Guide and pulled them out into their own file.
733: Initial Authentication Protocol
This implements the initial portion of the authentication protocol.
Postgres Reference Documentation:
https://www.postgresql.org/docs/15/sasl-authentication.html
Primarily, this implements SASL SCRAM-SHA-256, which appears to be the primary form of authentication used in modern Postgres. It has been built by following the RFC specification:
https://datatracker.ietf.org/doc/html/rfc5802
There are no tests since the implementation is incomplete. It cannot truly be tested until we have passwords and such that it can verify against (as the results must be sent back to the client for verification, so it can't be faked), however I have tested it up through what has been written, and what exists works as it should.
Surprisingly, there aren't any libraries that we could really leverage for this. Most SASL libraries don't implement SCRAM. The closest was the following:
https://codeberg.org/mellium/sasl
However, I couldn't really find a way to integrate it using raw messages and the eventual Doltgres user backend, so this is all custom-written using the RFC as a guideline (along with capturing packets using the regression capture tool to ensure that Postgres follows the RFC's implementation). For now, the logic is hidden behind a bogus parameter check so that the static analyzer is happy, and the next step is to make a mock in-memory database of users and passwords so I can fully test the entire workflow.
732: Initial support for ALTER TABLE, starting with adding a primary key
Adding initial support for converting ALTER TABLE statements. This first iteration only supports ALTER TABLE t ADD PRIMARY KEY (...);.
Related to #724
727: Enabled use mydb/main without quoting and implemented the IF function
Most of this PR is changes to the doltgres engine testing harness to make it pass more tests.
Correctly validates most DML statements
Matches commit hashes more reliably
Handles more types
Also include parser support for unquoted db identifiers with a / in a USE statement, and implements the IF function (apparently a cockroach extension).
722: Feature: COPY support for HEADER option
Adds support for using the HEADER option in COPY statements.
In this first iteration, we only support specifying HEADER or HEADER true. This form causes the tabular data loader and CSV data loader to skip over the initial, header line in import data. In addition to this form, COPY also supports a HEADER MATCH option, where the server asserts that the columns in the import data exactly match the name and the order of the columns in the destination table. (Note: this PR is based off of #700 to help split up the changes to make them easier to review)
717: Moved enginetests to their own testing CI workflow, don't use -race
This works around the problem described here: #718
707: use pgproto3 for handling server connection messages
705: Workbench tests for log, merge, table details
700: Feature: COPY FROM STDIN support for CSV files
Support for loading data via COPY FROM STDIN using CSV data.
699: Unskip some engine tests
dolt_reset now works correctly
694: Tests for version control operations on new schemas / tables in new schemas
Code changes are all in Dolt: dolthub/dolt#8343
691: Convert ANALYZE statements
Adds support for converting Postgres' ANALYZE statement for a single table and running it through the GMS SQL engine. There are stills lots of unsupported options in Postgres' ANALYZE statement, but this change allows Doltgres to process the simplest form – where a single table is being analyzed.
Since it's common to run ANALYZE at the end of data load scripts (example), this change is intended to make it easier to load dumps into Doltgres.
689: COPY FROM STDIN support
Adds support for COPY ... FROM STDIN. When copying from STDIN, the COPY FROM statement starts a multi-message flow between the server and client – the client will send COPY DATA messages until all the data has been sent, and then send a COPY DONE (or COPY FAIL) message to finalize the transfer and let the server process more queries.
This PR adds a new TabularDataLoader type, with the idea that we can create a DataLoader interface for that when we extend this to add support for loading data from CSV data files, too.
This PR also depends on a GMS change to allow us to create a new sql.Context instance: dolthub/go-mysql-server#2652
#672
Separating the COPY FROM portion from the regression tests.
681: use ParameterObjectIDs in Parse message for binding variable types
Note: the prepared tests don't utilize this field in the Parse message, so it needs to extract the binding value types from analyzed plan of the query.
680: Prevent premature decay for IN tuple
This removes IN prematurely decaying, since it's only necessary for index filters. To complement this, I've implemented SplitConjunction and SplitDisjunction, so that they're aware of Doltgres expression types. The GMS versions will see *pgexprs.GMSCast and do nothing, since we actually care about the child, but GMS is unaware of that.
675: Replace AllDatabasesWithNames so that revision dbs show up in information_schema tables
GMS PR: dolthub/go-mysql-server#2645
672: Added regression tests
This adds a ton of tests, taken from the Postgres Regression Test suite, and made into Go tests using the new tool in testing/go/regression/reader. There are errors in some of the tests, but it at least gives a very rough idea of what works and what doesn't. The plan is to put this in a nightly job that displays results alongside the correctness tests, and it will also be able to tell us which tests fail that previously passed.
In addition, this also implements COPY FROM, since it's used by the regression tests for loading in the test data.
644: Add tests for branch interactions with schema
641: Release v0.11.1
Created by the Release workflow to update DoltgreSQL's version
634: support schema qualified create table with serial type
561: Add support for Poltgres-compatible indexes
This implements a proof-of-concept for true Postgres indexes.
Current Implementation
The current index implementation relies on the methods used in GMS and Dolt, which are inherently MySQL-based. There are a lot of layers to make indexes as efficient as possible (while also allowing for different integrators to use their own implementations), but I'm going to focus on the inner "core" logic. There are two parts to the core: storage and iteration.
Storage
At a high level, we've hardcoded how values should be stored in an index by their observed behavior. NULL is treated as being smaller than the smallest possible value, and is always stored first. Integers are ordered from negative to positive. Strings are in the order defined by their collation, which is usually in alphabetical order with some casing differences. In Dolt the ordering is concrete, and in GMS this order is assumed for all indexable types.
Iteration
In GMS, we take an expression (filter or join) and create a range (or multiple ranges) that expresses the values that the index should return. That range contains the lower and upper bounds, and those are based on the value given. For example, the filter column >= 6 (where column is an integer) uses the value of 6 to construct a range of [6, ∞). This range is then passed to Dolt, which uses the inclusive 6 as its starting point, and knows to iterate over the remaining index data until the end. If given some value that uses a different type than the indexed column's type, then that value is automatically cast to the index's type.
Postgres vs. MySQL
With the storage and iteration in place for how MySQL (GMS and Dolt) work, let's now look at some key differences with indexes in Postgres.
Storage Control: The layout of values are defined by a function. For the default types, Postgres ships with the function pre-defined. It is assumed that this function lays out values such that distinct (not equal using the = operator) values are not equivalent.
Iteration via Operators: Specifically for b-trees, iterating over the storage is handled by the comparison operators. As long as the 5 main comparison operators exist (=, >, >=, <, <=), then any value may be used to iterate over storage. It is assumed that these operators map to some logical form of continuity, but that is not strictly required (the Postgres analyzer can actually catch some forms of discontinuity and apply additional filters, pretty cool actually). For example, it is possible that < and > could return true for the same input, but again it is assumed that this is not the case.
Null Ordering: Nulls can be viewed as either the smallest possible value or the largest possible value, changing where they're positioned in the index.
Null Distinction: For UNIQUE indexes, this controls whether we permit multiple NULL values. If NULLs are distinct, then multiple rows may use NULL. In MySQL, NULLs are always considered distinct.
Indexed Joins
This is originally what kickstarted this small project. Now that I've covered how the current implementation works, and a few ways how Postgres differs, it should be much easier to show how proper indexed joins would not work in the current implementation. At their simplest, an index join has a form like SELECT * FROM t1 JOIN t2 ON t1.col = t2.col;. It may not be obvious, but this touches at least 2 of the 4 differences that I mentioned in the previous section.
Iteration via Operators: Postgres doesn't need to be able to convert between two types, it just needs the comparison operators. For some types, this would actually lead to different results. For example, SELECT 0.99999994::float4 = 0.99999997::float8; returns false, as there is a defined = operator for the two types. SELECT 0.99999994::float4 = 0.99999997::float8::float4; returns true, as casting from float8 to float4 loses some precision. In this exact case, as long as we keep the filter expression then it's okay, but can lead to data corruption otherwise. There are many more examples than this, so don't take this as the only case, but it's an easier one to understand (compared to a more "realistic" example using reg... types). If our index framework is built on casting (like the current GMS implementation), then we will always have cases where we are tracking down bugs due to invalid behavior.
Null Ordering: It's obvious how index differences aren't handled at all. If NULL values are sorted differently between indexes, then that must be taken into account by some analyzer step. The current implementation does not do this, as it does not need to.
Null Distinction: I'm actually not sure how this is supposed to be handled in Postgres in this case (haven't done too much research on it), so this could be something else that the analyzer needs to take into account, but it hasn't been verified.
The Simplest Solution
Right now on main, we are implementing indexes by casting everything to the column type for filters. This "works" in that we are able to get some tests and performance metrics working, but that's only for items that have a cast. As mentioned earlier, this casting logic is not correct, but our limited testing at least works with it. Once we leave the small bubble, we start to see all of the changes that would have to be made in order to get Postgres indexes "working", such as special casing types and expressions to work under the assumptions made in GMS and Dolt, and still it would be incorrect. Some of the special casing would even be hard to figure out in the first place, like how the reg... types that were mentioned earlier should interact with other types.
I propose, with this PR (and attached Dolt PR) that the simplest solution is to just do what Postgres is doing. Postgres defines functions that control the layout of values, so we can implement those functions and simply pass them down to Dolt's storage layer, which uses those for ordering rather than the hardcoded versions. This PR doesn't yet implement this part, but it is what we are already doing with the introduction of sql.ExtendedType, which uses the comparisons defined on the type to control the layout. We just have to change which function is being used, which is relatively simple. This PR, instead, focuses on the filter and retrieval part (since it's the more involved portion).
Postgres simply passes the relevant operator functions down to its storage layer, and runs a tree search (using those operators on its internal b-tree) to find where the storage iterator should start. It then iterates until those operators are no longer fulfilled. This completely sidesteps the casting part, and focuses strictly on the comparison operators, which is exactly what we want. And that's all this PR (and the Dolt one) does in essence. It's a bit more complicated as I'm still trying to take advantage of as much infrastructure as possible, but at it's core it's passing the operators down to Dolt's storage layer to find a start (and stop) point. By passing down functions, this not only gives us full support for everything, but it even allows us to handle things like custom types without any additional code.
There are still additional things that need to be done, such as covering indexes vs. non covering indexes, composite indexes, etc. but those are mostly Doltgres-side changes to match how Postgres behaves. Also, code layout is not final (everything is in one file), comments are missing, names are bad, stuffing things into special structs rather than creating new fields, no GMS changes yet, etc. Look not at the code, but at the intention of the code, as none of this is final or production-quality.