-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DDL recovery for high-level indexes (CREATE, DROP and RENAME) #3520
base: bb-11.6-MDEV-32887-vector
Are you sure you want to change the base?
DDL recovery for high-level indexes (CREATE, DROP and RENAME) #3520
Conversation
as it can never be null (only "" or "disabled")
Bounded_queue<> pretended to be a typesafe C++ wrapper on top of pure C queues.h. But it wasn't, it was tightly bounded to filesort and only useful there. * implement Queue<> - a typesafe C++ wrapper on top of QUEUE * move Bounded_queue to filesort.cc, remove pointless "generalizations" change it to use Queue. * remove bounded_queue.h * change subselect_rowid_merge_engine to use Queue, not QUEUE
the information about index algorithm was stored in two places inconsistently split between both. BTREE index could have key->algorithm == HA_KEY_ALG_BTREE, if the user explicitly specified USING BTREE or HA_KEY_ALG_UNDEF, if not. RTREE index had key->algorithm == HA_KEY_ALG_RTREE and always had key->flags & HA_SPATIAL FULLTEXT index had key->algorithm == HA_KEY_ALG_FULLTEXT and always had key->flags & HA_FULLTEXT HASH index had key->algorithm == HA_KEY_ALG_HASH or HA_KEY_ALG_UNDEF long unique index always had key->algorithm == HA_KEY_ALG_LONG_HASH In this commit: All indexes except BTREE and HASH always have key->algorithm set, HA_SPATIAL and HA_FULLTEXT flags are not used anymore (except for storage to keep frms backward compatible). As a side effect ALTER TABLE now detects FULLTEXT index renames correctly
needed to get partitioning and information about secondary objects
…EM VERSIONING" This partially reverts 43623f0 Engines have to set ::position() after ::write_row(), otherwise the server won't be able to refer to the row just inserted. This is important for high-level indexes. heap part isn't reverted, so heap doesn't support high-level indexes. to fix this, it'll need info->lastpos in addition to info->current_ptr
create templates thd->alloc<X>(n) to use instead of (X*)thd->alloc(sizeof(X)*n) and the same for thd->calloc(). By the default the type is char, so old usage of thd->alloc(size) works too.
let the caller tell init_tmp_table_share() whether the table should be thread_specific or not. In particular, internal tmp tables created in the slave thread are perfectly thread specific
MDEV-33407 Parser support for vector indexes The syntax is create table t1 (... vector index (v) ...); limitation: * v is a binary string and NOT NULL * only one vector index per table * temporary tables are not supported MDEV-33404 Engine-independent indexes: subtable method added support for so-called "high level indexes", they are not visible to the storage engine, implemented on the sql level. For every such an index in a table, say, t1, the server implicitly creates a second table named, like, t1#i#05 (where "05" is the index number in t1). This table has a fixed structure, no frm, not accessible directly, doesn't go into the table cache, needs no MDLs. MDEV-33406 basic optimizer support for k-NN searches for a query like SELECT ... ORDER BY func() optimizer will use item_func->part_of_sortkey() to decide what keys can be used to resolve ORDER BY.
also add const to methods in List<> and Hash_set<> while we're at it
This commit includes the work done in collaboration with Hugo Wen from Amazon: MDEV-33408 Alter HNSW graph storage and fix memory leak This commit changes the way HNSW graph information is stored in the second table. Instead of storing connections as separate records, it now stores neighbors for each node, leading to significant performance improvements and storage savings. Comparing with the previous approach, the insert speed is 5 times faster, search speed improves by 23%, and storage usage is reduced by 73%, based on ann-benchmark tests with random-xs-20-euclidean and random-s-100-euclidean datasets. Additionally, in previous code, vector objects were not released after use, resulting in excessive memory consumption (over 20GB for building the index with 90,000 records), preventing tests with large datasets. Now ensure that vectors are released appropriately during the insert and search functions. Note there are still some vectors that need to be cleaned up after search query completion. Needs to be addressed in a future commit. All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc. As well as the commit: Introduce session variables to manage HNSW index parameters Three variables: hnsw_max_connection_per_layer hnsw_ef_constructor hnsw_ef_search ann-benchmark tool is also updated to support these variables in commit HugoWenTD/ann-benchmarks@e09784e for branch https://github.com/HugoWenTD/ann-benchmarks/tree/mariadb-configurable All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc. Co-authored-by: Hugo Wen <wenhug@amazon.com>
* sysvars should be REQUIRED_ARG * fix a mix of US and UK spelling (use US) * use consistent naming * work if VEC_DISTANCE arguments are in the swapped order (const, col) * work if VEC_DISTANCE argument is NULL/invalid or wrong length * abort INSERT if the value is invalid or wrong length * store the "number of neighbors" in a blob in endianness-independent way * use field->store(longlong, bool) not field->store(double) * a lot more error checking everywhere * cleanup after errors * simplify calling conventions, remove reinterpret_cast's * todo/XXX comments * whitespaces * use float consistently memory management is still totally PoC quality Initial HNSW implementation
instead of pointers to FVectorRef's (which are stored elsewhere) let's return one big array of all refs. Freeing this array will free the complete result set.
move everything into a query-local memroot which is freed at the end
This patch fixes only TRUNCATE by recreate variant, there seem to be no reasonable engine that uses TRUNCATE by handler method for testing. Reset index_cinfo so that mi_create is not confused by garbage passed via index_file_name and sets MY_DELETE_OLD flag. Review question: can we add a test case to make sure VECTOR index is empty indeed?
Rename high-level indexes along with a table.
1. randomize all vectors via multiplication by a random orthogonal matrix * to generate the matrix fill the square matrix with normally distributed random values and create an orthogonal matrix with the QR decomposition * the rnd generator is seeded with the number of dimensions, so the matrix will be always the same for a given table * multiplication by an orthogonal matrix is a "rotation", so does not change distances or angles 2. when calculating the distance, first calculate a "subdistance", the distance between projections to the first subdist_part coordinates (=192, best by test, if it's larger it's less efficient, if it's smaller the error rate is too high) 3. calculate the full distance only if "subdistance" isn't confidently higher (above subdist_margin) than the distance we're comparing with * it might look like it would make sense to do a second projection at, say, subdist_part*2, and so on - but in practice one check is enough, the projected distance converges quickly and if it isn't confidently higher at subdist_part, it won't be later either This optimization introduces a constant overhead per insert/search operation - an input/query vector has to be multiplied by the matrix. And the optimization saves on every distance calculation. Thus it is only beneficial when a number of distance calculations (which grows with M and with the table size) is high enough to outweigh the constant overhead. Let's use MIN_ROWS table option to estimate the number of rows in the table. use_subdist_heuristic() is optimal for mnist and fashion-mnist (784 dimensions, 60k rows) and variations of gist (960 dimensions, 200k, 400k, 600k, 800k, 1000k rows)
remove unused methods, reorder methods, add comments
into a separate transaction_participant structure handlerton inherits it, so handlerton itself doesn't change. but entities that only need to participate in a transaction, like binlog or online alter log, use a transaction_participant and no longer need to pretend to be a full-blown but invisible storage engine which doesn't support create table.
and create a parent Item_func_vec_distance_common class
it's measurably faster even in items
Fixes for ALTER TABLE ... ADD/DROP COLUMN, ALGORITHM=COPY. Let quick_rm_table() remove high-level indexes along with original table. Avoid locking uninitialized LOCK_share for INTERNAL_TMP_TABLEs. Don't enable bulk insert when altering a table containing vector index. InnoDB can't handle situation when bulk insert is enabled for one table but disabled for another. We can't do bulk insert on vector index as it does table updates currently.
Disable non-copy ALTER algorithms when VECTOR index is affected. Engines are not supposed to handle high-level indexes anyway. Also fixed misbehaving IF [NOT] EXISTS variants.
quick_rm_table() expects .frm to exist when it removes high-level indexes. For cases like ALTER TABLE t1 RENAME TO t2, ENGINE=other_engine .frm was removed earlier. Another option would be removing high-level indexes explicitly before the first quick_rm_table() and skipping high-level indexes for subsequent quick_rm_table(NO_FRM_RENAME). But this suggested order may also help with ddl log recovery. That is if we crash before high-level indexes are removed, .frm is going to exist.
Replaced obscure FRM_ONLY, NO_FRM_RENAME, NO_HA_TABLE, NO_PAR_TABLE with straightforward explicit flags: QRMT_FRM - [re]moves .frm QRMT_PAR - [re]moves .par QRMT_HANDLER - calls ha_delete_table()/ha_rename_table() and [re]moves high-level indexes QRMT_DEFAULT - same as QRMT_FRM | QRMT_HANDLER, which is regular table drop/rename.
also: renames, s/const/constexpr/ for consistency
Covers CREATE, DROP and RENAME.
... until a few bugs that cause server crash are fixed.
Sergey Vojtovich seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
file->ha_rename_table(idx_from, idx_to); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is a good approach. It feels like you put too much inside one DDL_RENAME_PHASE_TABLE
entry. There can be crashes between renames of individual vector indexes (presuming there will be many some day).
Perhaps, DDL_RENAME_PHASE_TABLE
should be one ha_rename_table
only? That is, a table with hlindexes would generate one DDL_RENAME_PHASE_TABLE
entry per index (and one for the main table, of course)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for DROP, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be crashes between renames of individual vector indexes (presuming there will be many some day).
Yes, sure. And my code handles it inline with what original code have been doing, that is ignores non_existing_table_errors()
, e.g. if handler files were moved and .frm not. Or as you suggested above crash between different high-level indexes rename.
... a table with hlindexes would generate one DDL_RENAME_PHASE_TABLE entry per index (and one for the main table, of course)?
PHASE
is just anenum
member of anACTION
, it can't have stuff like table name or whatever attached.ACTION
is more logical (or high-level) entity, which may do many things like binlogging, handling stat tables, triggers, handler files and of course .frm.
Here're options:
- We could add special phases to handle high-level indexes, but there's going to be no much difference between my solution. And it will introduce compatibility issues. Though it is very true that older versions won't be able to handle high-level indexes properly even with this patch.
- We can't re-use existing actions like
DDL_LOG_RENAME_TABLE_ACTION
, as I mentioned before they're way to sophisticated for high-level indexes. - In theory we could try re-using special actions like
DDL_LOG_RENAME_ACTION
for high level indexes. That is issue oneDDL_LOG_RENAME_TABLE_ACTION
and thenDDL_LOG_RENAME_ACTION
for each high-level index. It may work, it has no compatibility concerns, but it is a question if we're going to be able to integrate it with e.g.ALTER
.
Your thoughts?
file->ha_rename_table(idx_from, idx_to); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for DROP, etc.
658f587
to
70ff3b6
Compare
fd141e4
to
5d724ba
Compare
Description
Various fixes to make DDL recovery work with high-level indexes.
Release Notes
None.
How can this PR be tested?
mtr
Basing the PR against the correct MariaDB version
main
branch.PR quality check