Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] support lake table cache select in physical way #55328

Open
wants to merge 1 commit into
base: branch-3.3
Choose a base branch
from

Conversation

starrocks-xupeng
Copy link
Contributor

@starrocks-xupeng starrocks-xupeng commented Jan 22, 2025

Why I'm doing:

What I'm doing:

part of the cache select process is CPU heavy, which is unnecessary and can be removed.

100G SSB, everything is cached already, run cache select * from lineorder;.

- Previous Implementataion New Implemention
Total 2s373ms 418ms
IOTime (IO heavy) 272.922ms 281.019ms
Decompress Page (CPU heavy) + Checksum Check (CPU heavy) + Form Chunk (CPU heavy) 1.336s 0s

100G TPCH, everything is cached already, run cache select * from lineitem;.

- Previous Implementataion New Implemention
Total 9s254ms 1s190ms
IOTime (IO heavy) 348.076ms 274.867ms
Decompress Page (CPU heavy) + Checksum Check (CPU heavy) + Form Chunk (CPU heavy) 5.402s 0s

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.4
    • 3.3
    • 3.2
    • 3.1
    • 3.0

Signed-off-by: starrocks-xupeng <xupeng@starrocks.com>
@ctbrennan
Copy link
Contributor

ctbrennan commented Jan 22, 2025

Hi Xupeng, thank you very much for this change.

I have a question to clarify my own understanding about the time spent performing file i/o. Since we're talking about data that's already been cached, I assume that the file i/o is streaming the data from local disk into memory (and not downloading the file from s3 to local disk). Please correct me if my understanding of what this I/O is is wrong.

Assuming that my understanding above is correct, is there a way that in the future, we could add new code to StarOs and the backend client so that we could basically ensure that the file exists on local disk without streaming the file into memory? This might be a relevant performance improvement in the case that our tables are 1 or 10 TB large.

This PR is already a great improvement, just asking about future plans. Thanks again!

@starrocks-xupeng
Copy link
Contributor Author

in this pr, file will only be read from local disk into memory if it is cached already.

yes you are right, that's the best performance solution for your case, but that need some more work to be done.

  1. in your case, you only run cache select * from xxx, so whole file is needed, but if you run cache select A,B from xxx in a table with A,B,C,D 4 columns, part of the file is not needed;
  2. we use block cache instead of file cache, so say 100MB file will be divided into 100 blocks, each 1MB, so you need to write code like this, the code will be a bit tricky, also you need to modify both starlet code and starrocks BE/CN code
for (int i = 0; i < 100; ++i) {
    if (file->exist_block(i)) { // starlet needs to provide this api
        // do nothing
    } else {
        file->load_block(i); // can be done with current file-> read APi
    }
}

actually in your case, there is another possible solution which is multi-replica.
warehouse A do all the loading/compaction, and in the background, sends all write to warehouse B. we are doing multi-replica development in one warehouse and some time in the future, we will implement cross warehouse multi-replica. but all these features are in the enterprise version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants