Adjust Explorer Query and Error Handling #742

Ayiga · 2024-11-27T14:57:40Z

This PR:

Adjusts some error handling to have the intended error encoding representation.

Adjust the way Explorer queries are handled. Due to the reference of a local variable within the query strings, they are unable to be static. The usage of a lazy static library allows for the query strings to be statically set. They were being constructed in a manner that may have allowed for dependency injection if care was not taken. They are now set separately, and their lifetimes are external to the queries themselves, allowing for more consistent query handling.

Adjusts the histogram data fetched, so that we don't miss a histogram data point due to missing the prior data point in the results.

The Explorer API's underlying queries are a bit cluttered around. Additionally they are made utilizing the `format!` macro, which could lend itself to easy string injection. Rust's lifetimes with it's inability to use `format` with static strings and still be a constant expression makes this somewhat difficult. The `lazy_static` crate has been employed to make these query strings static and available without having to worry about lifetime issues.

The transaction size currently returns the size of the overall block instead of the individual transaction. To address this effectively, the `ExplorerTransaction` `trait` has been updated to require the ability to specify the payload size of the individual transaction.

The first block is always missing time at the moment, as the query can only compare the time to the previous row. Since we'll always be missing the previous row on the first entry, we end up missing the `time` for the first entry as well. The fix is simply to pull one more row than we want, and to remove the first entry when the time comes. As a result the `Vec` has been replaced with a `VecDeque`. Additionally the hard-coded values have been replaced with constants with comments.

imabdulbasit · 2024-12-17T20:54:30Z

src/data_source/storage/sql/queries/explorer.rs

@@ -105,6 +105,164 @@ where
    }
 }

+lazy_static::lazy_static! {
+    static ref GET_BLOCK_SUMMARIES_QUERY_FOR_LATEST: String = {


Are these static variables defined just for readability? I see these queries being used only once.

and do we have tests for all these queries?

They are primarily being defined as a static reference for a couple of reasons and advantages.

To ensure that they are indeed a static, unchanging string.

Query strings should be largely unchanging. Otherwise you open yourself up to the potential for SQL injection attacks. Consider what was being done before:

BlockIdentifier::Hash(hash) => format!( "SELECT {BLOCK_COLUMNS} FROM header AS h JOIN payload AS p ON h.height = p.height WHERE h.hash = {} ORDER BY h.height DESC LIMIT 1", query.bind(hash.to_string())?, )

At first glance this seems to be roughly the same thing. But, taking a closer look, the entire query string is in a format macro with a variable replacement / injection being provided by the result of the query.bind. I don't believe this is the intended or correct behavior. Additionally, attempts to split this up resulted in a difficult to resolve lifetime issue.

This allows the query strings to be defined once, and then referenced multiple times.

This cleans up a lot of lifetime issues, as the strings are static, and outlive their async blocks.

It prevents the string from having to be re-computed / formatted for every invocation of the query, saving memory.

The static unchanging string also allows for sqlx's query caching to be guaranteed to take advantage of its internal prepared statements for re-use.

During the refactor of this commit: 8f64921, the separation of the query string definition, and the query parameters were ultimately modified, leading to the potential for an SQL injection, as we are directly inserting a string coming from the end-user.

These queries do get tested indirectly at this symbol:
explorer::test::validate.

Sorry if I am misunderstanding but this

BlockIdentifier::Hash(hash) => format!( "SELECT {BLOCK_COLUMNS} FROM header AS h JOIN payload AS p ON h.height = p.height WHERE h.hash = {} ORDER BY h.height DESC LIMIT 1", query.bind(hash.to_string())?, )

would return string as

BlockIdentifier::Hash(hash) => format!( "SELECT {BLOCK_COLUMNS} FROM header AS h JOIN payload AS p ON h.height = p.height WHERE h.hash = $1 ORDER BY h.height DESC LIMIT 1", )

the argument is pushed to the QueryBuilder.args field. When you build the query using .query() on the builder then it returns Query<> by passing those arguments and the String. So, I dont understand how this is leading to sql injection?

The bind() method of the QueryBuilder returns self.arguments.len() so basically, it provides a placeholder for the query and stores the arguments in its struct.

Oh I see. I made an assumption as to what this was returning because I couldn't think of any other reason why the result of bind would return a Result<String>. So this currently returns a placeholder representation to put into query strings.

I do think that actually utilizing the bind call in this way with the format! macro is best avoided.

The result of the bind function could potentially be modified in the future without changing the API to return a different String value in the future which would not cause a breaking change in terms of the API. I do think that this is unlikely, but the possibility for it to happen is non-zero.

Additionally, since there is a format! macro, if in the future another parameter is added to these queries for whatever reason, we leave the possibility of a developer just adding it to the query string via a format parameter instead of through the QueryBuilder itself. While we could argue that this would surely be caught in review, I think it's probably best to avoid that possibility from the get go.

I agree that this bind() is confusing and should be changed. I am thinking we store the string with placeholders inside the QueryBuilder and bind() can return the QueryBuilder or it might just take &mut self

imabdulbasit · 2024-12-17T21:04:47Z

src/data_source/storage/sql/queries/explorer.rs

+                FROM header AS h
+                JOIN payload AS p ON h.height = p.height
+                WHERE h.height = $1
+                ORDER BY h.height DESC


what is the purpose of order by here? This should return the same block regardless of order by I think

These were just for consistency of the various queries based on what they were querying. To be clear, these were already here, and were not added with this change, simply relocated. But you are correct in that they are unnecessary.

I remember we used LIMIT 1 intentionally to force the query planner to use an index, and I am wondering if that is also the case for ORDER BY. I would suggest keeping the ORDER BY for now and testing the query planner on a local database.

imabdulbasit · 2024-12-17T21:07:48Z

src/data_source/storage/sql/queries/explorer.rs

+                FROM header AS h
+                JOIN payload AS p ON h.height = p.height
+                WHERE h.hash = $1
+                ORDER BY h.height DESC


same here.
The block hash is unique so I don't understand the purpose of ORDER BY here. Is this to force the query planner to use index search?

Ayiga added 4 commits November 26, 2024 10:57

Fix misspelling of histogram

c58c61a

Ayiga changed the title ~~Various Fixes for Explorer Endpoints~~ Adjust Explorer Query and Error Handling Dec 17, 2024

Ayiga marked this pull request as ready for review December 17, 2024 19:00

Ayiga requested review from sveitser, jbearer and imabdulbasit as code owners December 17, 2024 19:00

imabdulbasit reviewed Dec 17, 2024

View reviewed changes

imabdulbasit mentioned this pull request Dec 19, 2024

make QueryBuilder::bind() function less confusing #763

Open

imabdulbasit approved these changes Dec 20, 2024

View reviewed changes

Ayiga merged commit d369416 into main Jan 2, 2025
11 checks passed

Ayiga deleted the ts/enh/revisit-explorer-endpoints branch January 2, 2025 17:55

Ayiga mentioned this pull request Jan 3, 2025

Correct the transaction size that is reported on the Explorer EspressoSystems/espresso-sequencer#2432

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust Explorer Query and Error Handling #742

Adjust Explorer Query and Error Handling #742

Ayiga commented Nov 27, 2024 •

edited

Loading

imabdulbasit Dec 17, 2024

imabdulbasit Dec 17, 2024

Ayiga Dec 19, 2024

Ayiga Dec 19, 2024

Ayiga Dec 19, 2024

imabdulbasit Dec 19, 2024

imabdulbasit Dec 19, 2024

Ayiga Dec 19, 2024

imabdulbasit Dec 19, 2024

imabdulbasit Dec 19, 2024

imabdulbasit Dec 17, 2024

Ayiga Dec 19, 2024

imabdulbasit Dec 19, 2024

imabdulbasit Dec 17, 2024 •

edited

Loading

Adjust Explorer Query and Error Handling #742

Adjust Explorer Query and Error Handling #742

Conversation

Ayiga commented Nov 27, 2024 • edited Loading

This PR:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imabdulbasit Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

Ayiga commented Nov 27, 2024 •

edited

Loading

imabdulbasit Dec 17, 2024 •

edited

Loading