Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust Explorer Query and Error Handling #742

Merged
merged 4 commits into from
Jan 2, 2025

Conversation

Ayiga
Copy link
Member

@Ayiga Ayiga commented Nov 27, 2024

This PR:

Adjusts some error handling to have the intended error encoding representation.

Adjust the way Explorer queries are handled. Due to the reference of a local variable within the query strings, they are unable to be static. The usage of a lazy static library allows for the query strings to be statically set. They were being constructed in a manner that may have allowed for dependency injection if care was not taken. They are now set separately, and their lifetimes are external to the queries themselves, allowing for more consistent query handling.

Adjusts the histogram data fetched, so that we don't miss a histogram data point due to missing the prior data point in the results.

The Explorer API's underlying queries are a bit cluttered around.  Additionally
they are made utilizing the `format!` macro, which could lend itself to easy
string injection. Rust's lifetimes with it's inability to use `format` with
static strings and still be a constant expression makes this somewhat difficult.
The `lazy_static` crate has been employed to make these query strings static and
available without having to worry about lifetime issues.
The transaction size currently returns the size of the overall block
instead of the individual transaction.  To address this effectively,
the `ExplorerTransaction` `trait` has been updated to require the
ability to specify the payload size of the individual transaction.
The first block is always missing time at the moment, as the query can only
compare the time to the previous row.  Since we'll always be missing the
previous row on the first entry, we end up missing the `time` for the first
entry as well.  The fix is simply to pull one more row than we want, and to
remove the first entry when the time comes.  As a result the `Vec` has
been replaced with a `VecDeque`.

Additionally the hard-coded values have been replaced with constants with
comments.
@Ayiga Ayiga changed the title Various Fixes for Explorer Endpoints Adjust Explorer Query and Error Handling Dec 17, 2024
@Ayiga Ayiga marked this pull request as ready for review December 17, 2024 19:00
@@ -105,6 +105,164 @@ where
}
}

lazy_static::lazy_static! {
static ref GET_BLOCK_SUMMARIES_QUERY_FOR_LATEST: String = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these static variables defined just for readability? I see these queries being used only once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and do we have tests for all these queries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are primarily being defined as a static reference for a couple of reasons and advantages.

  • To ensure that they are indeed a static, unchanging string.

Query strings should be largely unchanging. Otherwise you open yourself up to the potential for SQL injection attacks. Consider what was being done before:

  BlockIdentifier::Hash(hash) => format!(
    "SELECT {BLOCK_COLUMNS}
        FROM header AS h
        JOIN payload AS p ON h.height = p.height
        WHERE h.hash = {}
        ORDER BY h.height DESC 
        LIMIT 1",
    query.bind(hash.to_string())?,
  )

At first glance this seems to be roughly the same thing. But, taking a closer look, the entire query string is in a format macro with a variable replacement / injection being provided by the result of the query.bind. I don't believe this is the intended or correct behavior. Additionally, attempts to split this up resulted in a difficult to resolve lifetime issue.

  • This allows the query strings to be defined once, and then referenced multiple times.

This cleans up a lot of lifetime issues, as the strings are static, and outlive their async blocks.

  • It prevents the string from having to be re-computed / formatted for every invocation of the query, saving memory.

  • The static unchanging string also allows for sqlx's query caching to be guaranteed to take advantage of its internal prepared statements for re-use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the refactor of this commit: 8f64921, the separation of the query string definition, and the query parameters were ultimately modified, leading to the potential for an SQL injection, as we are directly inserting a string coming from the end-user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These queries do get tested indirectly at this symbol:
explorer::test::validate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I am misunderstanding but this

 BlockIdentifier::Hash(hash) => format!(
    "SELECT {BLOCK_COLUMNS}
        FROM header AS h
        JOIN payload AS p ON h.height = p.height
        WHERE h.hash = {}
        ORDER BY h.height DESC 
        LIMIT 1",
    query.bind(hash.to_string())?,
  )

would return string as

 BlockIdentifier::Hash(hash) => format!(
    "SELECT {BLOCK_COLUMNS}
        FROM header AS h
        JOIN payload AS p ON h.height = p.height
        WHERE h.hash = $1
        ORDER BY h.height DESC 
        LIMIT 1",
  )

the argument is pushed to the QueryBuilder.args field. When you build the query using .query() on the builder then it returns Query<> by passing those arguments and the String. So, I dont understand how this is leading to sql injection?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bind() method of the QueryBuilder returns self.arguments.len() so basically, it provides a placeholder for the query and stores the arguments in its struct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. I made an assumption as to what this was returning because I couldn't think of any other reason why the result of bind would return a Result<String>. So this currently returns a placeholder representation to put into query strings.

I do think that actually utilizing the bind call in this way with the format! macro is best avoided.

The result of the bind function could potentially be modified in the future without changing the API to return a different String value in the future which would not cause a breaking change in terms of the API. I do think that this is unlikely, but the possibility for it to happen is non-zero.

Additionally, since there is a format! macro, if in the future another parameter is added to these queries for whatever reason, we leave the possibility of a developer just adding it to the query string via a format parameter instead of through the QueryBuilder itself. While we could argue that this would surely be caught in review, I think it's probably best to avoid that possibility from the get go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this bind() is confusing and should be changed. I am thinking we store the string with placeholders inside the QueryBuilder and bind() can return the QueryBuilder or it might just take &mut self

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FROM header AS h
JOIN payload AS p ON h.height = p.height
WHERE h.height = $1
ORDER BY h.height DESC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of order by here? This should return the same block regardless of order by I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were just for consistency of the various queries based on what they were querying. To be clear, these were already here, and were not added with this change, simply relocated. But you are correct in that they are unnecessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember we used LIMIT 1 intentionally to force the query planner to use an index, and I am wondering if that is also the case for ORDER BY. I would suggest keeping the ORDER BY for now and testing the query planner on a local database.

FROM header AS h
JOIN payload AS p ON h.height = p.height
WHERE h.hash = $1
ORDER BY h.height DESC
Copy link
Contributor

@imabdulbasit imabdulbasit Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.
The block hash is unique so I don't understand the purpose of ORDER BY here. Is this to force the query planner to use index search?

@Ayiga Ayiga merged commit d369416 into main Jan 2, 2025
11 checks passed
@Ayiga Ayiga deleted the ts/enh/revisit-explorer-endpoints branch January 2, 2025 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants