-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust Explorer Query and Error Handling #742
Conversation
The Explorer API's underlying queries are a bit cluttered around. Additionally they are made utilizing the `format!` macro, which could lend itself to easy string injection. Rust's lifetimes with it's inability to use `format` with static strings and still be a constant expression makes this somewhat difficult. The `lazy_static` crate has been employed to make these query strings static and available without having to worry about lifetime issues.
The transaction size currently returns the size of the overall block instead of the individual transaction. To address this effectively, the `ExplorerTransaction` `trait` has been updated to require the ability to specify the payload size of the individual transaction.
The first block is always missing time at the moment, as the query can only compare the time to the previous row. Since we'll always be missing the previous row on the first entry, we end up missing the `time` for the first entry as well. The fix is simply to pull one more row than we want, and to remove the first entry when the time comes. As a result the `Vec` has been replaced with a `VecDeque`. Additionally the hard-coded values have been replaced with constants with comments.
@@ -105,6 +105,164 @@ where | |||
} | |||
} | |||
|
|||
lazy_static::lazy_static! { | |||
static ref GET_BLOCK_SUMMARIES_QUERY_FOR_LATEST: String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these static variables defined just for readability? I see these queries being used only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and do we have tests for all these queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are primarily being defined as a static reference for a couple of reasons and advantages.
- To ensure that they are indeed a static, unchanging string.
Query strings should be largely unchanging. Otherwise you open yourself up to the potential for SQL injection attacks. Consider what was being done before:
BlockIdentifier::Hash(hash) => format!(
"SELECT {BLOCK_COLUMNS}
FROM header AS h
JOIN payload AS p ON h.height = p.height
WHERE h.hash = {}
ORDER BY h.height DESC
LIMIT 1",
query.bind(hash.to_string())?,
)
At first glance this seems to be roughly the same thing. But, taking a closer look, the entire query string is in a format
macro with a variable replacement / injection being provided by the result of the query.bind
. I don't believe this is the intended or correct behavior. Additionally, attempts to split this up resulted in a difficult to resolve lifetime issue.
- This allows the query strings to be defined once, and then referenced multiple times.
This cleans up a lot of lifetime issues, as the strings are static, and outlive their async blocks.
-
It prevents the string from having to be re-computed / formatted for every invocation of the query, saving memory.
-
The static unchanging string also allows for
sqlx
's query caching to be guaranteed to take advantage of its internal prepared statements for re-use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the refactor of this commit: 8f64921, the separation of the query string definition, and the query parameters were ultimately modified, leading to the potential for an SQL injection, as we are directly inserting a string coming from the end-user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These queries do get tested indirectly at this symbol:
explorer::test::validate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I am misunderstanding but this
BlockIdentifier::Hash(hash) => format!(
"SELECT {BLOCK_COLUMNS}
FROM header AS h
JOIN payload AS p ON h.height = p.height
WHERE h.hash = {}
ORDER BY h.height DESC
LIMIT 1",
query.bind(hash.to_string())?,
)
would return string as
BlockIdentifier::Hash(hash) => format!(
"SELECT {BLOCK_COLUMNS}
FROM header AS h
JOIN payload AS p ON h.height = p.height
WHERE h.hash = $1
ORDER BY h.height DESC
LIMIT 1",
)
the argument is pushed to the QueryBuilder.args field. When you build the query using .query() on the builder then it returns Query<> by passing those arguments and the String. So, I dont understand how this is leading to sql injection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bind() method of the QueryBuilder returns self.arguments.len()
so basically, it provides a placeholder for the query and stores the arguments in its struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. I made an assumption as to what this was returning because I couldn't think of any other reason why the result of bind
would return a Result<String>
. So this currently returns a placeholder representation to put into query strings.
I do think that actually utilizing the bind
call in this way with the format!
macro is best avoided.
The result of the bind
function could potentially be modified in the future without changing the API to return a different String
value in the future which would not cause a breaking change in terms of the API. I do think that this is unlikely, but the possibility for it to happen is non-zero.
Additionally, since there is a format!
macro, if in the future another parameter is added to these queries for whatever reason, we leave the possibility of a developer just adding it to the query string via a format
parameter instead of through the QueryBuilder
itself. While we could argue that this would surely be caught in review, I think it's probably best to avoid that possibility from the get go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this bind() is confusing and should be changed. I am thinking we store the string with placeholders inside the QueryBuilder and bind() can return the QueryBuilder or it might just take &mut self
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FROM header AS h | ||
JOIN payload AS p ON h.height = p.height | ||
WHERE h.height = $1 | ||
ORDER BY h.height DESC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of order by here? This should return the same block regardless of order by I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were just for consistency of the various queries based on what they were querying. To be clear, these were already here, and were not added with this change, simply relocated. But you are correct in that they are unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember we used LIMIT 1 intentionally to force the query planner to use an index, and I am wondering if that is also the case for ORDER BY. I would suggest keeping the ORDER BY for now and testing the query planner on a local database.
FROM header AS h | ||
JOIN payload AS p ON h.height = p.height | ||
WHERE h.hash = $1 | ||
ORDER BY h.height DESC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.
The block hash is unique so I don't understand the purpose of ORDER BY here. Is this to force the query planner to use index search?
This PR:
Adjusts some error handling to have the intended error encoding representation.
Adjust the way Explorer queries are handled. Due to the reference of a local variable within the query strings, they are unable to be static. The usage of a lazy static library allows for the query strings to be statically set. They were being constructed in a manner that may have allowed for dependency injection if care was not taken. They are now set separately, and their lifetimes are external to the queries themselves, allowing for more consistent query handling.
Adjusts the histogram data fetched, so that we don't miss a histogram data point due to missing the prior data point in the results.