Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] fix: Use CTE in Enrollment analytics queries [DHIS-16705] #19519

Draft
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

luciano-fiandesio
Copy link
Contributor

@luciano-fiandesio luciano-fiandesio commented Dec 18, 2024

WIP

Changes

This PR addresses the generation of SQL queries for Enrollment analytics

Problem

Currently, the Enrollment queries are structured so that sub-queries are used to fetch events values from the analytics_event_* tables.
For instance:

select
    enrollment,
    ...
    ax."ou",
    (select
         "fyjPqlHE7Dn"
     from
	 analytics_event_M3xtLkYBlKI
     where
	 analytics_event_M3xtLkYBlKI.eventstatus != 'SCHEDULE'
	 and analytics_event_M3xtLkYBlKI.enrollment = ax.enrollment
	 and ps = 'CWaAcQYKVpq'
limit 1 ) as "CWaAcQYKVpq[-1].fyjPqlHE7Dn",
from
     analytics_enrollment_m3xtlkyblki as ax
...

The above query works in Postgres but does not work in Doris, because correlation with outer layers of the parent query is not supported.

Mitigation

The current approach is trying to refactor the Enrollment query so that the subqueries (both as select and as where conditions) are "moved" into CTE (Common Table Expressions).
Common Table Expressions are temporary result sets in SQL, defined within a WITH clause, that simplify complex queries by improving readability and modularizing logic. They are particularly useful for recursive queries and can be referenced multiple times within the main query.

The above query can be rewritten like so:

with ps_cwaacqykvpq_fyjpqlhe7dn as (
select
	distinct on
	(enrollment) enrollment,
	"fyjPqlHE7Dn" as value
from
	analytics_event_M3xtLkYBlKI
where
	eventstatus != 'SCHEDULE'
	and ps = 'CWaAcQYKVpq'
	
order by
	enrollment,
	occurreddate desc,
	created desc)
select
	ax.enrollment,
	...
	ps_cwaacqykvpq_fyjpqlhe7dn.value as "CWaAcQYKVpq.fyjPqlHE7Dn"
from
	analytics_enrollment_m3xtlkyblki as ax
left join ps_cwaacqykvpq_fyjpqlhe7dn on
	ax.enrollment = ps_cwaacqykvpq_fyjpqlhe7dn.enrollment
where
	(((enrollmentdate >= '2021-01-01'
	and enrollmentdate < '2022-01-01')))
	and (ax."uidlevel1" = 'ImspTQPwCqd' )
...

The above query structure is compatible with Doris and it makes the execution of the query faster.
As a comparison:

Original query with sub-select

image

Refactored query with CTEs

image

Testing strategy

I am using the e2e project and aim at having 100% green tests on both Postgres and Doris.

@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch from 8c33952 to 3e070ce Compare December 18, 2024 16:56
@larshelge larshelge changed the title DHIS-16705 Use CTE in Enrollment analytics queries [DRAFT] Use CTE in Enrollment analytics queries [DHIS-16705] Dec 19, 2024
@larshelge larshelge changed the title [DRAFT] Use CTE in Enrollment analytics queries [DHIS-16705] [DRAFT] fix: Use CTE in Enrollment analytics queries [DHIS-16705] Dec 19, 2024
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch from 5170115 to d8119c0 Compare December 20, 2024 15:19
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch 2 times, most recently from 5c4dba5 to ef43c5b Compare January 3, 2025 16:19
this.cteDefinition = cteDefinition;
this.offsets.add(offset);
// one alias per offset
this.alias = new RandomStringGenerator.Builder().withinRange('a', 'z').build().generate(5);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
Builder.build
should be avoided because it has been deprecated.
this.programIndicatorUid = programIndicatorUid;
this.programStageUid = null;
// ignore offset
this.alias = new RandomStringGenerator.Builder().withinRange('a', 'z').build().generate(5);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
Builder.build
should be avoided because it has been deprecated.
this.programIndicatorUid = null;
this.programStageUid = programStageUid;
// ignore offset
this.alias = new RandomStringGenerator.Builder().withinRange('a', 'z').build().generate(5);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
Builder.build
should be avoided because it has been deprecated.
}

if (offset < 0) {
return (-1 * offset);

Check failure

Code scanning / CodeQL

User-controlled data in arithmetic expression High

This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.
This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.
This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.
This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.
This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.
This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.
This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.
This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.

Copilot Autofix AI 1 day ago

To fix the problem, we need to validate the offset parameter before performing any arithmetic operations. Specifically, we should ensure that the offset value is within a safe range to prevent overflow or underflow. We can achieve this by adding boundary checks for the offset value.

  • Add a method to validate the offset value, ensuring it falls within a safe range.
  • Modify the computeRowNumberOffset method to use this validation method before performing the arithmetic operation.
  • Update the relevant files and lines to include the validation logic.
Suggested changeset 1
dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/event/data/JdbcEnrollmentAnalyticsManager.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/event/data/JdbcEnrollmentAnalyticsManager.java b/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/event/data/JdbcEnrollmentAnalyticsManager.java
--- a/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/event/data/JdbcEnrollmentAnalyticsManager.java
+++ b/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/event/data/JdbcEnrollmentAnalyticsManager.java
@@ -1320,2 +1320,4 @@
   private int computeRowNumberOffset(int offset) {
+    validateOffset(offset);
+
     if (offset == 0) {
@@ -1329,2 +1331,8 @@
     }
+  }
+
+  private void validateOffset(int offset) {
+    if (offset < Integer.MIN_VALUE || offset > Integer.MAX_VALUE) {
+      throw new IllegalArgumentException("Offset value is out of range: " + offset);
+    }
   }
EOF
@@ -1320,2 +1320,4 @@
private int computeRowNumberOffset(int offset) {
validateOffset(offset);

if (offset == 0) {
@@ -1329,2 +1331,8 @@
}
}

private void validateOffset(int offset) {
if (offset < Integer.MIN_VALUE || offset > Integer.MAX_VALUE) {
throw new IllegalArgumentException("Offset value is out of range: " + offset);
}
}
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch 2 times, most recently from aa533b4 to 5f05bb5 Compare January 6, 2025 13:50
return "";
}

StringBuilder sb = new StringBuilder("WITH ");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use lower-case for SQL keywords in Java, i.e. with.

@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch 2 times, most recently from 4e085d3 to 62c2237 Compare January 7, 2025 09:09
void contributeCTE(
ProgramIndicator programIndicator,
RelationshipType relationshipType,
AnalyticsType outerSqlEntity,

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'outerSqlEntity' is never used.
/**
* Returns a select SQL clause for the given query.
*
* @param params the {@link EventQueryParams}.
*/
protected abstract String getSelectClause(EventQueryParams params);

protected abstract String getColumnWithCte(QueryItem item, String suffix, CteContext cteContext);

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'suffix' is never used.
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch 2 times, most recently from 79f7563 to 94ab16e Compare January 7, 2025 21:28
sql = buildEnrollmentQueryWithCte(params);
}

System.out.println("SQL: " + sql); // FIXME: Remove debug line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove println.

// Using a text block with .formatted() for clarity:
String join =
"""
LEFT JOIN %s %s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use lower-case keywords for SQL in Java.

import java.util.function.Predicate;
import org.hisp.dhis.common.QueryFilter;

/** Mimics the logic of @{@link org.hisp.dhis.common.InQueryFilter} to be used in CTEs */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove outer @.

params.getLatestEndDate(),
cteContext);
} else {
programIndicatorSubqueryBuilder.contributeCTE(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to contributeCte.

@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch 2 times, most recently from bc234ee to a4ff98c Compare January 10, 2025 08:07
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch from a4ff98c to e283b86 Compare January 14, 2025 16:54
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch 2 times, most recently from d2b4821 to 233220f Compare January 16, 2025 13:50
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch 2 times, most recently from 3e25880 to f093346 Compare January 17, 2025 07:27
Copy link

codecov bot commented Jan 17, 2025

Codecov Report

Attention: Patch coverage is 26.93603% with 651 lines in your changes missing coverage. Please review.

Project coverage is 67.11%. Comparing base (19ffa00) to head (f093346).

Files with missing lines Patch % Lines
...ics/event/data/JdbcEnrollmentAnalyticsManager.java 1.01% 288 Missing and 4 partials ⚠️
.../org/hisp/dhis/analytics/common/CteDefinition.java 0.00% 50 Missing ⚠️
...rg/hisp/dhis/analytics/util/sql/SelectBuilder.java 76.81% 35 Missing and 13 partials ⚠️
...org/hisp/dhis/analytics/util/sql/SqlFormatter.java 20.40% 39 Missing ⚠️
...ava/org/hisp/dhis/analytics/common/CteContext.java 0.00% 38 Missing ⚠️
...icator/DefaultProgramIndicatorSubqueryBuilder.java 3.44% 25 Missing and 3 partials ⚠️
.../event/data/AbstractJdbcEventAnalyticsManager.java 15.62% 26 Missing and 1 partial ⚠️
...is/analytics/util/sql/SqlWhereClauseExtractor.java 0.00% 25 Missing ⚠️
...g/hisp/dhis/analytics/common/InQueryCteFilter.java 0.00% 21 Missing ⚠️
...sp/dhis/analytics/util/sql/SqlConditionJoiner.java 0.00% 17 Missing ⚠️
... and 8 more
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #19519       +/-   ##
=============================================
+ Coverage     34.28%   67.11%   +32.82%     
- Complexity      150      640      +490     
=============================================
  Files          3481     3494       +13     
  Lines        126253   127133      +880     
  Branches      14160    14286      +126     
=============================================
+ Hits          43284    85321    +42037     
+ Misses        78658    35331    -43327     
- Partials       4311     6481     +2170     
Flag Coverage Δ
integration 48.68% <0.89%> (?)
integration-h2 34.04% <0.89%> (-0.24%) ⬇️
unit 31.49% <26.93%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ain/java/org/hisp/dhis/setting/SystemSettings.java 99.31% <100.00%> (+0.68%) ⬆️
...nalytics/event/data/JdbcEventAnalyticsManager.java 66.55% <0.00%> (+65.19%) ⬆️
...org/hisp/dhis/analytics/util/sql/NotCondition.java 0.00% <0.00%> (ø)
.../hisp/dhis/analytics/util/sql/SimpleCondition.java 0.00% <0.00%> (ø)
.../java/org/hisp/dhis/analytics/common/CteUtils.java 0.00% <0.00%> (ø)
.../hisp/dhis/analytics/util/sql/SqlColumnParser.java 47.05% <47.05%> (ø)
...hisp/dhis/analytics/util/sql/SqlAliasReplacer.java 75.00% <75.00%> (ø)
...is/analytics/event/data/EnrollmentQueryHelper.java 65.30% <0.00%> (+65.30%) ⬆️
...va/org/hisp/dhis/analytics/util/sql/Condition.java 40.74% <40.74%> (ø)
...sp/dhis/analytics/util/sql/SqlConditionJoiner.java 0.00% <0.00%> (ø)
... and 9 more

... and 1892 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 19ffa00...f093346. Read the comment docs.

*
* @param cteContext the {@link CteContext} to which the new CTE definition(s) will be added
* @param item the {@link QueryItem} containing program-stage details
* @param params the {@link EventQueryParams}, used for checking row-context eligibility, offsets,

Check notice

Code scanning / CodeQL

Spurious Javadoc @param tags Note

@param tag "params" does not match any actual parameter of method "handleAggregatedEnrollments()".
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch from f093346 to 3916833 Compare January 17, 2025 09:19
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch from 3916833 to c1a8dc6 Compare January 17, 2025 09:21
Copy link

sonarqubecloud bot commented Jan 17, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
39 New issues
39 New Code Smells (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

}

private static void extractColumnsFromExpression(Expression expression, Set<String> columns) {
if (expression instanceof net.sf.jsqlparser.schema.Column column) {

Check notice

Code scanning / CodeQL

Chain of 'instanceof' tests Note

This if block performs a chain of 7 type tests - consider alternatives, e.g. polymorphism or the visitor pattern.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants