Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional Steps for BQ_Source #1472

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@
<google.cloud.storage.version>2.3.0</google.cloud.storage.version>
<google.cloud.datastore.version>1.105.1</google.cloud.datastore.version>
<google.protobuf.java.version>3.19.4</google.protobuf.java.version>
<google.tink.version>1.3.0-rc3</google.tink.version>
<google.tink.version>1.3.0</google.tink.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still not reverted to older version

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, It is reverted and please review.

<guava.version>27.0.1-jre</guava.version>
<hadoop.version>3.3.6</hadoop.version>
<hbase-shaded-client.version>1.4.13</hbase-shaded-client.version>
Expand Down
21 changes: 21 additions & 0 deletions src/e2e-test/features/bigquery/source/BigQuerySourceError.feature
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,24 @@ Feature: BigQuery source - Validate BigQuery source plugin error scenarios
Then Enter BigQuery source property table name
Then Enter BigQuery property temporary bucket name "bqInvalidTemporaryBucket"
Then Verify the BigQuery validation error message for invalid property "bucket"

@BQ_SOURCE_TEST
Scenario Outline:To verify error message when unsupported format is provided in Partition Start date and Partition end Date
Given Open Datafusion Project to configure pipeline
When Expand Plugin group in the LHS plugins list: "Source"
When Select plugin: "BigQuery" from the plugins list as: "Source"
Then Navigate to the properties page of plugin: "BigQuery"
Then Replace input plugin property: "project" with value: "projectId"
Then Replace input plugin property: "dataset" with value: "dataset"
Then Replace input plugin property: "table" with value: "bqSourceTable"
Then Click on the Get Schema button
Then Enter BigQuery source properties partitionFrom and partitionTo
Then Validate BigQuery source incorrect property error for Partition Start date "<property>" value "<value>"
Copy link
Contributor

@itsmekumari itsmekumari Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Praveena2607 I don't think the internal review comments have been addressed. Please address all the internal review comments provided earlier in this PR as well. Ref Praveena2607#1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed over call, Please resolve all the discussed points and mentioned in internal PR. To use the latest framework steps in all scenarios, for entering the values in property, for validating inline error messages, for selecting output schema macro.

Then Validate BigQuery source incorrect property error for Partition End date "<property>" value "<value>"
And Enter input plugin property: "referenceName" with value: "bqIncorrectReferenceName"
Then Validate BigQuery source incorrect property error for reference name"<property>" value "<value>"
Examples:
| property | value |
| partitionFrom | bqIncorrectFormatStartDate |
| partitionTo | bqIncorrectFormatEndDate |
| referenceName | bqIncorrectReferenceName |
31 changes: 31 additions & 0 deletions src/e2e-test/features/bigquery/source/BigQueryToBigQuery.feature
Original file line number Diff line number Diff line change
Expand Up @@ -354,3 +354,34 @@ Feature: BigQuery source - Verification of BigQuery to BigQuery successful data
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Validate the values of records transferred to BQ sink is equal to the values from source BigQuery table

@BQ_SOURCE_TEST @BQ_SINK_TEST
Scenario:Validate that pipeline run gets failed when incorrect filter values and verify the log error message
Given Open Datafusion Project to configure pipeline
When Source is BigQuery
When Sink is BigQuery
Then Open BigQuery source properties
Then Enter BigQuery property reference name
Then Enter BigQuery property projectId "projectId"
Then Enter BigQuery property datasetProjectId "projectId"
Then Override Service account details if set in environment variables
Then Enter BigQuery property dataset "dataset"
Then Enter BigQuery source property table name
Then Enter input plugin property: "filter" with value: "incorrectFilter"
Then Validate output schema with expectedSchema "bqSourceSchema"
Then Validate "BigQuery" plugin properties
Then Close the BigQuery properties
Then Open BigQuery sink properties
Then Override Service account details if set in environment variables
Then Enter the BigQuery sink mandatory properties
Then Validate "BigQuery" plugin properties
Then Close the BigQuery properties
Then Connect source as "BigQuery" and sink as "BigQuery" to establish connection
Then Save the pipeline
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Wait till pipeline is in running state
Then Verify the pipeline status is "Failed"
Then Open Pipeline logs and verify Log entries having below listed Level and Message:
| Level | Message |
| ERROR | errorLogsMessageInvalidFilter |
219 changes: 219 additions & 0 deletions src/e2e-test/features/bigquery/source/BigQueryToGCS_WithMacro.feature
Original file line number Diff line number Diff line change
Expand Up @@ -69,3 +69,222 @@ Feature: BigQuery source - Verification of BigQuery to GCS successful data trans
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket
Then Validate the cmek key "cmekGCS" of target GCS bucket if cmek is enabled

@CMEK @BQ_SOURCE_TEST @GCS_SINK_TEST
Scenario:Validate successful records transfer from BigQuery to GCS with macro arguments for partition start date and partition end date
Given Open Datafusion Project to configure pipeline
When Source is BigQuery
When Sink is GCS
Then Open BigQuery source properties
Then Enter BigQuery property reference name
Then Enter BigQuery property "projectId" as macro argument "bqProjectId"
Then Enter BigQuery property "datasetProjectId" as macro argument "bqDatasetProjectId"
Then Enter BigQuery property "partitionFrom" as macro argument "bqStartDate"
Then Enter BigQuery property "partitionTo" as macro argument "bqEndDate"
Then Enter BigQuery property "serviceAccountType" as macro argument "serviceAccountType"
Then Enter BigQuery property "serviceAccountFilePath" as macro argument "serviceAccount"
Then Enter BigQuery property "serviceAccountJSON" as macro argument "serviceAccount"
Then Enter BigQuery property "dataset" as macro argument "bqDataset"
Then Enter BigQuery property "table" as macro argument "bqSourceTable"
Then Validate "BigQuery" plugin properties
Then Close the BigQuery properties
Then Open GCS sink properties
Then Enter GCS property reference name
Then Enter GCS property "projectId" as macro argument "gcsProjectId"
Then Enter GCS property "serviceAccountType" as macro argument "serviceAccountType"
Then Enter GCS property "serviceAccountFilePath" as macro argument "serviceAccount"
Then Enter GCS property "serviceAccountJSON" as macro argument "serviceAccount"
Then Enter GCS property "path" as macro argument "gcsSinkPath"
Then Enter GCS sink property "pathSuffix" as macro argument "gcsPathSuffix"
Then Enter GCS property "format" as macro argument "gcsFormat"
Then Enter GCS sink cmek property "encryptionKeyName" as macro argument "cmekGCS" if cmek is enabled
Then Validate "GCS" plugin properties
Then Close the GCS properties
Then Connect source as "BigQuery" and sink as "GCS" to establish connection
Then Save the pipeline
Then Preview and run the pipeline
Then Enter runtime argument value "projectId" for key "bqProjectId"
Then Enter runtime argument value "projectId" for key "bqDatasetProjectId"
Then Enter runtime argument value "partitionFrom" for key "bqStartDate"
Then Enter runtime argument value "partitionTo" for key "bqEndDate"
Then Enter runtime argument value "serviceAccountType" for key "serviceAccountType"
Then Enter runtime argument value "serviceAccount" for key "serviceAccount"
Then Enter runtime argument value "dataset" for key "bqDataset"
Then Enter runtime argument value for BigQuery source table name key "bqSourceTable"
Then Enter runtime argument value "projectId" for key "gcsProjectId"
Then Enter runtime argument value for GCS sink property path key "gcsSinkPath"
Then Enter runtime argument value "gcsPathDateSuffix" for key "gcsPathSuffix"
Then Enter runtime argument value "csvFormat" for key "gcsFormat"
Then Enter runtime argument value "cmekGCS" for GCS cmek property key "cmekGCS" if GCS cmek is enabled
Then Run the preview of pipeline with runtime arguments
Then Wait till pipeline preview is in running state
Then Open and capture pipeline preview logs
Then Verify the preview run status of pipeline in the logs is "succeeded"
Then Close the pipeline logs
Then Click on preview data for GCS sink
Then Close the preview data
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Enter runtime argument value "projectId" for key "bqProjectId"
Then Enter runtime argument value "projectId" for key "bqDatasetProjectId"
Then Enter runtime argument value "partitionFrom" for key "bqStartDate"
Then Enter runtime argument value "partitionTo" for key "bqEndDate"
Then Enter runtime argument value "serviceAccountType" for key "serviceAccountType"
Then Enter runtime argument value "serviceAccount" for key "serviceAccount"
Then Enter runtime argument value "dataset" for key "bqDataset"
Then Enter runtime argument value for BigQuery source table name key "bqSourceTable"
Then Enter runtime argument value "projectId" for key "gcsProjectId"
Then Enter runtime argument value for GCS sink property path key "gcsSinkPath"
Then Enter runtime argument value "gcsPathDateSuffix" for key "gcsPathSuffix"
Then Enter runtime argument value "csvFormat" for key "gcsFormat"
Then Enter runtime argument value "cmekGCS" for GCS cmek property key "cmekGCS" if GCS cmek is enabled
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket
Then Validate the cmek key "cmekGCS" of target GCS bucket if cmek is enabled

@CMEK @BQ_SOURCE_TEST @GCS_SINK_TEST
Scenario:Validate successful records transfer from BigQuery to GCS with macro arguments for filter and outputschema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output schema macro step is not present in scenario but in title it is written. Make sure if not using in scenario remove from title, and add in the scenario where step is used.

Given Open Datafusion Project to configure pipeline
When Source is BigQuery
When Sink is GCS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the latest steps from framework.

Then Open BigQuery source properties
Then Enter BigQuery property reference name
Then Enter BigQuery property "projectId" as macro argument "bqProjectId"
Then Enter BigQuery property "datasetProjectId" as macro argument "bqDatasetProjectId"
Then Enter BigQuery property "filter" as macro argument "bqFilter"
Then Enter BigQuery property "serviceAccountType" as macro argument "serviceAccountType"
Then Enter BigQuery property "serviceAccountFilePath" as macro argument "serviceAccount"
Then Enter BigQuery property "serviceAccountJSON" as macro argument "serviceAccount"
Then Enter BigQuery property "dataset" as macro argument "bqDataset"
Then Enter BigQuery property "table" as macro argument "bqSourceTable"
Then Validate "BigQuery" plugin properties
Then Close the BigQuery properties
Then Open GCS sink properties
Then Enter GCS property reference name
Then Enter GCS property "projectId" as macro argument "gcsProjectId"
Then Enter GCS property "serviceAccountType" as macro argument "serviceAccountType"
Then Enter GCS property "serviceAccountFilePath" as macro argument "serviceAccount"
Then Enter GCS property "serviceAccountJSON" as macro argument "serviceAccount"
Then Enter GCS property "path" as macro argument "gcsSinkPath"
Then Enter GCS sink property "pathSuffix" as macro argument "gcsPathSuffix"
Then Enter GCS property "format" as macro argument "gcsFormat"
Then Enter GCS sink cmek property "encryptionKeyName" as macro argument "cmekGCS" if cmek is enabled
Then Validate "GCS" plugin properties
Then Close the GCS properties
Then Connect source as "BigQuery" and sink as "GCS" to establish connection
Then Save the pipeline
Then Preview and run the pipeline
Then Enter runtime argument value "projectId" for key "bqProjectId"
Then Enter runtime argument value "projectId" for key "bqDatasetProjectId"
Then Enter runtime argument value "filter" for key "bqFilter"
Then Enter runtime argument value "serviceAccountType" for key "serviceAccountType"
Then Enter runtime argument value "serviceAccount" for key "serviceAccount"
Then Enter runtime argument value "dataset" for key "bqDataset"
Then Enter runtime argument value for BigQuery source table name key "bqSourceTable"
Then Enter runtime argument value "projectId" for key "gcsProjectId"
Then Enter runtime argument value for GCS sink property path key "gcsSinkPath"
Then Enter runtime argument value "gcsPathDateSuffix" for key "gcsPathSuffix"
Then Enter runtime argument value "csvFormat" for key "gcsFormat"
Then Enter runtime argument value "cmekGCS" for GCS cmek property key "cmekGCS" if GCS cmek is enabled
Then Run the preview of pipeline with runtime arguments
Then Wait till pipeline preview is in running state
Then Open and capture pipeline preview logs
Then Verify the preview run status of pipeline in the logs is "succeeded"
Then Close the pipeline logs
Then Click on preview data for GCS sink
Then Close the preview data
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Enter runtime argument value "projectId" for key "bqProjectId"
Then Enter runtime argument value "projectId" for key "bqDatasetProjectId"
Then Enter runtime argument value "filter" for key "bqFilter"
Then Enter runtime argument value "serviceAccountType" for key "serviceAccountType"
Then Enter runtime argument value "serviceAccount" for key "serviceAccount"
Then Enter runtime argument value "dataset" for key "bqDataset"
Then Enter runtime argument value for BigQuery source table name key "bqSourceTable"
Then Enter runtime argument value "projectId" for key "gcsProjectId"
Then Enter runtime argument value for GCS sink property path key "gcsSinkPath"
Then Enter runtime argument value "gcsPathDateSuffix" for key "gcsPathSuffix"
Then Enter runtime argument value "csvFormat" for key "gcsFormat"
Then Enter runtime argument value "cmekGCS" for GCS cmek property key "cmekGCS" if GCS cmek is enabled
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket
Then Validate the cmek key "cmekGCS" of target GCS bucket if cmek is enabled

@CMEK @BQ_SOURCE_TEST @GCS_SINK_TEST
Scenario:Validate successful records transfer from BigQuery to GCS with macro arguments for output schema
Given Open Datafusion Project to configure pipeline
When Source is BigQuery
When Sink is GCS
Then Open BigQuery source properties
Then Enter BigQuery property reference name
Then Enter BigQuery property "projectId" as macro argument "bqProjectId"
Then Enter BigQuery property "datasetProjectId" as macro argument "bqDatasetProjectId"
Then Enter BigQuery property "serviceAccountType" as macro argument "serviceAccountType"
Then Enter BigQuery property "serviceAccountFilePath" as macro argument "serviceAccount"
Then Enter BigQuery property "serviceAccountJSON" as macro argument "serviceAccount"
Then Enter BigQuery property "dataset" as macro argument "bqDataset"
Then Enter BigQuery property "table" as macro argument "bqSourceTable"
Then Enter BigQuery source property output schema "outputSchema" as macro argument "bqOutputSchema"
Copy link
Contributor

@itsmekumari itsmekumari Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the existing step discussed earlier, for selecting macro action for output schema.

Then Validate "BigQuery" plugin properties
Then Close the BigQuery properties
Then Open GCS sink properties
Then Enter GCS property reference name
Then Enter GCS property "projectId" as macro argument "gcsProjectId"
Then Enter GCS property "serviceAccountType" as macro argument "serviceAccountType"
Then Enter GCS property "serviceAccountFilePath" as macro argument "serviceAccount"
Then Enter GCS property "serviceAccountJSON" as macro argument "serviceAccount"
Then Enter GCS property "path" as macro argument "gcsSinkPath"
Then Enter GCS sink property "pathSuffix" as macro argument "gcsPathSuffix"
Then Enter GCS property "format" as macro argument "gcsFormat"
Then Enter GCS sink cmek property "encryptionKeyName" as macro argument "cmekGCS" if cmek is enabled
Then Validate "GCS" plugin properties
Then Close the GCS properties
Then Connect source as "BigQuery" and sink as "GCS" to establish connection
Then Save the pipeline
Then Preview and run the pipeline
Then Enter runtime argument value "projectId" for key "bqProjectId"
Then Enter runtime argument value "projectId" for key "bqDatasetProjectId"
Then Enter runtime argument value "serviceAccountType" for key "serviceAccountType"
Then Enter runtime argument value "serviceAccount" for key "serviceAccount"
Then Enter runtime argument value "dataset" for key "bqDataset"
Then Enter runtime argument value for BigQuery source table name key "bqSourceTable"
Then Enter runtime argument value "OutputSchema" for key "bqOutputSchema"
Then Enter runtime argument value "projectId" for key "gcsProjectId"
Then Enter runtime argument value for GCS sink property path key "gcsSinkPath"
Then Enter runtime argument value "gcsPathDateSuffix" for key "gcsPathSuffix"
Then Enter runtime argument value "csvFormat" for key "gcsFormat"
Then Enter runtime argument value "cmekGCS" for GCS cmek property key "cmekGCS" if GCS cmek is enabled
Then Run the preview of pipeline with runtime arguments
Then Wait till pipeline preview is in running state
Then Open and capture pipeline preview logs
Then Verify the preview run status of pipeline in the logs is "succeeded"
Then Close the pipeline logs
Then Click on preview data for GCS sink
Then Close the preview data
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Enter runtime argument value "projectId" for key "bqProjectId"
Then Enter runtime argument value "projectId" for key "bqDatasetProjectId"
Then Enter runtime argument value "serviceAccountType" for key "serviceAccountType"
Then Enter runtime argument value "serviceAccount" for key "serviceAccount"
Then Enter runtime argument value "dataset" for key "bqDataset"
Then Enter runtime argument value for BigQuery source table name key "bqSourceTable"
Then Enter runtime argument value "OutputSchema" for key "bqOutputSchema"
Then Enter runtime argument value "projectId" for key "gcsProjectId"
Then Enter runtime argument value for GCS sink property path key "gcsSinkPath"
Then Enter runtime argument value "gcsPathDateSuffix" for key "gcsPathSuffix"
Then Enter runtime argument value "csvFormat" for key "gcsFormat"
Then Enter runtime argument value "cmekGCS" for GCS cmek property key "cmekGCS" if GCS cmek is enabled
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket
Then Validate the cmek key "cmekGCS" of target GCS bucket if cmek is enabled
Loading
Loading