Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E GCS Sink additional test scenarios. #1478

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions src/e2e-test/features/gcs/sink/BigQueryToGCSSink_WithMacro.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
@GCS_Sink
Feature: GCS sink - Verification of GCS Sink plugin macro scenarios

@CMEK @BQ_SOURCE_TEST @GCS_SINK_TEST
Scenario:Validate successful records transfer from BigQuery to GCS with macro enabled at sink
Given Open Datafusion Project to configure pipeline
Then Select plugin: "BigQuery" from the plugins list as: "Source"
When Expand Plugin group in the LHS plugins list: "Sink"
When Select plugin: "GCS" from the plugins list as: "Sink"
Then Open BigQuery source properties
Then Override Service account details if set in environment variables
Then Enter the BigQuery source mandatory properties
Then Validate "BigQuery" plugin properties
Then Close the BigQuery properties
Then Open GCS sink properties
Then Enter GCS property reference name
Then Enter GCS property "projectId" as macro argument "gcsProjectId"
Then Enter GCS property "serviceAccountType" as macro argument "serviceAccountType"
Then Enter GCS property "serviceAccountFilePath" as macro argument "serviceAccount"
Then Enter GCS property "path" as macro argument "gcsSinkPath"
Then Enter GCS sink property "pathSuffix" as macro argument "gcsPathSuffix"
Then Enter GCS property "format" as macro argument "gcsFormat"
Then Click on the Macro button of Property: "writeHeader" and set the value to: "WriteHeader"
Then Click on the Macro button of Property: "location" and set the value to: "gcsSinkLocation"
Then Click on the Macro button of Property: "contentType" and set the value to: "gcsContentType"
Then Click on the Macro button of Property: "outputFileNameBase" and set the value to: "OutFileNameBase"
Then Click on the Macro button of Property: "fileSystemProperties" and set the value to: "FileSystemPr"
Then Enter GCS sink cmek property "encryptionKeyName" as macro argument "cmekGCS" if cmek is enabled
Then Validate "GCS" plugin properties
Then Close the GCS properties
Then Connect source as "BigQuery" and sink as "GCS" to establish connection
Then Save the pipeline
Then Preview and run the pipeline
Then Enter runtime argument value "projectId" for key "gcsProjectId"
Then Enter runtime argument value "serviceAccountType" for key "serviceAccountType"
Then Enter runtime argument value "serviceAccount" for key "serviceAccount"
Then Enter runtime argument value for GCS sink property path key "gcsSinkPath"
Then Enter runtime argument value "gcsPathDateSuffix" for key "gcsPathSuffix"
Then Enter runtime argument value "csvFormat" for key "gcsFormat"
Then Enter runtime argument value "writeHeader" for key "WriteHeader"
Then Enter runtime argument value "contentType" for key "gcsContentType"
Then Enter runtime argument value "gcsSinkBucketLocation" for key "gcsSinkLocation"
Then Enter runtime argument value "outputFileNameBase" for key "OutFileNameBase"
Then Enter runtime argument value "gcsCSVFileSysProperty" for key "FileSystemPr"
Then Enter runtime argument value "cmekGCS" for GCS cmek property key "cmekGCS" if GCS cmek is enabled
Then Run the preview of pipeline with runtime arguments
Then Wait till pipeline preview is in running state
Then Open and capture pipeline preview logs
Then Verify the preview run status of pipeline in the logs is "succeeded"
Then Close the pipeline logs
Then Close the preview
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Enter runtime argument value "projectId" for key "gcsProjectId"
Then Enter runtime argument value "serviceAccountType" for key "serviceAccountType"
Then Enter runtime argument value "serviceAccount" for key "serviceAccount"
Then Enter runtime argument value for GCS sink property path key "gcsSinkPath"
Then Enter runtime argument value "gcsPathDateSuffix" for key "gcsPathSuffix"
Then Enter runtime argument value "csvFormat" for key "gcsFormat"
Then Enter runtime argument value "writeHeader" for key "WriteHeader"
Then Enter runtime argument value "contentType" for key "gcsContentType"
Then Enter runtime argument value "gcsSinkBucketLocation" for key "gcsSinkLocation"
Then Enter runtime argument value "outputFileNameBase" for key "OutFileNameBase"
Then Enter runtime argument value "gcsCSVFileSysProperty" for key "FileSystemPr"
Then Enter runtime argument value "cmekGCS" for GCS cmek property key "cmekGCS" if GCS cmek is enabled
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket
Then Validate the cmek key "cmekGCS" of target GCS bucket if cmek is enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the validation step for validating the values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In progress.

121 changes: 120 additions & 1 deletion src/e2e-test/features/gcs/sink/GCSSink.feature
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Feature: GCS sink - Verification of GCS Sink plugin
| parquet | application/octet-stream |
| orc | application/octet-stream |

@GCS_SINK_TEST @BQ_SOURCE_TEST
@BQ_SOURCE_TEST @GCS_SINK_TEST
Scenario Outline: To verify data is getting transferred successfully from BigQuery to GCS with combinations of contenttype
Given Open Datafusion Project to configure pipeline
When Source is BigQuery
Expand Down Expand Up @@ -265,3 +265,122 @@ Feature: GCS sink - Verification of GCS Sink plugin
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket

@BQ_SOURCE_TEST @GCS_SINK_TEST
Scenario Outline: To verify data is getting transferred successfully from BigQuery to GCS with contenttype selection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the validation for file format as well

Given Open Datafusion Project to configure pipeline
When Select plugin: "BigQuery" from the plugins list as: "Source"
When Expand Plugin group in the LHS plugins list: "Sink"
When Select plugin: "GCS" from the plugins list as: "Sink"
Then Connect source as "BigQuery" and sink as "GCS" to establish connection
Then Open BigQuery source properties
Then Enter the BigQuery source mandatory properties
Then Validate "BigQuery" plugin properties
Then Close the BigQuery properties
Then Open GCS sink properties
Then Enter GCS property projectId and reference name
Then Enter GCS sink property path
Then Select GCS property format "<FileFormat>"
Then Select GCS sink property contentType "<contentType>"
Then Validate "GCS" plugin properties
Then Close the GCS properties
Then Save and Deploy Pipeline
Then Run the Pipeline in Runtime
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket
Examples:
| FileFormat | contentType |
| csv | text/csv |
| tsv | text/plain |

@BQ_SOURCE_DATATYPE_TEST @GCS_SINK_TEST
Scenario:Validate successful records transfer from BigQuery to GCS with advanced file system properties field
Given Open Datafusion Project to configure pipeline
Then Select plugin: "BigQuery" from the plugins list as: "Source"
When Expand Plugin group in the LHS plugins list: "Sink"
When Select plugin: "GCS" from the plugins list as: "Sink"
Then Open BigQuery source properties
Then Enter BigQuery property reference name
Then Enter BigQuery property projectId "projectId"
Then Enter BigQuery property datasetProjectId "projectId"
Then Override Service account details if set in environment variables
Then Enter BigQuery property dataset "dataset"
Then Enter BigQuery source property table name
Then Validate output schema with expectedSchema "bqSourceSchemaDatatype"
Then Validate "BigQuery" plugin properties
Then Close the BigQuery properties
Then Open GCS sink properties
Then Override Service account details if set in environment variables
Then Enter the GCS sink mandatory properties
Then Enter GCS File system properties field "gcsCSVFileSysProperty"
Then Validate "GCS" plugin properties
Then Close the GCS properties
Then Connect source as "BigQuery" and sink as "GCS" to establish connection
Then Save the pipeline
Then Preview and run the pipeline
Then Wait till pipeline preview is in running state
Then Open and capture pipeline preview logs
Then Verify the preview run status of pipeline in the logs is "succeeded"
Then Close the pipeline logs
Then Click on preview data for GCS sink
Then Verify preview output schema matches the outputSchema captured in properties
Then Close the preview data
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket
Then Validate the values of records transferred to GCS bucket is equal to the values from source BigQuery table

@GCS_CSV @GCS_SINK_TEST @GCS_Source_Required
Scenario Outline: To verify data is getting transferred from GCS Source to GCS Sink with write header true at Sink
Given Open Datafusion Project to configure pipeline
When Select plugin: "GCS" from the plugins list as: "Source"
When Expand Plugin group in the LHS plugins list: "Sink"
When Select plugin: "GCS" from the plugins list as: "Sink"
Then Connect plugins: "GCS" and "GCS2" to establish connection
Then Navigate to the properties page of plugin: "GCS"
Then Select dropdown plugin property: "select-schema-actions-dropdown" with option value: "clear"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using this step?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is just clearing the 'output schema' clear.

Then Replace input plugin property: "project" with value: "projectId"
Then Override Service account details if set in environment variables
Then Enter input plugin property: "referenceName" with value: "sourceRef"
Then Enter GCS source property path "gcsCsvDataFile"
Then Select GCS property format "delimited"
Then Enter input plugin property: "delimiter" with value: "delimiterValue"
Then Toggle GCS source property skip header to true
Then Validate "GCS" plugin properties
Then Verify the Output Schema matches the Expected Schema: "gcsSingleFileDataSchema"
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "GCS2"
Then Enter GCS property projectId and reference name
Then Enter GCS sink property path
Then Select GCS property format "<FileFormat>"
Then Click on the Macro button of Property: "writeHeader" and set the value to: "WriteHeader"
Then Validate "GCS" plugin properties
Then Close the GCS properties
Then Save the pipeline
Then Preview and run the pipeline
Then Enter runtime argument value "writeHeader" for key "WriteHeader"
Then Run the preview of pipeline with runtime arguments
Then Wait till pipeline preview is in running state
Then Open and capture pipeline preview logs
Then Verify the preview run status of pipeline in the logs is "succeeded"
Then Close the pipeline logs
Then Close the preview
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Enter runtime argument value "writeHeader" for key "WriteHeader"
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the validation steps in all the scenarios.

Then Validate the data from GCS Source to GCS Sink with expected csv file and target data in GCS bucket
Examples:
| FileFormat |
| csv |
#| tsv |
# | delimited |
36 changes: 36 additions & 0 deletions src/e2e-test/features/gcs/sink/GCSSinkError.feature
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,39 @@ Feature: GCS sink - Verify GCS Sink plugin error scenarios
Then Select GCS property format "csv"
Then Click on the Validate button
Then Verify that the Plugin Property: "format" is displaying an in-line error message: "errorMessageInvalidFormat"

@BQ_SOURCE_TEST @GCS_SINK_TEST
Scenario: To verify and validate the Error message in pipeline logs after deploy with invalid bucket path
Given Open Datafusion Project to configure pipeline
When Select plugin: "BigQuery" from the plugins list as: "Source"
When Expand Plugin group in the LHS plugins list: "Sink"
When Select plugin: "GCS" from the plugins list as: "Sink"
Then Connect source as "BigQuery" and sink as "GCS" to establish connection
Then Open BigQuery source properties
Then Enter the BigQuery source mandatory properties
Then Validate "BigQuery" plugin properties
Then Close the BigQuery properties
Then Open GCS sink properties
Then Enter GCS property projectId and reference name
Then Enter GCS property "path" as macro argument "gcsSinkPath"
Then Select GCS property format "csv"
Then Click on the Validate button
Then Close the GCS properties
Then Save the pipeline
Then Preview and run the pipeline
Then Enter runtime argument value "gcsInvalidBucketNameSink" for key "gcsSinkPath"
Then Run the preview of pipeline with runtime arguments
Then Wait till pipeline preview is in running state
Then Open and capture pipeline preview logs
Then Close the pipeline logs
Then Close the preview
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Enter runtime argument value "gcsInvalidBucketNameSink" for key "gcsSinkPath"
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state
Then Verify the pipeline status is "Failed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add open and capture logs step

Then Open Pipeline logs and verify Log entries having below listed Level and Message:
| Level | Message |
| ERROR | errorMessageInvalidBucketNameSink |
Then Close the pipeline logs
2 changes: 1 addition & 1 deletion src/e2e-test/resources/errorMessage.properties
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,4 @@ errorMessageMultipleFileWithoutClearDefaultSchema=Found a row with 4 fields when
errorMessageInvalidSourcePath=Invalid bucket name in path 'abc@'. Bucket name should
errorMessageInvalidDestPath=Invalid bucket name in path 'abc@'. Bucket name should
errorMessageInvalidEncryptionKey=CryptoKeyName.parse: formattedString not in valid format: Parameter "abc@" must be

errorMessageInvalidBucketNameSink=Unable to read or access GCS bucket.
6 changes: 6 additions & 0 deletions src/e2e-test/resources/pluginParameters.properties
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,12 @@ gcsParquetFileSchema=[{"key":"workforce","value":"string"},{"key":"report_year",
{"key":"race_black","value":"long"},{"key":"race_hispanic_latinx","value":"long"},\
{"key":"race_native_american","value":"long"},{"key":"race_white","value":"long"},\
{"key":"tablename","value":"string"}]
gcsInvalidBucketNameSink=ggg
writeHeader=true
gcsSinkBucketLocation=US
contentType=application/octet-stream
outputFileNameBase=part
gcsCSVFileSysProperty={"csvinputformat.record.csv": "1"}
## GCS-PLUGIN-PROPERTIES-END

## BIGQUERY-PLUGIN-PROPERTIES-START
Expand Down