Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E GCS Sink additional test scenarios. #1478

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 172 additions & 0 deletions src/e2e-test/features/gcs/sink/GCSSink.feature
Original file line number Diff line number Diff line change
Expand Up @@ -265,3 +265,175 @@ Feature: GCS sink - Verification of GCS Sink plugin
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket

#Added new scenarios for GCS Sink - Bijay
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the commented line here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@BQ_SOURCE_TEST @GCS_SINK_TEST
Scenario:Validate successful records transfer from BigQuery to GCS with macro enabled at sink
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the macro scenario in a separate feature file with name macro, refer other plugins feature file for naming convention.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Given Open Datafusion Project to configure pipeline
When Source is BigQuery
When Sink is GCS
Then Connect source as "BigQuery" and sink as "GCS" to establish connection
Then Open BigQuery source properties
Then Override Service account details if set in environment variables
Then Enter the BigQuery source mandatory properties
Then Validate "BigQuery" plugin properties
Then Close the BigQuery properties
Then Open GCS sink properties
Then Enter GCS property reference name
Then Enter GCS property "projectId" as macro argument "gcsProjectId"
Then Enter GCS property "serviceAccountType" as macro argument "serviceAccountType"
Then Enter GCS property "serviceAccountFilePath" as macro argument "serviceAccount"
Then Enter GCS property "serviceAccountJSON" as macro argument "serviceAccount"
Then Enter GCS property "path" as macro argument "gcsSinkPath"
Then Enter GCS sink property "pathSuffix" as macro argument "gcsPathSuffix"
Then Enter GCS property "format" as macro argument "gcsFormat"
Then Click on the Macro button of Property: "writeHeader" and set the value to: "WriteHeader"
Then Click on the Macro button of Property: "location" and set the value to: "gcsSinkLocation"
Then Click on the Macro button of Property: "contentType" and set the value to: "gcsContentType"
Then Click on the Macro button of Property: "outputFileNameBase" and set the value to: "OutFileNameBase"
Then Click on the Macro button of Property: "fileSystemProperties" and set the value to: "FileSystemPr"
Then Validate "GCS" plugin properties
Then Close the GCS properties
Then Save the pipeline
Then Preview and run the pipeline
Then Enter runtime argument value "projectId" for key "gcsProjectId"
Then Enter runtime argument value "serviceAccountType" for key "serviceAccountType"
Then Enter runtime argument value "serviceAccount" for key "serviceAccount"
Then Enter runtime argument value for GCS sink property path key "gcsSinkPath"
Then Enter runtime argument value "gcsPathDateSuffix" for key "gcsPathSuffix"
Then Enter runtime argument value "csvFormat" for key "gcsFormat"
Then Enter runtime argument value "writeHeader" for key "WriteHeader"
Then Enter runtime argument value "contentType" for key "gcsContentType"
Then Enter runtime argument value "gcsSinkBucketLocation" for key "gcsSinkLocation"
Then Enter runtime argument value "outputFileNameBase" for key "OutFileNameBase"
Then Enter runtime argument value "gcsFileSysProperty" for key "FileSystemPr"
Then Run the preview of pipeline with runtime arguments
Then Wait till pipeline preview is in running state
Then Open and capture pipeline preview logs
Then Verify the preview run status of pipeline in the logs is "succeeded"
Then Close the pipeline logs
Then Close the preview
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Enter runtime argument value "projectId" for key "gcsProjectId"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the properties from the macro which are already covered in the scenarios . for eg-projectID is already covered.

Then Enter runtime argument value "serviceAccountType" for key "serviceAccountType"
Then Enter runtime argument value "serviceAccountType" for key "serviceAccountType"
Then Enter runtime argument value "serviceAccount" for key "serviceAccount"
Then Enter runtime argument value for GCS sink property path key "gcsSinkPath"
Then Enter runtime argument value "gcsPathDateSuffix" for key "gcsPathSuffix"
Then Enter runtime argument value "csvFormat" for key "gcsFormat"
Then Enter runtime argument value "writeHeader" for key "WriteHeader"
Then Enter runtime argument value "contentType" for key "gcsContentType"
Then Enter runtime argument value "gcsSinkBucketLocation" for key "gcsSinkLocation"
Then Enter runtime argument value "outputFileNameBase" for key "OutFileNameBase"
Then Enter runtime argument value "gcsFileSysProperty" for key "FileSystemPr"
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket

@GCS_SINK_TEST @BQ_SOURCE_TEST
Scenario Outline: To verify data is getting transferred successfully from BigQuery to GCS with contenttype selection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the validation for file format as well

Given Open Datafusion Project to configure pipeline
When Source is BigQuery
When Sink is GCS
Then Connect source as "BigQuery" and sink as "GCS" to establish connection
Then Open BigQuery source properties
Then Enter the BigQuery source mandatory properties
Then Validate "BigQuery" plugin properties
Then Close the BigQuery properties
Then Open GCS sink properties
Then Enter GCS property projectId and reference name
Then Enter GCS sink property path
Then Select GCS property format "<FileFormat>"
Then Select GCS sink property contentType "<contentType>"
Then Validate "GCS" plugin properties
Then Close the GCS properties
Then Save and Deploy Pipeline
Then Run the Pipeline in Runtime
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket
Examples:
| FileFormat | contentType |
| csv | text/csv |
| tsv | text/plain |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra line here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@BQ_SOURCE_TEST @GCS_SINK_TEST
Scenario: To verify data is getting transferred successfully from BigQuery to GCS using advanced file system properties field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding macro here again, It is already covered in macro enabled scenario. It should be for without macro enabled

Given Open Datafusion Project to configure pipeline
When Source is BigQuery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the latest existing steps. This is a common review comment across all scenarios.

When Sink is GCS
Then Connect source as "BigQuery" and sink as "GCS" to establish connection
Then Open BigQuery source properties
Then Override Service account details if set in environment variables
Then Enter the BigQuery source mandatory properties
Then Validate "BigQuery" plugin properties
Then Close the BigQuery properties
Then Open GCS sink properties
Then Enter GCS property projectId and reference name
Then Override Service account details if set in environment variables
Then Enter GCS sink property path
Then Select GCS property format "csv"
Then Click on the Macro button of Property: "fileSystemProperties" and set the value to: "FileSystemPr"
Then Validate "GCS" plugin properties
Then Close the GCS properties
Then Save the pipeline
Then Preview and run the pipeline
Then Enter runtime argument value "gcsFileSysProperty" for key "FileSystemPr"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any value added in parameter file for file system property

Then Run the preview of pipeline with runtime arguments
Then Wait till pipeline preview is in running state
Then Open and capture pipeline preview logs
Then Verify the preview run status of pipeline in the logs is "succeeded"
Then Close the pipeline logs
Then Close the preview
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Enter runtime argument value "gcsFileSysProperty" for key "FileSystemPr"
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket

@GCS_SINK_TEST @BQ_SOURCE_TEST @GCS_Sink_Required
Scenario Outline: To verify successful data transfer from BigQuery to GCS for different formats with write header true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario should be from GCS source to GCS sink right? Re-check and change accordingly. And why are we making it a macro scenarios, it is already covered in macro enabled scenario anyways.

Given Open Datafusion Project to configure pipeline
When Source is BigQuery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the latest existing steps from framework. Change in all the scenarios

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

When Sink is GCS
Then Connect source as "BigQuery" and sink as "GCS" to establish connection
Then Open BigQuery source properties
Then Enter the BigQuery source mandatory properties
Then Validate "BigQuery" plugin properties
Then Close the BigQuery properties
Then Open GCS sink properties
Then Enter GCS property projectId and reference name
Then Enter GCS sink property path
Then Select GCS property format "<FileFormat>"
Then Click on the Macro button of Property: "writeHeader" and set the value to: "WriteHeader"
Then Validate "GCS" plugin properties
Then Close the GCS properties
Then Save the pipeline
Then Preview and run the pipeline
Then Enter runtime argument value "writeHeader" for key "WriteHeader"
Then Run the preview of pipeline with runtime arguments
Then Wait till pipeline preview is in running state
Then Open and capture pipeline preview logs
Then Verify the preview run status of pipeline in the logs is "succeeded"
Then Close the pipeline logs
Then Close the preview
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Enter runtime argument value "writeHeader" for key "WriteHeader"
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the validation steps in all the scenarios.

Examples:
| FileFormat |
| csv |
| tsv |
| delimited |
34 changes: 34 additions & 0 deletions src/e2e-test/features/gcs/sink/GCSSinkError.feature
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,37 @@ Feature: GCS sink - Verify GCS Sink plugin error scenarios
Then Select GCS property format "csv"
Then Click on the Validate button
Then Verify that the Plugin Property: "format" is displaying an in-line error message: "errorMessageInvalidFormat"

@GCS_SINK_TEST @BQ_SOURCE_TEST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the tag order, for ease of understanding.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Scenario: To verify and validate the Error message in pipeline logs after deploy with invalid bucket path
Given Open Datafusion Project to configure pipeline
When Source is BigQuery
When Sink is GCS
Then Connect source as "BigQuery" and sink as "GCS" to establish connection
Then Open BigQuery source properties
Then Enter the BigQuery source mandatory properties
Then Validate "BigQuery" plugin properties
Then Close the BigQuery properties
Then Open GCS sink properties
Then Enter GCS property projectId and reference name
Then Enter GCS property "path" as macro argument "gcsSinkPath"
Then Select GCS property format "csv"
Then Click on the Validate button
Then Close the GCS properties
Then Save the pipeline
Then Preview and run the pipeline
Then Enter runtime argument value "gcsInvalidBucketNameSink" for key "gcsSinkPath"
Then Run the preview of pipeline with runtime arguments
Then Wait till pipeline preview is in running state
Then Open and capture pipeline preview logs
Then Close the pipeline logs
Then Close the preview
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Enter runtime argument value "gcsInvalidBucketNameSink" for key "gcsSinkPath"
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state
Then Verify the pipeline status is "Failed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add open and capture logs step

Then Open Pipeline logs and verify Log entries having below listed Level and Message:
| Level | Message |
| ERROR | errorMessageInvalidBucketNameSink |
1 change: 1 addition & 0 deletions src/e2e-test/resources/errorMessage.properties
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,5 @@ errorMessageMultipleFileWithoutClearDefaultSchema=Found a row with 4 fields when
errorMessageInvalidSourcePath=Invalid bucket name in path 'abc@'. Bucket name should
errorMessageInvalidDestPath=Invalid bucket name in path 'abc@'. Bucket name should
errorMessageInvalidEncryptionKey=CryptoKeyName.parse: formattedString not in valid format: Parameter "abc@" must be
errorMessageInvalidBucketNameSink=Spark program 'phase-1' failed with error: Errors were encountered during validation. Error code: 400, Unable to read or access GCS bucket. Bucket names must be at least 3 characters in length, got 2: 'gg'. Please check the system logs for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add only relevant error message.


5 changes: 5 additions & 0 deletions src/e2e-test/resources/pluginParameters.properties
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,11 @@ gcsParquetFileSchema=[{"key":"workforce","value":"string"},{"key":"report_year",
{"key":"race_black","value":"long"},{"key":"race_hispanic_latinx","value":"long"},\
{"key":"race_native_american","value":"long"},{"key":"race_white","value":"long"},\
{"key":"tablename","value":"string"}]
gcsInvalidBucketNameSink=gg
writeHeader=true
gcsSinkBucketLocation=US
contentType=application/octet-stream
outputFileNameBase=part
## GCS-PLUGIN-PROPERTIES-END

## BIGQUERY-PLUGIN-PROPERTIES-START
Expand Down
Loading