Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQueryWriteClient/JsonStreamWriter: missing values for duplicated, nested, proto-incompatible fields #2575

Open
pondzix opened this issue Jul 23, 2024 · 0 comments
Labels
api: bigquerystorage Issues related to the googleapis/java-bigquerystorage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@pondzix
Copy link

pondzix commented Jul 23, 2024

Environment details

  • OS type and version: macos Ventura 13.6
  • Java version: 11
  • version(s): 3.7.0, 2.47.0

Steps to reproduce

  1. Create table in BigQuery:
CREATE TABLE test_dataset.test_table ( 
  parent1 STRUCT< `1nested` STRING >,
  parent2 STRUCT< `1nested` STRING >
)

So 2 parent STRUCT fields, both with a single field named 1nested. 1 at the beginning is important here, which makes it proto incompatible based on this class

  1. Write data to the table using JsonStreamWriter. Here is the gist with sample scala-cli script

The problem is: input contains values for both parent1.1nested and parent2.1nested but in BQ only the first one has correct value. The second one is null. Looks like it's lost somewhere:

Screenshot 2024-07-23 at 09 32 49

Code example

From the gist ☝️:

//> using dep com.google.cloud:google-cloud-bigquerystorage:3.7.0
//> using dep com.google.cloud:google-cloud-bigquery:2.41.0

import com.google.cloud.bigquery.TableId
import com.google.cloud.bigquery.storage.v1.{BigQueryWriteClient, JsonStreamWriter}
import org.json.JSONArray
import scala.jdk.CollectionConverters._

/**
CREATE TABLE
  test_dataset.test_table ( 
    parent1 STRUCT< `1nested` STRING >,
    parent2 STRUCT< `1nested` STRING >
  )
*/ 

val input = Map(
  "parent1" -> Map("1nested" -> "value").asJava, // "value" in BQ
  "parent2" -> Map("1nested" -> "value").asJava, // null in BQ
)

write(input)

def write(input: Map[String, AnyRef]): Unit = {
  val client = BigQueryWriteClient.create()
  val streamId = TableId.of("...project..", "...dataset..", "..table..").getIAMResourceName + "/streams/_default"
  val writer = JsonStreamWriter
    .newBuilder(streamId, client)
    .build

  writer.append(new JSONArray(List(input.asJava).asJava)).get
  writer.close()
  client.close()
  ()
}

Any following attempt of writing data to parent1.1nested is successful, data is not null. Any following attempt of writing data to parent2.1nested always results in null in BQ.

In general, first proto-incompatible field "wins" and any other proto-incompatible field with the same name, living as a nested fields somewhere else, "lose".

@product-auto-label product-auto-label bot added the api: bigquerystorage Issues related to the googleapis/java-bigquerystorage API. label Jul 23, 2024
@leahecole leahecole added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquerystorage Issues related to the googleapis/java-bigquerystorage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

2 participants