-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests fail due to apparent duplicate rows in apple_store__subscription_report #15
Comments
Hi @casparwylie thank you for raising this issue. Did Fivetran support say why these duplicates are not a mistake? I also was looking at a past issue #12 and saw another customer experienced a similar error due to the territory_name being slightly different. Would you be able to share a few of these duplicate records so we may see if they have some differences that we may be able to correct within the package. If they do not differ, I would encourage reaching out to Apple to understand why there are duplicates in the source data. I do not believe there should be duplicate subscription report entries in the raw data. That seems like a data integrity issue that this failed test is appropriately flagging. |
To clarify, the rows are still unique by the fivetran/meta fields, just often not by We are seeing plenty of duplicate rows where every column except the meta columns (e.g @fivetran-markgaughran am I right in saying from your end, the duplicates are expected? |
Hi @fivetran-joemarkiewicz @casparwylie , duplicates do not exist in the |
Thanks for adding context @casparwylie and @fivetran-markgaughran. @casparwylie would you be able to share an example of a duplicate in the |
I'm not sure why, but the tests are now passing, likely due to a new historic sync. I now can't find examples other than what I described above! I'm closing the issue. Thank you both. |
Apologies but the issue as resurfaced now. Here is are 2 fresh examples in JSON result format given the query
(in total there are
Let me know your thoughts. Thank you. |
@fivetran-joemarkiewicz Hey - just wondering if any updates on this! Thanks. |
Hi @casparwylie I am sorry to see that the issue has resurfaced. Would you be able to share the |
I've included the query that fetched all the rows (and hidden some more sensitive properties) in the previous comment. Is there any column(s) in particular you'd be keen to see? |
Yeah I am wondering if there are any columns where you saw the rows were not unique? If they are sensitive no need to share, but I am curious if rows were duplicates across every single field? Additionally, it would be worthwhile to check the source again and make sure these duplicates don't exist there. |
Yea so we are seeing plenty of duplicate rows where every column except the meta columns (e.g _index) are the same in |
@casparwylie thank you for sharing! The insight into the However, I am still struggling with the duplicates in the source that are only not duplicates due to the Fivetran metadata columns. Would you be interested in meeting sometime this week for my team and I to review these live with you and determine the best approach forward? |
My team aren't going to be looking at this anymore so probably not neccassary - the first issue mentioned is probably the main one though in case you're keen to look into it further! Thanks anyway. |
Is there an existing issue for this?
Describe the issue
The test case
unique_combination_of_columns
in theapple_store__subscription_report
table fails because the data does seem to have duplicates (where multiple rows have the samedate_day
,account_id
,app_id
,subscription_name
,territory_long
,state
).Looking into it further, the duplicates seem to exist in the transformed data from
app_store.sales_subscription_event_summary
(though the Fivetran assigned Primary keys are still all distinct).I raised a ticket with Fivetran support who have said the duplicates are not a mistake, meaning I believe the test in the DBT package should change.
Relevant error log or model output
No response
Expected behavior
Tests should pass regardless of duplicates for the listed fields.
dbt Project configurations
N/A
Package versions
0.3.1
What database are you using dbt with?
bigquery
dbt Version
1.5.0
Additional Context
No response
Are you willing to open a PR to help address this issue?
The text was updated successfully, but these errors were encountered: