Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential fix for boolean elem alteration #929

Closed
wants to merge 5 commits into from

Conversation

jlloyd-widen
Copy link
Contributor

I have run into an error/bug where an element from a row is "false" and the schema for the field is ['null', 'string', 'boolean']. When this happens the is_boolean_type logic will trigger from line 219 of _typing.py but the if/then logic ends up executing lines 225 and 226 which changes the value from the original "false" to True. Obviously no bueno!

I'm not necessarily dead set on this being the solution. It's just the easiest one to implement for my particular case. Happy to take feedback or suggestions.

@codecov
Copy link

codecov bot commented Aug 26, 2022

Codecov Report

Merging #929 (beac08d) into main (0b2fe9e) will decrease coverage by 6.50%.
The diff coverage is 0.00%.

❗ Current head beac08d differs from pull request most recent head a040acb. Consider uploading reports for the commit a040acb to get more accurate results

@@            Coverage Diff             @@
##             main     #929      +/-   ##
==========================================
- Coverage   86.97%   80.48%   -6.50%     
==========================================
  Files          59       35      -24     
  Lines        5091     3484    -1607     
  Branches      825      696     -129     
==========================================
- Hits         4428     2804    -1624     
- Misses        466      507      +41     
+ Partials      197      173      -24     
Files Changed Coverage Δ
singer_sdk/helpers/_typing.py 45.25% <0.00%> (-25.23%) ⬇️

... and 61 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -220,7 +220,7 @@ def conform_record_data_types( # noqa: C901
boolean_representation: Optional[bool]
if elem is None:
boolean_representation = None
elif elem == 0:
elif elem in [0, 'false']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have support for alternate spellings / casing?

Suggested change
elif elem in [0, 'false']:
elif elem in [0, 'false', 'False', 'FALSE']:

Copy link
Contributor

@aaronsteers aaronsteers Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut says stringification of arbitrary values could have a minor perf impact. What about putting the string check behind a type check?

Suggested change
elif elem in [0, 'false']:
elif elem == 0:
boolean_representation = False
elif isinstance(elem, str) and elem == 'false':
boolean_representation = False

@edgarrmondragon - thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronsteers 👍 and I like @jlloyd-widen's iteration over this to make it case insensitive

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I just did some quick benchmarks and isinstance seems to cause execution of this block to more than double (129.7707ns vs 289.6716ns on my machine).

Copy link
Contributor

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my side. I asked about alternate spellings, but nothing too much at issue there.

@@ -220,7 +220,7 @@ def conform_record_data_types( # noqa: C901
boolean_representation: Optional[bool]
if elem is None:
boolean_representation = None
elif elem == 0:
elif elem in [0, 'false']:
Copy link
Contributor

@aaronsteers aaronsteers Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut says stringification of arbitrary values could have a minor perf impact. What about putting the string check behind a type check?

Suggested change
elif elem in [0, 'false']:
elif elem == 0:
boolean_representation = False
elif isinstance(elem, str) and elem == 'false':
boolean_representation = False

@edgarrmondragon - thoughts?

@jlloyd-widen
Copy link
Contributor Author

@aaronsteers Good ideas. I just pushed a new commit that's a bit more comprehensive. Let me know if you or @edgarrmondragon have more thoughts

@@ -222,6 +222,9 @@ def conform_record_data_types( # noqa: C901
boolean_representation = None
elif elem == 0:
boolean_representation = False
elif isinstance(elem, str):
if elem.lower() in ['false', 'f']:
boolean_representation = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlloyd-widen what happens if the value is a string but not one of the checked options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then it would force it to true ... so that's not good either

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlloyd-widen even if True was an acceptable value, it seems to me that boolean_representation wouldn't have a value because there's no else here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgarrmondragon just added a new commit. I tried to make the logic inclusive of true scenarios. Where the input element isn't obviously true or false, I pass it along as the final property value unaltered.

@tayloramurphy
Copy link
Collaborator

@jlloyd-widen this is a pretty old PR at this point. Do you still want to try and get this merged? cc @edgarrmondragon

@jlloyd-widen
Copy link
Contributor Author

I just synced the most recent code with this branch and found that the strategy now being used for this logic is significantly different than when I originally submitted the PR. So I'm going to assume this is no longer needed and close this PR.

@jlloyd-widen jlloyd-widen deleted the fix-boolean-typing-error branch September 5, 2023 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants