-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
potential fix for boolean elem alteration #929
Conversation
Codecov Report
@@ Coverage Diff @@
## main #929 +/- ##
==========================================
- Coverage 86.97% 80.48% -6.50%
==========================================
Files 59 35 -24
Lines 5091 3484 -1607
Branches 825 696 -129
==========================================
- Hits 4428 2804 -1624
- Misses 466 507 +41
+ Partials 197 173 -24
... and 61 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
singer_sdk/helpers/_typing.py
Outdated
@@ -220,7 +220,7 @@ def conform_record_data_types( # noqa: C901 | |||
boolean_representation: Optional[bool] | |||
if elem is None: | |||
boolean_representation = None | |||
elif elem == 0: | |||
elif elem in [0, 'false']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to have support for alternate spellings / casing?
elif elem in [0, 'false']: | |
elif elem in [0, 'false', 'False', 'FALSE']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut says stringification of arbitrary values could have a minor perf impact. What about putting the string check behind a type check?
elif elem in [0, 'false']: | |
elif elem == 0: | |
boolean_representation = False | |
elif isinstance(elem, str) and elem == 'false': | |
boolean_representation = False |
@edgarrmondragon - thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronsteers 👍 and I like @jlloyd-widen's iteration over this to make it case insensitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I just did some quick benchmarks and isinstance
seems to cause execution of this block to more than double (129.7707ns
vs 289.6716ns
on my machine).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from my side. I asked about alternate spellings, but nothing too much at issue there.
singer_sdk/helpers/_typing.py
Outdated
@@ -220,7 +220,7 @@ def conform_record_data_types( # noqa: C901 | |||
boolean_representation: Optional[bool] | |||
if elem is None: | |||
boolean_representation = None | |||
elif elem == 0: | |||
elif elem in [0, 'false']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut says stringification of arbitrary values could have a minor perf impact. What about putting the string check behind a type check?
elif elem in [0, 'false']: | |
elif elem == 0: | |
boolean_representation = False | |
elif isinstance(elem, str) and elem == 'false': | |
boolean_representation = False |
@edgarrmondragon - thoughts?
@aaronsteers Good ideas. I just pushed a new commit that's a bit more comprehensive. Let me know if you or @edgarrmondragon have more thoughts |
singer_sdk/helpers/_typing.py
Outdated
@@ -222,6 +222,9 @@ def conform_record_data_types( # noqa: C901 | |||
boolean_representation = None | |||
elif elem == 0: | |||
boolean_representation = False | |||
elif isinstance(elem, str): | |||
if elem.lower() in ['false', 'f']: | |||
boolean_representation = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlloyd-widen what happens if the value is a string but not one of the checked options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then it would force it to true ... so that's not good either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlloyd-widen even if True
was an acceptable value, it seems to me that boolean_representation
wouldn't have a value because there's no else
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgarrmondragon just added a new commit. I tried to make the logic inclusive of true
scenarios. Where the input element isn't obviously true or false, I pass it along as the final property value unaltered.
@jlloyd-widen this is a pretty old PR at this point. Do you still want to try and get this merged? cc @edgarrmondragon |
I just synced the most recent code with this branch and found that the strategy now being used for this logic is significantly different than when I originally submitted the PR. So I'm going to assume this is no longer needed and close this PR. |
I have run into an error/bug where an element from a row is
"false"
and the schema for the field is['null', 'string', 'boolean']
. When this happens theis_boolean_type
logic will trigger from line 219 of_typing.py
but the if/then logic ends up executing lines 225 and 226 which changes the value from the original"false"
toTrue
. Obviously no bueno!I'm not necessarily dead set on this being the solution. It's just the easiest one to implement for my particular case. Happy to take feedback or suggestions.