-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logic to normalize comma-delimited decimals #69
base: master
Are you sure you want to change the base?
Add logic to normalize comma-delimited decimals #69
Conversation
run test |
A few minutes ago, I realized the way I was trying to distinguish comma-delimited decimals from comma-delimited thousands wasn't gonna work. So I came here to mark this WIP, and I now see that commit didn't make it to the branch in any case 😳 |
create function for both extract_number and extract_numbers to call
Alternate decimal points now specified with function parameter
e7fde5d
to
402c1f2
Compare
It occurs to me that this argument will almost always be used to parse Should this PR address that? If so, should it be an additional keyword, Although the function signatures may become bloated, I'm partial toward two keywords. Standards exist other than full stops or commas for decimal separators. Indeed, it might be worth handling spaces, if specifically indicated in the function call, but that's another layer of complexity, probably involving a while loop. |
in a computer a decimal number is always represented with a . regardless of language. I don't think we will get any stt transcription ever where this isn't the case It's still good to think about this, chat usage is also important ! |
Exactly. This PR is so that people can parse data written the opposite way. @TheLastProject asked and shall receive. |
Alright, here we go, now that I'm free and at a desktop. Say you wanted TTS to read lines from a document, containing yearly something, and you knew the file's formatting: With this PR, you could do: >>> foo = parse.extract_numbers("1942: 4,5".replace(":",""), decimal=",")
>>> foo
[1942.0, 4.5]
>>> d = format.nice_year(datetime(year=int(foo[0]), month=1, day=1))
>>> d
'nineteen forty two'
>>> bar = d + ", " + format.pronounce_number(foo[1])
>>> bar
'nineteen forty two, four point five' Pass that along to TTS and you've got some nice behavior. |
rebase of MycroftAI#69
rebase of MycroftAI#69 Co-authored-by: jarbasal <jarbasai@mailfence.com>
rebase of MycroftAI#69 Co-authored-by: jarbasal <jarbasai@mailfence.com>
port lingua_nostra/pull/20 - support decimal markers rebase of MycroftAI#69 Co-authored-by: jarbasal <jarbasai@mailfence.com>
feat/normalize_decimals port lingua_nostra/pull/20 - support decimal markers rebase of MycroftAI#69 Co-authored-by: jarbasal <jarbasai@mailfence.com>
rebase of MycroftAI#69
rebase of MycroftAI#69
rebase of MycroftAI#69
Closes #65
All languages, including English, will now normalize34,5
to34.5
before beginning to extract numbers.Decimal markers can now be specified through
extract_number()
andextract_numbers()
function calls, using a new parameterdecimal='.'
Note that these functions will only normalize decimals if they are called as such. Individual parsers, such as
extractnumber_es()
, have not been modified in any way, but will produce correct output when called viaextract_number(lang='es', decimal=',')
For those who don't speak regex, though I encourage you to run the regex through a regex tester, it means this:
\b\d+,{1}\d+\b
:\b
a word boundary\d+
any number of digits, followed by,{1}
exactly one comma, followed by\d+
any number of digits\b
another word boundary