Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingredient amounts given as intervals are not treated properly #2234

Open
mkaut opened this issue Apr 7, 2024 · 8 comments
Open

ingredient amounts given as intervals are not treated properly #2234

mkaut opened this issue Apr 7, 2024 · 8 comments
Labels
bug Something isn't working Frontend Issue or PR related to the frontend code

Comments

@mkaut
Copy link

mkaut commented Apr 7, 2024

In many recipes, amounts of some ingredients are specified as an interval, such as "4-5 eggs".
If I do this in the Cookbook, the upper bound is completely ignored, i.e., it shows as "4 eggs".
I tested this with both "-" and "–", the result is the same.

Even worse, if I use spaces around the dash ("4 - 5 eggs"), the "- 5" part is interpreted as part of the name of the ingredients, so if I ask for twice the number of portions, it would say "8 - 5 eggs".

Is this a bug, or is there some different way to specify imprecise amounts that I did not think of?

@mkaut mkaut added the bug Something isn't working label Apr 7, 2024
@stefan123t
Copy link

stefan123t commented Nov 26, 2024

I still have this same issue with the following ingredient given with ranges:

1-2 EL Zitronen- oder Rote-Beete-Saft

Error message is:

(i) The ingredient cannot be recalculated due to incorrect syntax.
Please ensure the syntax follows this format:
amount unit ingredient
and that a specific number of portions is set for this function to work correctly.
Examples: 200 g carrots or 1 pinch of salt.

Similar to common n/m fractions #2004
and unicode ½ fractions #2498
Nextcloud Cookbook should also support 1-2 ranges
and uncounted ingredients / without an amount #2233.

This can be eventually solved together with the alternative ingredients 1 fresh or {3} dried leaves #2581

@christianlupus
Copy link
Collaborator

@j0hannesr0th here is some problem with the indredient recalculator. Can you give your advice here?

@christianlupus christianlupus added the Frontend Issue or PR related to the frontend code label Dec 2, 2024
@j0hannesr0th
Copy link
Contributor

Hi @christianlupus

Good question! I don't have a solution for this, but here are some thoughts:

  • If you've made a recipe and know whether it's better with 1 or 2 EL Zitronen- oder Rote-Beete-Saft, just remove the other number, and you're good to go.
  • No matter how many edge cases we resolve, users will always find more:
    • 6 servings = ⅞-1 pt 3 Mile Island Iced Tea or ½ fl oz 7 and 7
    • 10 servings = 1.46-1.67 pt 3 Mile Island Iced Tea or 0.83 fl oz 7 and 7
    • 12 servings = 1.75-2 pt 3 Mile Island Iced Tea or 1 fl oz 7 and 7
    • I can calculate it by hand, but the regex for this would be really ugly.
  • Where do you want to draw the line on what’s supported and what’s not?

@stefan123t
Copy link

stefan123t commented Dec 3, 2024

I can calculate it by hand, but the regex for this would be really ugly.

@j0hannesr0th your comment made me smile and thus got me interested into the challenge 😺

I have been checking the current regex expressions and tried to yield something more general:

/*
The ingredientFractionRegExp is used to identify fractions in the string.
This is used to exclude strings that contain fractions from being valid.
*/
const fractionRegExp = /^((\d+\s+)?(?:\p{No}|(\d+)\s*\/\s*(\d+))).*/u;
function isValidIngredientSyntax(ingredient) {
/*
The ingredientSyntaxRegExp checks whether the ingredient string starts with a fraction,
or a whole part and fraction, or a decimal.
It may optionally have a unit but must be proceeded by a single whitespace and further text.
*/
const ingredientSyntaxRegExp =
/^(?:(?:\d+\s)?(?:\d+\/\d+|\p{No})|\d+(?:\.\d+)?)[a-zA-z]*\s.*$/;
/*
The ingredientMultipleSeperatorsRegExp is used to check whether the string contains
more than one separator (.,) after a number. This is used to exclude strings that
contain more than one separator from being valid.
*/
const ingredientMultipleSeparatorsRegExp = /^-?\d+(?:[.,]\d+){2,}.*/;
/*
startsWithDoubleHashRegExp checks if the ingredient string begins with "## " followed by any characters.
*/
const startsWithDoubleHashRegExp = /^## .+$/;
return (
startsWithDoubleHashRegExp.test(ingredient) ||
(ingredientSyntaxRegExp.test(ingredient) &&
!ingredientMultipleSeparatorsRegExp.test(ingredient))
);
}

I noticed that A-z in [a-zA-z] matches some overlapping character ranges/classes (including [\]^_ and backtick `),
it might therefor be better to use \p{L} instead:

  • a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
  • A-z matches a single character in the range between A (index 65) and z (index 122) (case sensitive)

Actually I am unsure whether it will work with Latin-1 characters like in french é, etc. ?

I could come up with the following regex that should match a basic grammar for <amount> ( fractions, numerator(s) / dividend(s) and/or range(s) ) with <unit>, <ingredient> and <optional> / <alternative> suffixes as suggested in #2581:

^(?<amount>(?<range_from>(?<numerator_from>\d+(?:\.(?:\d+))?|\p{N}+)(?:\/(?<dividend_from>\d+(?:\.(?:\d+))?|\p{N}+))?)(?:(?:-)(?<range_to>(?<numerator_to>\d+(?:\.(?:\d+))?|\p{N}+)(?:\/(?<dividend_to>\d+(?:\.(?:\d+))?|\p{N}+))?))?)(?:\s)(?<unit>\p{L}*)\s(?<ingredient>.*?)(?:\s(?<option>\(optional\)))?(?:\s(?<alternative>(?:or|oder|ou)))?$

You can test it on the below regex101 instance against the following set of examples:
https://regex101.com/r/2dAXUG/4

1 pt 3 Mile Island Iced Tea
1 fl oz 7 and 7 or
½ fl oz 7 and 7
1.46 pt 3 Mile Island Iced Tea
0.83 fl oz 7 and 7
⅞ pt 3 Mile Island Iced Tea or
1.3/2.4 things to do
1.3-2.4 things to do
⁹⁹/₁₀₀ pt Fractions
⅞-1 pt 3 Mile Island Iced Tea or
1.46-1.67 pt 3 Mile Island Iced Tea or
1.75-2 pt 3 Mile Island Iced Tea or
²/3 pt 3 Mile Island Iced Tea
3/₄ pt 3 Mile Island Iced Tea
²/3-3/₄ pt 3 Mile Island Iced Tea
1 ingredient fresh (optional) or
3-7 ingredient dried
5 other things or
3 other stuff or
1 other jiffy
7/8 Prisen Salz oder
⁷⁷⁄₇₈ Prisen Salz oder
1 Prise Salz
1 ingredient fresh or
3-5 ingredient dried
5 other things or
3 other stuff or
1 other jiffy
1 ingredient fresh (optional) or
3 ingredient dried (optional)
5 other things (optional) or
3 other stuff (optional)

@j0hannesr0th
Copy link
Contributor

@stefan123t as I said: not impossible but kind of ugly but not that easy to understand.

And a few weeks later some other edge case, which we haven't considered, comes up.

For me it's fine if you want to commit your code.

@christianlupus
Copy link
Collaborator

Yeah, no one will be able to read this as suggested nor maintain in case of bugs/problems.

Can't we split up the RegEx into building blocks and define something like with BNF typically done? I am thinking along this lines:

const reInteger = '([0-9]+)'
const reFloat = '([0-9]*\\.[0-9]+)'
const reNumber = `(${reInteger}|${reFloat})`
// ...
const re = new Regex(...)

The problem with is kind of parsing is that there are many groups and you have to be super careful to not get caught in assembling the actual numeric value.
Also, I am thinking if it makes sense to define various functions that try to parse a certain structure (e.g. only decimal numbers) and will throw a defined exception or return a defined value to indicate that parsing failed. Then, one could just iterate a list of functions, try each entry and return the first matching result. If no result was found, parsing actually failed. This would allow to give human readable names on the functions and also keep the complexity at bay. This can be combined with the RegEx building blocks as defined above.

Do you think this was resulting in a cleaner structure and more maintainability?

@stefan123t
Copy link

@christianlupus splitting the rather unwieldy RegEx I construed into BNF form would be a really good idea
as it also allows to reuse and eventually even optimize some of the building blocks!

Do you think the named capture groups (?<name>...) I provided are generally useful,
or would you rather do away with them and use the anonymous capture groups (?:...)
which are assigned to speaking / named variables here anyway ?

/*
The ingredientFractionRegExp is used to identify fractions in the string.
This is used to exclude strings that contain fractions from being valid.
*/
const fractionRegExp = /^((\d+\s+)?(?:\p{No}|(\d+)\s*\/\s*(\d+))).*/u;

const matches = ingredient.match(fractionRegExp);
// Fraction
if (matches) {
const [
,
fractionMatch,
wholeNumberPartRaw,
numeratorRaw,
denominatorRaw,
] = matches;

Note: I have not been able to include the or \x{2044} for #2498 as in ⁷⁷⁄₇₈ Prisen Salz or ⁷⁷⁄₇₈ Prisen Salz into the RegEx yet.

@Olen
Copy link

Olen commented Jan 9, 2025

Could it make sense to split the ingredient-fied into two parts - amount/volume/whatever and ingredient?
This would allow for simpler parsing of the amount-field, and you would avoid all the warnings in the recipes when there are lines with e.g. "Salt to taste" or "Basil for garnish" etc.

Simple javascript could use a similar regexp as today when you read the recipe from JSON or paste data into the ingredient field, to auto-fill the "amount" field - which can then be manually modified if the parser is incorrect somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Frontend Issue or PR related to the frontend code
Projects
None yet
Development

No branches or pull requests

5 participants