Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Washington Post #74

Open
infinitebuffalo opened this issue Nov 10, 2021 · 1 comment
Open

Washington Post #74

infinitebuffalo opened this issue Nov 10, 2021 · 1 comment

Comments

@infinitebuffalo
Copy link

The Washington Post often posts recipes in two different formats. It appears that org-chef chokes on one, but almost-but-not-quite manages the other.

First, the food section: URLs from this section seem to fail with the error message "org-capture: Capture template ‘c’: Template is not a valid Org entry or tree" (examples: Pecan Tassies, Misir Wot.)

(Side note: Is it possible to get a better error message that makes it clearer that the page couldn't be parsed? I've recently been mucking about in my .emacs and thought I'd somehow messed up the template or something there....)

Second: the recipes section: URLs from this section (the same examples: Pecan Tassies, Misir Wot) seem to work....mostly. However, there are two small problems with the parsing:

  • It cuts every separate sentence into a step, dropping the period along the way.
  • It doubles the step number for each step.

image

(Just to confirm this isn't an issue across org-chef generally, I tried a recipe from AllRecipes and it was as expected, with multiple-sentence steps and only one numeral per step.)

@egh
Copy link
Contributor

egh commented Jan 15, 2022

For the first two examples (the "food" section urls), the problem is that the pages contain incomplete recipe json ld. The error could be better, but this can't really be fixed without a custom parser for the Washington Post recipe site.

For the second problem, it's a problem with the source material. The Washington Post site formats the json ld like that

image

The initial number could be removed pretty easily, I suppose, by stripping out initial numbers, but there's not much to do about breaking the recipe into too many steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants