Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support metaformats fallback option #224

Closed
aciccarello opened this issue May 25, 2023 · 8 comments · Fixed by #229
Closed

Support metaformats fallback option #224

aciccarello opened this issue May 25, 2023 · 8 comments · Fixed by #229
Labels
enhancement New feature or request

Comments

@aciccarello
Copy link
Contributor

Describe the feature

An experimental option should be available to use meta tags as a fallback for sites that don't natively implement microformats. The Metaformats spec defines changes to the microformats parsing algorithm to support this.

Adding an opt-in feature would allow using microformats as the vocabulary for understanding other sites while not interfering with pure microformats parsing.

Example of input

GitHub Issue: #39

Example of output

Here's the output of opengraph-mf2 which implements this general concept (though limited to OpenGraph).

{
    "type": [
        "h-entry"
    ],
    "properties": {
        "name": [
            "Provide JavaScript ES module export in distribution · Issue #39 · microformats/microformats-parser"
        ],
        "summary": [
            "Hi Aimee! Lovely to see a clean and modern implementation of a microformats parser; just what I need for my Micropub project! What type of feature is it? An additional option for importing this mod..."
        ],
        "featured": [
            "https://opengraph.githubassets.com/78fd25660d09eb046441b40894ba7dcc6668105efa381fdd8dd1cb3b2f5ba61e/microformats/microformats-parser/issues/39"
        ],
        "url": [
            "https://github.com/microformats/microformats-parser/issues/39"
        ]
    }
}

Additional context

This was prompted by getindiekit/mf2tojf2#20

More details found on the Indieweb Wiki.

@aciccarello aciccarello added the enhancement New feature or request label May 25, 2023
@aimee-gm
Copy link
Member

aimee-gm commented Jul 9, 2023

Hi @aciccarello, thanks for the heads-up for this one.

I've started an implimentation in #226. I've not made any real progress except to introduce the experimental option, and a single test that is failing (as it's not implemented)

It will take me quite a while to get my head back into this and to figure out how to include it, but at least it's on my radar.

Any help here would be appreciated, even if it's just more test cases 🙏

@aciccarello
Copy link
Contributor Author

@aimee-gm I'm glad to hear you're open to this feature 🎉 . As far as I know there hasn't been many implementations of metaformats so there could be a few questions that come during implementation. I took a look at the microformats-parser codebase earlier (which was impressively straightforward 👏 ) and think I've got an idea of where to go with this. I'll see if I can put some more code together.

@aciccarello
Copy link
Contributor Author

I looked into this more today and there are a lot of potential edge cases. I've tried to add test cases on my fork for as many as I could

To name a few:

  • h-* class on <html> avoids parsing metaformats
  • h-* class on <head> avoids parsing metaformats
  • Documents with both mf2 and metaformats
  • mf2 property classes on meta tags reads from content attribute
  • og overriding twitter classes
  • handling multiples of og: properties
  • handling different og:type values

When I looked at some actual pages with meta tags and how to implement the spec, I had a lot of new questions about adjustments the spec may need

  • Why consider twitter tags when most sites also have og values?
  • Why have an explicit meta[name=twitter:card] check if all meta[property^=og:] or meta[name^=twitter:] imply h-entry?
  • Should it support og:video:url in addition to og:video?
  • Should the parser fix usage of name="og:x" rather than property="og:x"?
  • Should article:tag, profile:first-name, and profile:last-name be part of the spec?
  • Why not <title> or <meta name="description"> tags?
  • Is including og:image as u-photo going to imply photo post for things like blog posts (if using post type discovery)?

I'll try to work out the spec questions in the indieweb chat to get clarification. I'll also continue looking at a possible implementation. My initial attempt ran into problems with needing to keep track of whether metaformats were enabled and how to combine potentially conflicting meta tags.

@aimee-gm
Copy link
Member

The main thing to remember is that this is experimental - so for an initial pass, we can implement as much, or as little, as we'd like.

The main hurdles as I see it:

  • Getting <meta> tags to parse to something recognisable - as there is no "parent" that indicates microformats are hidden within
  • Once a h-* class is found on the <html> or <head> elements, meta tag parsing is not performed.

As for which I would attack first, I would consider just handling the og: use cases as a proof of concept.

Beyond that, it should just be implementation and interpretaion of the spec.

As for an approach, there is a step where the document goes through a "setup" phase - picking out elements with IDs, handling templates etc. Then the microformats are parsed. I was wondering if we were to add a step between these, which would see if metaformats would need parsing, and to modify the relevant elements to have necessary class properties to fool the parser into parsing them correctly.

@aciccarello
Copy link
Contributor Author

That's a good point, we have some flexibility here. I like the idea of applying classes during a pre-parsing pass. My latest proof-of-concept was able to use that approach fairly well.

There are still some things to sort out. I think it would be good to indicate in the mf2 response an item was parsed using metaformats. While parsing, it could be challenging to figure out the fallback values when tags are in a different order. There could also be some intra-tag dependencies that could make things more difficult too.

@aciccarello
Copy link
Contributor Author

aciccarello commented Jul 17, 2023

Had some conversations in the indieweb chat (starting on 2023-07-15). Sounds like there is consensus that metaformats should only be returned if there are no mf2 on the page. There are some concerns about badly formatted hentrys from WordPress templates causing problems so some validation of content may be needed.

aciccarello added a commit to aciccarello/microformats-parser that referenced this issue Aug 4, 2023
aimee-gm added a commit that referenced this issue Sep 4, 2023
* feat(Experimental): add support for metaformats

* implement metaformats parsing

Closes #224

* chore(deps): update micoformats/test (#1)

should fix test ordering issue

---------

Co-authored-by: aimee-gm <12508200+aimee-gm@users.noreply.github.com>
@aimee-gm
Copy link
Member

aimee-gm commented Sep 4, 2023

@aciccarello this is now merged and released as v1.5.0 🎉

@aciccarello
Copy link
Contributor Author

Great! Thanks for taking a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants