feat(experimental): authorship #292

paulrobertlloyd · 2023-11-12T22:10:13Z

Checklist

Added validation to any changes in the parser API.
Added tests covering the parsing behaviour changes.

Changes to parsing behaviour

The Authorship specification is an algorithm for discovering and determining the author of a post. This algorithm is especially useful when implementing Webmention and wanting to infer the authorship of individual posts.

This PR is the start of a discussion about adding this algorithm, either to this project, or somewhere else in the Microformats + Node/JavaScript ecosystem.

Authorship discovery is currently an open feature request on mf2tojf2 (see getindiekit/mf2tojf2#22). It would be useful when looking to develop downstream libraries such as webmention-handler (a project which currently implements it’s own authorship normalisation).

There are a few options where authorship discovery and parsing could live:

Within this package, enabled via an experimental option (this PR currently assumes this option),
Within this package, provided as a separate and discrete function,
Within or as another package (mf2utiljs covers this use case, but so far there has been no response to a request for community involvement),
Within mf2tojf2 and other downstream JF2 packages,
Within webmention-handler and other downstream packages that need authorship discovery.

Example input covered by new behaviour

Tests have been added with example HTML, largely adapted from the validation examples provided on authorship.rocks.

Example output from new behaviour

Tests have been added with expected mf2 output. Currently these tests fail, but expectedly as the required discovery and parsing is not performed, even with the experimental authorship flag enabled.

Other changes

Besides tests, and adding an authorship option to the experimental flags, I’ve added the src/helpers/authorship.ts helper, whose findAuthor¹ function is exported and used in src/microformats/parse.ts, where I think it might be called. Again, this is assuming this functionality is added as an experimental option.

I’ve added comments that reflect my best understanding of the algorithm, and where code might live to implement it.

One reason why such parsing might want to be provided as a discrete function, is that later stages of the discovery algorithm require fetching remote content and parsing that to find a representative h-card. That would require making the entire function asynchronous.

Anything else

‘I don't know TypeScript or my tests won't pass’ is a pretty good summary of my weekend.

I’ll try to refrain from adding a long and perhaps unhelpful rant about TypeScript, except to say that I’ve found it immensely difficult and frustrating trying to put together a useful enough PR, and have instead spent most of my time trying to understand various TypeScript errors and behaviours.

Perhaps a discussion for a separate PR, but I’d love to see this project move to model in which JSDoc type annotations are used (and exported as type definitions). Not only would this make it easier for people without TypeScript knowledge to contribute, it would remove the need for a build step and make the codebase more future proof and portable.</rant>

This function is partly based on find_author provided by mf2utiljs ↩

aimee-gm · 2023-11-14T08:37:53Z

src/helpers/authorship.ts

+      result.url = urls[0];
+    }
+  } else if (hCard) {
+    if (URL.canParse(String(hCard))) {


I'm not sure we can use URL.canParse() - see https://developer.mozilla.org/en-US/docs/Web/API/URL/canParse_static#browser_compatibility

Ah, think I confused myself with another Node-only project supporting Node v20. I’ll create a utility function that achieves similar.

aimee-gm · 2023-11-14T09:22:20Z

@paulrobertlloyd thanks for taking the time to put this proposal together. As an experimental option, provided the implementation is faily lightweight (e.g. no additional dependencies), I'm happy to include it with this package.

I really appreciate the tests, this is the most important thing to get in so we can work towards the feature working. It's been a great help in sorting out the issues you've experienced.

I've managed to get the tests working, and did a little refactoring. These are all available in this commit: 50aa892. I've had to comment out a step or two as they were not needed to get to this point.

If you're happy to continue, it should be in a state where you can work on the last 3 tests that are failing.

As for TypeScript - it's far from uncommon for open source javascript packages to be written in TypeScript - take parse5, which is the only dependency this package uses, and rollup, which we use to bundle this, are themselves built with TypeScript.

I do appreciate that it may make it easier for some people to contribute if I was to drop TypeScript, but on the other hand it would also make it significantly harder for me to maintain. I am actively maintaining this, so I am around to help out if anyone needs any help contributing.

paulrobertlloyd · 2023-11-14T13:49:42Z

Thanks for reviewing and refactoring this PR. Do you know how I can add your commit to my branch? I’m guessing using git cherry-pick, but I’ve never done that across forks before.

On getting those last 3 tests working, as mentioned, these will require an asynchronous fetch, which would in-turn require the entire function to be awaited, for example:

const parsed = await mf2('<a class="h-card" href="https://another.example" rel="me">Jimmy</a>', {
  baseUrl: "http://example.com/",
  experimental: {
    authorhship: true
  }
});

I’m not sure you could create a function that is only asynchronous if certain options are enabled. What do you believe is the right way to include this optional fetch functionality? Are we happy for the API to be changed in this way (which would be a breaking change)?

Also, with regards to fetch, are you happy to only support native fetch (which would require the minimum Node version being bumped to v18+, and this experimental feature not working in legacy browsers)?

Zegnat · 2023-11-19T18:21:14Z

I have been thinking about this one for the last couple days since I saw this PR. The following are some random thoughts in no particular order.

I like the idea of being able to offer authorship at the same time as mf2 parsing. The Authorship discovery algorithm requires more context than just the parsed mf2 of a page, so correct authorship discovery can otherwise only be done by reparsing the entire HTML. Getting this in a single step is good. This is also the main argument for doing it here and not downstream, if you ask me.
I dislike the idea of introducing more implied properties to mf2. By putting the result of the Authorship discovery algorithm straight into an author property when no such property class is in the HTML seems weird to me. (Example: authorship-h-card-with-rel-author-to-rel-me.html from this PR.)
Could we bring this up a level to address the above? Put it on the microformat, next to properties instead of inside it? This is the solution that has been used for id as well as the experimental lang (and some other metadata that is being experimented with).
Do we need a new type for Author? The algorithm always results in either the native author property value or an h-card. Is there a specific use-case for changing the output from a microformat into something else? I feel like converting to jf2 or some other format is something that can be left up to the downstream consumer, but I might be missing the reasoning.
I am a big fan of using native fetch. When it comes to this being an experimental feature, how much would we need to care about backwards compatibility? Do we take that into consideration for experiments?

I’m not sure you could create a function that is only asynchronous if certain options are enabled.

You can. You could make mf2 return both ParsedDocument and Promise<ParsedDocument>, based on the experimental feature being turned on. Here is a quick POC in TypeScript Playground that will swap the return type based on a boolean flag. Though in this case the entire function just wraps whatever you pass into it for illustrative purposes:

type Wrapped<T> = { wrapped: T };
type Options = { baseUrl?: string, experimental?: { authorship?: boolean } };

function wrap<T>(input: T, options?: Options & { experimental?: { authorship?: false }}): Wrapped<T>;
function wrap<T>(input: T, options?: Options & { experimental?: { authorship: true }}): Promise<Wrapped<T>>;
function wrap<T>(input: T, options?: Options) {
    const output = { wrapped: input };
    return options?.experimental?.authorship ? Promise.resolve(output) : output;
}

const justAString = wrap("hi");
//    ^?
const justAPromiseString = wrap("hi", { experimental: { authorship: true } } );
//    ^?

The ^? syntax is a special comment that makes the playground write out the inferred type values:

With that said I am not sure how good of an idea that is in reality.

aimee-gm · 2023-11-20T09:37:31Z

I will comment about the thought I had when writing this parser. It was intentionally syncronous, and didn't provide any fetch abilities, such as passing in a URL and returning a promise that resolves to the parsed microformats.

Using a TypeScript overload is possible, as pointed out, but I think this would be confusing. If you're not using TypeScript, you can get a promise, or not, returned based on an experimental feature toggle.

If this library supports it, I would look to perhaps have it in it's own exported function, that returns a promise and that's explicitly set up for this.

As for if authorship belongs in the same method as the parsing, I can't comment - with this I am very happy following spec or what is the community consensus for what it should be.

paulrobertlloyd added 5 commits November 12, 2023 16:59

docs: fix typo in parse.ts comment

e635f4f

feat(experimental): add support for authorship (option flag)

bc9d8d3

feat(experimental): add support for authorship (demo option)

d87355c

feat(experimental): add support for authorship (test suite)

c01370d

feat(experimental): add support for authorship (wip)

d190378

paulrobertlloyd mentioned this pull request Nov 12, 2023

Implement authorship spec getindiekit/mf2tojf2#22

Open

aimee-gm reviewed Nov 14, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(experimental): authorship #292

feat(experimental): authorship #292

paulrobertlloyd commented Nov 12, 2023 •

edited

Loading

aimee-gm Nov 14, 2023

paulrobertlloyd Nov 14, 2023

aimee-gm commented Nov 14, 2023

paulrobertlloyd commented Nov 14, 2023

Zegnat commented Nov 19, 2023

aimee-gm commented Nov 20, 2023 •

edited

Loading

feat(experimental): authorship #292

Are you sure you want to change the base?

feat(experimental): authorship #292

Conversation

paulrobertlloyd commented Nov 12, 2023 • edited Loading

Footnotes

aimee-gm Nov 14, 2023

Choose a reason for hiding this comment

paulrobertlloyd Nov 14, 2023

Choose a reason for hiding this comment

aimee-gm commented Nov 14, 2023

paulrobertlloyd commented Nov 14, 2023

Zegnat commented Nov 19, 2023

aimee-gm commented Nov 20, 2023 • edited Loading

paulrobertlloyd commented Nov 12, 2023 •

edited

Loading

aimee-gm commented Nov 20, 2023 •

edited

Loading