Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release types #50

Merged
merged 19 commits into from
Jul 8, 2024
Merged

Release types #50

merged 19 commits into from
Jul 8, 2024

Conversation

phw
Copy link
Collaborator

@phw phw commented Jul 1, 2024

This adds support for release group types, see #15. I'm not fully sure this is ready to go, there are some loose ends, so I open this as a draft.

Some notes:

  • For iTunes it does detect the - Single and - EP suffixes used there and also removed them (this closes Remove - Single & - EP from release names automagically #9)
  • For Deezer and Tidal it does support the single and EP type
  • Spotify does provide a single and compilation type
  • Deezer, Spotify and Tidal provide an "album" type, but I think this is too generic to prefill it as type album
  • For Bandcamp it tries to guess the type from the titles, as "EP" suffixes are frequently used there
  • There are generic functions to guess type from release title and guessing live releases from release and track titles

For discussion:

  • Merging currently takes a union of types across all providers. It probably should be more clever. Especially for the primary type, where there should be only a single type, there should maybe be some logic to select only one.
  • The generic guessing functions could be applied for all providers. Should this be done per provider, or should we have this automatically run for any provider after convertRawRelease?
  • On Support release group types #15 there was some discussions about additional hints that could be used at least for validating if types are plausible. I personally think they involve a bit much guesswork, but maybe this could be used to trigger warnings if a type seems unlikely.

Copy link
Owner

@kellnerd kellnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging currently takes a union of types across all providers. It probably should be more clever. Especially for the primary type, where there should be only a single type, there should maybe be some logic to select only one.

The merging logic should probably based on rules. Some initial ideas:

  • Album is always discarded in favour of another primary type (EP, Single, Broadcast)
  • All secondary types will be preserved (unless they conflict, like Audiobook and Audio drama would, but this is not relevant yet)
  • If both EP and Single are specified, discard both as the type is ambiguous? (depends on when the automatic guessing happens, see the next question)

The generic guessing functions could be applied for all providers. Should this be done per provider, or should we have this automatically run for any provider after convertRawRelease?

I agree that the guessing functions should not have to be called in each provider implementation.
What I had in mind is to do only the required, provider-specific release type processing steps inside the provider and handle everything else later on.
There would be two possibilities for the second processing step:

  1. Apply the guessing functions once after merging. Unless there are any issues with this approach, it would be my favourite because it is more efficient and we could display the "original source values" (without guessed values) as alternative values on the website.
  2. Apply the guessing functions for each individual before merging. Might yield better results if the titles from a secondary metadata source contain more useful information?

On #15 there was some discussions about additional hints that could be used at least for validating if types are plausible. I personally think they involve a bit much guesswork, but maybe this could be used to trigger warnings if a type seems unlikely.

Yes, I would probably keep these track count and duration metrics for a separate step.

harmonizer/merge.ts Outdated Show resolved Hide resolved
harmonizer/types.ts Outdated Show resolved Hide resolved
musicbrainz/seeding.ts Outdated Show resolved Hide resolved
harmonizer/types.ts Outdated Show resolved Hide resolved
server/components/Release.tsx Outdated Show resolved Hide resolved
Comment on lines 128 to 130
// Type can also be "album", but this might be too generic
// to be reliably set as a MusicBrainz type.
// 'album': 'Album',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can set the type to album as soon as we have a clever merge algorithm which prefers other primary types over Album. Although most providers likely declare everything an Album which is not a Single or an EP, it would feel strange to ignore the most common case completely.
There might be some other cases for which we should also ignore an Album type (like non-music releases), but we can have this logic at a higher level and keep the providers as dumb as possible.

Copy link
Collaborator Author

@phw phw Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced by this. All of Spotify, Deezer and Tidal use album for everything they don't consider a Single or EP (or Compilation for Spotify).

I think this is similar to e.g. the uncertainty around release countries, where in the discussion around atisket the consensus seems to be "if in doubt leave it empty". I think we should apply the same careful approach here and leave it to the user if we can't have a certain confidence in a type.

But I'll try it once the merge algorithm is in place.

utils/release.ts Outdated Show resolved Hide resolved
utils/release.ts Outdated Show resolved Hide resolved
utils/release.ts Outdated Show resolved Hide resolved
@kellnerd kellnerd added feature New feature or request harmonizer Harmonized data representation and processing labels Jul 1, 2024
@phw phw force-pushed the release-types branch from 5f18b59 to d91fbe4 Compare July 2, 2024 15:33
phw added 3 commits July 3, 2024 12:48
@phw phw force-pushed the release-types branch from 58b222f to 70c9e8e Compare July 3, 2024 11:50
@phw phw marked this pull request as ready for review July 3, 2024 11:53
@phw phw force-pushed the release-types branch from 70c9e8e to 5a7cd1b Compare July 3, 2024 12:30
@phw
Copy link
Collaborator Author

phw commented Jul 3, 2024

@kellnerd I think I have now addressed most issues and this is usable. This has become much more complex than I had originally anticipated.

I have implemented sorting the types with primary type first in 0f08a80 and a merging algorithm in d9db1f1 . The merging takes the secondary types as given, but for the primary types keeps only one. It prefers more specific types over "Other" or "Album", and it prefers "EP" over "Single" (as e.g. Spotify has not EP, just Single, and EP is usually the more specific classification). This overly seems to work quite well and I have even allowed the "Album" type from Deezer, Spotify and Tidal again, as you suggested.

  1. Apply the guessing functions once after merging. Unless there are any issues with this approach, it would be my favourite because it is more efficient and we could display the "original source values" (without guessed values) as alternative values on the website.

The final release merging now runs guessTypesForRelease always after type merging, see 5a7cd1b . This is very helpful especially when detecting live releases, but also helps with EPs.

I'm unsure how to go about the "original source values" , but maybe we can move this to a separate PR. This one is already big enough.

Also I'm unsure whether we need to do something about preferSameProvider in merge.ts (https://github.com/kellnerd/harmony/blob/main/harmonizer/merge.ts#L146-L148). In the current implementation the types get merged across all releases regardless. This is essentially the same as regions are handled as well. But maybe this is unexpected if preferSameProvider is true. I'm not sure about the wanted behavior here.

Copy link
Owner

@kellnerd kellnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has become much more complex than I had originally anticipated.

Indeed, I also thought this would be more straightforward. Thank you for carefully thinking this through with me!

I'm unsure how to go about the "original source values" , but maybe we can move this to a separate PR.

Oh, that part is a lot simpler than it sounds because the necessary tools are already there.
I have left a suggestion for it, it is basically a one-liner without the formatting.

Also I'm unsure whether we need to do something about preferSameProvider in merge.ts (https://github.com/kellnerd/harmony/blob/main/harmonizer/merge.ts#L146-L148). In the current implementation the types get merged across all releases regardless. This is essentially the same as regions are handled as well. But maybe this is unexpected if preferSameProvider is true. I'm not sure about the wanted behavior here.

Interesting point, I had not thought about that. The concept of the provider preferences only applies to "immutable" properties so far (those for which we only use one source) but not to "combinable" properties:

export const combinableReleaseProperties: Array<keyof HarmonyRelease> = [
'externalLinks',
'availableIn',
'excludedFrom',
'info',
];

Note that this constant is not actually used, it just exists for book-keeping purposes and to accompany the "immutable" constants. These are used in the merge algorithm and to restrict the valid keys of ProviderPreferences.
Strictly speaking 'types' should also be part of combinableReleaseProperties since it is combined from multiple sources. On the other hand, the primary type is immutable, which makes 'types' partially immutable 🤔

I would say we leave 'types' as a combinable property for now and focus on improving the merging algorithm rather than allowing the user to select a preferred provider for them.

harmonizer/merge.ts Outdated Show resolved Hide resolved
harmonizer/release_types.test.ts Outdated Show resolved Hide resolved
harmonizer/release_types.ts Outdated Show resolved Hide resolved
harmonizer/release_types.ts Outdated Show resolved Hide resolved
harmonizer/release_types.ts Outdated Show resolved Hide resolved
harmonizer/release_types.ts Outdated Show resolved Hide resolved
server/components/Release.tsx Outdated Show resolved Hide resolved
@phw
Copy link
Collaborator Author

phw commented Jul 7, 2024

@kellnerd I think I addressed all your latest comments.

@phw phw requested a review from kellnerd July 7, 2024 16:00
Avoids merging and sorting the types multiple times.
@kellnerd kellnerd merged commit db4e0ef into kellnerd:main Jul 8, 2024
2 checks passed
@phw phw deleted the release-types branch July 9, 2024 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request harmonizer Harmonized data representation and processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove - Single & - EP from release names automagically
2 participants