-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Structured Previews #182
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for reconciliation-api-specs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Hmm, can you speak more about the additional information and what this might look like? And what safety concerns are you thinking of? Media is a very wide net, and on the Web, there's been much evolution and will continue to be so. As evidence, in Schema.org we've already experienced several iterations of https://schema.org/MediaObject |
By "safety" I do not mean "securely" here but rather if you can determinate enough about the resource to render it correctly. From a purely web-perspective one should determinate be able to determinate what element that should be used for the browser to display the resource. I imagine by simply providing both the resource URL and a MIME type for that resource. |
I think it makes sense that structured previews would use existing structure standards of the Linked Data web. That way, structured data to power previews could actually come from many systems around the world simply through... a link. Linked Data allows an app to start at one piece of Linked Data and follow and retrieve additional data through embedded links and JSON-LD is a lightweight syntax that allows existing JSON to be interpreted as Linked Data with minimal changes. We also automatically get strings annotated with their language (we don't have to do backflips). It's also easy enough to generate expanded, compacted, table, etc. forms through existing JSON-LD client libraries. The world of Linked Data then automatically helps with preview generation through a simple...link, not even a query itself has to be formed in many cases. But there are SPARQL endpoints around the world that even do that...generate a JSON-LD serialization for their results. A Few Examples:
Your example could thus look like this (where I've added context and "sameAs" also as an expanded example). JSON-LD (with a Schema.org context):
|
I couldn't agree more @thadguidry, and JSON-LD based previews would actually make some of our services compatible simply by pointing to URIs of our various data services(example: https://fornpunkt.se/lamning/bjyWKNz.jsonld). Not to mention that the reconciliation specification would benefit of RDFs native extension support, alignment with RDF based service/data discovery vocabularies, native multilingual support, and much more. However, I wouldn't want to make this proposal dependent on such a larger effort as I see this proposal as it could be be beneficial on its own and the easily merged with a JSON-LD effort as illustrated by your example. |
Thanks for this proposal! Previews are one of the main things I would prioritize improving if I wanted to continue developing my Wikibase reconciliation service, as people have been asking for adding much more data to them. I would essentially add all statements of the Wikibase item, displayed in a more compact way (perhaps with a way to prioritize certain properties if there is a sensible way to do so). For all the corresponding data to fit in a response like the one you're proposing, it would essentially amount to returning all entity data in the response. So, I see the interest in returning structured data instead of HTML from an API purity point of view, but I wonder what makes those structured previews different from fetching the data of the entity, for instance via content negotiation on the URI or via the data extension API. To me, it's not clear that there is sufficient difference between the two use cases to really justify having a separate API endpoint in the specs for that. I would also be reluctant to deprecate HTML previews, because of the freedom they give to the service to present its data in a suitable way. The data model of the reconciliation API is really basic, so in a lot of cases services will need to coerce their data into it at a loss. I think those previews are a welcome opportunity for the service to render its data in its "native" form, with the appropriate context and rendering conventions. |
I see three challenges using content negotiation on the URI(which to some extent also applies to the data extension API):
I completely agree and I argue for them not to be deprecated :-) |
We discussed this in the last meeting and I realized I have some reservations with regard to standardizing structured previews:
Presumed these three statements, it probably is not possible and does not make sense to standardize identifying properties for preview on protocol level. |
@acka47 exactly, don't standardize at the protocol level, but instead around the basic structure of a preview, as @Abbe98 and I are encouraging. That way, any service can throw into the structured preview, any and as many "disambiguating properties" that they want clients to help disambiguate between entities. |
I think that the off-topic discussion about JSON-LD might have lead to some confusion because this is exactly what this proposal addresses(and I properly contributed to that confusion). A preview "description", "title", etc is not conceptually the same as that of an entity. Consider a location as an entity ; the entity would have properties about the name of the location, a higher level administrative area etc. The preview service would pull several layers of administrative areas from various entities into the description to help with identification, it might also provide a dynamically generated map-thumbnail. Given the negative summary form the meeting I wonder if the actual use cases this proposal addresses were discussed and if the group considers any alternative approaches to solve them? |
I think the basic idea in the meeting was that such structured previews would be a kind of entity data access, for which don't have an API yet, but which might make sense to add. The discussion started like this: "we have no access to the content of an entity, like data extension, but without having to ask for specific properties", the action item is "comment about possible entity API". (The discussion moved into ways of configuring such an API, which perhaps moved the original "without having to ask for specific properties" part into the background.) So from my understanding, a possible solution would be to add a new kind of entity API, which in the simple, no-config case would return data suitable for previews? Since it's been a bit hidden in the discussion here (it's in the diff only), this is the example from the original proposal (by @Abbe98): {
"id": "http://www.wikidata.org/entity/Q2",
"name": "Earth",
"description": "third planet from the Sun in the Solar System",
"url": "https://www.wikidata.org/wiki/Q2",
"image": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/The_Blue_Marble_%285052124705%29.jpg/330px-The_Blue_Marble_%285052124705%29.jpg",
"tags": [
["terrestrial planet", "http://www.wikidata.org/entity/Q128207"],
["inner planet of the Solar System", "http://www.wikidata.org/entity/Q3504248"]
]
} If we remove the {
"id": "http://www.wikidata.org/entity/Q2",
"name": "Earth",
"description": "third planet from the Sun in the Solar System",
"image": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/The_Blue_Marble_%285052124705%29.jpg/330px-The_Blue_Marble_%285052124705%29.jpg",
"type": [
{"name": "terrestrial planet", "id": "http://www.wikidata.org/entity/Q128207"},
{"name": "inner planet of the Solar System", "id": "http://www.wikidata.org/entity/Q3504248"}
]
} This could then be returned e.g. for a simple (no-config) GET When requesting specific properties, these could be added in an additional {
"id": "http://www.wikidata.org/entity/Q2",
"name": "Earth",
"description": "third planet from the Sun in the Solar System",
"image": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/The_Blue_Marble_%285052124705%29.jpg/330px-The_Blue_Marble_%285052124705%29.jpg",
"type": [
{"name": "terrestrial planet", "id": "http://www.wikidata.org/entity/Q128207"},
{"name": "inner planet of the Solar System", "id": "http://www.wikidata.org/entity/Q3504248"}
],
"properties": [{
"id": "P138",
"name": "named after",
"values": [
{"name": "soil", "id": "Q36133"},
{"name": "land", "id": "Q11081619"},
{"name": "ball", "id": "Q838611"}
]
}]
} |
Thank you @fsteeg for the background. This proposal is only about presenting an entity for identification just like with the existing preview feature, with the difference being the format and a few standardized properties meant to move control of display and accessibility from the reconciliation endpoint to the client.
In hindsight I maybe shouldn't have chosen so similar property names, nor exemplified with Wikidata content. The properties are not intended to represent entity data 1:1 but information that can be used for previewing and identification. To exemplify using the descriptions of
I would like to emphasis that my proposal and initial PR does not mention data access nor is it in conflict with potential efforts around entity data access or JSON-LD. The proposal is solely about ensuring clients can control the display and accessibility of previews and to make it possibly to use previews in non-web environments. |
In my current service prototype, I have disambiguating properties (a few properties used for identification to clearly expose differences between other entities). Previews have a high need for showcasing disambiguating properties (not all properties of an entity - Wikidata has P1963 but it's not purely for a limited set of disambiguating properties, but instead all common properties for a type). My need is for describing my limited set of properties directly within a structured preview. Thus, the further problem that I have is when I need to give more information about those disambiguating properties themselves. For each Previews' disambiguating property, I was hoping to avoid data duplication and instead use a graph (preferably JSON-LD) so that I could describe the disambiguating properties once at the beginning of the preview. Ex. 1 Disambiguating Property "affiliation" without using JSON-LD syntax or context, but just descriptors for universal understanding {
"affiliation": {
"alias" : [
"ally",
"companion",
"cohort"
],
"description": "a loose alliance with another organization",
"about": {
"type" : "Relationship",
"alias" : "Connection"
},
"sameAs" : "http://www.w3.org/ns/org#memberOf"
}
} My service would deal mostly with Organizations and having disambiguating properties such as "leader", "affiliation", "location", "industry". Preview: { "id" : "1234",
"name" : "Affiliated Metals",
"affiliation" : "Reliance Steel & Aluminum Co.",
"location" : "Salt Lake City, Utah",
"industry" : "423510"
} |
I noticed that the proposed contents of such structured previews is quite close to the output of the Suggest Entities endpoint: I wonder if your use case could be satisfied by adding any missing fields to that response (for instance, it could be nice to have images in those auto-complete widgets). You could then call this endpoint with the full entity id as To me, it would have the advantage of consolidating an existing endpoint, meaning less implementation effort for service authors. |
@wetneb I like that idea a lot, the suggest service has in my opinion a similar intent to that of an preview(compared to that of for example data fetching). I also like the idea of just reusing the One issue that comes to mind is that an identifier might also be a legitime search, so maybe although you loose some backwards compatibility it might be worth introducing a new parameter. |
@Abbe98 can you give an example of a "legitime search"? |
The case I imagine as most common would be a search for a number in a dataset using numeric identifiers, but as a real world case consider a Wikidata search for Q1. In the case of Wikidata one could of-course move the identifiers to use the HTTP URIs but it's a luxury few reconciliation services have. |
To use it for previews though, we would need to be able to pick a specific result, right? So maybe it would have to be not just in the list, but the first element? But would we then still need a way for clients to know if the first element is actually a preview for the given identifier (to decide if it wants to show the preview or not)? |
Well, the identifier of the entity is included in the response, so I think it shouldn't be too hard for the client to filter the list of candidates, checking if their identifier is identical to the one they requested, no? |
Yes, right. So the Wikidata search for Q1 example would still be doable that way: when using it for previews, the client would pick the exact Q1 match, for an actual suggest search we'd still get / use the full result list. Here's the original example as a suggest response result (with added {
"id": "http://www.wikidata.org/entity/Q2",
"name": "Earth",
"description": "third planet from the Sun in the Solar System",
"image": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/The_Blue_Marble_%285052124705%29.jpg/330px-The_Blue_Marble_%285052124705%29.jpg",
"notable": [
{"name": "terrestrial planet", "id": "http://www.wikidata.org/entity/Q128207"},
{"name": "inner planet of the Solar System", "id": "http://www.wikidata.org/entity/Q3504248"}
]
} So indeed the only missing field would be |
I don't think we should push such complexity on to the client for things like previews, not considering that the requested entity might not be on the first page of suggested entities. I think it would be beneficial to know for sure if the value is an identifier, as it would allow service implementations to make quick identifier lookups rather than searches. |
Abstract
This proposal introduces a JSON-based alternative to the existing HTML previews with the intent to allow clients to control the user experience and presentation of said previews. A secondary intent is to make it possible to utilize previews in environments where HTML rendering might not be convenient (terminals, etc).
Status of this Proposal
This proposal reflects our experimental usage rather than the expected end result. For concerns we already have in mind, see the open questions below. If the group considers this a worthy proposal we intend to keep our experiments and this proposal in sync.
Background and Motivation
We use a few hundred reconciliation services and even though we run all services ourselves using a few shared frameworks the look and feel of the HTML previews have diverted over the years.
In addition to the issue above the client has no control(in a sane and safe way) over the look and functionality of the embedded HTML preview causing user settings related to; keyboard shortcuts, dark mode, font size, etc to differ between the client and the preview.
We can avoid some of these issues as we run all services ourselves and have our own clients, however, new problems arise with third-party clients as we do so. For example, many of our HTML previews now support dark-mode natively but OpenRefine(.org) does not, causing previews to use a different theme. With structured previews, we want to give the power of controlling theming and much more to clients like OpenRefine.
Another lesser use case is our wish to use the information provided by preview endpoints in our CLI, where HTML rendering is not an option.
Upstreaming this extension would allow our public reconciliation services to remain compatible with OpenRefine and other clients even as we deprecate HTML previews.
Open Questions and Known Issues
image
property should possibly bemedia
and contain additional information needed to render it safely.