Skip to content

Reference terms within the specs

Philipp Zumstein edited this page Oct 20, 2016 · 2 revisions

We're using bikeshed for building the HTML spec, which is used by both WHATWG and W3C for various specs like CSS and HTML, so it comes with a lot of shortcuts for defining and linking to various types of terms, values, interfaces etc.

To reference terms within the specs, we use the following scheme:

hOCR terminology CSS terminology Definition Link
class element <dfn element>ocr_page</dfn> <{ocr_page}>
title property property <dfn property>bbox</dfn> 'bbox'
metadata property <dfn property>ocr-system</dfn> 'ocr-system'
capability property value <dfn for="ocr-capabilities">ocrp_lang</dfn> (*) ''ocr-capabilities/ocrp_lang''

(*) If the property value is nested inside the property, then the for-attribute is not needed.

This is based on the following mapping of terminologies:

  • Classes define the one hOCR type of an element, so for all intents and purposes, they are equivalent to HTML elements. From hOCR perspective, It doesn't matter whether an element has tagname div or span or p but that it has exactly one class attribute that starts with ocr.
  • Title properties describe the layout/typography/recognition-related attributes of an element in key-value pairs. Most of those can be mapped to CSS properties, so treating them like property/value terms seems the best option.
  • Metadata are properties of the whole document, again key-value-pairs, so property/value fits
  • Capabilities can be classes or predefined values starting with ocrp. The latter are specific values for the metadata property ocr-capabilities.
  • Profiles: This is so underspecified at the moment that I skipped them for now.

See https://tabatkins.github.io/bikeshed/ for more information.

Clone this wiki locally