Skip to content

3. Reading and writing Gellish expressions

Andries van Renssen edited this page Nov 12, 2018 · 5 revisions

Reading files with Gellish Expressions is implemented in the Expression_list.py modules.

Gellish Expressions

Conform the 'Gellish Syntax' specification, each Gellish file in Gellish Expression Format consists of three header lines, followed by a body with an unlimited number of tables rows (lines/expressions). The first header line specifies among others in which language the expressions are written. The second header line defines a number columns for the table in the body and the third header line contains free text names of the columns. Each row in the table body forms a Gellish expression.
The table columns are identified by integer number IDs. Columns with blank column IDs are allowed but their content is for human interpretation only and should be ignored by software, The columns in actual use in a table depend on the application, therefore the number of columns may vary per file. The columns may be arranged in any sequence, because their IDs determine their interpretation.
For the Communicator a full set of columns is implemented for which a reference table is defined in the Expr_Table_Def.py module, where each column is given a column name (for usage within the Communicator software).
For example, the two expressions

  • the Eiffel tower 'is located in' Paris
  • the temperature in Paris 'has on scale a value greater than' 20 degC

are stored in a table with 4 naming columns and 5 UID columns. The names of those four naming columns according the reference table are:

  • lh_name_col, for the name of the left hand object,
  • rel_type_name_col, for the name of the kind of relation,
  • rh_name_col, for the name of the right hand object,
  • uom_name_col, for the name (symbol) of the unit of measure.

The 5 UIDs are: one UID for the expressed idea and 4 UIDs for the concepts denoted by the names.
Note !: A specification of the period during which a statement is valid requires two additional date-time columns, such as required for specifying the period that a temperature has or is above a particular value.
Note 2: The kinds of relations and the kinds of things (such as degC) are standard concepts in Gellish, so that the software can build on that, e.g. by providing unit of measure conversion, intelligent searching, reasoning, etc.

Table rearrangement conform the reference table definition

The freedom of appearance of columns in Gellish expressions implies that the interpretation of expressions requires that the columns are interpreted conform the IDs of the columns in the table. In the Communicator software is chosen that each input table is first rearranged conform the reference table. This is done in the method Import_expressions_from_Gellish_file. After that process several defaults are applied and checks are performed, as is implemented in the Verify_row method.

Bootstrapping

For the interpretation of the base ontology section of the language definition (the Base Formal Language defining section) there is not yet a language definition available. Therefore, the prime semantics is provided in the definition of the kinds of relations that are specified in the Bootstrapping.py module. The (7) bootstrapping kinds of relations are themselves defined in the base ontology section in the same way as how the other kinds of relations are defined. The Bootstrapping.py module also includes identifiers of concepts that are defined in the base ontology section.

Building a semantic network database

First of the base ontology in interpreted, then the scales, followed by other domain ontologies. Each rearranged expressions is interpreted, primarily in a way that is determined by the kind of (binary) relation that classifies the relation. If the left hand object and/or the right hand object UID is unknown then an object is created, its name is recorded, and in case it is a defining relation (classification, specialization or alias relation) then the various names in language and language community contexts and textual definitions are collected, whereas each related object is related to the other related object. The names in contexts are added to the dictionary of the network.

Writing new expressions

New Unique Identifiers (UIDs)

New expressions require that each concept (kind and individual) is represented by a unique identifier (UID). For facilitating the generation of new UIDs, the first header line provides the opportunity for specifying lower and upper boundaries of a range of UIDs for the objects and a range of UIDs for the ideas (relations). Software should first determine the first free UID within those ranges and then it can allocate new UIDs for new concepts. (In queries the unknowns should get a UID in the range 1-99). Gellish dictionary UIDs are numeric strings, other UIDs may be alpha-numeric and may start with a prefix. For example rdf concepts have 'rdf:' as a standard prefix. Thus the rdf concept 'type' can be represented in Gellish by its own UID as 'ref:type'. However, when concepts are harmonized, they may appear to be identical to an existing concept in the Gellish Taxonomic Dictionary. As rdf:type denotes a classification relation, it is identical to the Gellish classification relation concept (UID=1225), which has as base phrase 'is classified as a'. Therefore in Gellish expressions UID 1225 should be used, while still having the option of using the name rdf:type and using rdf as the language community context.
In other words, the expressions {Paris 'is classified as a' city} and {Paris 'rdf:type' city} are both valid Gellish expressions and the components of the two expressions have identical UIDs and identical meanings.

Creating expressions

The Semantic Modeling Methodology explains how to express information in the form of collections of Gellish expressions. That methodology is described in the wiki on the Gellish website and extensively described in the book 'Semantic Information Modeling Methodology'.

Each idea in a table with Gellish expressions is expressed in the language that is speciied on the first line of the header of the table. Nevertheless, to enable multi-lingual expressions, each line in the table body provides the option to specify the language of the name of the left hand object. Furthermore, to enable using synonyms and homonyms, there is an optional column for specifying a 'language community' context which specifies where the name of the left hand object has its base. Typically each expressions needs an intention, such as being a 'statement' (which is the default), and it has an approval status, such as 'accepted' or 'proposed' to differentiate between expressions that need to be approved and those that are approved. Other possible 'Contextual facts' are the begin and end of time of validity of an expression, which is typically used to indicate the validity period of measured property values. Each created expression will have a number of such 'contextual facts' that support its proper interpretation. The generation of Gellish expressions is implemented in the Create_output_file module of the Communicator reference application.

The Gellish website

Information about the definition and application of the Gellish family of formalized languages is available via the Gellish website.

Semantic Modeling Methodology

The book Semantic Information Modeling Methodology describes the application of Gellish formalized languages.

Language defining ontology

The book Formalized languages for Semantic Information Modeling describes the base ontology that defines Gellish formalized languages.

Clone this wiki locally