diff --git a/docs/about.md b/docs/about.md index 67debd7..591e095 100644 --- a/docs/about.md +++ b/docs/about.md @@ -3,8 +3,8 @@ # The Dead Sea Scrolls (DSS) -> The Dead Sea Scrolls (also Qumran Caves Scrolls) are ancient Jewish religious manuscripts -found in the Qumran Caves in the Judaean Desert, +> The Dead Sea Scrolls (also Qumran Caves Scrolls) are ancient Jewish religious +manuscripts found in the Qumran Caves in the Judaean Desert, near Ein Feshkha on the northern shore of the Dead Sea. > [Wikipedia article on DSS](https://en.m.wikipedia.org/wiki/Dead_Sea_Scrolls). @@ -21,23 +21,28 @@ a project that broke the lengthy publication monopoly held on the scrolls. The contents of this repo is created during the [*Creating Annotated Corpora of Classical Hebrew Texts (CACCHT) project*]() -carried out by Jarod Jacobs, Martijn Naaijer, Dirk Roorda, Robert Rezetko, Oliver Glanz, and Wido van Peursen. +carried out by Jarod Jacobs, Martijn Naaijer, Dirk Roorda, Robert Rezetko, +Oliver Glanz, and Willem van Peursen. -The DSS texts and morphological data connected with them were generously provided by Martin Abegg. +The DSS texts and morphological data connected with them were generously +provided by Martin Abegg. They consist of two foundational sets of data: transcriptions and morphological tagging. The transcriptions come from various sources, -but primarily reflect what is found in the Discoveries in the Judean Desert series (Oxford:Clarendon Press, 1955-). +but primarily reflect what is found in the Discoveries in the Judean Desert +series (Oxford:Clarendon Press, 1955-). For full details see: [DSSB-Read me first](assets/readme-dssb.pdf) and [QUMRAN - Read me first](assets/readme-qumran.pdf). -In addition to what is derived from the Abegg sources, Martijn Naaijer has provided several extras: +In addition to what is derived from the Abegg sources, Martijn Naaijer has +provided several extras: * ETCBC morphological feature data * clause and phrase boundaries -Both kinds of data are the result of creating models (*machine learning models*) from the BHSA and +Both kinds of data are the result of creating models (*machine learning +models*) from the BHSA and applying them to the DSS. This is experimental. ## Abegg sources @@ -45,19 +50,22 @@ applying them to the DSS. This is experimental. Abegg started morphologically tagging the Qumran texts in the mid-90s with the assistance of several people that he mentions in the above read me first files. Over the following decades, Abegg completed full morphological tagging -of nearly every Hebrew and Aramaic scroll found in the Judaean Desert between 1947 and today. +of nearly every Hebrew and Aramaic scroll found in the Judaean Desert between +1947 and today. For more information about the development and particularities of Abegg’s data, we will once again point you to the DSSB and QUMRAN read me first files. The tagging scheme itself is also [documented](assets/morph.pdf). -After conversion to Text-Fabric, the these tags have been normalized into seperate features, -such as *sp (part-of-speech)*, *ps (person)*, *nu (number)*, *gn (gender)*, etc. +After conversion to Text-Fabric, the these tags have been normalized into +separate features, such as +*`sp` (part-of-speech)*, *`ps` (person)*, *`nu` (number)*, *`gn` (gender)*, etc. -See [morhpological features in TF](transcription.md#morphological-features). +See [morphological features in TF](transcription.md#morphological-features). -Upon learning of the current project, Martin Abegg graciously gave permission to Jarod Jacobs to use his data and +Upon learning of the current project, Martin Abegg graciously gave permission +to Jarod Jacobs to use his data and to distribute the results under a CC-BY-NC license. The corpus consists of two files, one for the non-biblical scrolls and one for the @@ -69,7 +77,8 @@ who subsequently converted the source data files to Text-Fabric format by means of a special purpose Python program [tfFromAbegg.py](../programs/tfFromAbegg.py). -This program performs numerous checks, and as a result several corrections have been made. +This program performs numerous checks, and as a result several corrections have +been made. The conversion logs have been [preserved](https://github.com/ETCBC/dss/tree/master/log). @@ -82,16 +91,17 @@ They are plain text files that roughly correspond to the columns in the data fil A single `.tf` file is called a feature. It maps nodes to values. However, we have separated out all text-critical and morphological information into -additional features, thereby greatly uncluttering the wealth of information in these files. +additional features, thereby greatly uncluttering the wealth of information in +these files. ## Naaijer extras -As of data version 0.7, additional features have been added in, mostly adaptions of existing -features to the ETCBC format, prepared by Martijn Naaijer. +As of data version 0.7, additional features have been added in, mostly +adaptions of existing features to the ETCBC format, prepared by Martijn Naaijer. Version 0.9 contains clause and phrase boundaries. This version is available on GitHub but is still work in process, so it is not yet -an offical release. You can work with it by means of +an official release. You can work with it by means of ``` sh text-fabric ETCBC/dss:hot --version=0.9 diff --git a/docs/transcription.md b/docs/transcription.md index 1ef95ba..c614850 100644 --- a/docs/transcription.md +++ b/docs/transcription.md @@ -16,8 +16,8 @@ See also The corpus consists of two files, one for the non-biblical scrolls and one for the biblical scrolls. -In both files, the material is subdivided into *scroll*, *fragment*, *line*. -In the biblical file, references to *book*, *chapter* and *verse* are marked +In both files, the material is subdivided into `scroll`, `fragment`, `line`. +In the biblical file, references to `book`, `chapter` and `verse` are marked at the word level. Some scrolls contain biblical as well as non-biblical materials. @@ -40,50 +40,50 @@ Every line in both files has fields for and some bits of extra information. The Text-Fabric model views the text as a series of atomic units, called -*slots*. In this corpus [*signs*](#sign) are the slots. +*slots*. In this corpus [`signs`](#sign) are the slots. On top of that, more complex textual objects can be represented as *nodes*. In this corpus we have node types for: -[*sign*](#sign), -[*word*](#word), -[*lex*](#lex), -[*cluster*](#cluster), -[*line*](#line), -[*fragment*](#fragment), -[*scroll*](#scroll), +[`sign`](#sign), +[`word`](#word), +[`lex`](#lex), +[`cluster`](#cluster), +[`line`](#line), +[`fragment`](#fragment), +[`scroll`](#scroll), The type of every node is given by the feature -[**otype**](https://annotation.github.io/text-fabric/tf/cheatsheet.html#f-node-features). +[*`otype`*](https://annotation.github.io/text-fabric/tf/cheatsheet.html#f-node-features). Every node is linked to a subset of slots by -[**oslots**](https://annotation.github.io/text-fabric/tf/cheatsheet.html#special-edge-feature-oslots). +[*`oslots`*](https://annotation.github.io/text-fabric/tf/cheatsheet.html#special-edge-feature-oslots). Nodes can be annotated with features. Relations between nodes can be annotated with edge features. See the table below. -Text-Fabric supports up to three customizable section levels. +Text-Fabric supports up to three customisable section levels. In this corpus we use: -[*scroll*](#scroll) and [*fragment*](#fragment) and [*line*](#line). +[`scroll`](#scroll) and [`fragment`](#fragment) and [`line`](#line). ## Transcription -We map the transcriptions and lexemes to Hebrew unicode. +We map the transcriptions and lexemes to Hebrew UNICODE. The transcriptions are consonant only, the lexemes are pointed. The vowels we encounter in those lexemes have been transcribed by one or more special characters, probably in order to fine-tune the position of those points with respect to their consonants. -We reduce them to single Hebrew unicodes per vowel. +We reduce them to single Hebrew UNICODEs per vowel. There are bracketing constructs in the transcription, such as `<< >>`, `« »`, `[ ]`. It turns out that in the files as we see them, they are consistently written as if in the right to left writing direction. So they appear as `>> <<`, `» «`, `] [`. -When we reproduce the orginal transcription, we put them all back into the left-to-right orientation, +When we reproduce the original transcription, we put them all back into the left-to-right orientation, because this is the intended direction. The cause for encountering them in the opposite orientation might be that -we have stripped all unicode orientation characters (202A-202E) -in our sanitizing preprocessing step. +we have stripped all UNICODE orientation characters (202A-202E) +in our sanitizing pre-processing step. We also supply the ETCBC transcription for Hebrew material. For the full details see the extensive @@ -94,11 +94,11 @@ For the full details see the extensive *(Keep this under your pillow)* Some features come in three variants, a main variant -and two variants with the letter *e* of *o* after the feature name. +and two variants with the letter `e` of `o` after the feature name. -* *main variant* the unicode value -* **e** the ETCBC transliteration, or something that extends it -* **o** the original transcription (as in the source files) +* *main variant* the UNICODE value +* *`e`* the ETCBC transliteration, or something that extends it +* *`o`* the original transcription (as in the source files) ## *absent* @@ -118,14 +118,14 @@ See also [search templates](https://annotation.github.io/text-fabric/tf/about/searchusage.html) under **Value specifications**. -## Node type [*sign*](#sign) +## Node type [`sign`](#sign) Basic unit containing a single symbol, mostly a consonant, but it can also be punctuation, or a text-critical sign. The type of sign is stored in the feature `type`. -type | source | etcbc | unicode | description +type | source | ETCBC | UNICODE | description ------- | ------ | ------ | --- | --- `cons` | `m` `M` | `M` `m`| `מ` `ם` | normal consonantal letter `vwl` | `I` | `I`| ` ִ ` | vowel point @@ -141,32 +141,32 @@ type | source | etcbc | unicode | description `add` | `+` | ` + ` | `+` | representation of an addition between numerals `term` | `/` | `╱` | `╱` | representation of an end of line -feature | values | source | ETCBC | Unicode | description +feature | values | source | ETCBC | UNICODE | description ------- | ------ | ------ | ----------- | --- | --- -**after** | ` ` | | | whether there is a space after the last sign of a word and before the next word -**alt** | `1` | `lwz/)h(` | `LWZ61(H)` | | indicates an alternative material, marked by being within brackets `( )` -**cor** | `1` | `yqw>mw)n` | | material is corrected by a modern editor, marked by being within single angle brackets `< >` -**cor** | `2` | `>>zwnh«<<` | `(<< ZWNH# >>)` | | material is corrected by an ancient editor, marked by being within double angle brackets `<< >>` -**cor** | `3` | `^dbr/y^` | `(^ DBR ? J ^)` | | material is corrected by an ancient editor, supralinear, marked by being within carets `^ ^` -**glyph[eo]** | | `m` | `M` | `מ` | transliteration of an individual sign -**lang** | `a` `g` | | | language, `a` is Aramaic, `g` is Greek, absent means Hebrew -**rec** | `1` | `]p[n»y` | `[P]N#?Y` | | material is reconstructed by a modern editor, marked by being within square brackets `[ ]` -**rem** | `1` | `}m«x«r«yØM«{` | `{M#Y#R#J?m#}` | | material is removed by a modern editor, marked by being within single braces `{ }` -**rem** | `2` | `twlo}}t{{` | `TWL<{{t}}` | | material is removed by an ancient editor, marked by being within double braces `{{ }}` -**type** | | | | | type of sign, see table above -**unc** | `1` | `b«NØ` | `B#n?` | | indicates *uncertainty of degree=1* by flag `|` -**unc** | `2` | `at«` `aj«y»/K` | `>T#` `>X#J#?) ? k` | | indicates *uncertainty of degree=2* by flag `«` or brackets `« »`, in this example the `« »` are not brackets but individual tokens -**unc** | `3` | `]p[n»y` | `[P]N#?Y` | | indicates *uncertainty of degree=3* by flag `»` -**unc** | `4` | `a\|hrwN` | `>#?HRWn` | | indicates *uncertainty of degree=4* by flag `\|` -**vac** | `1` | `≥ ≤` | `(- -)` | | indicates an empty, unwritten space by brackets `≤ ≥` +*`after`* | ` ` | | | whether there is a space after the last sign of a word and before the next word +*`alt`* | `1` | `lwz/)h(` | `LWZ61(H)` | | indicates an alternative material, marked by being within brackets `( )` +*`cor`* | `1` | `yqw>mw)n` | | material is corrected by a modern editor, marked by being within single angle brackets `< >` +*`cor`* | `2` | `>>zwnh«<<` | `(<< ZWNH# >>)` | | material is corrected by an ancient editor, marked by being within double angle brackets `<< >>` +*`cor`* | `3` | `^dbr/y^` | `(^ DBR ? J ^)` | | material is corrected by an ancient editor, supralinear, marked by being within carets `^ ^` +*`glyph[eo]`* | | `m` | `M` | `מ` | transliteration of an individual sign +*`lang`* | `a` `g` | | | language, `a` is Aramaic, `g` is Greek, absent means Hebrew +*`rec`* | `1` | `]p[n»y` | `[P]N#?Y` | | material is reconstructed by a modern editor, marked by being within square brackets `[ ]` +*`rem`* | `1` | `}m«x«r«yØM«{` | `{M#Y#R#J?m#}` | | material is removed by a modern editor, marked by being within single braces `{ }` +*`rem`* | `2` | `twlo}}t{{` | `TWL<{{t}}` | | material is removed by an ancient editor, marked by being within double braces `{{ }}` +*`type`* | | | | | type of sign, see table above +*`unc`* | `1` | `b«NØ` | `B#n?` | | indicates *uncertainty of degree=1* by flag `|` +*`unc`* | `2` | `at«` `aj«y»/K` | `>T#` `>X#J#?) ? k` | | indicates *uncertainty of degree=2* by flag `«` or brackets `« »`, in this example the `« »` are not brackets but individual tokens +*`unc`* | `3` | `]p[n»y` | `[P]N#?Y` | | indicates *uncertainty of degree=3* by flag `»` +*`unc`* | `4` | `a\|hrwN` | `>#?HRWn` | | indicates *uncertainty of degree=4* by flag `\|` +*`vac`* | `1` | `≥ ≤` | `(- -)` | | indicates an empty, unwritten space by brackets `≤ ≥` ## Biblical or not biblical -The feature `biblical` is defined for *scrolls*, *fragments*, *lines*, *clusters*, and *words*. +The feature `biblical` is defined for `scrolls`, `fragments`, `lines`, `clusters`, and `words`. value | node type | description --- | --- | --- -*absent* | `scroll` `fragment` `line` `word` `cluster` | material is completely non-biblical +`absent` | `scroll` `fragment` `line` `word` `cluster` | material is completely non-biblical `1` | `scroll` `fragment` `line` `word` `cluster` | material is completely biblical `2` | `scroll` `fragment` | material is partly biblical, partly non-biblical `2` | `line` | material is biblical, but the line also occurs in the non-biblical file, see remark below @@ -175,17 +175,17 @@ value | node type | description **Remark** For lines with `biblical=2` we have included the material according to the biblical source file -and we have discarded the material according to the nonbiblical source file. +and we have discarded the material according to the non-biblical source file. There are only 14 of such lines, 6 of them are identical in both source files, and the rest has a reconstruction in the biblical source file (marked as such by `[ ]` brackets and hardly any definite material in the non-biblical source file. -## Node type [*word*](#word) +## Node type [`word`](#word) Sequence of signs separated corresponding to a single line in the source files. Whether a word is adjacent to a next word can be gleaned from the numbering of the word in the source file. -If so, we leave the *after* feature without value. +If so, we leave the `after` feature without value. There are several types of things that can occupy a word: a string of consonants, a numeral, punctuation, nothing, ... @@ -202,25 +202,25 @@ type | description If a transcription field is empty, but there is lexeme information, we insert a word node with type `glyph` -and all of its textual features (*full[eo], glyph[eo], punc[eo]*) absent. +and all of its textual features (`full[eo], glyph[eo], punc[eo]`) absent. We add a slot of type `empty` to this word. -feature | source | ETCBC | Unicode | description +feature | source | ETCBC | UNICODE | description ------- | ------ | ------ | --- | -------- -**after** | ` ` | | | whether there is a space after a word and before the next word -**full[eo]** | `mm/nw[` | `MM61NW]` | `ממ׳נו]` | full transcription of a word, including flags and clustering characters -**g_cons[eo]** | `mmnw` | `MMNW]` | `ממנו` | consonantal letters of a word in ETCBC encoding excluding flags and brackets -**glex[eo]** | `mIN` | `MIn` | `מִן` | lexeme of a word, without non-textual characters -**glyph[eo]** | `mmnw` | `MMNW]` | `ממנו` | letters of a word excluding flags and brackets -**intl** | `1` `2` | | | if the physical word is on an interlinear line, this is `1`, if there are two interlinear lines at that point, the words on the first line get `1` and words on the second line gets `2` -**lang** | `a` `g` | | | language, `a` is Aramaic, `g` is Greek, absent means Hebrew -**lex_etcbc** | `mIN` | `MIn` | `מִן` | consonantal lexeme of a word in ETCBC encoding -**lex[eo]** | `mIN` | `MIn` | `מִן` | lexeme of a word -**punc[eo]** | `.` | `00` | `׃` | punctuation at the end of a word -**morpho** | `vHi1cpX3mp` | | | original morphological tag for this word; all information in this has been decomposed into the morphological features below -**script** | `paleohebrew` `greekcapital` | | | indicates the script in which the word is written -**srcLn** | `424242` | | | line number of this word in its source data file; use `biblical` to find out whether it is the bib or the nonbib file -**type** | | | | type of word, see table above +*`after`* | ` ` | | | whether there is a space after a word and before the next word +*`full[eo]`* | `mm/nw[` | `MM61NW]` | `ממ׳נו]` | full transcription of a word, including flags and clustering characters +*`g_cons[eo]`* | `mmnw` | `MMNW]` | `ממנו` | consonantal letters of a word in ETCBC encoding excluding flags and brackets +*`glex[eo]`* | `mIN` | `MIn` | `מִן` | lexeme of a word, without non-textual characters +*`glyph[eo]`* | `mmnw` | `MMNW]` | `ממנו` | letters of a word excluding flags and brackets +*`intl`* | `1` `2` | | | if the physical word is on an interlinear line, this is `1`, if there are two interlinear lines at that point, the words on the first line get `1` and words on the second line gets `2` +*`lang`* | `a` `g` | | | language, `a` is Aramaic, `g` is Greek, absent means Hebrew +*`lex_etcbc`* | `mIN` | `MIn` | `מִן` | consonantal lexeme of a word in ETCBC encoding +*`lex[eo]`* | `mIN` | `MIn` | `מִן` | lexeme of a word +*`punc[eo]`* | `.` | `00` | `׃` | punctuation at the end of a word +*`morpho`* | `vHi1cpX3mp` | | | original morphological tag for this word; all information in this has been decomposed into the morphological features below +*`script`* | `paleohebrew` `greekcapital` | | | indicates the script in which the word is written +*`srcLn`* | `424242` | | | line number of this word in its source data file; use `biblical` to find out whether it is the biblical or the non-biblical file +*`type`* | | | | type of word, see table above ### Biblical reference @@ -228,11 +228,11 @@ Words coming from the biblical source file have references to a passage in the B feature | examples | description --- | --- | --- -**biblical** | `1` `2` | 1 or 2 if this word is biblical material, otherwise absent, see section on biblical -**book** | `Gen` `1Q1` | the book of the corresponding passage -**chapter** | `3` `f6` | the chapter of the corresponding passage -**verse** | `1` `2` | the verse of the corresponding passage -**halfverse** | `a` `b` (the only values)| the halfverse of the corresponding passage +*`biblical`* | `1` `2` | 1 or 2 if this word is biblical material, otherwise absent, see section on biblical +*`book`* | `Gen` `1Q1` | the book of the corresponding passage +*`chapter`* | `3` `f6` | the chapter of the corresponding passage +*`verse`* | `1` `2` | the verse of the corresponding passage +*`halfverse`* | `a` `b` (the only values)| the half-verse of the corresponding passage **N.B** Many times chapters are not really chapter numbers of books, but fragments of scrolls. @@ -260,14 +260,14 @@ There you see also the connection with the original Abegg encoding of morphologi We have switched to slightly more verbose feature values, and to feature names that are in line with those of the [BHSA](https://etcbc.github.io/bhsa/). -The original tag as a whole is also available in the feature **morpho**. +The original tag as a whole is also available in the feature *`morpho`*. We only describe the plain features here, but keep in mind that they may be accompanied by their numbered brothers. Al these features may contain the value `unknown`. -The *xxx*`_etcbc` features below are part of the extra features by Martijn Naaijer, +The `xxx_etcbc` features below are part of the extra features by Martijn Naaijer, which have been produced in a different way, not based on the Abegg sources. They are the product of a model trained on BHSA data which has been subsequently applied to the DSS. We mark them as *derived from BHSA* in the table below. @@ -276,42 +276,42 @@ See [ETCBC/DSS2ETCBC](https://github.com/ETCBC/DSS2ETCBC). feature | examples | description ------- | ------ | ------ -**sp** | `subs` `verb` `numr` `ptcl` | part-of-speech -**sp_etcbc** | `subs` `verb` `numr` `ptcl` | idem, but derived from BHSA -**cl** | `card` `prp` `prep` | class, i.e. a sub category within its part-of-speech -**ps** | `1` `2` `3` | person -**ps_etcbc** | `p1` `p2` `p3` `NA` | idem, but derived from BHSA -**gn** | `m` `f` `c` `b` | gender, also with *common* and *both* -**gn_etcbc** | `m` `f` `NA` `unknown` | idem, but derived from BHSA -**nu** | `s` `p` `d` | number, also with *dual* -**nu_etcbc** | `sg` `pl` `du` `NA` | idem, but derived from BHSA -**st** | `a` `c` `d` | state, also with *determined* -**cs** | `nom` `acc` `gen` | case -**vs** | `qal` `passive` `piel` `hifil` `hithpolel` | verbal stem, also with *passive*, some are Hebrew, some are Aramaic -**vs_etcbc** | `qal` `passive` `piel` `hif` `htpo` | idem, but derived from BHSA -**vt** | `perf` `impf` `wayy` `impv` `infc` `infa` `ptca` `ptcp` | verbal tense or aspect, also with *wayyiqtol* -**vt_etcbc** | `perf` `impf` `wayq` `impv` `infc` `infa` `ptca` `ptcp` `NA` | idem, but derived from BHSA -**md** | `juss` `coho` `cons` | mood +*`sp`* | `subs` `verb` `numr` `ptcl` | part-of-speech +*`sp_etcbc`* | `subs` `verb` `numr` `ptcl` | idem, but derived from BHSA +*`cl`* | `card` `prp` `prep` | class, i.e. a sub category within its part-of-speech +*`ps`* | `1` `2` `3` | person +*`ps_etcbc`* | `p1` `p2` `p3` `NA` | idem, but derived from BHSA +*`gn`* | `m` `f` `c` `b` | gender, also with `common` and `both` +*`gn_etcbc`* | `m` `f` `NA` `unknown` | idem, but derived from BHSA +*`nu`* | `s` `p` `d` | number, also with `dual` +*`nu_etcbc`* | `sg` `pl` `du` `NA` | idem, but derived from BHSA +*`st`* | `a` `c` `d` | state, also with `determined` +*`cs`* | `nom` `acc` `gen` | case +*`vs`* | `qal` `passive` `piel` `hifil` `hithpolel` | verbal stem, also with `passive`, some are Hebrew, some are Aramaic +*`vs_etcbc`* | `qal` `passive` `piel` `hif` `htpo` | idem, but derived from BHSA +*`vt`* | `perf` `impf` `wayy` `impv` `infc` `infa` `ptca` `ptcp` | verbal tense or aspect, also with `wayyiqtol` +*`vt_etcbc`* | `perf` `impf` `wayq` `impv` `infc` `infa` `ptca` `ptcp` `NA` | idem, but derived from BHSA +*`md`* | `juss` `coho` `cons` | mood If the parsing of the morphology tag has been inconclusive, there will be an error feature present on that word: feature | examples | description ------- | ------ | ------ -**merr** | `vnPfpa` `@0` | the characters are those that are not recognized by the parser at that point +*`merr`* | `vnPfpa` `@0` | the characters are those that are not recognized by the parser at that point -## Node type [*lex*](#lex) +## Node type [`lex`](#lex) The type of lexemes, as found in the lexeme field of the source data files. -feature | source | ETCBC | Unicode | description +feature | source | ETCBC | UNICODE | description ------- | ------ | ------ | --- | -------- -**lex[eo]** | `mIN` | `MIn` | `מִן` | lexeme of a word -**complete** | 1 | | | 1 if the lexeme is complete, i.e. without uncertain characters +*`lex[eo]`* | `mIN` | `MIn` | `מִן` | lexeme of a word +*`complete`* | 1 | | | 1 if the lexeme is complete, i.e. without uncertain characters **N.B.** Lexemes may contain characters with an uncertainty level, such as `#` and `?`. -See the under [*sign*](#node-type-sign) above. +See the under [`sign`](#node-type-sign) above. Lexemes are connected to their occurrence words by means of an edge feature: @@ -326,9 +326,9 @@ words = E.occ.f(lex) lex = E.occ.t(word)[0] ``` -## Node type [*cluster*](#cluster) +## Node type [`cluster`](#cluster) -Grouped sequence of [*signs*](#sign). There are different +Grouped sequence of [`signs`](#sign). There are different types of these bracketings. Clusters of the same type are not nested. Clusters of different types need not be nested properly with respect to each other. @@ -351,65 +351,65 @@ type | value | examples | description Each cluster induces a sign feature with the same name as the type of the cluster, which gets a numeric value, as indicated in the table. -Note the *vac* cluster: by definition, it contains no signs. +Note the `vac` cluster: by definition, it contains no signs. In order to anchor it into the text sequence, we have generated an empty slot in each vacat cluster. We have done the same for other clusters that happened to be without other slots. -**N.B.**: Note that such clusters do not have *words* inside them, only an empty *sign*. +**N.B.**: Note that such clusters do not have `words` inside them, only an empty `sign`. These are cases of signs that do not belong to words! Other features: feature | examples | description ------- | ------ | ------ -**biblical** | `1` `2` | 1 or 2 if this cluster is biblical material, otherwise absent, see section on biblical +*`biblical`* | `1` `2` | 1 or 2 if this cluster is biblical material, otherwise absent, see section on biblical -## Node type [*line*](#line) +## Node type [`line`](#line) Section level 3. -Subdivision of a containing [*fragment*](#fragment). -Corresponds to a set of source data lines with the same value in the *line* column. +Subdivision of a containing [`fragment`](#fragment). +Corresponds to a set of source data lines with the same value in the `line` column. feature | values | description ------- | ------ | ------ -**biblical** | `1` `2` | 1 or 2 if this line is biblical material, otherwise absent, see section on biblical -**line** | `3` | number of a line of a fragment (not necessarily integer valued) -**fragment** | `f3` | label of a fragment or column of a scroll -**scroll** | `1Q1` | short name of a scroll +*`biblical`* | `1` `2` | 1 or 2 if this line is biblical material, otherwise absent, see section on biblical +*`line`* | `3` | number of a line of a fragment (not necessarily integer valued) +*`fragment`* | `f3` | label of a fragment or column of a scroll +*`scroll`* | `1Q1` | short name of a scroll There are lines in the source data with number `0` and with a subdivision by means of an other number. We have converted this situation to a sequence of lines numbered as `0.1`, `0.1`, etc. Hence the number of a line is not always an integer. So we store the number in a feature named `label`, instead of number. -## Node type [*fragment*](#fragment) +## Node type [`fragment`](#fragment) Section level 2. -Subdivision of a containing [*scroll*](#scroll). -Corresponds to a set of source data lines with the same value in the *fragment* column. +Subdivision of a containing [`scroll`](#scroll). +Corresponds to a set of source data lines with the same value in the `fragment` column. -For non-biblical scrolls, the fragment is usually called *column*. +For non-biblical scrolls, the fragment is usually called `column`. feature | values | description ------- | ------ | ------ -**biblical** | `1` `2` | 1 or 2 if this fragment contains biblical material, otherwise absent, see section on biblical -**fragment** | `f3` | label of a fragment or column of a scroll -**scroll** | `1Q1` | short name of a scroll +*`biblical`* | `1` `2` | 1 or 2 if this fragment contains biblical material, otherwise absent, see section on biblical +*`fragment`* | `f3` | label of a fragment or column of a scroll +*`scroll`* | `1Q1` | short name of a scroll -## Node type [*scroll*](#scroll) +## Node type [`scroll`](#scroll) Section level 1. -Corresponds to a set of source data lines with the same value in the *scroll* column. +Corresponds to a set of source data lines with the same value in the `scroll` column. feature | values | description ------- | ------ | ------ -**biblical** | `1` `2` | 1 or 2 if this scroll contains biblical material, otherwise absent, see section on biblical -**scroll** | `1Q1` | short name of a scroll +*`biblical`* | `1` `2` | 1 or 2 if this scroll contains biblical material, otherwise absent, see section on biblical +*`scroll`* | `1Q1` | short name of a scroll # More about the node types @@ -420,18 +420,19 @@ a textual object. Some node types will be marked as a section level. This is the basic unit of writing. -**The node type [*sign*](#sign) is our slot type in the Text-Fabric representation of this corpus.** +**The node type [`sign`](#sign) is our slot type in the Text-Fabric representation of this corpus.** Slots are the textual positions. They are be occupied by individual glyphs (consonants, "digits", punctuation, miscellaneous glyphs). -All signs have the features **type** and **glyph[eo]**. +All signs have the features *`type`* and *`glyph[eo]`*. ### Glyphs -The *type* stores the kind of glyph, such as `cons`. -The *glyph glyphe glypho* features store the transcription of the glyph, without any flags -and brackets. They store it in Unicode, ETCBC transcription, and source transcription. +The `type` stores the kind of glyph, such as `cons`. +The `glyph glyphe glypho` features store the transcription of the glyph, +without any flags and brackets. They store it in UNICODE, ETCBC transcription, +and source transcription. These features do not suffice to reconstruct the original source transcription, because the flags and brackets are not part of them. @@ -439,25 +440,25 @@ and brackets are not part of them. #### Punctuation Punctuation is either a mark or a white space, or a boundary. -All punctuation characters have Unicode representations. +All punctuation characters have UNICODE representations. For some we have *borrowed* a Hebrew character that has a different meaning in the Masoretic text, but that does not occur otherwise in the Dead Sea Scrolls. The reason is that we can represent Hebrew consonants plus punctuation in a smooth, right-to-left way. -source | etcbc | unicode | description +source | ETCBC | UNICODE | description --- | --- | --- | --- ` ` | `_` | ` ` | non-breaking intra-word space `-` | `&` | `־` | maqaf `.` | `00` | `׃` | sof pasuq -`±` | `0000` | `׃׃` | double sof pasuq (mis)used as paleo divider -`/` | `61` | `׳` | geresh (punctuation, not accent) (mis)used as morpheme break +`±` | `0000` | `׃׃` | double sof pasuq, questionably used as paleo divider +`/` | `61` | `׳` | geresh (punctuation, not accent), questionably used as morpheme break #### Numerals Numerals are ancient signs for denoting quantities. -source | etcbc | unicode | value +source | ETCBC | UNICODE | value --- | --- | --- | --- `A` | `>'` | `א֜` | 1 `å` | `>52` | `אׄ` | 1 @@ -470,11 +471,11 @@ source | etcbc | unicode | value #### Miscellaneous Several characters have to do with uncertainty and illegibility. -They have an improvised Unicode representations. +They have an improvised UNICODE representations. We propose an transcription that works inside the ETCBC transcription. Note that these have spaces around them. -source | etcbc | unicode | description +source | ETCBC | UNICODE | description --- | --- | --- | --- `--` | ` 0 ` | `ε` | missing sign `?` | ` ? ` | ` ? ` | uncertain sign, degree 1 @@ -486,11 +487,11 @@ source | etcbc | unicode | description Signs also have features corresponding to flags and brackets, that store under which flag or inside which brackets the sign occurs: -**unc** **cor** **rem** **vac** **alt** **rec**. +*`unc`* *`cor`* *`rem`* *`vac`* *`alt`* *`rec`*. #### Flags -*Signs* may have *flags*. +`Signs` may have *flags*. In transcription they show up as a special trailing character. Flags code for signs that are damaged, questionable (in their reading), in short: uncertain. They apply to the preceding character. @@ -498,9 +499,9 @@ They apply to the preceding character. We propose an transcription that works inside the ETCBC transcription. Note that these have *no* spaces around them. -We use this for the Unicode represenatation as well. +We use this for the UNICODE representation as well. -source | etcbc / unicode | description +source | ETCBC / UNICODE | description --- | --- | --- `Ø` | `?` | uncertain, degree 1 `«` | `#` | uncertain, degree 2 @@ -511,8 +512,8 @@ Note that there is also a bracket pair for uncertainty level 2. #### Brackets -We discuss the brackets under the node type [*cluster*](#cluster). -Each type of bracket corresponds to a feature of the same name at the *sign* level. +We discuss the brackets under the node type [`cluster`](#cluster). +Each type of bracket corresponds to a feature of the same name at the `sign` level. With some difficulty, you can reconstruct the source data from this, modulo the order of flags and brackets. @@ -522,24 +523,26 @@ word level. ## Cluster -One or more [*signs*](#sign) may be bracketed by certain delimiters. -Together they form a *cluster*. +One or more [`signs`](#sign) may be bracketed by certain delimiters. +Together they form a `cluster`. Each pair of boundary signs marks a cluster of a certain type. -This type is stored in the feature **type**. +This type is stored in the feature *`type`*. Clusters are not be nested in clusters of the same type. Clusters of one type in general do not respect the boundaries of clusters of other types. -Clusters may contain just one [*sign*](#sign). +Clusters may contain just one [`sign`](#sign). Cluster boundaries are usually within words. In Text-Fabric, cluster nodes are linked to the signs it contains. So, if `c` is a cluster, you can get its signs by - L.d(c, otype='sign') +``` python +L.d(c, otype='sign') +``` More over, every type of cluster corresponds to a numerical feature on signs with the same name as that type. @@ -547,15 +550,15 @@ as that type. We propose an transcription that works inside the ETCBC transcription. Note that these have *sometimes* a space at the inner side. -We use the original brackets for the Unicode representation as well. +We use the original brackets for the UNICODE representation as well. But note that in the original the direction of the brackets is inverted, due to the conversion process that has stripped RTL and LTR triggering characters. -In the Unicode representation we restore the proper direction. +In the UNICODE representation we restore the proper direction. In the table below, the *value* is the value that the associated feature has for signs within that type of brackets under the given description. -source / unicode | etcbc | value | type | description +source / UNICODE | ETCBC | value | type | description --- | --- | --- | --- | --- `^ ^` | `(^ ^)` | 3 | `cor3` | correction by ancient editor, supralinear `<< >>` | `(<< >>)` | 2 | `cor2` | correction by ancient editor @@ -573,15 +576,15 @@ Words are the contents of the transcription fields of the source data lines. Words will be separated by spaces or by nothing, in case the connection field in the same source data line has a `B`. -They have features **glyph[eo] full[eo] punc[eo] after**. +They have features *`glyph[eo] full[eo] punc[eo] after`*. -* **full[eo]** full value of the word: letters, symbols, punctuation, flags, brackets; - **fullo** is the original content of the *trans* field in the source data file -* **glyph[eo]** letter value of the word: consonants, vowels, digits, numerals; +* *`full[eo]`* full value of the word: letters, symbols, punctuation, flags, brackets; + *`fullo`* is the original content of the `trans` field in the source data file +* *`glyph[eo]`* letter value of the word: consonants, vowels, digits, numerals; no punctuation, flags, or brackets; -* **punc[eo]** the punctuation of a word, if any; -* **after** a space when the word should be followed by a space, - i.e. when the *connection* field does not have a `B`. +* *`punc[eo]`* the punctuation of a word, if any; +* *`after`* a space when the word should be followed by a space, + i.e. when the `connection` field does not have a `B`. The source transcription can be reconstructed by walking over all words and printing @@ -599,7 +602,7 @@ glypho + punco + after for each word. -Or, in ETCBC transcription / Unicode: +Or, in ETCBC transcription / UNICODE: ``` glyphe + punce + after @@ -614,14 +617,14 @@ The following text formats are defined (you can also list them with `T.formats`) format | kind | description --- | --- | --- -`text-orig-full` | plain | the source text, glyphs only, no flags / brackets, in unicode -`text-trans-full` | plain | the source text, glyphs only, no flags / brackets, in etcbc transcription +`text-orig-full` | plain | the source text, glyphs only, no flags / brackets, in UNICODE +`text-trans-full` | plain | the source text, glyphs only, no flags / brackets, in ETCBC transcription `text-source-full` | plain | the source text, glyphs only, no flags / brackets, in source transcription -`text-orig-extra` | plain | the source text with flags and brackets, in unicode -`text-trans-extra` | plain | the source text with flags and brackets, in etcbc transcription +`text-orig-extra` | plain | the source text with flags and brackets, in UNICODE +`text-trans-extra` | plain | the source text with flags and brackets, in ETCBC transcription `text-source-extra` | plain | the source text with flags and brackets, in source transcription -`lex-orig-full` | plain | lexeme of a word in unicode -`lex-trans-full` | plain | lexeme of a word in etcbc transcription +`lex-orig-full` | plain | lexeme of a word in UNICODE +`lex-trans-full` | plain | lexeme of a word in ETCBC transcription `lex-source-full` | plain | lexeme of a word in source transcription `layout-orig-full` | layout | as `text-orig-full` but the flag and cluster information is visible in layout `layout-trans-full` | layout | as `text-trans-full` but the flag and cluster information is visible in layout @@ -629,7 +632,7 @@ format | kind | description The formats with `text` result in strings that are plain text, without additional formatting. -The formats with `layout` result in pieces html with css-styles; the richness of layout enables us to code more information +The formats with `layout` result in pieces HTML with CSS-styles; the richness of layout enables us to code more information in the plain representation, e.g. blurry characters when signs are damaged or uncertain. See also the diff --git a/programs/addBoundariesFromNaaijer.py b/programs/addBoundariesFromNaaijer.py index 931716d..850bb11 100644 --- a/programs/addBoundariesFromNaaijer.py +++ b/programs/addBoundariesFromNaaijer.py @@ -80,7 +80,7 @@ def readBoundaries(): When we add clauses and phrases, we must map them to signs. That is why we make use of a mapping of words to the signs\they contain. - After reading the file, we compose a datastructure that can be fed into the + After reading the file, we compose a data structure that can be fed into the `modify` function of Text-Fabric. See https://annotation.github.io/text-fabric/compose/modify.html """ diff --git a/programs/addDataFromNaaijer.py b/programs/addDataFromNaaijer.py index e1ad052..f5841bc 100644 --- a/programs/addDataFromNaaijer.py +++ b/programs/addDataFromNaaijer.py @@ -73,7 +73,7 @@ def readFile(fileNum, data): The values are the columns to extract. The result is delivered as a dict whose keys are the column names - of the columns extracted, and whose values are dicts of TF nodes as keys + of the columns extracted, and whose values are dictionaries of TF nodes as keys and column values as values. """ diff --git a/programs/boundariesFromNaaijer.ipynb b/programs/boundariesFromNaaijer.ipynb index 2b53732..022b7ca 100644 --- a/programs/boundariesFromNaaijer.ipynb +++ b/programs/boundariesFromNaaijer.ipynb @@ -6,7 +6,7 @@ "source": [ "# Add phrase and clause nodes\n", "\n", - "The data for clauses and phrases comes from csv files prepared by Martijn Naaijer.\n", + "The data for clauses and phrases comes from CSV files prepared by Martijn Naaijer.\n", "\n", "We compile interpret the CSV and compile it into input data for the\n", "[modify](https://annotation.github.io/text-fabric/compose/modify.html)\n", @@ -963,7 +963,7 @@ "\n", "The script specifies the source version and the destination version for the new TF dataset.\n", "\n", - "We can run it on the commandline, or right here, in the notebook.\n" + "We can run it on the command line, or right here, in the notebook.\n" ] }, { diff --git a/programs/checks.ipynb b/programs/checks.ipynb index f4ca966..88686f8 100644 --- a/programs/checks.ipynb +++ b/programs/checks.ipynb @@ -14,7 +14,7 @@ "* language/lexeme `lang` and `lexo`\n", "* morphology `morpho`\n", "\n", - "and we'll keep track of the source location: biblical or nonbiblical file, line number in the file.\n", + "and we'll keep track of the source location: biblical or non-biblical file, line number in the file.\n", "\n", "We show that all this material has been transferred to TF completely and faithfully." ] @@ -458,15 +458,15 @@ "source": [ "# Overview\n", "\n", - "We compare the material in the source files with the o-style features of the TF dataset.\n", - "The o-style features `fullo`, `lexo`, `morpho` contain the unmodified strings corresponding to\n", + "We compare the material in the source files with the `o`-style features of the TF dataset.\n", + "The `o`-style features `fullo`, `lexo`, `morpho` contain the unmodified strings corresponding to\n", "fields in the lines of the source files. we add the `lang` feature to the mix.\n", "\n", "We'll compile two lists of this material, one based directly on the source files, and one based on the TF\n", "features.\n", "\n", "Both lists consist of tuples, one for each word, and inside each tuple we also\n", - "store whether the word comes from the biblical or nonbiblical file and what the line number is.\n", + "store whether the word comes from the biblical or non-biblical file and what the line number is.\n", "\n", "Then we'll compare the tuples of both lists one by one." ] @@ -1009,7 +1009,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The function showTf looks up a line number in TF." + "The function `showTf` looks up a line number in TF." ] }, { @@ -1037,7 +1037,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "And showDiff combines firstDiff and showSrc and showTf to get a meaningful display of the first difference,\n", + "And `showDiff` combines `firstDiff` and `showSrc` and `showTf` to get a meaningful display of the first difference,\n", "as we'll see later." ] }, diff --git a/programs/fromNaaijer.ipynb b/programs/fromNaaijer.ipynb index 70f0881..9577297 100644 --- a/programs/fromNaaijer.ipynb +++ b/programs/fromNaaijer.ipynb @@ -2114,7 +2114,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Check the lex_etcbc feature" + "## Check the `lex_etcbc` feature" ] }, { @@ -2158,7 +2158,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Check the g_cons feature" + "## Check the `g_cons` feature" ] }, { diff --git a/programs/parallels.ipynb b/programs/parallels.ipynb index aa7b03c..9a09a19 100644 --- a/programs/parallels.ipynb +++ b/programs/parallels.ipynb @@ -209,7 +209,7 @@ "source": [ "# Compute all similarities\n", "\n", - "We are going to perform more than half a billion of comparisons, each of which is more than an elemetary operation.\n", + "We are going to perform more than half a billion of comparisons, each of which is more than an elementary operation.\n", "\n", "Let's measure time." ] @@ -1545,7 +1545,7 @@ "For parallels, we link each line to each of its parallel lines and we annotate that link with the similarity between\n", "the two lines. The similarity is a percentage, and we round it to integer values.\n", "\n", - "If *n1* is similar to *n2*, then *n2* is similar to *n1*.\n", + "If `n1` is similar to `n2`, then `n2` is similar to `n1`.\n", "In order to save space, we only add such links once.\n", "\n", "We can then use\n", @@ -1686,7 +1686,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "I have added this file to a new release of the DSS Github repo." + "I have added this file to a new release of the DSS GitHub repo." ] }, { diff --git a/tutorial/display.ipynb b/tutorial/display.ipynb index 1c0ed05..869b4d3 100644 --- a/tutorial/display.ipynb +++ b/tutorial/display.ipynb @@ -49,14 +49,14 @@ "metadata": {}, "source": [ "If you want to use a version of the DSS ahead of a release, use the incantation with `hot` in it.\n", - "That will take time, not only for the download itself, but also for the one-time preprocessing of the data.\n", + "That will take time, not only for the download itself, but also for the one-time pre-processing of the data.\n", "\n", "If you are content with the latest stable release, use the line without the `hot`.\n", "\n", - "I have the data locally in my github clone of the DSS, so I use the variant with `clone`.\n", + "I have the data locally in my GitHub clone of the DSS, so I use the variant with `clone`.\n", "\n", "If you do a `git clone https://github.com/ETCBC/dss` from your directory\n", - "`~/github/etcbc` you can use this as well." + "`~/github/ETCBC` you can use this as well." ] }, { @@ -1589,7 +1589,7 @@ "Moreover, for this corpus a TF app has been build that defines additional text-formats.\n", "\n", "Whereas the formats defined in `otext` are strictly plain text formats, the formats\n", - "defined in the app are able to use typographic styles to shape the text, such as bold, italic, colors, etc.\n", + "defined in the app are able to use typographic styles to shape the text, such as bold, italic, colour, etc.\n", "\n", "Here is the list of all formats." ] @@ -2356,12 +2356,12 @@ "* alternate text `( )` is overlined\n", "* vacats `(- -)` have a red border\n", "* reconstructions `[ ]` are in color teal and in a smaller font\n", - "* removed text is striked through, if by an ancient editor `{{ }}` it is set in color maroon\n", + "* removed text is strike-through, if by an ancient editor `{{ }}` it is set in color maroon\n", " if by a modern editor, the color is red\n", "* corrected text is overlined, if by an ancient editor `<< >>` the color is navy,\n", " if supralinear (also ancient editor) `^ ^` the color is also navy\n", - " and the text is superscripted, if by a modern editor `< >` the color is dodgerblue\n", - "* interlinear text is subscripted or extra subscripted (depending on the interlinear value 1 or 2)\n", + " and the text is in superscript, if by a modern editor `< >` the color is `dodgerblue`\n", + "* interlinear text is in subscript or extra subscript (depending on the interlinear value 1 or 2)\n", "* non-Hebrew text (Aramaic or Greek) is underlined\n", "* text that is marked by script (paleo Hebrew or Greek Capital) gets a straight border around it" ] diff --git a/tutorial/exportExcel.ipynb b/tutorial/exportExcel.ipynb index 0d24db9..053f693 100644 --- a/tutorial/exportExcel.ipynb +++ b/tutorial/exportExcel.ipynb @@ -4948,11 +4948,11 @@ "source": [ "You see the following columns:\n", "\n", - "* **R** the sequence number of the result tuple in the result list\n", - "* **S1 S2 S3** the section as scroll name, fragment, and line number, in separate columns\n", - "* **NODEi TYPEi** the node and its type, for each node **i** in the result tuple\n", - "* **TEXTi** the full text of node **i**, if the node type admits a concise text representation\n", - "* **vs2** **unc3** the value of feature **vs** on the word and **unc** on the sign,\n", + "* *`R`* the sequence number of the result tuple in the result list\n", + "* *`S1 S2 S3`* the section as scroll name, fragment, and line number, in separate columns\n", + "* *`NODEi TYPEi`* the node and its type, for each node **i** in the result tuple\n", + "* *`TEXTi`* the full text of node *`i`*, if the node type admits a concise text representation\n", + "* *`vs2`* *`unc3`* the value of feature *`vs`* on the word and *`unc`* on the sign,\n", "since our query mentions them on those nodes." ] }, @@ -4969,7 +4969,7 @@ "\n", "In this corpus, the default condense type is line. Node types bigger than lines will not get text.\n", "\n", - "Now, if we change the condenseType to something smaller than line, e.g. `word`, the line text will be suppressed." + "Now, if we change the `condenseType` to something smaller than line, e.g. `word`, the line text will be suppressed." ] }, { @@ -5118,7 +5118,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As you see, you have an extra columns **lexo2**, **morpho2** and **rec3**.\n", + "As you see, you have an extra columns *`lexo2`*, *`morpho2`* and *`rec3`*.\n", "\n", "This gives you a lot of control over the generation of spreadsheets." ] @@ -5488,7 +5488,7 @@ "* **[start](start.ipynb)** become an expert in creating pretty displays of your text structures\n", "* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures\n", "* **[search](search.ipynb)** turbo charge your hand-coding with search templates\n", - "* **exportExcel** make tailor-made spreadsheets out of your results\n", + "* **export Excel** make tailor-made spreadsheets out of your results\n", "* **[share](share.ipynb)** draw in other people's data and let them use yours\n", "* **[similarLines](similarLines.ipynb)** spot the similarities between lines\n", "\n", diff --git a/tutorial/search.ipynb b/tutorial/search.ipynb index 70b3c79..bae4337 100644 --- a/tutorial/search.ipynb +++ b/tutorial/search.ipynb @@ -5160,7 +5160,7 @@ "You loose the information of what parts belong to what result.\n", "\n", "As an example of the difference, we look for all proper nouns, but only in lines where there is also\n", - "a word marked with paleohebrew script." + "a word marked with `paleohebrew` script." ] }, { @@ -5278,7 +5278,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Note in passing that *numerals* are also marked in as paleohebrew." + "Note in passing that *numerals* are also marked in as `paleohebrew`." ] }, { @@ -5584,16 +5584,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can apply different highlight colors to different parts of the result.\n", + "We can apply different highlight colours to different parts of the result.\n", "\n", "The line is member 1.\n", "the words are members 2 and 3,\n", "and the sign is member 4.\n", "\n", - "We do not give a colour to the line, the verb will have thedefault color,\n", + "We do not give a colour to the line, the verb will have the default color,\n", "the proper name cyan, and the sign magenta.\n", "\n", - "**NB:** You can choose your colors from the\n", + "**NB:** You can choose your colours from the\n", "[CSS specification](https://developer.mozilla.org/en-US/docs/Web/CSS/color_value)." ] }, @@ -6433,7 +6433,7 @@ "source": [ "# Show your own tuples\n", "\n", - "So far we have `show()`n the results of searches.\n", + "So far we have shown the results of searches.\n", "But you can also construct your own tuples and show them.\n", "\n", "Whereas you can use search to get a pretty good approximation of what you want, most of the times\n", @@ -6709,7 +6709,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We are going to make a dictionary of highligts: one color for the hypothetical signs and one for the certain." + "We are going to make a dictionary of highlights: one color for the hypothetical signs and one for the certain." ] }, { diff --git a/tutorial/share.ipynb b/tutorial/share.ipynb index 4e4e47c..1538d29 100644 --- a/tutorial/share.ipynb +++ b/tutorial/share.ipynb @@ -4935,7 +4935,7 @@ "\n", "We choose a location where to save it, the `exercises` folder in the `dss` repository in the `dss` organization.\n", "\n", - "In order to do this, we restart the TF api, but now with the desired output location in the `locations` parameter." + "In order to do this, we restart the TF API, but now with the desired output location in the `locations` parameter." ] }, { @@ -11073,7 +11073,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Hover over the features to see where they come from, and you'll see they come from your local github repo." + "Hover over the features to see where they come from, and you'll see they come from your local GitHub repo." ] }, { diff --git a/tutorial/similarLines.ipynb b/tutorial/similarLines.ipynb index 0bca29d..25e07a8 100644 --- a/tutorial/similarLines.ipynb +++ b/tutorial/similarLines.ipynb @@ -5136,7 +5136,7 @@ "source": [ "And how many lines have just one correspondence?\n", "\n", - "We look at the tail of rankedParallels." + "We look at the tail of `rankedParallels`." ] }, { @@ -5818,7 +5818,7 @@ "* **[search](search.ipynb)** turbo charge your hand-coding with search templates\n", "* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results\n", "* **[share](share.ipynb)** draw in other people's data and let them use yours\n", - "* **similarLines** spot the similarities between lines\n", + "* **similar Lines** spot the similarities between lines\n", "\n", "---\n", "\n", diff --git a/tutorial/start.ipynb b/tutorial/start.ipynb index a1dbfef..fcee5bc 100644 --- a/tutorial/start.ipynb +++ b/tutorial/start.ipynb @@ -62,7 +62,7 @@ "source": [ "## Data\n", "\n", - "Text-Fabric will fetch the data set for you from github, and check for updates.\n", + "Text-Fabric will fetch the data set for you from GitHub, and check for updates.\n", "\n", "The data will be stored in the `text-fabric-data` in your home directory." ] @@ -6034,7 +6034,7 @@ "\n", "`L.p(node)` **Previous** are the previous *adjacent* nodes, i.e. nodes whose last slot comes immediately before the first slot of `node`.\n", "\n", - "All these functions yield nodes of all possible otypes.\n", + "All these functions yield nodes of all possible node types.\n", "By passing an optional parameter, you can restrict the results to nodes of that type.\n", "\n", "The result are ordered according to the order of things in the text.\n", @@ -6588,7 +6588,7 @@ "source": [ "### The `-extra` formats\n", "\n", - "In order to use non-default formats, we have to specify them in the *fmt* parameter." + "In order to use non-default formats, we have to specify them in the `fmt` parameter." ] }, { @@ -6893,7 +6893,7 @@ "metadata": {}, "source": [ "Look at the last case, the lexeme node: obviously, the text-format that has been invoked provides\n", - "the *language* (`h`) of the lexeme, plus its representations in unicode, etcbc, and Abegg transcription.\n", + "the *language* (`h`) of the lexeme, plus its representations in UNICODE, ETCBC, and Abegg transcription.\n", "\n", "But what format exactly has been invoked?\n", "Let's ask."