Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregating crk dictionary sources #131

Open
aarppe opened this issue Jul 26, 2024 · 0 comments
Open

Aggregating crk dictionary sources #131

aarppe opened this issue Jul 26, 2024 · 0 comments
Labels
aggregation Changes to the aggregation algorithm meta Issues for tracking issues

Comments

@aarppe
Copy link
Contributor

aarppe commented Jul 26, 2024

This contains some general notes and observations concerning the aggregation of the three (or more) Plains Cree dictionary resources.

A. For creating LEXC source, the dictionaries minimally need the following:

  1. Entry head that has been standardized according to SRO.
  2. Stem that has been standardized according to SRO.
  3. Lexical (inflectional) category following the classification scheme in CW.

B. For mapping the dictionary sources against each other, we need minimally the following:

  1. Entry head that has been standardized according to SRO.
  2. Lexical category following the classification scheme in CW.

A-B. Additional head-aches in regularizing the sources:

  1. CW uses the accented-y in the entry heads (and stems), while MD and AECD have not. However, for comparison purposes the accented-y can be treated as a regular-y. Nevertheless, we would want to revise the pertinent fields in MD and AECD to have the accented-y, where appropriate (head and stem).

C. Additional headaches concerning the English definitions that we would probably want to regularize:

  1. The sources use different conventions for indicating subject (CW: s/he vs. AECD: S/he vs. MD: He) and object (CW: s.o., s.t. vs. AECD: him/her, it vs. MD: him, it).
  2. The sources use different conventions in separating senses (CW: semicolon, AECD: semicolon, MD: numbering).
  3. The definitions contain Cree words and passages, that should be marked as such.

D. Varying sets of fields under entries that we may want to fill in:

  1. Besides stem and lexical category, CW has a number of fields that AECD and MD do not, e.g. the morphological decomposition.
  2. AECD has variants and alternatives listed with a different convention than CW, while MD has practically none.
@aarppe aarppe added meta Issues for tracking issues aggregation Changes to the aggregation algorithm labels Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aggregation Changes to the aggregation algorithm meta Issues for tracking issues
Projects
None yet
Development

No branches or pull requests

1 participant