Refactor: Make process of collection independent of the choice of main dictionary #124

fbanados · 2024-07-11T22:59:35Z

Currently, the process of generating an importjson file strongly depends on the contents of the CW dictionary. We intend to make the following changes:

There's repeated code to process each dictionary. Isolate code that processes complete dictionaries in one place
Encapsulate source specific changes into each source's class
Separate special initialization processes currently done while converting CW into a general class so that any source can be "the first to appear"
Turn aggregation from a global operation to a one-to-one process so that source priority can be easily changed.
Ensure entries that have no mapping on previously aggregated entries are still included (depends on previous issues)\
Refactor instructions to use _altlab versions of alternative dictionaries instead of main (mostly immutable) sources.
Change documentation and dependencies on use of FSTs: Currently we only use the relaxed analyzer as a way to account for spelling differences between dictionaries.

It is currently expected that finalizing this process will expand the crkeng_dictionary.importjson file with around 8k senses that are currently being discarded by the matching process.

The text was updated successfully, but these errors were encountered:

aarppe · 2024-07-12T01:31:01Z

The fields in CW and MD relevant to dictionary comparison from the LEXC perspective are discussed in #125.

fbanados self-assigned this Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Make process of collection independent of the choice of main dictionary #124

Refactor: Make process of collection independent of the choice of main dictionary #124

fbanados commented Jul 11, 2024

aarppe commented Jul 12, 2024

Refactor: Make process of collection independent of the choice of main dictionary #124

Refactor: Make process of collection independent of the choice of main dictionary #124

Comments

fbanados commented Jul 11, 2024

aarppe commented Jul 12, 2024