Support grouping of sounds #271

FredericBlum · 2023-08-14T16:22:07Z

As referenced in an example case, we should modify pylexibank in a way that allows us to support grouped sounds (e.g. a.ʔ) instead of reporting a transcription error for those cases.

The text was updated successfully, but these errors were encountered:

LinguList · 2023-08-15T12:01:56Z

We'd need this ideally for Lexibank 2.0.

FredericBlum · 2024-02-23T11:49:58Z

One option would be to add the ungroup() function that we have used so far to pylexibank. Would that be a reasonable step? Or should we modify the check-up in a way that grouped sounds do not throw an error if all the individual segments are valid? I could try to implement either solution.

FredericBlum · 2024-04-30T09:08:45Z

@LinguList Tagging you on this again to see how we proceed with this.

LinguList · 2024-04-30T10:36:51Z

The current workaround that would also guarantee backwards compat is to have a specific Lexeme class.

from pylexibank import Lexeme

@attr.s
class CustomLexeme(Lexeme):
    Grouped_Segments = attr.ib(default=None, metadata={"datatype": "string", "separator": " "})

And then you add a function ungroup to your data (if you have a default profile that groups.

def ungroup(segments):
    return [segment for segment_group in [s.split(".") for s in segments] for segment in segment_group]

Then you add in args.writer.add_form as Segments=ungroup(segments) and Grouped_Segments=segments.

FredericBlum · 2024-04-30T10:47:14Z

Yes, that"s what I am doing right now in my repositories. I just thought we could add this function to pylexibank and modify the Lexeme class to make this workaround unnecessary.

LinguList · 2024-04-30T13:42:13Z

I think this would be too much by now, since it is not part of the CLDF specification.

LinguList · 2024-04-30T13:43:31Z

I also wonder how we want to handle this in the future. If se say, Segments can be potentially Grouped, we have a situation where we may have clashes, so my idea would be to propose Grouped_Segments as another representation of segments to CLDF, but I suggest we wait for the reviews of the grouping sounds paper to see how we react here? With the paper, we have the reference to add this to lexibank.

xrotwang · 2024-04-30T13:49:33Z

Yes, I think both are needed - a reference and more experience, including actively searching for cases with conflicting options to group. I think in the worst case, grouping of segments would introduce a degree of freedom which invites abuse where fine-tuned grouping together with fine-tuned analysis algorithms create intransparent results.

I could also imagine that grouping and trimming used together could have funny effects.

LinguList · 2024-05-01T13:12:41Z

We already have examples of this kind. The freedom that this introduces at times may be so great that one can have two different outcomes of the same analysis due to grouping alone. The current solution leaves everything open but makes clear that this is a currently tested candidate for inclusion into CLDF in a later version. We can discuss already now -- also with respect to the modification of the paper -- if we want to propose a little plugin that could be used to make the conversion easier in lexibank (but would require to be installed on top of it and could be added there later).

FredericBlum · 2024-05-02T04:49:36Z

I agree with waiting until the reviews to see how we do this. With respect to the degrees of freedom: We would not touch the Segments of a Lexeme, but add a new category. Maybe it is unproblematic that we have freedom there? Through ungrouping and checking the transcription in Segments, we will still have the verification that the individual segments conform with CLTS.

LinguList · 2024-05-02T04:54:49Z

That is also an open question for now: do we add Grouped_Segments or do we not add it? The good thing is: the current solution is like an independent library in Python: if you conform to it and also make sure to define the metadata properly, you can already work with the feature in lexibank / cldf datasets, but you must make sure to select the datasets by hand. So the difference with respect to being officially mentioned in cldf and not being mentioned is not that big, right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support grouping of sounds #271

Support grouping of sounds #271

FredericBlum commented Aug 14, 2023

LinguList commented Aug 15, 2023

FredericBlum commented Feb 23, 2024

FredericBlum commented Apr 30, 2024

LinguList commented Apr 30, 2024

FredericBlum commented Apr 30, 2024

LinguList commented Apr 30, 2024

LinguList commented Apr 30, 2024

xrotwang commented Apr 30, 2024

LinguList commented May 1, 2024

FredericBlum commented May 2, 2024

LinguList commented May 2, 2024

Support grouping of sounds #271

Support grouping of sounds #271

Comments

FredericBlum commented Aug 14, 2023

LinguList commented Aug 15, 2023

FredericBlum commented Feb 23, 2024

FredericBlum commented Apr 30, 2024

LinguList commented Apr 30, 2024

FredericBlum commented Apr 30, 2024

LinguList commented Apr 30, 2024

LinguList commented Apr 30, 2024

xrotwang commented Apr 30, 2024

LinguList commented May 1, 2024

FredericBlum commented May 2, 2024

LinguList commented May 2, 2024