Releases: KathyReid/cvaccents
Releases · KathyReid/cvaccents
0.3 - Updated with Kiswahili example for EAAMO
0.2 - Updated with v13 Mozilla Common Voice data
The key changes in this version are:
-
The number of categories identified in the data have increased from 16 in the first version, to 20 in this one. The four additional categories are:
- Linguistic heritage of speaker - indicating the speaker's language acquisition or immersion heritage, such as time spent in a location, or being born or raised in a location.
- Socio-economic marker - indicating a speaker's association with a socio-economic group or class, such as Middle Class.
- Hybrid dialect - indicating the speaker speaks using a dialect where two languages have come into contact - such as Denglish (German - Deutsch - and English) and Hinglish (Hindi and English, spoken in India).
- Generational marker - indicating the speaker's association with a generation, belying their age range, such as Gen Z.
-
The number of individual accents identified has increased from 164 in the first version, to 235 in this one.
-
The number of relationships between individual accents, which indicate a co-occurrence between speaker-described accents, such as "German" and "England English", has increased from 297 in the first version, to 515 in this one.
0.1 - Initial release - FAccT 2023
This release versions this repository at the state that was submitted to the FAccT 2023 conference - please see this preprint.
- Only does analysis of
en
language - Using v11 of the Common Voice dataset
Full Changelog: https://github.com/KathyReid/cvaccents/commits/0.1