Skip to content

Commit

Permalink
Fix the orthography profile to get zero error.
Browse files Browse the repository at this point in the history
  • Loading branch information
gederajeg committed Jul 6, 2024
1 parent e8a22fa commit 098e42f
Show file tree
Hide file tree
Showing 8 changed files with 47 additions and 95 deletions.
14 changes: 5 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
# CLDF dataset derived from von Rosenberg's "De Mentawei-Eilanden en Hunne Bewoners" from 1853

<!-- badges: start -->
[![CLDF validation](https://github.com/complexico/mentawai-word-list-1853/workflows/CLDF-validation/badge.svg)](https://github.com/complexico/mentawai-word-list-1853/actions?query=workflow%3ACLDF-validation)
<!-- badges: end -->

## How to cite

If you use these data please cite
Expand Down Expand Up @@ -33,18 +29,18 @@ As a long-time R user, the motivation to produce this repository is as a practic
![Glottolog: 100%](https://img.shields.io/badge/Glottolog-100%25-brightgreen.svg "Glottolog: 100%")
![Concepticon: 98%](https://img.shields.io/badge/Concepticon-98%25-green.svg "Concepticon: 98%")
![Source: 100%](https://img.shields.io/badge/Source-100%25-brightgreen.svg "Source: 100%")
![BIPA: 97%](https://img.shields.io/badge/BIPA-97%25-green.svg "BIPA: 97%")
![CLTS SoundClass: 97%](https://img.shields.io/badge/CLTS%20SoundClass-97%25-green.svg "CLTS SoundClass: 97%")
![BIPA: 100%](https://img.shields.io/badge/BIPA-100%25-brightgreen.svg "BIPA: 100%")
![CLTS SoundClass: 100%](https://img.shields.io/badge/CLTS%20SoundClass-100%25-brightgreen.svg "CLTS SoundClass: 100%")

- **Varieties:** 1 (linked to 1 different Glottocodes)
- **Concepts:** 267 (linked to 255 different Concepticon concept sets)
- **Lexemes:** 271
- **Sources:** 1
- **Synonymy:** 1.01
- **Invalid lexemes:** 0
- **Tokens:** 1,578
- **Segments:** 32 (1 BIPA errors, 1 CLTS sound class errors, 31 CLTS modified)
- **Inventory size (avg):** 32.00
- **Tokens:** 1,575
- **Segments:** 31 (0 BIPA errors, 0 CLTS sound class errors, 31 CLTS modified)
- **Inventory size (avg):** 31.00

# Contributors

Expand Down
19 changes: 7 additions & 12 deletions TRANSCRIPTION.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,24 +21,23 @@
| p | 40 |||
| s | 39 |||
| ŋ | 30 |||
| h | 29 |||
| ʔ | 22 |||
| h | 27 |||
| ʔ | 24 |||
| ei̯ | 20 |||
| d | 13 |||
| j | 13 |||
| c | 12 |||
| c | 10 |||
| w | 7 |||
| d͡ʒ | 5 |||
| <<->> | 4 | ? | ? |
| + | 4 | | |
| oːi̯ | 3 |||
| ʃ | 3 |||
|| 2 |||
| ui̯ | 2 |||
| + | 1 |||
| aːi̯ | 1 |||
| f | 1 |||

(32 rows)
(31 rows)



Expand All @@ -54,12 +53,8 @@
## Words with invalid segments (up to 100 only)

| ID | LANGUAGE | CONCEPT | FORM | SEGMENTS |
|:------------------------|:-----------|:-------------|:------------|:-------------------------------|
| Mentawai-16_laugh-1 | Mentawai | 16_laugh | gah-gah | ɡ a h <s> <<->> </s> ɡ a ʔ |
| Mentawai-249_scissors-1 | Mentawai | 249_scissors | nab-nab | n a b <s> <<->> </s> n a b |
| Mentawai-45_regards-1 | Mentawai | 45_regards | moele-moele | m u l e <s> <<->> </s> m u l e |
| Mentawai-7_open-2 | Mentawai | 7_open | -boekakai | <s> <<->> </s> b u k a k a i |
|------|------------|-----------|--------|------------|

(4 rows)
(0 rows)


73 changes: 17 additions & 56 deletions cldf/.transcription-report.json
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
{
"by_language": {
"Mentawai": {
"bipa_errors": [
"<<->>"
],
"general_errors": 4,
"bipa_errors": [],
"general_errors": 0,
"replacements": {
"+": [
"+"
Expand Down Expand Up @@ -100,22 +98,19 @@
"\u0294"
]
},
"sclass_errors": [
"<<->>"
],
"sclass_errors": [],
"segments": {
"+": 1,
"<<->>": 4,
"+": 4,
"a": 227,
"a\u02d0i\u032f": 1,
"b": 94,
"c": 12,
"c": 10,
"d": 13,
"d\u0361\u0292": 5,
"e": 158,
"ei\u032f": 20,
"f": 1,
"h": 29,
"h": 27,
"i": 100,
"j": 13,
"k": 97,
Expand All @@ -135,49 +130,18 @@
"\u014b": 30,
"\u0261": 42,
"\u0283": 3,
"\u0294": 22
"\u0294": 24
}
}
},
"stats": {
"bad_words": [
[
"Mentawai-16_laugh-1",
"Mentawai",
"16_laugh",
"gah-gah",
"\u0261 a h <s> <<->> </s> \u0261 a \u0294"
],
[
"Mentawai-249_scissors-1",
"Mentawai",
"249_scissors",
"nab-nab",
"n a b <s> <<->> </s> n a b"
],
[
"Mentawai-45_regards-1",
"Mentawai",
"45_regards",
"moele-moele",
"m u l e <s> <<->> </s> m u l e"
],
[
"Mentawai-7_open-2",
"Mentawai",
"7_open",
"-boekakai",
"<s> <<->> </s> b u k a k a i"
]
],
"bad_words_count": 4,
"bipa_errors": [
"<<->>"
],
"general_errors": 4,
"bad_words": [],
"bad_words_count": 0,
"bipa_errors": [],
"general_errors": 0,
"invalid_words": [],
"invalid_words_count": 0,
"inventory_size": 32.0,
"inventory_size": 31.0,
"replacements": {
"+": [
"+"
Expand Down Expand Up @@ -273,22 +237,19 @@
"\u0294"
]
},
"sclass_errors": [
"<<->>"
],
"sclass_errors": [],
"segments": {
"+": 1,
"<<->>": 4,
"+": 4,
"a": 227,
"a\u02d0i\u032f": 1,
"b": 94,
"c": 12,
"c": 10,
"d": 13,
"d\u0361\u0292": 5,
"e": 158,
"ei\u032f": 20,
"f": 1,
"h": 29,
"h": 27,
"i": 100,
"j": 13,
"k": 97,
Expand All @@ -308,7 +269,7 @@
"\u014b": 30,
"\u0261": 42,
"\u0283": 3,
"\u0294": 22
"\u0294": 24
}
}
}
2 changes: 1 addition & 1 deletion cldf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ property | value
[dc:identifier](http://purl.org/dc/terms/identifier) | https://www.digitale-sammlungen.de/en/view/bsb10433845?page=450,451
[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by-nc-sa/4.0/
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | git@github.com:complexico/mentawai-word-list-1853
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="git@github.com:complexico/mentawai-word-list-1853/tree/099fc35">git@github.com:complexico/mentawai-word-list-1853 099fc35</a></li><li><a href="glottolog-glottolog-d9da5e2">Glottolog glottolog-glottolog-d9da5e2</a></li><li><a href="https://github.com/concepticon/concepticon-data/tree/7c0b6ae3">Concepticon v3.1.0-19-g7c0b6ae3</a></li><li><a href="cldf-clts-clts-6dc73af">CLTS cldf-clts-clts-6dc73af</a></li></ol>
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="git@github.com:complexico/mentawai-word-list-1853/tree/e8a22fa">git@github.com:complexico/mentawai-word-list-1853 e8a22fa</a></li><li><a href="glottolog-glottolog-d9da5e2">Glottolog glottolog-glottolog-d9da5e2</a></li><li><a href="https://github.com/concepticon/concepticon-data/tree/7c0b6ae3">Concepticon v3.1.0-19-g7c0b6ae3</a></li><li><a href="cldf-clts-clts-6dc73af">CLTS cldf-clts-clts-6dc73af</a></li></ol>
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>lingpy-rcParams</strong>: <a href="./lingpy-rcParams.json">lingpy-rcParams.json</a></li><li><strong>python</strong>: 3.9.6</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | barrier-islands-mentawai-wlist1853
[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution
Expand Down
2 changes: 1 addition & 1 deletion cldf/cldf-metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
{
"rdf:about": "git@github.com:complexico/mentawai-word-list-1853",
"rdf:type": "prov:Entity",
"dc:created": "099fc35",
"dc:created": "e8a22fa",
"dc:title": "Repository"
},
{
Expand Down
14 changes: 7 additions & 7 deletions cldf/forms.csv
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Mentawai-4_sit-1,4,Mentawai,4_sit,koeddoe,koeddoe,k u d d u,,VonRosenberg1853,,,
Mentawai-5_come-1,5,Mentawai,5_come,kepanoebara,kepanoebara,k e p a n u b a r a,,VonRosenberg1853,,,^ k e p a n oe b a r a $,default,kepanubara
Mentawai-6_stay-1,6,Mentawai,6_stay,gala,gala,ɡ a l a,,VonRosenberg1853,,,^ g a l a $,default,gala
Mentawai-7_open-1,7,Mentawai,7_open,"roekoetnake , -boekakai",roekoetnake,r u k u t n a k e,,VonRosenberg1853,,,^ r oe k oe t n a k e $,default,"rukutnake , -bukakai"
Mentawai-7_open-2,7,Mentawai,7_open,"roekoetnake , -boekakai",-boekakai,<<->> b u k a k a i,,VonRosenberg1853,,,^ <-> b oe k a k a i $,default,"rukutnake , -bukakai"
Mentawai-7_open-2,7,Mentawai,7_open,"roekoetnake , -boekakai",-boekakai,b u k a k a i,,VonRosenberg1853,,,^- b oe k a k a i $,default,"rukutnake , -bukakai"
Mentawai-8_give-1,8,Mentawai,8_give,"kau , aketkako",kau,k a u,,VonRosenberg1853,,,^ k a u $,default,"kau , aketkako"
Mentawai-8_give-2,8,Mentawai,8_give,"kau , aketkako",aketkako,a k e t k a k o,,VonRosenberg1853,,,^ a k e t k a k o $,default,"kau , aketkako"
Mentawai-271_gift-1,9,Mentawai,271_gift,poengroeat,poengroeat,p u ŋ r u a t,,VonRosenberg1853,,,^ p oe ng r oe a t $,default,pungruat
Expand All @@ -17,7 +17,7 @@ Mentawai-12_see-1,13,Mentawai,12_see,djo,djo,d͡ʒ o,,VonRosenberg1853,,,^ dj o
Mentawai-13_make-1,14,Mentawai,13_make,galeij,galeij,ɡ a l ei̯,,VonRosenberg1853,,,^ g a l eij $,default,gale:i
Mentawai-14_search-1,15,Mentawai,14_search,gaba,gaba,ɡ a b a,,VonRosenberg1853,,,^ g a b a $,default,gaba
Mentawai-15_speak-1,16,Mentawai,15_speak,tibboi,tibboi,t i b b o i,,VonRosenberg1853,,,^ t i b b o i $,default,tibboi
Mentawai-16_laugh-1,17,Mentawai,16_laugh,gah-gah,gah-gah,ɡ a h <<->> ɡ a ʔ,,VonRosenberg1853,,,^ g a h <-> g a h$,default,ga'-ga'
Mentawai-16_laugh-1,17,Mentawai,16_laugh,gah-gah,gah-gah,ɡ a h + ɡ a ʔ,,VonRosenberg1853,,,^ g a h - g a h$,default,ga'-ga'
Mentawai-17_cry-1,18,Mentawai,17_cry,mso,mso,m s o,,VonRosenberg1853,,,^ m s o $,default,mso
Mentawai-18_food-1,19,Mentawai,18_food,mokkom,mokkom,m o k k o m,,VonRosenberg1853,,,^ m o k k o m $,default,mokkom
Mentawai-19_drink-1,20,Mentawai,19_drink,lo,lo,l o,,VonRosenberg1853,,,^ l o $,default,lo
Expand Down Expand Up @@ -46,7 +46,7 @@ Mentawai-41_thinking-1,42,Mentawai,41_thinking,goeaba,goeaba,ɡ u a b a,,VonRose
Mentawai-42_smell-1,43,Mentawai,42_smell,idoe,idoe,i d u,,VonRosenberg1853,,,^ i d oe $,default,idu
Mentawai-43_bind-1,44,Mentawai,43_bind,abara,abara,a b a r a,,VonRosenberg1853,,,^ a b a r a $,default,abara
Mentawai-44_marrying-1,45,Mentawai,44_marrying,melija,melija,m e l i j a,,VonRosenberg1853,,,^ m e l i j a $,default,meliya
Mentawai-45_regards-1,46,Mentawai,45_regards,moele-moele,moele-moele,m u l e <<->> m u l e,,VonRosenberg1853,,,^ m oe l e <-> m oe l e $,default,mule-mule
Mentawai-45_regards-1,46,Mentawai,45_regards,moele-moele,moele-moele,m u l e + m u l e,,VonRosenberg1853,,,^ m oe l e - m oe l e $,default,mule-mule
Mentawai-46_swimming-1,47,Mentawai,46_swimming,melala,melala,m e l a l a,,VonRosenberg1853,,,^ m e l a l a $,default,melala
Mentawai-47_archery-1,48,Mentawai,47_archery,tjilagoeij,tjilagoeij,c i l a ɡ ui̯,,VonRosenberg1853,,,^ tj i l a g oeij $,default,cilagu:i
Mentawai-48_dance-1,49,Mentawai,48_dance,simaliet,simaliet,s i m a l i t,,VonRosenberg1853,,,^ s i m a l ie t $,default,simalit
Expand All @@ -73,7 +73,7 @@ Mentawai-68_lying-1,69,Mentawai,68_lying,menangka,menangka,m e n a ŋ k a,,VonRo
Mentawai-69_wide-1,70,Mentawai,69_wide,alia,alia,a l i a,,VonRosenberg1853,,,^ a l i a $,default,alia
Mentawai-70_high-1,71,Mentawai,70_high,naboeak,naboeak,n a b u a k,,VonRosenberg1853,,,^ n a b oe a k $,default,nabuak
Mentawai-71_low-1,72,Mentawai,71_low,metaleb,metaleb,m e t a l e b,,VonRosenberg1853,,,^ m e t a l e b $,default,metaleb
Mentawai-72_deep-1,73,Mentawai,72_deep,keëroe,keëroe,k e e r u,,VonRosenberg1853,,,^ k e ë r oe $,default,ke'eru
Mentawai-72_deep-1,73,Mentawai,72_deep,keëroe,keëroe,k e ʔ e r u,,VonRosenberg1853,,,^ k r oe $,default,ke'eru
Mentawai-73_fine-1,74,Mentawai,73_fine,meninim,meninim,m e n i n i m,,VonRosenberg1853,,,^ m e n i n i m $,default,meninim
Mentawai-74_coarse-1,75,Mentawai,74_coarse,mkabejoean,mkabejoean,m k a b e j u a n,,VonRosenberg1853,,,^ m k a b e j oe a n $,default,mkabeyuan
Mentawai-75_clean-1,76,Mentawai,75_clean,mehroe,mehroe,m e h r u,,VonRosenberg1853,,,^ m e h r oe $,default,mehru
Expand Down Expand Up @@ -201,7 +201,7 @@ Mentawai-196_forehead-1,197,Mentawai,196_forehead,boekkoe,boekkoe,b u k k u,,Von
Mentawai-197_eyes-1,198,Mentawai,197_eyes,mata,mata,m a t a,,VonRosenberg1853,,,^ m a t a $,default,mata
Mentawai-198_mouth-1,199,Mentawai,198_mouth,ngoengoe,ngoengoe,ŋ u ŋ u,,VonRosenberg1853,,,^ ng oe ng oe $,default,ngungu
Mentawai-199_ear-1,200,Mentawai,199_ear,gigie,gigie,ɡ i ɡ i,,VonRosenberg1853,,,^ g i g ie $,default,gigi
Mentawai-200_tooth-1,201,Mentawai,200_tooth,tschon,tschon,tʃ c h o n,,VonRosenberg1853,,,^ts c h o n $,default,tson
Mentawai-200_tooth-1,201,Mentawai,200_tooth,tschon,tschon,tʃ o n,,VonRosenberg1853,,,^tsch o n $,default,tson
Mentawai-201_tongue-1,202,Mentawai,201_tongue,lilah,lilah,l i l a ʔ,,VonRosenberg1853,,,^ l i l a h$,default,lila'
Mentawai-202_neck-1,203,Mentawai,202_neck,ellokot,ellokot,e l l o k o t,,VonRosenberg1853,,,^ e l l o k o t $,default,ellokot
Mentawai-203_chest-1,204,Mentawai,203_chest,topot,topot,t o p o t,,VonRosenberg1853,,,^ t o p o t $,default,topot
Expand Down Expand Up @@ -241,7 +241,7 @@ Mentawai-236_village-1,237,Mentawai,236_village,lakkei,lakkei,l a k k e i,,VonRo
Mentawai-237_roof-1,238,Mentawai,237_roof,tobat,tobat,t o b a t,,VonRosenberg1853,,,^ t o b a t $,default,tobat
Mentawai-238_devil-1,239,Mentawai,238_devil,sinetoe,sinetoe,s i n e t u,,VonRosenberg1853,,,^ s i n e t oe $,default,sinetu
Mentawai-239_birdglue-1,240,Mentawai,239_birdglue,eket,eket,e k e t,,VonRosenberg1853,,,^ e k e t $,default,eket
Mentawai-240_treeoil-1,241,Mentawai,240_treeoil,tschoiëlakat,tschoiëlakat,tʃ c h o i e l a k a t,,VonRosenberg1853,,,^ts c h o i ë l a k a t $,default,tsoi'elakat
Mentawai-240_treeoil-1,241,Mentawai,240_treeoil,tschoiëlakat,tschoiëlakat,tʃ o i ʔ e l a k a t,,VonRosenberg1853,,,^tsch o ië l a k a t $,default,tsoi'elakat
Mentawai-241_arrowpoison-1,242,Mentawai,241_arrowpoison,ipoe,ipoe,i p u,,VonRosenberg1853,,,^ i p oe $,default,ipu
Mentawai-242_copperwire-1,243,Mentawai,242_copperwire,dakdjok,dakdjok,d a k d͡ʒ o k,,VonRosenberg1853,,,^ d a k dj o k $,default,dakjok
Mentawai-243_tin-1,244,Mentawai,243_tin,boelan,boelan,b u l a n,,VonRosenberg1853,,,^ b oe l a n $,default,bulan
Expand All @@ -250,7 +250,7 @@ Mentawai-245_tobacco-1,246,Mentawai,245_tobacco,obeh,obeh,o b e ʔ,,VonRosenberg
Mentawai-246_mirror-1,247,Mentawai,246_mirror,blikobat,blikobat,b l i k o b a t,,VonRosenberg1853,,,^ b l i k o b a t $,default,blikobat
Mentawai-247_cotton-1,248,Mentawai,247_cotton,komang,komang,k o m a ŋ,,VonRosenberg1853,,,^ k o m a ng $,default,komang
Mentawai-248_glassbeads-1,249,Mentawai,248_glassbeads,inoe,inoe,i n u,,VonRosenberg1853,,,^ i n oe $,default,inu
Mentawai-249_scissors-1,250,Mentawai,249_scissors,nab-nab,nab-nab,n a b <<->> n a b,,VonRosenberg1853,,,^ n a b <-> n a b $,default,nab-nab
Mentawai-249_scissors-1,250,Mentawai,249_scissors,nab-nab,nab-nab,n a b + n a b,,VonRosenberg1853,,,^ n a b - n a b $,default,nab-nab
Mentawai-250_needle-1,251,Mentawai,250_needle,pingolab,pingolab,p i ŋ o l a b,,VonRosenberg1853,,,^ p i ng o l a b $,default,pingolab
Mentawai-203_chest-2,252,Mentawai,203_chest,petie,petie,p e t i,,VonRosenberg1853,,,^ p e t ie $,default,peti
Mentawai-251_axe-1,253,Mentawai,251_axe,bliok,bliok,b l i o k,,VonRosenberg1853,,,^ b l i o k $,default,bliok
Expand Down
2 changes: 1 addition & 1 deletion cldf/lingpy-rcParams.json
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@
"scorer": {},
"sonar": true,
"stress": "\u02c8\u02cc'",
"timestamp": "2024-07-06 19:57",
"timestamp": "2024-07-06 20:49",
"tones": "\u00b9\u00b2\u00b3\u2074\u2075\u2076\u2077\u2078\u2079\u2070\u2081\u2082\u2083\u2084\u2085\u2086\u2087\u2088\u2089\u20800123456789\u02e5\u02e6\u02e7\u02e8\u02e9\u02ea\u02eb-\ua708-\ua709-\ua70a-\ua70b-\ua70c-\ua70d-\ua70e-\ua70f-\ua710-\ua711-\ua712-\ua713-\ua714-\ua715-\ua716-\ua717-\ua718-\ua719-\ua71a-\ua700-\ua701-\ua702-\ua703-\ua704-\ua705-\ua706-\ua707",
"tree_calc": "neighbor",
"unique_sequences": true,
Expand Down
16 changes: 8 additions & 8 deletions etc/orthography.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -2,28 +2,28 @@ Grapheme IPA Comment
oeij ui̯
aij aːi̯
sch ʃ
^tsch
oij$ oːi̯
eij ei̯ attempting to adjust with CLTS Bipa
oe u
^ts
tj c
dj d͡ʒ
ie i
ng ŋ

\- -
- + Based on here: https://github.com/lexibank/bantubvd/blob/v4.0/etc/orthography.tsv
^- NULL Based on here: https://github.com/lexibank/bantubvd/blob/v4.0/etc/orthography.tsv
\, ,
a a
b b
c c attempting to adjust with CLTS Bipa
d d
e e attempting to adjust with CLTS Bipa
i(?=ë) attempting to adjust with CLTS Bipa
a(?=ë) attempting to adjust with CLTS Bipa
u(?=ë) attempting to adjust with CLTS Bipa
e(?=ë) attempting to adjust with CLTS Bipa
o(?=ë) attempting to adjust with CLTS Bipa
e attempting to adjust with CLTS Bipa
ië i ʔ e attempting to adjust with CLTS Bipa: inspired by how the string sequence is handled here: https://github.com/lexibank/bantubvd/blob/v4.0/etc/orthography.tsv
aë a ʔ e attempting to adjust with CLTS Bipa
uë u ʔ e attempting to adjust with CLTS Bipa
eë e ʔ e attempting to adjust with CLTS Bipa
oë o ʔ e attempting to adjust with CLTS Bipa
f f
g ɡ
h(?=[tnjgb-]) ʔ
Expand Down

0 comments on commit 098e42f

Please sign in to comment.