Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare release #10

Merged
merged 1 commit into from
Aug 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/cldf-validation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.6]
python-version: [3.12]

steps:
- uses: actions/checkout@v2
Expand Down
6 changes: 5 additions & 1 deletion .zenodo.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@
"contributors": [
{
"name": "Christoph Rzymski",
"type": "Other"
"type": "Editor"
},
{
"name": "Johann-Mattis List",
"type": "Editor"
}
],
"communities": [
Expand Down
4 changes: 3 additions & 1 deletion CONTRIBUTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,6 @@ Name | GitHub user | Description | Role
--- | --- | --- | ---
Robinson, Laura C. | | data collector | Author
Holton, Gary | | data collector | Author
Christoph Rzymski | @chrzyki | maintainer, patron | Other
Christoph Rzymski | @chrzyki | maintainer, patron | Editor
Johann-Mattis List | @LinguList | maintainer, profile | Editor

12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,21 +29,21 @@ Any dataset specific notes on lexibank decisions/mapping choices etc go in here.

[![CLDF validation](https://github.com/lexibank/robinsonap/workflows/CLDF-validation/badge.svg)](https://github.com/lexibank/robinsonap/actions?query=workflow%3ACLDF-validation)
![Glottolog: 100%](https://img.shields.io/badge/Glottolog-100%25-brightgreen.svg "Glottolog: 100%")
![Concepticon: 98%](https://img.shields.io/badge/Concepticon-98%25-green.svg "Concepticon: 98%")
![Concepticon: 99%](https://img.shields.io/badge/Concepticon-99%25-green.svg "Concepticon: 99%")
![Source: 100%](https://img.shields.io/badge/Source-100%25-brightgreen.svg "Source: 100%")
![BIPA: 100%](https://img.shields.io/badge/BIPA-100%25-brightgreen.svg "BIPA: 100%")
![CLTS SoundClass: 100%](https://img.shields.io/badge/CLTS%20SoundClass-100%25-brightgreen.svg "CLTS SoundClass: 100%")

- **Varieties:** 13
- **Concepts:** 398
- **Varieties:** 13 (linked to 13 different Glottocodes)
- **Concepts:** 398 (linked to 392 different Concepticon concept sets)
- **Lexemes:** 4,841
- **Sources:** 1
- **Synonymy:** 1.06
- **Cognacy:** 3,902 cognates in 2,371 cognate sets (1,835 singletons)
- **Cognate Diversity:** 0.44
- **Invalid lexemes:** 0
- **Tokens:** 25,844
- **Segments:** 49 (0 BIPA errors, 0 CTLS sound class errors, 49 CLTS modified)
- **Segments:** 49 (0 BIPA errors, 0 CLTS sound class errors, 49 CLTS modified)
- **Inventory size (avg):** 27.31

# Contributors
Expand All @@ -52,7 +52,9 @@ Name | GitHub user | Description | Role
--- | --- | --- | ---
Robinson, Laura C. | | data collector | Author
Holton, Gary | | data collector | Author
Christoph Rzymski | @chrzyki | maintainer, patron | Other
Christoph Rzymski | @chrzyki | maintainer, patron | Editor
Johann-Mattis List | @LinguList | maintainer, profile | Editor




Expand Down
120 changes: 120 additions & 0 deletions cldf/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
<a name="ds-cldfmetadatajson"> </a>

# Wordlist CLDF dataset derived from Robinson and Holton's "Internal Classification of the Alor-Pantar Language Family" from 2012

**CLDF Metadata**: [cldf-metadata.json](./cldf-metadata.json)

**Sources**: [sources.bib](./sources.bib)

property | value
--- | ---
[dc:bibliographicCitation](http://purl.org/dc/terms/bibliographicCitation) | Robinson, Laura C. and Holton, Gary (2012): Internal Classification of the Alor-Pantar Language Family Using Computational Methods Applied to the Lexicon. Language Dynamics and Change 2.2. 123-149.
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF Wordlist](http://cldf.clld.org/v1.0/terms.rdf#Wordlist)
[dc:format](http://purl.org/dc/terms/format) | <ol><li>http://concepticon.clld.org/contributions/Robinson-2012-398</li></ol>
[dc:identifier](http://purl.org/dc/terms/identifier) | https://doi.org/10.1163/22105832-20120201
[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by/4.0/
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | https://github.com/lexibank/robinsonap
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/lexibank/robinsonap/tree/v4.0">lexibank/robinsonap v4.0</a></li><li><a href="https://github.com/glottolog/glottolog/tree/v5.0">Glottolog v5.0</a></li><li><a href="https://github.com/concepticon/concepticon-data/tree/v3.2.0">Concepticon v3.2.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">CLTS v2.3.0</a></li></ol>
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>lingpy-rcParams</strong>: <a href="./lingpy-rcParams.json">lingpy-rcParams.json</a></li><li><strong>python</strong>: 3.12.4</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | robinsonap
[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution


## <a name="table-formscsv"></a>Table [forms.csv](./forms.csv)


Raw lexical data item as it can be pulled out of the original datasets.

This is the basis for creating rows in CLDF representations of the data by
- splitting the lexical item into forms
- cleaning the forms
- potentially tokenizing the form


property | value
--- | ---
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF FormTable](http://cldf.clld.org/v1.0/terms.rdf#FormTable)
[dc:extent](http://purl.org/dc/terms/extent) | 4841


### Columns

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
[Local_ID](http://purl.org/dc/terms/identifier) | `string` |
[Language_ID](http://cldf.clld.org/v1.0/terms.rdf#languageReference) | `string` | References [languages.csv::ID](#table-languagescsv)
[Parameter_ID](http://cldf.clld.org/v1.0/terms.rdf#parameterReference) | `string` | References [parameters.csv::ID](#table-parameterscsv)
[Value](http://cldf.clld.org/v1.0/terms.rdf#value) | `string` |
[Form](http://cldf.clld.org/v1.0/terms.rdf#form) | `string` |
[Segments](http://cldf.clld.org/v1.0/terms.rdf#segments) | list of `string` (separated by ` `) |
[Comment](http://cldf.clld.org/v1.0/terms.rdf#comment) | `string` |
[Source](http://cldf.clld.org/v1.0/terms.rdf#source) | list of `string` (separated by `;`) | References [sources.bib::BibTeX-key](./sources.bib)
`Cognacy` | `string` |
`Loan` | `boolean` |
`Graphemes` | `string` |
`Profile` | `string` |

## <a name="table-languagescsv"></a>Table [languages.csv](./languages.csv)

property | value
--- | ---
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF LanguageTable](http://cldf.clld.org/v1.0/terms.rdf#LanguageTable)
[dc:extent](http://purl.org/dc/terms/extent) | 13


### Columns

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` |
[Glottocode](http://cldf.clld.org/v1.0/terms.rdf#glottocode) | `string` |
`Glottolog_Name` | `string` |
[ISO639P3code](http://cldf.clld.org/v1.0/terms.rdf#iso639P3code) | `string` |
[Macroarea](http://cldf.clld.org/v1.0/terms.rdf#macroarea) | `string` |
[Latitude](http://cldf.clld.org/v1.0/terms.rdf#latitude) | `decimal`<br>&ge; -90<br>&le; 90 |
[Longitude](http://cldf.clld.org/v1.0/terms.rdf#longitude) | `decimal`<br>&ge; -180<br>&le; 180 |
`Family` | `string` |
`Token` | `string` |

## <a name="table-parameterscsv"></a>Table [parameters.csv](./parameters.csv)

property | value
--- | ---
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF ParameterTable](http://cldf.clld.org/v1.0/terms.rdf#ParameterTable)
[dc:extent](http://purl.org/dc/terms/extent) | 398


### Columns

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` |
[Concepticon_ID](http://cldf.clld.org/v1.0/terms.rdf#concepticonReference) | `string` |
`Concepticon_Gloss` | `string` |

## <a name="table-cognatescsv"></a>Table [cognates.csv](./cognates.csv)

property | value
--- | ---
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF CognateTable](http://cldf.clld.org/v1.0/terms.rdf#CognateTable)
[dc:extent](http://purl.org/dc/terms/extent) | 3902


### Columns

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
[Form_ID](http://cldf.clld.org/v1.0/terms.rdf#formReference) | `string` | References [forms.csv::ID](#table-formscsv)
[Form](http://linguistics-ontology.org/gold/2010/FormUnit) | `string` |
[Cognateset_ID](http://cldf.clld.org/v1.0/terms.rdf#cognatesetReference) | `string` |
`Doubt` | `boolean` |
`Cognate_Detection_Method` | `string` |
[Source](http://cldf.clld.org/v1.0/terms.rdf#source) | list of `string` (separated by `;`) | References [sources.bib::BibTeX-key](./sources.bib)
[Alignment](http://cldf.clld.org/v1.0/terms.rdf#alignment) | list of `string` (separated by ` `) |
`Alignment_Method` | `string` |
`Alignment_Source` | `string` |

17 changes: 7 additions & 10 deletions cldf/cldf-metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,25 +17,25 @@
{
"rdf:about": "https://github.com/lexibank/robinsonap",
"rdf:type": "prov:Entity",
"dc:created": "v3.0-16-g5f5b9db",
"dc:created": "v4.0",
"dc:title": "Repository"
},
{
"rdf:about": "https://github.com/glottolog/glottolog",
"rdf:type": "prov:Entity",
"dc:created": "v4.4",
"dc:created": "v5.0",
"dc:title": "Glottolog"
},
{
"rdf:about": "https://github.com/concepticon/concepticon-data",
"rdf:type": "prov:Entity",
"dc:created": "v2.5.0",
"dc:created": "v3.2.0",
"dc:title": "Concepticon"
},
{
"rdf:about": "https://github.com/cldf-clts/clts",
"rdf:type": "prov:Entity",
"dc:created": "v2.1.0",
"dc:created": "v2.3.0",
"dc:title": "CLTS"
}
],
Expand All @@ -46,7 +46,7 @@
},
{
"dc:title": "python",
"dc:description": "3.8.10"
"dc:description": "3.12.4"
},
{
"dc:title": "python-packages",
Expand All @@ -55,9 +55,6 @@
],
"rdf:ID": "robinsonap",
"rdf:type": "http://www.w3.org/ns/dcat#Distribution",
"dialect": {
"commentPrefix": null
},
"tables": [
{
"dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#FormTable",
Expand Down Expand Up @@ -181,7 +178,7 @@
{
"datatype": "string",
"propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#glottocode",
"valueUrl": "http://glottolog.org/resource/languoid/id/{glottolog_id}",
"valueUrl": "http://glottolog.org/resource/languoid/id/{Glottocode}",
"name": "Glottocode"
},
{
Expand Down Expand Up @@ -251,7 +248,7 @@
{
"datatype": "string",
"propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#concepticonReference",
"valueUrl": "http://concepticon.clld.org/parameters/{concepticon_id}",
"valueUrl": "http://concepticon.clld.org/parameters/{Concepticon_ID}",
"name": "Concepticon_ID"
},
{
Expand Down
2 changes: 1 addition & 1 deletion cldf/languages.csv
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ westernpantar,Western Pantar,lamm1241,Western Pantar,lev,Papunesia,-8.52787,124.
kui,Kui,kuii1254,Kui,kvd,Papunesia,,,Timor-Alor-Pantar,Ki
adang,Adang,adan1251,Adang,adn,Papunesia,-8.18958,124.448,Timor-Alor-Pantar,Ad
sawila,Sawila,sawi1256,Sawila,swt,Papunesia,-8.29105,125.078,Timor-Alor-Pantar,Sw
nedebang,Nedebang,nede1245,Klamu,nec,Papunesia,-8.28776,124.192,Timor-Alor-Pantar,Nd
nedebang,Nedebang,nede1245,Nedebang,nec,Papunesia,-8.28776,124.192,Timor-Alor-Pantar,Nd
klon,Klon,kelo1247,Klon,kyo,Papunesia,-8.40688,124.429,Timor-Alor-Pantar,Kl
4 changes: 2 additions & 2 deletions cldf/lingpy-rcParams.json
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
10,
10
],
"filename": "lingpy-2021-07-22",
"filename": "lingpy-2024-08-01",
"gap_symbol": "-",
"gap_weight": 0.5,
"gop": -2,
Expand Down Expand Up @@ -123,7 +123,7 @@
"scorer": {},
"sonar": true,
"stress": "\u02c8\u02cc'",
"timestamp": "2021-07-22 11:19",
"timestamp": "2024-08-01 10:28",
"tones": "\u00b9\u00b2\u00b3\u2074\u2075\u2076\u2077\u2078\u2079\u2070\u2081\u2082\u2083\u2084\u2085\u2086\u2087\u2088\u2089\u20800123456789\u02e5\u02e6\u02e7\u02e8\u02e9\u02ea\u02eb-\ua708-\ua709-\ua70a-\ua70b-\ua70c-\ua70d-\ua70e-\ua70f-\ua710-\ua711-\ua712-\ua713-\ua714-\ua715-\ua716-\ua717-\ua718-\ua719-\ua71a-\ua700-\ua701-\ua702-\ua703-\ua704-\ua705-\ua706-\ua707",
"tree_calc": "neighbor",
"unique_sequences": true,
Expand Down
8 changes: 4 additions & 4 deletions cldf/parameters.csv
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ ID,Name,Concepticon_ID,Concepticon_Gloss
43_burnshine,burn/shine,2102,BURN
44_butterfly,butterfly,1791,BUTTERFLY
45_buy,buy,1869,BUY
46_callout,call out,,
46_callout,call out,715,SHOUT
47_canoe,canoe,1970,CANOE
48_cassava,cassava,925,CASSAVA
49_chaseawayexpel,chase away/expel,30,DISPEL
Expand Down Expand Up @@ -209,7 +209,7 @@ ID,Name,Concepticon_ID,Concepticon_Gloss
208_oldersibling,older sibling,405,OLDER SIBLING
209_one,one,1493,ONE
210_onehundred,one hundred,1634,HUNDRED
211_onehundredthousand,one hundred thousand,,
211_onehundredthousand,one hundred thousand,3532,ONE HUNDRED THOUSAND
212_oven,oven,1143,OVEN
213_papaya,papaya,2445,PAPAYA
214_penis,penis,1222,PENIS
Expand Down Expand Up @@ -240,7 +240,7 @@ ID,Name,Concepticon_ID,Concepticon_Gloss
239_salt,salt,1274,SALT
240_salty,salty,1091,SALTY
241_sand,sand,671,SAND
242_scabies,scabies,2664,SCAB
242_scabies,scabies,3172,SCABIES
243_scared,scared,3033,SCARED
244_scorpion,scorpion,1538,SCORPION
245_scratch,scratch,1436,SCRATCH
Expand Down Expand Up @@ -307,7 +307,7 @@ ID,Name,Concepticon_ID,Concepticon_Gloss
306_thousand,thousand,1843,THOUSAND
307_three,three,492,THREE
308_thunder,thunder,1150,THUNDER
309_tinea,tinea,1189,ULCER
309_tinea,tinea,3173,TINEA
310_tobark,to bark,1206,BARKING
311_tobathe,to bathe,138,BATHE
312_tobatheachild,to bathe a child,3170,BATHE (SOMEONE)
Expand Down
Loading
Loading