Skip to content

Commit

Permalink
more analysis metadata
Browse files Browse the repository at this point in the history
  • Loading branch information
anne17 committed Nov 11, 2024
1 parent 4959865 commit 28ab09b
Show file tree
Hide file tree
Showing 3 changed files with 331 additions and 2 deletions.
118 changes: 118 additions & 0 deletions sparv/modules/geo/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
id: geo-parent
abstract: true
task: geotagging
language_codes:
- swe
keywords:
- geotagging
standard_reference: ''
other_references: []
tool: ''
model: "[GeoNames](https://www.geonames.org/)"
trained_on: ''
tagset: ''
evaluation_results: ''
created: 2018-05-28
updated: 2022-05-18
---
id: swe-geotagcontext-sparv
parent: geo-parent
name:
swe: Geotaggning av platsnamn från kontext
eng: Geotagging of place names from context
short_description:
swe: Annotering av texter med platsinformation, baserad på platser som finns i texten
eng: Annotate text chunks with location data, based on locations contained within the text
annotations:
- <text>:geo.geo_context
- <paragraph>:geo.geo_context
- <sentence>:geo.geo_context
example_output: |-
```xml
<text geo_context="|Göteborg;SE;57.70716;11.96679|Torslanda;SE;57.72432;11.77013|">
<paragraph geo_context="|Torslanda;SE;57.72432;11.77013|Göteborg;SE;57.70716;11.96679|">
<sentence geo_context="|Göteborg;SE;57.70716;11.96679|Torslanda;SE;57.72432;11.77013|">
<token>Varje</token>
<token>tisdag</token>
<token>kommer</token>
<token>en</token>
<token>leverans</token>
<token>av</token>
<token>lådor</token>
<token>med</token>
<token>matsvinn</token>
<token>från</token>
<token>Ica</token>
<token>Maxi</token>
<token>i</token>
<token>Torslanda</token>
<token>till</token>
<token>förskolan</token>
<token>i</token>
<token>Göteborg</token>
<token>.</token>
</sentence>
</paragraph>
</text>
```
description:
swe: |-
Texter berikas med platsnamn (och deras geografiska koordinater) som finns i dem. Detta är baserat på platsnamn som
hittats genom namnigenkänning med [SweNer](https://spraakbanken.gu.se/analyser/swe-namedentity-swener). Geografiska
koordinater letas upp i [GeoNames-databasen](https://www.geonames.org/). Denna annotation kan användas på valfria
textspann såsom text, stycke, mening eller token.
eng: |-
Text chunks are enriched with place names (and their geographic coordinates) occurring within them. This is based on
the place names found by the named entity tagger
[SweNer](https://spraakbanken.gu.se/en/analyses/swe-namedentity-swener). Geographical coordinates are looked up in
the [GeoNames database](https://www.geonames.org/). This annotation can be applied to any text chunk, e.g. texts,
paragraphs, sentences or tokens.
---
id: swe-geotagmetadata-sparv
parent: geo-parent
name:
swe: Geotagging av platsnamn från metadata
eng: Geotagging of place names from metadata
short_description:
swe: Annotering av texter med platsinformation, baserad på platser som finns i texten
eng: Annotate text chunks with location data, based on metadata containing location names
annotations:
- <text>:geo.geo_metadata
example_output: |-
```xml
<text author_location="Göteborg" geo_metadata="|Göteborg;SE;57.70716;11.96679|">
<token>Det</token>
<token>var</token>
<token>då</token>
<token>änna</token>
<token>bösigt</token>
<token>i</token>
<token>bamban</token>
<token>!</token>
</text>
```
example_extra: |-
In order to use this annotation you need to tell Sparv where to look for the geographic metadata. If, for example,
your corpus looks like this:
```xml
<text author_location="Göteborg">Det var då änna bösigt i bamban!</text>
```
and you would like to use `author_location` as input for your annotation you need to add the following setting to your
Sparv corpus configuration file:
```yaml
geo:
metadata_source: text:author_location
```
description:
swe: |-
Texter berikas med platsnamn (och deras geografiska koordinater) som finns i dess metadata. Detta är baserat på
platsnamn som hittats genom namnigenkänning med
[SweNer](https://spraakbanken.gu.se/analyser/swe-namedentity-swener). Geografiska koordinater letas upp i
[GeoNames-databasen](https://www.geonames.org/). Denna annotation kan användas på valfria textspann och valfria
attribut som innehåller platsnamn.
eng: |-
Text chunks are enriched with place names (and their geographic coordinates) occurring within them. This is based on
the place names found by the named entity tagger
[SweNer](https://spraakbanken.gu.se/en/analyses/swe-namedentity-swener). Geographical coordinates are looked up in
the [GeoNames database](https://www.geonames.org/). This annotation can be applied to any text chunk and any
attribute containing place names.
211 changes: 211 additions & 0 deletions sparv/modules/hunpos/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
id: hunpos-parent
abstract: true
language_codes:
- swe
standard_reference: ''
other_references:
- "Hunpos: https://code.google.com/archive/p/hunpos/"
tool: "Hunpos"
trained_on: "[SUC3](https://spraakbanken.gu.se/resurser/suc3)"
tagset: "[SUC3](https://spraakbanken.gu.se/korp/markup/msdtags.html)"
evaluation_results: ''
---
id: swe-pos-hunpos-suc3
parent: hunpos-parent
name:
swe: SUC-ordklasstaggning med Hunpos
eng: SUC part-of-speech tagging with Hunpos
short_description:
swe: Annotering av SUC-ordklasser med Hunpos för svenska
eng: Swedish part-of-speech annotation with SUC tags by Hunpos
task: part-of-speech tagging
keywords:
- pos-tagging
annotations:
- <token>:hunpos.pos
example_output: |-
```xml
<token pos="PN">Det</token>
<token pos="AB">här</token>
<token pos="VB">är</token>
<token pos="DT">en</token>
<token pos="NN">korpus</token>
<token pos="MAD">.</token>
```
model: "[suc3_suc-tags_default-setting_utf8.model](https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true)"
description:
swe: |-
Meningssegment analyseras och annoteras med ordklasstaggar. Ingår inte längre i
Sparvs standardanalyser eftersom Stanzas ordklassannotering ger bättre resultat.
eng: |-
Sentence segments are analysed to enrich tokens with part-of-speech tags. No longer
used by default by Sparv because Stanza's POS-tagging yields better results.
created: 2010-12-15
updated: 2018-05-28
---
id: swe-msd-hunpos-suc3
parent: hunpos-parent
name:
swe: Morfosyntaktisk SUC-taggning med Hunpos
eng: Tagging of morphological features (SUC) by Hunpos
short_description:
swe: Annotering av morfosyntaktiska deskriptorer (SUC) med Hunpos för svenska
eng: Annotation of morphological features (SUC) by Hunpos for Swedish
task: morphosyntactic tagging
keywords:
- msd
annotations:
- <token>:hunpos.msd
example_output: |-
```xml
<token msd="PN.NEU.SIN.DEF.SUB+OBJ">Det</token>
<token msd="AB">här</token>
<token msd="VB.PRS.AKT">är</token>
<token msd="DT.UTR.SIN.IND">en</token>
<token msd="NN.UTR.SIN.IND.NOM">korpus</token>
<token msd="MAD">.</token>
```
model: "[suc3_suc-tags_default-setting_utf8.model](https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true)"
description:
swe: |-
Meningssegment analyseras och annoteras med ordklasstaggar och morfosyntaktisk information. Ingår inte längre i
Sparvs standardanalyser eftersom Stanzas ordklassannotering ger bättre resultat.
eng: |-
Sentence segments are analysed to enrich tokens with part-of-speech tags and morphosyntactic information. No longer
used by default by Sparv because Stanza's POS-tagging yields better results.
created: 2010-12-15
updated: 2018-05-28
---
id: swe-pos-hunpos-suc3-1800
parent: hunpos-parent
name:
swe: SUC-ordklasstaggning med Hunpos för 1800-talssvenska
eng: SUC part-of-speech tagging with Hunpos for Swedish from the 1800's
short_description:
swe: Annotering av SUC-ordklasser med Hunpos för 1800-talssvenska
eng: Part-of-speech annotation with SUC tags by Hunpos for Swedish from the 1800's
task: part-of-speech tagging
keywords:
- pos-tagging
annotations:
- <token>:hunpos.pos
example_output: |-
```xml
<token pos="NN">Lådan</token>
<token pos="VB">var</token>
<token pos="PC">upphängd</token>
<token pos="PP">under</token>
<token pos="DT">den</token>
<token pos="NN">waggon</token>
<token pos="HA">hvari</token>
<token pos="DT">de</token>
<token pos="JJ">andra</token>
<token pos="NN">djuren</token>
<token pos="VB">befunno</token>
<token pos="PN">sig</token>
<token pos="MAD">.</token>
```
model: |-
- [suc3_suc-tags_default-setting_utf8.model](https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true)
- a word list along with the words' morphosyntactic information generated from the [Dalin
morphology](https://spraakbanken.gu.se/resurser/dalinm) and the [Swedberg
morphology](https://spraakbanken.gu.se/resurser/swedbergm)
description:
swe: |-
Meningssegment analyseras och annoteras med ordklasstaggar. Utöver ordklasstaggningsmodellen använder Hunpos listor
med böjningsformer för att kunna generera bättre ordklasstaggar för 1800-talssvenska.
eng: |-
Sentence segments are analysed to enrich tokens with part-of-speech tags. In addition to the pos model inflection
lists are provided to Hunpos to make more accuare part-of-speech predictions for Swedish from the 1800's.
created: 2012-10-23
updated: 2015-09-11
---
id: swe-pos-hunpos-suc3-1800
parent: hunpos-parent
name:
swe: SUC-ordklasstaggning med Hunpos för 1800-talssvenska
eng: SUC part-of-speech tagging with Hunpos for Swedish from the 1800's
short_description:
swe: Annotering av SUC-ordklasser med Hunpos för 1800-talssvenska
eng: Part-of-speech annotation with SUC tags by Hunpos for Swedish from the 1800's
task: part-of-speech tagging
keywords:
- pos-tagging
annotations:
- <token>:hunpos.pos
example_output: |-
```xml
<token pos="NN">Lådan</token>
<token pos="VB">var</token>
<token pos="PC">upphängd</token>
<token pos="PP">under</token>
<token pos="DT">den</token>
<token pos="NN">waggon</token>
<token pos="HA">hvari</token>
<token pos="DT">de</token>
<token pos="JJ">andra</token>
<token pos="NN">djuren</token>
<token pos="VB">befunno</token>
<token pos="PN">sig</token>
<token pos="MAD">.</token>
```
model: |-
- [suc3_suc-tags_default-setting_utf8.model](https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true)
- a word list along with the words' morphosyntactic information generated from the [Dalin
morphology](https://spraakbanken.gu.se/resurser/dalinm) and the [Swedberg
morphology](https://spraakbanken.gu.se/resurser/swedbergm)
description:
swe: |-
Meningssegment analyseras och annoteras med ordklasstaggar. Utöver ordklasstaggningsmodellen använder Hunpos listor
med böjningsformer för att kunna generera bättre ordklasstaggar för 1800-talssvenska.
eng: |-
Sentence segments are analysed to enrich tokens with part-of-speech tags. In addition to the pos model inflection
lists are provided to Hunpos to make more accuare part-of-speech predictions for Swedish from the 1800's.
created: 2012-10-23
updated: 2015-09-11
---
id: swe-msd-hunpos-suc3-1800
parent: hunpos-parent
name:
swe: Morfosyntaktisk SUC-taggning med Hunpos för 1800-talssvenska
eng: Tagging of morphological features (SUC) by Hunpos for Swedish from the 1800's
short_description:
swe: Annotering av morfosyntaktiska deskriptorer (SUC) med Hunpos för 1800-talssvenska
eng: Annotation of morphological features (SUC) by Hunpos for Swedish from the 1800's
task: morphosyntactic tagging
keywords:
- msd
annotations:
- <token>:hunpos.msd
example_output: |-
```xml
<token msd="NN.UTR.SIN.DEF.NOM">Lådan</token>
<token msd="VB.PRT.AKT">var</token>
<token msd="PC.PRF.UTR.SIN.IND.NOM">upphängd</token>
<token msd="PP">under</token>
<token msd="DT.UTR.SIN.DEF">den</token>
<token msd="NN.UTR.SIN.IND.NOM">waggon</token>
<token msd="HA">hvari</token>
<token msd="DT.UTR+NEU.PLU.DEF">de</token>
<token msd="JJ.POS.UTR+NEU.PLU.IND+DEF.NOM">andra</token>
<token msd="NN.NEU.PLU.DEF.NOM">djuren</token>
<token msd="VB.INF.AKT">befunno</token>
<token msd="PN.UTR+NEU.SIN+PLU.DEF.OBJ">sig</token>
<token msd="MAD">.</token>
```
model: |-
- [suc3_suc-tags_default-setting_utf8.model](https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true)
- a word list along with the words' morphosyntactic information generated from the [Dalin
morphology](https://spraakbanken.gu.se/resurser/dalinm) and the [Swedberg
morphology](https://spraakbanken.gu.se/resurser/swedbergm)
description:
swe: |-
Meningssegment analyseras och annoteras med ordklasstaggar och morfosyntaktisk information. Utöver
ordklasstaggningsmodellen använder Hunpos listor med böjningsformer för att kunna generera bättre ordklasstaggar för
1800-talssvenska.
eng: |-
Sentence segments are analysed to enrich tokens with part-of-speech tags and morphosyntactic information. In
addition to the pos model inflection lists are provided to Hunpos to make more accuare part-of-speech predictions
for Swedish from the 1800's.
created: 2012-10-23
updated: 2015-09-11
4 changes: 2 additions & 2 deletions sparv/modules/readability/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ language_codes:
- swe
keywords:
- readability measures
other_references: ''
other_references: []
tool: ''
model: ''
trained_on: ''
Expand Down Expand Up @@ -138,7 +138,7 @@ description:
dividing this by the number of verbs, adverbs and pronouns. A high nominal ratio suggests a high density of
information, which can also mean that the text is difficult to read.
---
id: swe-readability-sparv-nk
id: swe-readability-sparv-ovix
parent: readability-parent
name:
swe: Annotering av Ordvariationsindex (OVIX) för texter
Expand Down

0 comments on commit 28ab09b

Please sign in to comment.