-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
331 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
id: geo-parent | ||
abstract: true | ||
task: geotagging | ||
language_codes: | ||
- swe | ||
keywords: | ||
- geotagging | ||
standard_reference: '' | ||
other_references: [] | ||
tool: '' | ||
model: "[GeoNames](https://www.geonames.org/)" | ||
trained_on: '' | ||
tagset: '' | ||
evaluation_results: '' | ||
created: 2018-05-28 | ||
updated: 2022-05-18 | ||
--- | ||
id: swe-geotagcontext-sparv | ||
parent: geo-parent | ||
name: | ||
swe: Geotaggning av platsnamn från kontext | ||
eng: Geotagging of place names from context | ||
short_description: | ||
swe: Annotering av texter med platsinformation, baserad på platser som finns i texten | ||
eng: Annotate text chunks with location data, based on locations contained within the text | ||
annotations: | ||
- <text>:geo.geo_context | ||
- <paragraph>:geo.geo_context | ||
- <sentence>:geo.geo_context | ||
example_output: |- | ||
```xml | ||
<text geo_context="|Göteborg;SE;57.70716;11.96679|Torslanda;SE;57.72432;11.77013|"> | ||
<paragraph geo_context="|Torslanda;SE;57.72432;11.77013|Göteborg;SE;57.70716;11.96679|"> | ||
<sentence geo_context="|Göteborg;SE;57.70716;11.96679|Torslanda;SE;57.72432;11.77013|"> | ||
<token>Varje</token> | ||
<token>tisdag</token> | ||
<token>kommer</token> | ||
<token>en</token> | ||
<token>leverans</token> | ||
<token>av</token> | ||
<token>lådor</token> | ||
<token>med</token> | ||
<token>matsvinn</token> | ||
<token>från</token> | ||
<token>Ica</token> | ||
<token>Maxi</token> | ||
<token>i</token> | ||
<token>Torslanda</token> | ||
<token>till</token> | ||
<token>förskolan</token> | ||
<token>i</token> | ||
<token>Göteborg</token> | ||
<token>.</token> | ||
</sentence> | ||
</paragraph> | ||
</text> | ||
``` | ||
description: | ||
swe: |- | ||
Texter berikas med platsnamn (och deras geografiska koordinater) som finns i dem. Detta är baserat på platsnamn som | ||
hittats genom namnigenkänning med [SweNer](https://spraakbanken.gu.se/analyser/swe-namedentity-swener). Geografiska | ||
koordinater letas upp i [GeoNames-databasen](https://www.geonames.org/). Denna annotation kan användas på valfria | ||
textspann såsom text, stycke, mening eller token. | ||
eng: |- | ||
Text chunks are enriched with place names (and their geographic coordinates) occurring within them. This is based on | ||
the place names found by the named entity tagger | ||
[SweNer](https://spraakbanken.gu.se/en/analyses/swe-namedentity-swener). Geographical coordinates are looked up in | ||
the [GeoNames database](https://www.geonames.org/). This annotation can be applied to any text chunk, e.g. texts, | ||
paragraphs, sentences or tokens. | ||
--- | ||
id: swe-geotagmetadata-sparv | ||
parent: geo-parent | ||
name: | ||
swe: Geotagging av platsnamn från metadata | ||
eng: Geotagging of place names from metadata | ||
short_description: | ||
swe: Annotering av texter med platsinformation, baserad på platser som finns i texten | ||
eng: Annotate text chunks with location data, based on metadata containing location names | ||
annotations: | ||
- <text>:geo.geo_metadata | ||
example_output: |- | ||
```xml | ||
<text author_location="Göteborg" geo_metadata="|Göteborg;SE;57.70716;11.96679|"> | ||
<token>Det</token> | ||
<token>var</token> | ||
<token>då</token> | ||
<token>änna</token> | ||
<token>bösigt</token> | ||
<token>i</token> | ||
<token>bamban</token> | ||
<token>!</token> | ||
</text> | ||
``` | ||
example_extra: |- | ||
In order to use this annotation you need to tell Sparv where to look for the geographic metadata. If, for example, | ||
your corpus looks like this: | ||
```xml | ||
<text author_location="Göteborg">Det var då änna bösigt i bamban!</text> | ||
``` | ||
and you would like to use `author_location` as input for your annotation you need to add the following setting to your | ||
Sparv corpus configuration file: | ||
```yaml | ||
geo: | ||
metadata_source: text:author_location | ||
``` | ||
description: | ||
swe: |- | ||
Texter berikas med platsnamn (och deras geografiska koordinater) som finns i dess metadata. Detta är baserat på | ||
platsnamn som hittats genom namnigenkänning med | ||
[SweNer](https://spraakbanken.gu.se/analyser/swe-namedentity-swener). Geografiska koordinater letas upp i | ||
[GeoNames-databasen](https://www.geonames.org/). Denna annotation kan användas på valfria textspann och valfria | ||
attribut som innehåller platsnamn. | ||
eng: |- | ||
Text chunks are enriched with place names (and their geographic coordinates) occurring within them. This is based on | ||
the place names found by the named entity tagger | ||
[SweNer](https://spraakbanken.gu.se/en/analyses/swe-namedentity-swener). Geographical coordinates are looked up in | ||
the [GeoNames database](https://www.geonames.org/). This annotation can be applied to any text chunk and any | ||
attribute containing place names. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,211 @@ | ||
id: hunpos-parent | ||
abstract: true | ||
language_codes: | ||
- swe | ||
standard_reference: '' | ||
other_references: | ||
- "Hunpos: https://code.google.com/archive/p/hunpos/" | ||
tool: "Hunpos" | ||
trained_on: "[SUC3](https://spraakbanken.gu.se/resurser/suc3)" | ||
tagset: "[SUC3](https://spraakbanken.gu.se/korp/markup/msdtags.html)" | ||
evaluation_results: '' | ||
--- | ||
id: swe-pos-hunpos-suc3 | ||
parent: hunpos-parent | ||
name: | ||
swe: SUC-ordklasstaggning med Hunpos | ||
eng: SUC part-of-speech tagging with Hunpos | ||
short_description: | ||
swe: Annotering av SUC-ordklasser med Hunpos för svenska | ||
eng: Swedish part-of-speech annotation with SUC tags by Hunpos | ||
task: part-of-speech tagging | ||
keywords: | ||
- pos-tagging | ||
annotations: | ||
- <token>:hunpos.pos | ||
example_output: |- | ||
```xml | ||
<token pos="PN">Det</token> | ||
<token pos="AB">här</token> | ||
<token pos="VB">är</token> | ||
<token pos="DT">en</token> | ||
<token pos="NN">korpus</token> | ||
<token pos="MAD">.</token> | ||
``` | ||
model: "[suc3_suc-tags_default-setting_utf8.model](https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true)" | ||
description: | ||
swe: |- | ||
Meningssegment analyseras och annoteras med ordklasstaggar. Ingår inte längre i | ||
Sparvs standardanalyser eftersom Stanzas ordklassannotering ger bättre resultat. | ||
eng: |- | ||
Sentence segments are analysed to enrich tokens with part-of-speech tags. No longer | ||
used by default by Sparv because Stanza's POS-tagging yields better results. | ||
created: 2010-12-15 | ||
updated: 2018-05-28 | ||
--- | ||
id: swe-msd-hunpos-suc3 | ||
parent: hunpos-parent | ||
name: | ||
swe: Morfosyntaktisk SUC-taggning med Hunpos | ||
eng: Tagging of morphological features (SUC) by Hunpos | ||
short_description: | ||
swe: Annotering av morfosyntaktiska deskriptorer (SUC) med Hunpos för svenska | ||
eng: Annotation of morphological features (SUC) by Hunpos for Swedish | ||
task: morphosyntactic tagging | ||
keywords: | ||
- msd | ||
annotations: | ||
- <token>:hunpos.msd | ||
example_output: |- | ||
```xml | ||
<token msd="PN.NEU.SIN.DEF.SUB+OBJ">Det</token> | ||
<token msd="AB">här</token> | ||
<token msd="VB.PRS.AKT">är</token> | ||
<token msd="DT.UTR.SIN.IND">en</token> | ||
<token msd="NN.UTR.SIN.IND.NOM">korpus</token> | ||
<token msd="MAD">.</token> | ||
``` | ||
model: "[suc3_suc-tags_default-setting_utf8.model](https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true)" | ||
description: | ||
swe: |- | ||
Meningssegment analyseras och annoteras med ordklasstaggar och morfosyntaktisk information. Ingår inte längre i | ||
Sparvs standardanalyser eftersom Stanzas ordklassannotering ger bättre resultat. | ||
eng: |- | ||
Sentence segments are analysed to enrich tokens with part-of-speech tags and morphosyntactic information. No longer | ||
used by default by Sparv because Stanza's POS-tagging yields better results. | ||
created: 2010-12-15 | ||
updated: 2018-05-28 | ||
--- | ||
id: swe-pos-hunpos-suc3-1800 | ||
parent: hunpos-parent | ||
name: | ||
swe: SUC-ordklasstaggning med Hunpos för 1800-talssvenska | ||
eng: SUC part-of-speech tagging with Hunpos for Swedish from the 1800's | ||
short_description: | ||
swe: Annotering av SUC-ordklasser med Hunpos för 1800-talssvenska | ||
eng: Part-of-speech annotation with SUC tags by Hunpos for Swedish from the 1800's | ||
task: part-of-speech tagging | ||
keywords: | ||
- pos-tagging | ||
annotations: | ||
- <token>:hunpos.pos | ||
example_output: |- | ||
```xml | ||
<token pos="NN">Lådan</token> | ||
<token pos="VB">var</token> | ||
<token pos="PC">upphängd</token> | ||
<token pos="PP">under</token> | ||
<token pos="DT">den</token> | ||
<token pos="NN">waggon</token> | ||
<token pos="HA">hvari</token> | ||
<token pos="DT">de</token> | ||
<token pos="JJ">andra</token> | ||
<token pos="NN">djuren</token> | ||
<token pos="VB">befunno</token> | ||
<token pos="PN">sig</token> | ||
<token pos="MAD">.</token> | ||
``` | ||
model: |- | ||
- [suc3_suc-tags_default-setting_utf8.model](https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true) | ||
- a word list along with the words' morphosyntactic information generated from the [Dalin | ||
morphology](https://spraakbanken.gu.se/resurser/dalinm) and the [Swedberg | ||
morphology](https://spraakbanken.gu.se/resurser/swedbergm) | ||
description: | ||
swe: |- | ||
Meningssegment analyseras och annoteras med ordklasstaggar. Utöver ordklasstaggningsmodellen använder Hunpos listor | ||
med böjningsformer för att kunna generera bättre ordklasstaggar för 1800-talssvenska. | ||
eng: |- | ||
Sentence segments are analysed to enrich tokens with part-of-speech tags. In addition to the pos model inflection | ||
lists are provided to Hunpos to make more accuare part-of-speech predictions for Swedish from the 1800's. | ||
created: 2012-10-23 | ||
updated: 2015-09-11 | ||
--- | ||
id: swe-pos-hunpos-suc3-1800 | ||
parent: hunpos-parent | ||
name: | ||
swe: SUC-ordklasstaggning med Hunpos för 1800-talssvenska | ||
eng: SUC part-of-speech tagging with Hunpos for Swedish from the 1800's | ||
short_description: | ||
swe: Annotering av SUC-ordklasser med Hunpos för 1800-talssvenska | ||
eng: Part-of-speech annotation with SUC tags by Hunpos for Swedish from the 1800's | ||
task: part-of-speech tagging | ||
keywords: | ||
- pos-tagging | ||
annotations: | ||
- <token>:hunpos.pos | ||
example_output: |- | ||
```xml | ||
<token pos="NN">Lådan</token> | ||
<token pos="VB">var</token> | ||
<token pos="PC">upphängd</token> | ||
<token pos="PP">under</token> | ||
<token pos="DT">den</token> | ||
<token pos="NN">waggon</token> | ||
<token pos="HA">hvari</token> | ||
<token pos="DT">de</token> | ||
<token pos="JJ">andra</token> | ||
<token pos="NN">djuren</token> | ||
<token pos="VB">befunno</token> | ||
<token pos="PN">sig</token> | ||
<token pos="MAD">.</token> | ||
``` | ||
model: |- | ||
- [suc3_suc-tags_default-setting_utf8.model](https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true) | ||
- a word list along with the words' morphosyntactic information generated from the [Dalin | ||
morphology](https://spraakbanken.gu.se/resurser/dalinm) and the [Swedberg | ||
morphology](https://spraakbanken.gu.se/resurser/swedbergm) | ||
description: | ||
swe: |- | ||
Meningssegment analyseras och annoteras med ordklasstaggar. Utöver ordklasstaggningsmodellen använder Hunpos listor | ||
med böjningsformer för att kunna generera bättre ordklasstaggar för 1800-talssvenska. | ||
eng: |- | ||
Sentence segments are analysed to enrich tokens with part-of-speech tags. In addition to the pos model inflection | ||
lists are provided to Hunpos to make more accuare part-of-speech predictions for Swedish from the 1800's. | ||
created: 2012-10-23 | ||
updated: 2015-09-11 | ||
--- | ||
id: swe-msd-hunpos-suc3-1800 | ||
parent: hunpos-parent | ||
name: | ||
swe: Morfosyntaktisk SUC-taggning med Hunpos för 1800-talssvenska | ||
eng: Tagging of morphological features (SUC) by Hunpos for Swedish from the 1800's | ||
short_description: | ||
swe: Annotering av morfosyntaktiska deskriptorer (SUC) med Hunpos för 1800-talssvenska | ||
eng: Annotation of morphological features (SUC) by Hunpos for Swedish from the 1800's | ||
task: morphosyntactic tagging | ||
keywords: | ||
- msd | ||
annotations: | ||
- <token>:hunpos.msd | ||
example_output: |- | ||
```xml | ||
<token msd="NN.UTR.SIN.DEF.NOM">Lådan</token> | ||
<token msd="VB.PRT.AKT">var</token> | ||
<token msd="PC.PRF.UTR.SIN.IND.NOM">upphängd</token> | ||
<token msd="PP">under</token> | ||
<token msd="DT.UTR.SIN.DEF">den</token> | ||
<token msd="NN.UTR.SIN.IND.NOM">waggon</token> | ||
<token msd="HA">hvari</token> | ||
<token msd="DT.UTR+NEU.PLU.DEF">de</token> | ||
<token msd="JJ.POS.UTR+NEU.PLU.IND+DEF.NOM">andra</token> | ||
<token msd="NN.NEU.PLU.DEF.NOM">djuren</token> | ||
<token msd="VB.INF.AKT">befunno</token> | ||
<token msd="PN.UTR+NEU.SIN+PLU.DEF.OBJ">sig</token> | ||
<token msd="MAD">.</token> | ||
``` | ||
model: |- | ||
- [suc3_suc-tags_default-setting_utf8.model](https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true) | ||
- a word list along with the words' morphosyntactic information generated from the [Dalin | ||
morphology](https://spraakbanken.gu.se/resurser/dalinm) and the [Swedberg | ||
morphology](https://spraakbanken.gu.se/resurser/swedbergm) | ||
description: | ||
swe: |- | ||
Meningssegment analyseras och annoteras med ordklasstaggar och morfosyntaktisk information. Utöver | ||
ordklasstaggningsmodellen använder Hunpos listor med böjningsformer för att kunna generera bättre ordklasstaggar för | ||
1800-talssvenska. | ||
eng: |- | ||
Sentence segments are analysed to enrich tokens with part-of-speech tags and morphosyntactic information. In | ||
addition to the pos model inflection lists are provided to Hunpos to make more accuare part-of-speech predictions | ||
for Swedish from the 1800's. | ||
created: 2012-10-23 | ||
updated: 2015-09-11 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters