Skip to content

Commit

Permalink
more metadata
Browse files Browse the repository at this point in the history
  • Loading branch information
anne17 committed Nov 4, 2024
1 parent c5dcff0 commit 5df01ba
Show file tree
Hide file tree
Showing 2 changed files with 133 additions and 0 deletions.
50 changes: 50 additions & 0 deletions sparv/modules/malt/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
id: swe-dependency-malt-treebank
name:
swe: Dependensparsning med MaltParser
eng: Dependency parsing with MaltParser
short_description:
swe: Svensk dependensparsning tränad på Svensk trädbank baserad på MaltParser
eng: Swedish dependency parsing from MaltParser trained on Sweedish treebank
task: dependency parsing
language_codes:
- swe
keywords:
- dependency parsing
annotations:
- <token>:malt.ref
- <token>:malt.dephead_ref
- <token>:malt.deprel
example_output: |-
```xml
<token dephead_ref="4" deprel="SS" ref="1">Alfred</token>
<token dephead_ref="1" deprel="HD" ref="2">Bernhard</token>
<token dephead_ref="1" deprel="HD" ref="3">Nobel</token>
<token deprel="ROOT" ref="4">var</token>
<token dephead_ref="8" deprel="DT" ref="5">en</token>
<token dephead_ref="8" deprel="AT" ref="6">svensk</token>
<token dephead_ref="8" deprel="CJ" ref="7">kemist</token>
<token dephead_ref="9" deprel="DT" ref="8">och</token>
<token dephead_ref="4" deprel="SP" ref="9">stiftare</token>
<token dephead_ref="9" deprel="ET" ref="10">av</token>
<token dephead_ref="10" deprel="PA" ref="11">Nobelpriset</token>
<token dephead_ref="4" deprel="IP" ref="12">.</token>
```
standard_reference: |-
Joakim Nivre, Johan Hall, and Jens Nilsson. 2006. MaltParser: A Data-Driven Parser-Generator for Dependency Parsing.
In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy.
European Language Resources Association (ELRA).
other_references:
- "Maltparser: https://www.maltparser.org/download.html"
- 'https://aclanthology.org/2021.nodalida-main.20/'
tool: "Maltparser"
model: "[Swemalt](https://www.maltparser.org/mco/swedish_parser/swemalt.html)"
trained_on: "[Svensk trädbank (the TalbankenSTB part)](https://spraakbanken.gu.se/resurser/sv-treebank)"
tagset: "[MambaDep](https://svn.spraakdata.gu.se/sb-arkiv/pub/mamba.html)"
evaluation_results: Labelled Attachment Score 0.78 (using the TalbankenSBX train-dev-test split)
description:
swe: |-
Denna Maltparser model har konfigurerats för svenska och tränats på TalbankenSTB-korpusen.
eng: |-
This MaltParser model configured for Swedish has been trained on the TalbankenSTB corpus.
created: 2010-12-15
updated: 2021-06-01
83 changes: 83 additions & 0 deletions sparv/modules/swener/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
id: swe-namedentity-swener
name:
swe: Namnigenkänning med HFST-SweNER
eng: Named entity recognition with HFST-SweNER
short_description:
swe: Namnigenkänning känner igen och förser namn och namnliknande uttryck (s.k. entiteter) i löpande text med fördefinierade etiketter, som organisation, person eller plats.
eng: Named entity recognition (NER) recognises named entities such as locations, persons and time expressions in text.
task: named entity recognition
language_codes:
- swe
keywords:
- ner
annotations:
- swener.ne
- swener.ne:swener.name
- swener.ne:swener.ex
- swener.ne:swener.type
- swener.ne:swener.subtype
example_output: |-
```xml
<ne ex="ENAMEX" name="Alfred Bernhard Nobel" subtype="HUM" type="PRS">
<token>Alfred</token>
<token>Bernhard</token>
<token>Nobel</token>
</ne>
<token>,</token>
<token>född</token>
<ne ex="TIMEX" name="21 oktober 1833" subtype="DAT" type="TME">
<token>21</token>
<token>oktober</token>
<token>1833</token>
</ne>
<token>i</token>
<ne ex="ENAMEX" name="Stockholm" subtype="PPL" type="LOC">
<token>Stockholm</token>
</ne>
<token>,</token>
<ne ex="ENAMEX" name="Italien" subtype="PPL" type="LOC">
<token>Italien</token>
</ne>
<token>,</token>
<token>var</token>
<token>en</token>
<token>svensk</token>
<token>kemist</token>
<token>och</token>
<token>stiftare</token>
<token>av</token>
<ne ex="ENAMEX" name="Nobelpriset" subtype="PRZ" type="OBJ">
<token>Nobelpriset</token>
</ne>
```
standard_reference: |-
[Dimitrios Kokkinakis, Jyrki Niemi, Sam Hardwick, Krister Lindén, and Lars Borin. 2014. HFST-SweNER — A New NER
Resource for Swedish. In Proceedings of the Ninth International Conference on Language Resources and Evaluation
(LREC'14), pages 2537-2543, Reykjavik, Iceland. European Language Resources Association
(ELRA).](http://www.lrec-conf.org/proceedings/lrec2014/pdf/391_Paper.pdf)
other_references:
- "[Dimitrios Kokkinakis. 2004. Reducing the effect of name explosion](https://demo.spraakbanken.gu.se/svedk/pbl/kokkinakisBNER.pdf)"
- "Download HFST-SweNER: https://www.kielipankki.fi/download/HFST-SweNER/"
tool: "HFST-SweNER"
model: "Included in the tool"
trained_on: ''
tagset: "[Named entity tags from hfst-SweNER](https://svn.spraakdata.gu.se/sb-arkiv/pub/swener-tags.html)"
evaluation_results: "f-score between 91.33% to 27.48%, depending on the named entity category"
description:
swe: |-
Namnigenkänning är en språkteknologisk tekniks som automatiskt känner igen och förser namn och namnliknande uttryck
(s.k. entiteter) i löpande text med fördefinierade etiketter, som t. ex. person eller organisationer, men, beroende
på tillämpningsområdet, även numeriska uttryck och tidsuttryck. HFST-SweNER bygger på konvertering, modellering och
anpassning av en tidigare svenskt NER-system till Helsinki Finite-State Transducer Technology (HFST)-plattformen.
HFST-SweNER är en fullfjädrad implementering med öppen källkod som stöder en mängd olika generiska namngivna
entitetstyper och består av flera lexikala resurslager såsom olika n-gram-baserade namngivna namnlistor (s.k.
gazetteers).
eng: |-
Named entity recognition (NER) recognises textual mentions of named entities that belong to a predefined set of
categories, such as locations, and time expressions. HFST-SweNER is based on the conversion, modelling and
adaptation of a Swedish NER system from a hybrid environment to the Helsinki Finite-State Transducer Technology
(HFST) platform. HFST-SweNER is a full-fledged open source implementation that supports a variety of generic named
entity types and consists of multiple, reusable resource layers such as various n-gram-based named entity lists
(gazetteers).
created: 2014-07-04
updated: 2020-05-13

0 comments on commit 5df01ba

Please sign in to comment.