parsing German #221

ManfredBernhard · 2022-06-08T16:55:52Z

I seem to have installed the German language model for spacyr/spacy properly, but the pos parsing is a chaos. See below: aux is parsed as NOUN, Articles are parsed as Nouns, etc.

spacy_initialize(model="de_core_news_sm")
spaCy is already initialized
NULL
> text_german<-c("R ist eine freie Programmiersprache für statistische Berechnungen und Grafiken. Sie wurde von Statistikern für Anwender mit statistischen Aufgaben entwickelt.")
> results_german<-spacy_parse(text_german, dependency=F, lemma=F, tag=T)
> results_german
   doc_id sentence_id token_id              token   pos tag   entity
1   text1           1        1                  R  NOUN  NN         
2   text1           1        2                ist  NOUN  NN         
3   text1           1        3               eine  NOUN  NN         
4   text1           1        4              freie PROPN NNP         
5   text1           1        5 Programmiersprache PROPN NNP    ORG_B
6   text1           1        6                für  NOUN  NN         
7   text1           1        7       statistische  NOUN  NN         
8   text1           1        8       Berechnungen PROPN NNP    ORG_B
9   text1           1        9                und  NOUN  NN         
10  text1           1       10           Grafiken PROPN NNP PERSON_B
11  text1           1       11                  . PUNCT   .         
12  text1           2        1                Sie PROPN NNP         
13  text1           2        2              wurde PROPN NNP PERSON_B
14  text1           2        3                von PROPN NNP PERSON_I
15  text1           2        4       Statistikern PROPN NNP PERSON_I
16  text1           2        5                für  NOUN  NN         
17  text1           3        1           Anwender   ADJ  JJ         
18  text1           3        2                mit  NOUN  NN         
19  text1           3        3      statistischen   ADP  IN         
20  text1           3        4           Aufgaben PROPN NNP    ORG_B
21  text1           3        5         entwickelt  VERB VBD         
22  text1           3        6                  . PUNCT   .

Can somebody tell me what I am doing wrong?
Best,
Manfred

kbenoit · 2022-09-01T10:01:54Z

Does this look any better?

library("spacyr")

spacy_initialize(model = "de_core_news_lg")
#> Found 'spacy_condaenv'. spacyr will use this environment
#> successfully initialized (spaCy Version: 3.4.1, language model: de_core_news_lg)
#> (python options: type = "condaenv", value = "spacy_condaenv")
text_german<-c("R ist eine freie Programmiersprache für statistische Berechnungen und Grafiken. Sie wurde von Statistikern für Anwender mit statistischen Aufgaben entwickelt.")
results_german<-spacy_parse(text_german, dependency=F, lemma=F, tag=T)
results_german
#>    doc_id sentence_id token_id              token   pos   tag entity
#> 1   text1           1        1                  R  NOUN    NN MISC_B
#> 2   text1           1        2                ist   AUX VAFIN       
#> 3   text1           1        3               eine   DET   ART       
#> 4   text1           1        4              freie   ADJ  ADJA       
#> 5   text1           1        5 Programmiersprache  NOUN    NN       
#> 6   text1           1        6                für   ADP  APPR       
#> 7   text1           1        7       statistische   ADJ  ADJA       
#> 8   text1           1        8       Berechnungen  NOUN    NN       
#> 9   text1           1        9                und CCONJ   KON       
#> 10  text1           1       10           Grafiken  NOUN    NN       
#> 11  text1           1       11                  . PUNCT    $.       
#> 12  text1           2        1                Sie  PRON  PPER       
#> 13  text1           2        2              wurde   AUX VAFIN       
#> 14  text1           2        3                von   ADP  APPR       
#> 15  text1           2        4       Statistikern  NOUN    NN       
#> 16  text1           2        5                für   ADP  APPR       
#> 17  text1           2        6           Anwender  NOUN    NN       
#> 18  text1           2        7                mit   ADP  APPR       
#> 19  text1           2        8      statistischen   ADJ  ADJA       
#> 20  text1           2        9           Aufgaben  NOUN    NN       
#> 21  text1           2       10         entwickelt  VERB  VVPP       
#> 22  text1           2       11                  . PUNCT    $.

^{Created on 2022-09-01 with reprex v2.0.2}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parsing German #221

parsing German #221

ManfredBernhard commented Jun 8, 2022 •

edited by kbenoit

Loading

kbenoit commented Sep 1, 2022

parsing German #221

parsing German #221

Comments

ManfredBernhard commented Jun 8, 2022 • edited by kbenoit Loading

kbenoit commented Sep 1, 2022

ManfredBernhard commented Jun 8, 2022 •

edited by kbenoit

Loading