Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Each as PRON instead of DET #11

Open
AngledLuffa opened this issue Oct 27, 2023 · 9 comments
Open

Each as PRON instead of DET #11

AngledLuffa opened this issue Oct 27, 2023 · 9 comments

Comments

@AngledLuffa
Copy link
Contributor

The standard in English treebanks is to label each as a DET:

# sent_id = weblog-blogspot.com_alaindewitt_20040929103700_ENG_20040929_103700-0079
# text = His column appears in The Hill each week.
1       His     his     PRON    PRP$    Case=Gen|Gender=Masc|Number=Sing|Person=3|Poss=Yes|PronType=Prs 2       nmod:poss       2:nmod:poss     _
2       column  column  NOUN    NN      Number=Sing     3       nsubj   3:nsubj _
3       appears appear  VERB    VBZ     Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   0       root    0:root  _
4       in      in      ADP     IN      _       6       case    6:case  _
5       The     the     DET     DT      Definite=Def|PronType=Art       6       det     6:det   _
6       Hill    Hill    PROPN   NNP     Number=Sing     3       obl     3:obl:in        _
7       each    each    DET     DT      _       8       det     8:det   _
8       week    week    NOUN    NN      Number=Sing     3       obl:tmod        3:obl:tmod      SpaceAfter=No
9       .       .       PUNCT   .       _       3       punct   3:punct _

Any objections to updating the each UPOS from PRON to DET?

@rueter
Copy link

rueter commented Oct 27, 2023

I trust this is on a by-context basis. I can also think of an ADV reading:

They gave us two apples each.

@AngledLuffa
Copy link
Contributor Author

AngledLuffa commented Oct 27, 2023 via email

@nschneid
Copy link

nschneid commented Oct 27, 2023

I think EWT probably has a consistent policy on "each". (Demonstratives not in det function become PRON, but most other DETs stay DET: https://universaldependencies.org/en/pos/DET.html)

@AngledLuffa
Copy link
Contributor Author

@nschneid that's not what happens in practice in EWT. each other gets tagged DET with the feature ExtPos=PRON, and all of the others just get tagged DET. For example:

# the spaces are each are actually tabs
grep " each    " extern_data/ud2/git/UD_English-EWT/en_ewt-ud-train.conllu | cut -f 4 | sort | uniq
DET

@nschneid
Copy link

Right, I'm saying only demonstratives and relativizers become PRON in EWT. "each" is neither so it stays DET.

@AngledLuffa
Copy link
Contributor Author

AngledLuffa commented Oct 27, 2023 via email

@LarsAhrenberg
Copy link
Contributor

In that case, unless there are objections from the stakeholders,

I do object to anyone making changes to UD_English-LinES without my consent. I welcome that errors are pointed out, where they exist, but I much prefer to correct them myself. So, please don't.

While I appreciate the efforts to improve consistency between treebanks I fear it sometimes is going too fast. Many decisions are now being made relating to EWT and GUM that revoke earlier decisions and where's the guarantee that they will not be changed in the future? And minor inconsistencies cannot be that harmful. Moreover, increasing consistency in one place may cause inconsistency in others, e.g. between languages. I have annotated data that I maintain to add to the treebank in the future and that will obviously be less consistent with the public UD data if anyone can just go in and make changes. Besides, I think the decision to declare each as always DET is misguided and one that may well be changed in the future. See the example above from @rueter.

@AngledLuffa
Copy link
Contributor Author

AngledLuffa commented Oct 28, 2023 via email

@LarsAhrenberg
Copy link
Contributor

To be consistent with myself I'll do a review of each before Nov. 1st and change back, where necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants