Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An appalling lemma discrepancy #480

Open
AngledLuffa opened this issue Nov 28, 2023 · 12 comments
Open

An appalling lemma discrepancy #480

AngledLuffa opened this issue Nov 28, 2023 · 12 comments
Labels

Comments

@AngledLuffa
Copy link
Contributor

# sent_id = reviews-071650-0012
# text = I was appalled.
1       I       I       PRON    PRP     Case=Nom|Number=Sing|Person=1|PronType=Prs      3       nsubj   3:nsubj _
2       was     be      AUX     VBD     Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin   3       cop     3:cop   _
3       appalled        appalled        ADJ     JJ      Degree=Pos      0       root    0:root  SpaceAfter=No
4       .       .       PUNCT   .       _       3       punct   3:punct _

vs

# sent_id = newsgroup-groups.google.com_INTPunderground_b2c62e87877e4a22_ENG_20050906_165900-0015
24      I       I       PRON    PRP     Case=Nom|Number=Sing|Person=1|PronType=Prs      26      nsubj:pass      26:nsubj:pass   _
25      am      be      AUX     VBP     Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin   26      aux:pass        26:aux:pass     _
26      appalled        appal   VERB    VBN     Tense=Past|VerbForm=Part|Voice=Pass     14      ccomp   14:ccomp        _

Should this be appal instead of appall, anyway? Similar question for enrol, I suppose. It doesn't look right to my American sensibilities.

@AngledLuffa
Copy link
Contributor Author

Thoughts on this? I'd like to switch it to the double letter versions

@nschneid
Copy link
Contributor

nschneid commented Dec 5, 2023

Yeah definitely not "appal". I guess it should be a VERB, with lemma "appall".

@AngledLuffa
Copy link
Contributor Author

That's not the only time there's I was [a-z]+ed with ADJ/JJ as the tag

# sent_id = newsgroup-groups.google.com_INTPunderground_b2c62e87877e4a22_ENG_20050906_165900-0048
# text = ... I was shocked at the lack of racial diversity.
30      shocked shocked ADJ     JJ      Degree=Pos      0       root    0:root  _

# sent_id = reviews-126171-0003
# text = ... but I was disappointed with their customer service.
16      disappointed    disappointed    ADJ     JJ      Degree=Pos      4       conj    4:conj:but      _


# sent_id = reviews-360937-0002
# text = I must say, I was impressed with the size ...
7       impressed       impressed       ADJ     JJ      Degree=Pos      3       ccomp   3:ccomp _


# sent_id = email-enronsent12_01-0069
# text = I didn't feel guilty about the garage sale, that's why I was annoyed - being notified at 10:00 at night GRRRRRRR.
16      annoyed annoyed ADJ     JJ      Degree=Pos      13      advcl:relcl     13:advcl:relcl  _

# sent_id = email-enronsent37_01-0105
# text = And I was relieved when Nicki called to let me know she was home safe.
4       relieved        relieved        ADJ     JJ      Degree=Pos      0       root    0:root  _


# sent_id = newsgroup-groups.google.com_herpesnation_c74170a0fcfdc880_ENG_20051125_075200-0011
# text = ... before I was finished being a teenager.
35      finished        finished        ADJ     JJ      Degree=Pos      28      advcl   28:advcl:before _

although not consistently so:

# sent_id = weblog-blogspot.com_rigorousintuition_20060511134300_ENG_20060511_134300-0249
# text = I was amazed at the spiel they delivered.
3       amazed  amaze   VERB    VBN     Tense=Past|VerbForm=Part|Voice=Pass     0       root    0:root  _

It looks like usually a state of mind is tagged as ADJ here

@nschneid
Copy link
Contributor

nschneid commented Dec 5, 2023

Deciding VERB (past participle) vs. ADJ is very nuanced. I defer to @amir-zeldes when it comes up. (One test for ADJ is un- negation, which works for some of these. Another is very modification. I'm not sure there's a good reason that shocked should be ADJ but amazed should be VERB.)

@amir-zeldes
Copy link
Contributor

It is indeed very tricky, since the original PTB guidelines are internally inconsistent/offer contradictory tests without a ranking. I do rely on PTB/ON precendents in doubtful cases, but my basic methodology and hopefully the one you find in practice in the GU corpora is articulated here, and I'll repeat it for convenience - the priorities of importance we have noticed in prior corpora and what we enforce is, in order:

  1. Negation, a.k.a. "no such verb" - if the item has negation that would preclude a verbal lemma, it must be an adjective (uncredited -> JJ). This outranks even a by-agent; exception: if the "un-" lemma exists as a verb (e.g. Santorini's example "untied" -> "untie"). Similar logic applies to the 'no VBG for no such verb' rule, e.g. "outgoing" cannot be VBG, because "outgo" is not a verb.
  2. By-agent - if it has an overt by agent, then it is VBN. Santorini originally ranked this below intensifiers, but the corpora seem to behave the opposite way so I have followed them and I think we've been pretty consistent about it. It's also in the spirit of prioritizing argument structure where possible.
  3. If a relative clause paraphrase is possible, prefer VBG/VBN, but if that changes the meaning, prefer JJ: "appetizing dish" ->*dish which appetizes -> JJ; existing safeguards ->safeguards that exist -> VBG
  4. If "get" is possible but "become" is bad, prefer VBN - Santorini's "I was/got/*became married"
  5. Prefer JJ for state if it has a different reading from the event (e.g. "I was mistaken/JJ" != "someone mistook me")

In OntoNotes, it seems "impressed" is about 50-50 when not used in a perfect construction/as a finite verb, and tagged VBN whenever "by" appears (criterion 2), otherwise JJ. Criterion 4 would justify this behavior IMO, but we have 3. ranked higher, which is why it is always VBN in GUM in this function. Personally I would opt for VBN here, it seems pretty transparent to me.

@nschneid
Copy link
Contributor

nschneid commented Dec 8, 2023

ON has both tags for lowercased "united", though JJ is the majority tag. For the criteria in the link, the relative paraphrase decides it for VBN ("united people" can be "people who are united by ...")

Originally posted by @amir-zeldes in UniversalDependencies/UD_English-GUM#78 (comment)

If test 3 permits adding a copula, then "united states" => "states that are united" clearly passes, but so would canonical stative adjectives ("tall people" => "people who are tall"). So I'm not sure how that would favor the verb analysis.

If it doesn't permit adding a copula, then "states that united" is not quite the same meaning, though it is related by the inchoative alternation.

@amir-zeldes
Copy link
Contributor

but so would canonical stative adjectives

Yes, this is true, but it's more consistent with VBG (uniting factors -> factors which unite), which can always be construed that way, and I don't see why active participles get to stay verbs in this use when passives don't. For me the question is mainly one of the lexeme (lemma), and note that not all participle-like forms pass this test, for example "missing documents" are not "documents which miss".

@nschneid
Copy link
Contributor

nschneid commented Dec 8, 2023

Well there's a semantic difference—"uniting" is dynamic, "united" is stative. Paraphrasing as a finite verb would usually favor the dynamic reading except with verbs that are inherently stative like "exist".

@rueter
Copy link

rueter commented Dec 8, 2023

Hi, @nschneid !
Couldn't this also be understood as partially aspect.
I'm thinking of the -ing participle vs the -ed participle. So, the -ing participle is imperfect and -ed perfect.
How does the word "unite" fit into this, you ask? Well, maybe, "united" is simply the perfect aspect, such that ongoing is dynamic when completed, the result, is stative.
I would see both -ing and -ed participles as possibly verb-based.

@amir-zeldes
Copy link
Contributor

Well there's a semantic difference—"uniting" is dynamic, "united" is stative. Paraphrasing as a finite verb would usually favor the dynamic reading except with verbs that are inherently stative like "exist".

Right, but even for "existing" if I search for DT followed by it, I get 36:3 VBG... So it seems at least for active participles, PTB recognizes the deverbal nature regardless of lexical stativeness

@nschneid
Copy link
Contributor

nschneid commented Dec 8, 2023

I'm willing to change EWT's treatment of "united" to VERB/VBG per GUM policies. But there are likely other lexical items in EWT that are not consistent with GUM.

@amir-zeldes
Copy link
Contributor

OK thanks - I wouldn't be surprised if there are also internal inconsistencies within GUM and EWT, it's probably a longer term to do but worth looking at at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants