Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/UPOS combinations that should be ruled out across English treebanks #549

Open
4 of 5 tasks
nschneid opened this issue Sep 22, 2024 · 11 comments
Open
4 of 5 tasks
Labels

Comments

@nschneid
Copy link
Contributor

nschneid commented Sep 22, 2024

We should tighten up the allowed combinations in the validator settings. Several that stand out as problematic but will require data fixes:

(Manually fixed some Definite, Tense, VerbForm issues in LinES and PUD)

@amir-zeldes
Copy link
Contributor

Fixed the GUM PronType and Degree issues upstream, thanks

nschneid added a commit to UniversalDependencies/UD_English-LinES that referenced this issue Sep 24, 2024
nschneid added a commit to UniversalDependencies/UD_English-PUD that referenced this issue Sep 24, 2024
@nschneid
Copy link
Contributor Author

In the validator I've disabled PronType for NUM/SCONJ, and limited Case=Nom to PRON. I think all the corpora are up to date with these changes (GUM changes are upstream).

@LarsAhrenberg
Copy link

The validator for English does not allow the negative twopart conjunction neither - nor to be annotated as Negative, neither by PronType nor by Polarity as these features are disallowed for upos CCONJ. What is the reasoning behind that decision?

@nschneid
Copy link
Contributor Author

It's a good question—I don't know if it has been discussed. https://universaldependencies.org/en/feat/Polarity.html doesn't mention CCONJ one way or the other, and I don't see it on the universal documentation pages either. A CCONJ is a function word, but is it "pronoun-like"?

@nschneid
Copy link
Contributor Author

Opened an issue for wider discussion: UniversalDependencies/docs#1056

nschneid added a commit to UniversalDependencies/UD_English-LinES that referenced this issue Sep 25, 2024
nschneid added a commit to UniversalDependencies/UD_English-PUD that referenced this issue Sep 25, 2024
@nschneid
Copy link
Contributor Author

Degree is now limited to ADJ and ADV.

@nschneid
Copy link
Contributor Author

AFAICT the only remaining issues now are Gender and Number in ParTUT.

@nschneid
Copy link
Contributor Author

nschneid commented Oct 1, 2024

Validator configuration now updated for all these features. There is a pending decision about negative CCONJ (UniversalDependencies/docs#1056). Beyond that, LinES still has some validation issues. Can't tell for sure about GUM* because changes are upstream.

@amir-zeldes
Copy link
Contributor

OK, I pushed a preview version to UD dev (just GUM, no reddit/GENTLE) so you can take a look, I think everything we discussed is implemented.

@nschneid
Copy link
Contributor Author

nschneid commented Oct 8, 2024

@amir-zeldes
Copy link
Contributor

no reddit/GENTLE

^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants