Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent annotations for LS entries #74

Open
rhdunn opened this issue Oct 28, 2023 · 4 comments
Open

Inconsistent annotations for LS entries #74

rhdunn opened this issue Oct 28, 2023 · 4 comments

Comments

@rhdunn
Copy link

rhdunn commented Oct 28, 2023

  1. List items are annotated with the X UPOS. -- EWT favours NUM for these and https://universaldependencies.org/u/pos/X.html states that it should be used restrictively.
  2. These should have the NumType=Ord feature (as they specify an ordered list of items).
  3. The 1. etc variants should have the NumForm=Digit feature.
  4. The a) etc variants should have a NumForm feature, but no suitable form currently exists for these; maybe NumForm=Alpha (alphabetic -- "Examples: a, b, c, α, β, γ").
  5. EWT tokenizes the ( and ) as separate tokens.

Validation issues:

ERROR: Sentence GUM_textbook_governments-1 token 1 -- invalid X form '1.1'
ERROR: Sentence GUM_academic_eegimaa-1 token 1 -- invalid X form '2.'
ERROR: Sentence GUM_textbook_chemistry-1 token 1 -- invalid X form '2.1'
ERROR: Sentence GUM_textbook_chemistry-13 token 1 -- invalid X form '1.'
ERROR: Sentence GUM_textbook_chemistry-15 token 1 -- invalid X form '2.'
ERROR: Sentence GUM_textbook_chemistry-20 token 1 -- invalid X form '3.'
ERROR: Sentence GUM_textbook_chemistry-21 token 1 -- invalid X form '4.'
ERROR: Sentence GUM_textbook_chemistry-26 token 1 -- invalid X form '5.'
ERROR: Sentence GUM_academic_census-1 token 1 -- invalid X form '1'
ERROR: Sentence GUM_academic_economics-1 token 1 -- invalid X form '2.'
ERROR: Sentence GUM_academic_economics-2 token 1 -- invalid X form '2.1.'
ERROR: Sentence GUM_academic_economics-35 token 1 -- invalid X form '2.2.'
ERROR: Sentence GUM_academic_epistemic-23 token 30 -- invalid X form '8'
ERROR: Sentence GUM_academic_implicature-1 token 1 -- invalid X form '4.'
ERROR: Sentence GUM_academic_implicature-7 token 1 -- invalid X form '4.1.'
ERROR: Sentence GUM_academic_implicature-30 token 1 -- invalid X form '5.'
ERROR: Sentence GUM_academic_lighting-13 token 1 -- invalid X form '1.'
ERROR: Sentence GUM_academic_mutation-8 token 1 -- invalid X form '1.'
ERROR: Sentence GUM_academic_mutation-17 token 1 -- invalid X form '2.'
ERROR: Sentence GUM_academic_mutation-45 token 1 -- invalid X form '3.'
ERROR: Sentence GUM_academic_replication-12 token 14 -- invalid X form '(a)'
ERROR: Sentence GUM_academic_replication-12 token 25 -- invalid X form '(b)'
ERROR: Sentence GUM_academic_replication-12 token 36 -- invalid X form '(c)'
ERROR: Sentence GUM_academic_replication-20 token 11 -- invalid X form '(a)'
ERROR: Sentence GUM_academic_replication-20 token 18 -- invalid X form '(b)'
ERROR: Sentence GUM_academic_replication-20 token 25 -- invalid X form '(c)'
ERROR: Sentence GUM_academic_replication-20 token 35 -- invalid X form '(d)'
ERROR: Sentence GUM_academic_salinity-1 token 1 -- invalid X form '1.'
ERROR: Sentence GUM_bio_nida-33 token 1 -- invalid X form '1.'
ERROR: Sentence GUM_bio_nida-34 token 1 -- invalid X form '2.'
ERROR: Sentence GUM_bio_nida-35 token 1 -- invalid X form '3.'
ERROR: Sentence GUM_interview_herrick-48 token 3 -- invalid X form '1)'
ERROR: Sentence GUM_interview_herrick-48 token 20 -- invalid X form '2)'
ERROR: Sentence GUM_news_defector-35 token 19 -- invalid X form 'a)'
ERROR: Sentence GUM_news_defector-35 token 35 -- invalid X form 'b)'
ERROR: Sentence GUM_textbook_artwork-9 token 1 -- invalid X form '38.'
ERROR: Sentence GUM_textbook_artwork-27 token 1 -- invalid X form '39.'
ERROR: Sentence GUM_textbook_artwork-29 token 1 -- invalid X form '40.'
ERROR: Sentence GUM_textbook_artwork-31 token 1 -- invalid X form '41.'
ERROR: Sentence GUM_textbook_grit-1 token 1 -- invalid X form '2.2'
ERROR: Sentence GUM_textbook_history-1 token 1 -- invalid X form '1'
ERROR: Sentence GUM_textbook_history-2 token 1 -- invalid X form '1.1'
ERROR: Sentence GUM_textbook_history-73 token 1 -- invalid X form '1.'
ERROR: Sentence GUM_textbook_history-78 token 1 -- invalid X form '2.'
ERROR: Sentence GUM_textbook_spacetime-1 token 1 -- invalid X form '24.2'
ERROR: Sentence GUM_textbook_stats-1 token 1 -- invalid X form '2.3'
ERROR: Sentence GUM_voyage_isfahan-25 token 1 -- invalid X form '1'
ERROR: Sentence GUM_voyage_isfahan-31 token 1 -- invalid X form '2'
ERROR: Sentence GUM_voyage_isfahan-39 token 1 -- invalid X form '3'
ERROR: Sentence GUM_voyage_isfahan-43 token 1 -- invalid X form '4'
ERROR: Sentence GUM_voyage_isfahan-48 token 1 -- invalid X form '5'
ERROR: Sentence GUM_voyage_isfahan-50 token 1 -- invalid X form '6'
ERROR: Sentence GUM_voyage_isfahan-58 token 1 -- invalid X form '7'
ERROR: Sentence GUM_voyage_isfahan-64 token 1 -- invalid X form '8'
ERROR: Sentence GUM_voyage_isfahan-67 token 1 -- invalid X form '9'
ERROR: Sentence GUM_whow_basil-7 token 1 -- invalid X form '1'
ERROR: Sentence GUM_whow_basil-16 token 1 -- invalid X form '2'
ERROR: Sentence GUM_whow_basil-20 token 1 -- invalid X form '3'
ERROR: Sentence GUM_whow_basil-24 token 1 -- invalid X form '4'
ERROR: Sentence GUM_whow_basil-30 token 1 -- invalid X form '5'
ERROR: Sentence GUM_whow_basil-35 token 1 -- invalid X form '1'
ERROR: Sentence GUM_whow_basil-44 token 1 -- invalid X form '2'
ERROR: Sentence GUM_whow_basil-47 token 1 -- invalid X form '3'
ERROR: Sentence GUM_whow_basil-52 token 1 -- invalid X form '4'
ERROR: Sentence GUM_whow_basil-58 token 1 -- invalid X form '1'
ERROR: Sentence GUM_whow_basil-66 token 1 -- invalid X form '2'
ERROR: Sentence GUM_whow_basil-68 token 1 -- invalid X form '3'
ERROR: Sentence GUM_whow_basil-72 token 1 -- invalid X form '4'
@amir-zeldes
Copy link
Contributor

I could see using Ord for the numerical ones, but until we sort out what we're doing about LS I will leave this open. I anticipate this will stay as-is for v2.13.

@AngledLuffa
Copy link

Ping regarding this ( and @nschneid) ... one of the more frequent errors caused by the CoreNLP constituency -> dependency converter is because it wants to make the dependency "num" but the UPOS "X". If we come up with a standard and apply it to the EWT & GUM treebanks, I can implement that in the converter pretty easily.

@AngledLuffa
Copy link

@nschneid
Copy link

Yeah, we need a standard. It's under discussion in the core group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants