Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed comments from XML #2763

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Removed comments from XML #2763

wants to merge 1 commit into from

Conversation

mjpost
Copy link
Member

@mjpost mjpost commented Sep 1, 2023

I removed comments from the XML and the Python code that permitted skipping them when parsing. In its place, I added a "note" attribute to <volume-id> to help sort workshops. Would this work or does it also complicate things?

@github-actions
Copy link

github-actions bot commented Sep 1, 2023

Build successful. Some useful links:

This preview will be removed when the branch is merged.

@mbollmann
Copy link
Member

These notes are intended to work more like groupings, right? Maybe we could do

<colocated group="unsorted">
  ...
</colocated>
<colocated group="acl-2023">
  ...
</colocated>

Alternatively, do you foresee this being used for anything other than ws volumes? Maybe this is also a good opportunity to get rid of ws altogether and refactor the way we classify something as a "workshop".

@mjpost
Copy link
Member Author

mjpost commented Sep 2, 2023

This is very clearly a better way to do this, thanks!

It is only for workshops. I've had requests from senior people to maintain the workshop listing, since it's useful for people to browse. It's a bit of a pain to update this list, and to keep it sorted, which is why I've been pushing for this grouping idea. Definitely open to re-factoring; did you have something in mind?

@mbollmann
Copy link
Member

Firstly, rather than creating a hypothetical event called "ws-<year>", shouldn't it be that the workshop volume gets <venue>ws</venue> in its <meta> block, in addition to its other venues? That's how it's been done in the past, and I think that is clearer than going the event route.

Secondly, if that's how it's represented, it could be refactored in a number of ways, e.g. (i) replacing <venue>ws</venue> with a <is-workshop/> tag, (ii)) adding an attribute workshop="true" to the <meta> or the <volume> tag, or (iii) changing <volume type="proceedings"> to <volume type="workshop">.

In any case, adapting the build so that everything marked as a workshop is compiled on its own page (which being part of the "ws" venue does now) should be a simple change.

@mbollmann
Copy link
Member

In #1117, we discussed adding a "workshop" flag to venues, but I think it's clearer to attach it to volumes, as this both mirrors how it currently works and avoids the issue of workshop venues turning into full conferences at some point.

(On this note, there are a couple of volumes in the current "ws-2023" list that do say "conference" in the proceedings title — is this intentional?)

@mjpost
Copy link
Member Author

mjpost commented Sep 3, 2023

The distinction between workshop/conference can be fuzzy. I think thought that *SEM, IWSLT, etc should not be listed under the workshops event. I suspect they were just blindly copied over along with all other colocated events.

You're right in pointing out that we currently have redundant ways to add to the workshop "event". I'm not sure why I didn't see that and instead created the {yyyy}.ws.xml files. For the refactoring, I'm in favor of a static boolean tag, e.g., <is-workshop/> in the <meta> block (or maybe a variant like <add-to-workshops/> that more directly conveys the purpose). This might also be a step in the direction of no longer abusing the "event" idea to display amalgamated workshops, though I suspect we'll still have to create that HTML page or make it discoverable somehow. The downside to this is that there is no longer a single place to view all the workshops in a given year, prior to building the site out. Though I guess one could accomplish it by grepping through all files with such a tag in the current year.

So if we go through with this, I guess the proposal is to eliminate all files of the format {year}.ws.xml, moving that information to a tag?

@mbollmann
Copy link
Member

The distinction between workshop/conference can be fuzzy. I think thought that *SEM, IWSLT, etc should not be listed under the workshops event. I suspect they were just blindly copied over along with all other colocated events.

Noticing these types of issues might be easier when workshops are flagged within the volume itself, I think.

This might also be a step in the direction of no longer abusing the "event" idea to display amalgamated workshops, though I suspect we'll still have to create that HTML page or make it discoverable somehow.

We can start by treating "workshop tags" the same as before during the build, i.e., creating the virtual "ws" venue and attaching these volumes to it. That should mean that on the front-end, everything stays the same. The question of how to do this better is then probably related to a redesign of the front page.

The downside to this is that there is no longer a single place to view all the workshops in a given year, prior to building the site out. Though I guess one could accomplish it by grepping through all files with such a tag in the current year.

XPath expressions work:

~/r/acl-anthology/data/xml $ cat ?19.xml | xq -x '//meta[venue="ws"]/url'                                                                                                                                                                                                        [09:33:53] 
D19-51
D19-54
D19-58
D19-59
D19-60
D19-61
D19-63
D19-64
D19-66
W19-03
W19-11
W19-56
W19-68
W19-70
W19-71
W19-72
W19-73
W19-85

And we could add functionality to the new library that'll make this easy too.

So if we go through with this, I guess the proposal is to eliminate all files of the format {year}.ws.xml, moving that information to a tag?

Yes, and also:

  • Convert all instances of <venue>ws</venue> to <is-workshop/>
  • Add <is-workshop/> to all volumes where the "ws" venue is currently inferred implicitly (all W*.xml files, I believe)
  • Find all volumes with contains(booktitle, "Workshop") that did not already get the <is-workshop/> tag and determine if they should have it — because I suspect it was not always added in recent years

For the build, I'd just turn <is-workshop/> into giving the volume the "ws" venue in the respective Python class; in the new library we can handle this a little bit saner then.

@mjpost mjpost changed the title Removed comments Removed comments from XML Sep 13, 2023
@mjpost mjpost mentioned this pull request Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants