Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix seifart 410 to 408 #1381

Merged
merged 11 commits into from
Apr 16, 2024
Merged

fix seifart 410 to 408 #1381

merged 11 commits into from
Apr 16, 2024

Conversation

FredericBlum
Copy link
Collaborator

Pull request checklist

  • add new concept list
  • add new metadata
  • add new Concepticon concept sets
    • checked whether the new concept(s) can be applied to existing lists with
      concepticon notlinked --gloss "NEW_GLOSS"
  • add new Concepticon concept relations
  • refine existing Concepticon concept set mappings
  • refine Concepticon glosses
  • refine Concepticon concept relations
  • refine Concepticon concept definitions
  • retire data

Additional information

This PR fixes #1380 , two duplicates that we had previously missed. Are there precedents for renaming existing conceptlists? Since there is only one dataset where this is used (seifartecheverriboran) and this is only by ourselves, it will probably be not be a problem for other users.

@AnnikaTjuka
Copy link
Collaborator

@FredericBlum Thanks for the update! Did you use the rename command to change the name of the concept list? If I remember correctly, this will insert the information of the changed name into the retired.json file. @chrzyki am I remembering this correctly?

@chrzyki
Copy link
Contributor

chrzyki commented Apr 8, 2024

Sorry, missed that PR. Yes, you'd do a

$ concepticon rename Seifart-2015-410 Seifart-2015-408

and then fix the list/items accordingly. The only other thing that rename does in addition to the things you've done manually here (fix IDs etc.) is that it populates retired.json with mappings from old to new IDs (in order to keep references stable, since that list has already been published). This includes mapping the old to the new concept list ID in retired.json:

@@ -30091,6 +32141,11 @@
       "id": "Yakhontov-1991-35",
       "comment": "renaming",
       "replacement": "Yakhontov-1956-35"
+    },
+    {
+      "id": "Seifart-2015-410",
+      "comment": "renaming",
+      "replacement": "Seifart-2015-408"
     }
   ]
 }

And individual mappings for all entries in a concept list:

diff --git a/concepticondata/retired.json b/concepticondata/retired.json
index 3b40dc8e..899d13ea 100644
--- a/concepticondata/retired.json
+++ b/concepticondata/retired.json
@@ -29949,6 +29949,2056 @@
       "id": "Yakhontov-1991-35-35",
       "comment": "renaming",
       "replacement": "Yakhontov-1956-35-35"
+    },
+    {
+      "id": "Seifart-2015-410-1",
+      "comment": "renaming",
+      "replacement": "Seifart-2015-408-1"
+    },
+    {
+      "id": "Seifart-2015-410-2",
+      "comment": "renaming",
+      "replacement": "Seifart-2015-408-2"
+    },
...

@FredericBlum
Copy link
Collaborator Author

Thanks for the tip, I will re-do the changes then

@FredericBlum
Copy link
Collaborator Author

I have updated the PR using the `renaming' command. Following strictly the described workflow, the list gets updated, but the individual entries do not. This means that the last entry has the following ID:

Seifart-2015-408-410 410 conjure 2222 CONJURE 562

I assume that this is working as intended, so that the ID of the entry does not get changed for backwards compatibility?

@chrzyki
Copy link
Contributor

chrzyki commented Apr 11, 2024

I have updated the PR using the `renaming' command. Following strictly the described workflow, the list gets updated, but the individual entries do not. This means that the last entry has the following ID:

Seifart-2015-408-410 410 conjure 2222 CONJURE 562

I assume that this is working as intended, so that the ID of the entry does not get changed for backwards compatibility?

Yes, here the manual intervention mentioned above is required: Since the number of items in the list was reduced, you need to manually fix those two cases (and make sure that the retired.json mapping still make sense, given the change in order and depending where in the list that has happened).

@FredericBlum
Copy link
Collaborator Author

I am not sure I can follow you. From what I've read above, I should run the changes in the following order:

  1. Run rename' - the retired.json' file gets populated
  2. Fix the ID's

But then the retired.json-file does not have the updated ID's, and I'd have to make 200+ manual changes to retired.json. Is there another option to re-populate the retired.json?

If I first change the ID's and remove the duplicates, the remaining ID's are correctly displayed, but there is no information about the two deleted lines. What happens if any project had one of the duplicates as ID?

Example: I deleted Seifart-2015-205

But since I had to do the changes BEFORE running `rename', the entry in retired.json refers to the NEW 205, not the old one. So any reference to the old Seifart-2015-205 is now wrongly referred to as the new Seifart-2015.

Also, there is another problem: Running rename seems to overwrite some previous changes that have been made to retired.json, namely referring to a conceptlist from Yakhontov - how can I avoid this?

@chrzyki
Copy link
Contributor

chrzyki commented Apr 11, 2024

I am not sure I can follow you. From what I've read above, I should run the changes in the following order:

1. Run `rename' - the `retired.json' file gets populated

2. Fix the ID's

Sorry, my brief explanation in the previous comment wasn't very helpful. Item reductions are always a bit more troublesome than item additions at the end of a list. So we begin with:

$ concepticon rename Seifart-2015-410 Seifart-2015-408

Then we fix the number of items in the ITEMS column in conceptlist.tsv (rename only modifies the ID).

After that, in the newly created ('renamed') Seifart-2015-408.tsv we remove rows Seifart-2015-410-26 and Seifart-2015-410-232 and use whatever tool you prefer for regenerating the ID and NUMBER column (LibreOffice drag'n'drop works well; delete all IDs except the first one and drag down to the last entry, do the same of NUMBER).

But then the retired.json-file does not have the updated ID's, and I'd have to make 200+ manual changes to retired.json. Is there another option to re-populate the retired.json?

There's different ways of doing that and I'm not aware of any easy fully automatic way of handling that. I usually use VisiData cell replacements or you can do something like (modify as you see fit):

import json

with open("retired.json") as f:
    data = json.load(f)

for e in data["Concept"]:
    if e["id"].startswith("Seifart"):
        id_n = int(e["id"].split("-")[-1])
    
        if id_n >= 27 and id_n < 232:
            e.update({"replacement": f"Seifart-2015-408-{id_n - 1}"})
        elif id_n >= 232:
            e.update({"replacement": f"Seifart-2015-408-{id_n - 2}"})
    
        print(e)

If I first change the ID's and remove the duplicates, the remaining ID's are correctly displayed, but there is no information about the two deleted lines. What happens if any project had one of the duplicates as ID?
Example: I deleted Seifart-2015-205

But since I had to do the changes BEFORE running `rename', the entry in retired.json refers to the NEW 205, not the old one. So any reference to the old Seifart-2015-205 is now wrongly referred to as the new Seifart-2015.

Afterwards, we manually map the two 'surplus' lines (i.e. the double mapping of BLOOD and FULL) to their remaining counterparts in retired.json (that is: in the output from the previous code):

{'id': 'Seifart-2015-410-26', 'comment': 'renaming', 'replacement': 'Seifart-2015-408-204'}
...
{'id': 'Seifart-2015-410-232', 'comment': 'renaming', 'replacement': 'Seifart-2015-408-344'}

Also, there is another problem: Running rename seems to overwrite some previous changes that have been made to retired.json, namely referring to a conceptlist from Yakhontov - how can I avoid this?

Hm, I can't reproduce this locally. For me, after running the initial rename I get the appropriate diff:

@@ -29949,6 +29949,2056 @@
       "id": "Yakhontov-1991-35-35",
       "comment": "renaming",
       "replacement": "Yakhontov-1956-35-35"
+    },
+    {
+      "id": "Seifart-2015-410-1",
+      "comment": "renaming",
+      "replacement": "Seifart-2015-408-1"
+    },
+    {
+      "id": "Seifart-2015-410-2",
+      "comment": "renaming",
+      "replacement": "Seifart-2015-408-2"
+    },

Have there been previous modifications from past calls to rename?

Hope this helps. Please let me know if it doesn't! There might also be other/better ways of doing that and I hope that someone will intervene should I be writing nonsense here. :)

@FredericBlum
Copy link
Collaborator Author

@chrzyki Thanks for the detailed lead-through. I have now fixed the list according to your guidelines, and it seems like everything worked. Can you give it a final check?

{
"id": "Seifart-2015-410-27",
"comment": "renaming",
"replacement": "Seifart-2015-408-26"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remapping is unfortunately wrong here (did you generate IDs before removing the doubles?): this maps the old ID for HEART to the new ID for BLOOD and propagates the wrong ID throughout the following entries.

"replacement": "Seifart-2015-408-230"
},
{
"id": "Seifart-2015-410-233",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise further down below.

@FredericBlum
Copy link
Collaborator Author

Ah, I see the error now... I have removed the second occurrence of each concept, but your script assumes that I remove the first. I will change the code and update retired.json accordingly

@FredericBlum
Copy link
Collaborator Author

@chrzyki The latest commit should contain the correct replacements

Copy link
Contributor

@chrzyki chrzyki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you very much for taking care of this!

@FredericBlum FredericBlum merged commit 07b860f into master Apr 16, 2024
1 check passed
@FredericBlum FredericBlum deleted the fix-seifart branch April 16, 2024 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicates in Seifart-2015-410
3 participants