Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential mistake in Active Learning (Acquisition.py) #20

Open
ddnimara opened this issue May 9, 2022 · 6 comments
Open

Potential mistake in Active Learning (Acquisition.py) #20

ddnimara opened this issue May 9, 2022 · 6 comments

Comments

@ddnimara
Copy link

ddnimara commented May 9, 2022

I believe there is a bug in this line in acquisition.py (which is used to rank and fetch samples based on the confidence score of your model).

Let me explain:

  1. Line 41 generates batches on which the model will iterate upon. Within each batch, data is reshuffled based on its length, putting longer sequences (which have fewer padded tokens) at the top (as shown here).
  2. In order to match scores - > samples, we need to extract the sorting info to undo it. This is done a few lines down via sort_info = data['sort_info']. This returns a tuple which describes the reshuffling that happened in step above. For example, a tuple of the form (3,0,2,1) tells us that the first element in this batch was in fact the 4th one (in the original dataset), the second one was the first and so on.
  3. Finally, because we want the scores to be in the same order as before, the line I mentioned in the beginning, runs the following code command probscores.extend(list(norm_scores[np.array(sort_info)])). The goal of this is to reshuffle the probabilty/confidence scores back so that they respect the original ordering and not this new, length based ordering that is used within each batch.

The issue is that (if I am not missing something obvious), norm_scores[np.array(sort_info)] is not what we want. Let me explain it with the below example:

  1. You ask the model to rank you sentences=[["Hello", "World"], ["This","is","a","big","sentence"], ["Hello", "World", "."]].
  2. These get reshuffled (based on their size) like so ordered_sentences=[["This","is","a","big","sentence"], ["Hello", "World", "."], ["Hello", "World"]], giving us sort_info = (1,2,0).
  3. The model will then score them. Let's say it gives us norm_scores = [0.1, 0.2, 0.3], meaning it gives a score of 0.1 to ["This","is","a","big","sentence"], 0.2 to ["Hello", "World", "."] and 0.3 to ["Hello", "World"].
  4. Current solution list(norm_scores[np.array(sort_info)])) will reshuffle it to [0.2, 0.3,0.1]. In the original dataset, this means that we give a score of 0.2 to ["Hello", "World"], 0.3 to ["This","is","a","big","sentence"] and 0.1 to ["Hello", "World"], which is not the same as above

The root of this problem is that sort_info returns indices (via argsort) that lead to the sorted array. It does not return the indices required to unshuffle it. In essence, what we need is the inverse. One proposed solution for this, is to instead used inverse_sorting = [sort_list.index(i) for i in range(len(sort_list))], and then list(norm_scores[np.array(invere_sorting)]). In the above example, inverse_sorting = [2, 0, 1], which in turn gives a score of [0.3, 0.1, 0.2], which is what we want in the original dataset (0.3 for ["Hello", "World"], 0.1 for ["This","is","a","big","sentence"] and 0.2 for ["Hello", "World", "."].

I stumbled on this error by noticing that sentences that were exactly the same, would be given different confidence scores by the model (because of the mistake in undoing the reshuffle). Nevertheless, the example I gave above should suffice.

@ddnimara
Copy link
Author

ddnimara commented May 9, 2022

Here are some screenshots from code execution. I have 4 not labeled examples, namely the ones seen below
Screenshot 2022-05-09 at 16 15 52

Behind the scenes we see the following:
Screenshot 2022-05-09 at 16 26 52

Each line in the terminal shows the following:

  1. First sample before the shuffling is "Hello world" (as seen in the first picture of this post)
  2. After the reshuffling, the first one is "This is a long sentence"
  3. Sorted info is (1,2,0,3) (based on their lengths)
  4. We then get the normalized scores. Notice how the last two are the same score, because we have the exact same sentences ("Hello world")
  5. before norm is what we have before we attempt to undo the reshuffle
  6. after norm is what we have after we attempt to undo the reshuffle

Finally, notice how this is not what we want. For example, after the reshuffling, the first and last sample, which are both the same sentence "Hello world", now have different scores ("-1.04" and "-1.28")

Finally, you can see this in the UI as well:
Screenshot 2022-05-09 at 16 14 18

and
Screenshot 2022-05-09 at 16 15 11

@rodrigomeireles
Copy link

rodrigomeireles commented Dec 22, 2022

@danny911kr @ddelange @yuchenlin @mshen2 @frankxu2004 it's been seven months and this seems like an application breaking bug, do you still care about this project?

@ddelange
Copy link
Contributor

Hi @rodrigomeireles 👋 I think this project is in the hands of the community. My contributions in this repo for instance were just enough to get our project rolling. I think PRs are welcome and will be reviewed by the repo maintainers 👍

@ddelange
Copy link
Contributor

ah I have to amend. my link above is to do active learning with Doccano itself, as this fork had diverged too much if I remember correctly

@rodrigomeireles
Copy link

Hey @ddelange, I think I fixed the bug in this issue (and some others related to your dockerfile) but the application always breaks when I try to turn on suggestions. The terminal spams:

I:VENTILATOR:[__i:_ru:188]:new encode request req id: 65 size: 47 client: b'f9f8c62b-dece-4d7b-b742-6267e752bc60' I:VENTILATOR:[__i:_ru:203]:zmq.error.Again: resource not available temporally, please send again! I:SINK:[__i:_ru:324]:job register size: 47 job id: b'f9f8c62b-dece-4d7b-b742-6267e752bc60#65' I:SINK:[__i:_ru:331]:send error client b'f9f8c62b-dece-4d7b-b742-6267e752bc60' error

Now, I might be dreaming a little too high here but would it be possible for you to help me debug this? You seem very knowledgeable about the stack and... I'm not. There is a fork in my repo which, although fixes this issue, I didn't consider stable for the reason above and haven't submitted yet.

Sorry if this sounds a little too cheeky but I'm new to contributing to public repos and don't quite know the best way to contact or ask for help in these cases.

@ddelange
Copy link
Contributor

fwiw you'll probably end up saving man hours by going with a more actively maintained or even managed/proprietary alternative. we spent some money with the spacy team some time ago for instance. their suite seemed nice :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants