-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
meeteval ignores false alarms due to issue in md_eval_22 #97
Comments
Hmm. Good point. The idea was to have a thin wrapper for md_eval_22, as an alternative to call that script directly. So, when you call with an uem, you get the desired output:
Do you have an idea, how to change the command line to make it obvious, that it is different from calling the perl script directly? I think, we should at least add a warning, when someone calls md_eval_22 without an uem. |
I checked the dscore tool and if I get it right, they call md-eval-22.pl to get the DER and the only modification is, I added some code to calculate the DER in the same way as dscore in #98.
|
Yeah cool thanks for looking at this and implementing the dscore way of doing things @boeddeker. As you say I think users should still be warned when they are calling md_eval_22 without a uem, and I think this should happen at the api level. Becuase this is an issue more with the perl script, I wouldn't mind seeing a parameter such as I understand you wanted to keep it just as a wrapper, I would feel this is an important fix to keep though. |
The code now emits a warning, when md_eval_22 is called without an uem file.
I am not a fan of changing metric values of a reference implementation, even if there are good reasons. When users calls Additionally, |
@boeddeker I tried to use your updated code and noticed an issue. CLI works fine. But when you try and call dscore using the high level api it only runs the first time: In [1]: import meeteval
In [2]: ers = meeteval.der.dscore(
...: reference=meeteval.io.STM.parse('''
...: recordingA 1 Alice 0 1 The quick brown fox jumps over the lazy dog
...: recordingB 1 Bob 0 1 The quick brown fox jumps over the lazy dog
...: '''),
...: hypothesis=meeteval.io.STM.parse('''
...: recordingA 1 spk-1 0 1 The kwick brown fox jump over lazy
...: recordingB 1 spk-1 0 1 The kwick brown fox jump over lazy
...: '''),
...: collar=0,
...: )
In [3]: ers = meeteval.der.dscore(
...: reference=meeteval.io.STM.parse('''
...: recordingA 1 Alice 0 1 The quick brown fox jumps over the lazy dog
...: recordingB 1 Bob 0 1 The quick brown fox jumps over the lazy dog
...: '''),
...: hypothesis=meeteval.io.STM.parse('''
...: recordingA 1 spk-1 0 1 The kwick brown fox jump over lazy
...: recordingB 1 spk-1 0 1 The kwick brown fox jump over lazy
...: '''),
...: collar=0,
...: )
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[3], line 1
----> 1 ers = meeteval.der.dscore(
2 reference=meeteval.io.STM.parse('''
3 recordingA 1 Alice 0 1 The quick brown fox jumps over the lazy dog
4 recordingB 1 Bob 0 1 The quick brown fox jumps over the lazy dog
5 '''),
6 hypothesis=meeteval.io.STM.parse('''
7 recordingA 1 spk-1 0 1 The kwick brown fox jump over lazy
8 recordingB 1 spk-1 0 1 The kwick brown fox jump over lazy
9 '''),
10 collar=0,
11 )
TypeError: 'module' object is not callable I believe this is due to having In [4]: meeteval.der.dscore.__file__
Out[4]: '/PATH/TO/YOUR/PYTHON/lib/python3.11/site-packages/meeteval/der/dscore.py' I guess this is why you made the EDIT: |
Thanks for reporting this bug. I frequently forget, that a function name and its file name should differ.
We have no established pattern for such name collisions. I changed the file name to |
rename dscore to nryant_dscore to avoid name collision (#97)
This issue is similar / the same as here: nryant/dscore#9
There is an issue in the md_eval_22 pearl script when it is ran without a UEM file, the script tries to generate one from the reference RTTM file only. This can lead to evaluation not taking into consideration any false alarms that occured before the first segment of labeled speech in the reference RTTM.
For example if I have:
and
I would expect a DER of 33.3%. Becuase total scored speech is 15 seconds and there is 5 seconds of false alarm predicted by the system at the beginning of the recording.
In reality md_eval_22 outputs 0% DER because the UEM generated on-the-fly would be generated for the reference start and end of speech [5.00, 20.00], which ignores scoring the 5 seconds of false alarm speech the system predicted.
The dscore github repo fixed this by generating a UEM file based on the entire recording before passing a reference RTTM, system RTTM and UEM file to md_eval_22 for evaluation - meaning md_eval_22 does not need to generate a UEM on the fly.
meeteval-der md_eval_22
does not generate this UEM before md_eval_22 evaluation and so we get the incorrect behaviour of not scoring false alarms.In the same example, with
stm
style files which meeteval requires:If I run
meeteval-der md_eval_22 -r ref.stm -h sys.stm
I get the output:The text was updated successfully, but these errors were encountered: