Is handling of singular / plural forms ('sentence' and 'sentences') correct / consistent? #242
-
In https://derwen.ai/docs/ptr/sample/ in the "Scrubber" section it says
To me this implies that "sentence" and "sentences" should be "grouped" (lemmatized), but in my experiments and in the output shown, the singular and plural forms are listed as separate.
Is this correct or wrong behavior? If it is correct, maybe just the tutorial needs to make this clear? With the bugfix I propose in #232 and the token list I used for scrubbing I get the results
but now assume that I would actually be getting only one line, for both "sentence" and "sentences", am I wrong? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Hi @0dB We can change the desired behaviour by changing the example code from
to
This will group all occurrences of sentence and sentences together. Please feel free to make this change in the example notebook in your existing PR #233 . |
Beta Was this translation helpful? Give feedback.
-
Thanks, let me try that out and see what effect that has in total and then I would also update the sample output, too. I can do this sometime soon. Update: I think I am more pleased with the results, I am getting better summaries this way, since singular and plural forms of words now are "equal" to the algorithm and together have more weight instead of carrying separate but then not so strong weights. I will test some more and then propose a few updates to the sample page. |
Beta Was this translation helpful? Give feedback.
-
Many thanks @0dB and @Ankush-Chander ! It would help to have @0dB, the changes in your PR #233 look good - We're having issues with our CI pipeline (see #235) and as soon as I get that cleared (hopefully tonight) I'll accept/merge the PR. I also noticed the typo |
Beta Was this translation helpful? Give feedback.
Hi @0dB
Thanks bringing this to our attention.
The occurrences of
sentences
being grouped together is working as per the scrubber code.Since scrubber function returns the
span.text
in the example code, sentences are grouped as one, whilesentence
are being grouped together.We can change the desired behaviour by changing the example code from
to
This will group all occurrences of sentence and sentences together.
Please feel free to make this change in the example notebook in your existing PR #233 .