Language Not Supported #1783

yusufsyaifudin · 2024-12-22T07:55:16Z

[x] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question
I want to use RAGAS as my RAG evaluation framework, but I cannot find supported language other than RAGAS_SUPPORTED_LANGUAGE_CODES in this line https://github.com/explodinggradients/ragas/blob/v0.2.8/src/ragas/metrics/base.py#L707

Which after tracing the code, it come from here:

The pySBD last commit is 3 years ago, which I also have question why prefer use that library?

My ultimate question is: How to add language support which don't supported by pySBD (and not supported by RAGAS)?
I see that the list is too limited, not a single language from Southeast Asia is supported.

Additional context
If I can extend the language support which don't "natively" supported by RAGAS, where I can find the example to create an Adapter Language?

Thank you!

The text was updated successfully, but these errors were encountered:

jjmachan · 2025-01-07T10:13:20Z

hey @yusufsyaifudin thanks for sharing this - which language are you are you planning to use? other that pySBD which other tools do you work with that have support which you mentioned?

@shahules786 should be able to provide you better information too

yusufsyaifudin · 2025-01-07T11:19:32Z

Thanks @jjmachan for your reply

which language are you are you planning to use?

I am work with Bahasa Indonesia and have tried to run the RAGAs with default settings (I assume it English) in three proprietary model: claude-3-haiku-20240307, claude-3-5-haiku-20241022 and claude-3-5-sonnet-20241022.

The claude-3-haiku-20240307 always return the faithfulness score to 1.0 (I only test with two data) which I can confirm that it should be near to 0. The other two models return 0.0, at this point I starting to think that maybe it just because the Haiku old version is "bad" at reasoning.

But, I think by using the same language in the prompt for testing (Bahasa Indonesia in my case), probably it would be have better reasoning.

other that pySBD which other tools do you work with that have support which you mentioned?

Actually I don't know any alternative, maybe we still in the state that none package supports all language for sentence boundary extractor.

But, imho, if we can create some "abstraction" regarding the sentence segmentation and prompt, we can achieve multi-language support easily? Probably using nltk, or other package.

For example, in my project I use https://github.com/yusufsyaifudin/id-sentence-segmenter which forked version from the https://yudanta.github.io/posts/indonesian-simple-sentence-segmentation/ (which is part of his theses work https://etd.repository.ugm.ac.id/penelitian/detail/103174).

If I want to extend or create abstraction for this, which file and line of code as the the starting point that I can read? Maybe @shahules786 can help me to point this out.

🙇

yusufsyaifudin added the question Further information is requested label Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language Not Supported #1783

Language Not Supported #1783

yusufsyaifudin commented Dec 22, 2024

jjmachan commented Jan 7, 2025

yusufsyaifudin commented Jan 7, 2025

Language Not Supported #1783

Language Not Supported #1783

Comments

yusufsyaifudin commented Dec 22, 2024

jjmachan commented Jan 7, 2025

yusufsyaifudin commented Jan 7, 2025