-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] Add additional knowledge bases #3
Comments
I like the idea of the project. Can anyone outside the organization start contributing by providing the knowledge sources? If yes, do you expect ChunkIndex (the output of build_index())? |
It was originally earmarked for the GSoC: https://julialang.org/jsoc/gsoc/juliagenai/ But I don't think anything is stopping us from laying the foundations. Yes, anyone can contribute but I think especially in the beginning the biggest help would be building a powerful data harvesting & processing pipeline. The reason is that there are more choices to be made (how to chunk, chunk size, dimensions, etc), which heavily influence the performance but it will take a while to tune them. So we need to be able to refresh and reprocess our data sources easily. To your specific question, we need to time the impact on loading times but it would be much more robust to save the knowledge via JLD2 and build the index upon downloading (it would allow us to support older Julia versions and also be resilient to any future Chunk index changes). If you would like to chat about it, we can have a call. You can find me on Julia Slack under the handle svilup? EDIT: I forgot to say - I'm in the process of slightly rewriting the RAG tooling in PromptingTools to be more hackable/modular, which should also help us here. |
@splendidbug I've just updated the first post with more details around associated websites + some more details for the data pipeline. Feel free to convert them into separate new issues if you start playing with it. |
Hiya tuning in, because i would love to help expand the knowledge base and fine-tune the RAG pipelines. |
Add the following knowledge sources:
Knowledge should contain both the documentation and the code snippets.
To be added as separate artifacts, clearly label the embedding model (and associated parameters like dimensions).
In addition, we need to build:
This functionality should be well-documented and user-friendly, so that anyone can index their own favourite package (and, ideally, share it with others in the Julia community).
All this tooling should live in the AIHelpMe as a separate module (initially) with its own separate dependencies (eg, Gumbo, etc).
The text was updated successfully, but these errors were encountered: