MINeD Hackathon 2024 - Project Documentation

Project Overview

Project Definition

Project Name: ChatWithAnyScientificDocument

Project Goals Achieved:

Implemented a Versatile Document Parser: Developed a Document Parser capable of handling multiple document types including PDF, TXT, HTML, DOCX, .TEX, and PPT.
Developed a Chat Interface: Created a user-friendly chat interface using HTML, CSS, JS, and Flask. This interface allows users to upload their document and search using their required query.
User Input Question Answering Functionality: Developed a functionality within the chat interface to enable users to input questions related to the parsed document content.

Installation and Setup

Installing Dependencies

To install the required dependencies, you can use the requirements.txt file provided. Run the following command in your terminal:

pip install -r requirements.txt

Downloading and Setting up the Model

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

model_name_or_path = "TheBloke/Llama-2-13B-chat-GGML"
model_basename = "llama-2-13b-chat.ggmlv3.q5_1.bin"

model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)
lcpp_llm = Llama(
  model_path=model_path,
  n_threads=2,
  n_batch=512,
  n_gpu_layers=32
)

Usage

Text Generation

prompt = "Summarize the 'Attention is all you need' research paper in 100 words."
prompt_template=f'''SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.
USER: {prompt}
ASSISTANT:
'''

response=lcpp_llm(prompt=prompt_template, max_tokens=256, temperature=0.5, top_p=0.95,
         repeat_penalty=1.2, top_k=150,
         echo=True)

print(response)
print(response["choices"][0]["text"][len(prompt_template):])

Question Answering

from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains.question_answering import load_qa_chain

callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
llm = LlamaCpp(
  model_path=model_path,
  max_tokens=256,
  n_gpu_layers=25,
  n_batch=256,
  callback_manager=callback_manager,
  n_ctx=1024,
  verbose=True,
)

query="What corpora are used for H-AES?"
docs=docsearch.similarity_search(query)
chain=load_qa_chain(llm, chain_type="stuff")
chain.run(input_documents=docs, question=query)

Team Members

Abhiraj Chaudhuri (@abhie7)
Jugal Gajjar
Sanjana Nathani

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data/files		data/files
static		static
templates		templates
LICENSE		LICENSE
Llama_2.ipynb		Llama_2.ipynb
README.md		README.md
app.py		app.py
llama-test(ggml-na).py		llama-test(ggml-na).py
llama-test.py		llama-test.py
llangchain-test.py		llangchain-test.py
main_parser.py		main_parser.py
parsers.py		parsers.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MINeD Hackathon 2024 - Project Documentation

Project Overview

Project Definition

Project Goals Achieved:

Installation and Setup

Installing Dependencies

Downloading and Setting up the Model

Usage

Text Generation

Question Answering

Team Members

About

Languages

License

abhie7/MINed-Hacakthon

Folders and files

Latest commit

History

Repository files navigation

MINeD Hackathon 2024 - Project Documentation

Project Overview

Project Definition

Project Goals Achieved:

Installation and Setup

Installing Dependencies

Downloading and Setting up the Model

Usage

Text Generation

Question Answering

Team Members

About

Topics

Resources

License

Stars

Watchers

Forks

Languages