-
Notifications
You must be signed in to change notification settings - Fork 0
Home
CognifyVault is a powerful knowledge management and search support tool designed to help users efficiently register, search, and extract information from their knowledge base. Utilizing advanced AI technologies such as OpenAI's API for text summarization and Weaviate for vector searches, CognifyVault provides a seamless experience for managing and retrieving knowledge.
We have expanded the system's capabilities by adding support for the following file formats:
-
Audio Files:
.mp3
.wav
.m4a
-
Video Files:
.mp4
.avi
.mov
.flv
.wmv
Key Features:
- Automatic Transcription: Uploaded video files are automatically converted to audio, and both audio and video files are transcribed to text using OpenAI's Whisper model.
- Enhanced Analysis: The transcribed text is processed and analyzed, enabling the system to generate summaries and perform vector searches based on the content of the audio and video files.
- Seamless Integration: These new capabilities are seamlessly integrated into the existing framework, allowing users to upload and analyze a broader range of media formats with the same ease as text and PDF files.
- Title Warning: When registering or editing a title, if the entered title matches an existing one, a red warning is displayed to alert the user of the duplication.
- Duplicate Title Check: Before saving a new title, the system checks for any existing titles with the same name and prompts the user to confirm if a duplicate is found.
- Duplicate File Check: When uploading a file, the system checks if the content matches an existing file and warns the user of the duplication before proceeding.
- Improved PDF Text Extraction: Enhanced the accuracy of text extraction from PDF files by removing unnecessary line breaks and spaces.
- Enhanced AI Prompts: Optimized the interaction with OpenAI API, leading to more accurate and relevant responses based on user queries.
- Optimized Reference Handling: Improved the consistency and accuracy of search results by preventing the referencing of duplicate files.
- Incorporation of Dates in Vector Search: The functionality to incorporate dates into vector search has been added, allowing the AI to reference materials filtered by date according to user instructions.
-
Knowledge Registration: Register knowledge by directly entering text or uploading files (supports
.txt
,.pdf
, and.md
formats). - Title Warning: When registering or editing a title, if the entered title matches an existing one, a red warning is displayed to alert the user of the duplication.
- Duplicate Title Check: Before saving a new title, the system checks for any existing titles with the same name and prompts the user to confirm if a duplicate is found.
- Knowledge Extraction: Ask questions and get responses based on the registered knowledge.
- Enhanced AI Prompts: Optimized the interaction with OpenAI API, leading to more accurate and relevant responses based on user queries.
- File Summarization: Automatically generate summaries for uploaded files using the OpenAI API.
- Duplicate File Check: When uploading a file, the system checks if the content matches an existing file and warns the user of the duplication before proceeding.
- Improved PDF Text Extraction: Enhanced the accuracy of text extraction from PDF files by removing unnecessary line breaks and spaces.
- Vector Search: Efficiently search through registered knowledge using vector search powered by Weaviate.
- Optimized Reference Handling: Improved the consistency and accuracy of search results by preventing the referencing of duplicate files.
This application offers an advanced report generation feature that creates detailed reports based on user requests. Unlike simple text generation, this feature intelligently analyzes the provided documents and articles to produce reports that align closely with the user's intent.
-
- Information Extraction Using Vector Search First, the application automatically extracts relevant information from the provided materials (such as articles or documents). This process utilizes vector search technology, which considers the semantic relationships between words and phrases, ensuring that the most relevant content is selected in response to the user's request.
-
- Understanding and Reflecting User Intent Next, the application interprets the user's request to understand their intent. This step goes beyond surface-level processing and delves into what the user is truly asking for, ensuring that the report is constructed in a way that accurately reflects the user's needs.
-
- Report Generation and Proofreading Based on the extracted information and the interpreted user intent, the application generates a report. The generated report is then further proofread to verify the accuracy of numbers, names, translation quality, and the appropriateness of the format. This process ensures that the final document is of high quality.
-
- Accuracy and Cost This approach involves multiple invocations of large language models (LLMs), which increases processing costs. However, the precision and quality of the resulting reports are significantly enhanced, meeting the user's expectations. While the cost is higher, the end result is a highly reliable document.
-
Clone the Repository Clone the CognifyVault repository to your local machine.
git clone https://github.com/yourusername/cognifyvault.git cd cognifyvault
-
Configure the OpenAI API Key Open the
Dockerfile
in a text editor and set your OpenAI API key in theENV OPENAI_API_KEY=
line.ENV OPENAI_API_KEY=your_openai_api_key_here
-
Build the Docker Image Build the Docker image using Docker Compose.
docker-compose build
-
Start the Application Start the application in detached mode.
docker-compose up -d
-
Access the Application Open your web browser and go to
http://127.0.0.1:5000
to access CognifyVault.
CognifyVault uses several environment variables to configure its behavior. These can be set in the Dockerfile
or through your environment management tool.
-
WEAVIATE_SERVER
: The URL for the Weaviate server. Default ishttp://weaviate:8080
. -
COGNIFY_VAULT_PORT
: The public port number for CognifyVault. Default is5000
. -
ARTICLE_NAMES
: The class name used in Weaviate for storing articles. Default isArticleV1_1,ArticleV1_2,ArticleV1_3
. -
OPENAI_API_KEY
: The API key for accessing OpenAI services. -
LLM_MODEL
: The model used for handling critical tasks. Default isgpt-4o-mini
. -
SUPPORT_LLM_MODEL
: The model used for support tasks. Default isgpt-4o-mini
. -
SPEECH_TO_TEXT_MODEL
: The model used for speech-to-text processing. Default iswhisper-1
. -
WEAVIATE_SEARCH_DISTANCE
: Determines the closeness of the match to the search keywords. Default is0.2
. -
WEAVIATE_SEARCH_LIMIT
: Limits the number of references returned in search results. Default is3
.
-
COGNIFY_VAULT_PORT
: The port on which the CognifyVault application will run. Defaults to5000
.
Once you've completed the installation and configuration steps, you can run the application using Docker.
-
Start the Application To start the application, navigate to your project directory and run:
docker-compose up -d
-
Stop the Application To stop the application, run:
docker-compose down
-
Accessing the Application Open a browser and navigate to
http://127.0.0.1:5000
. The CognifyVault interface should load, where you can start registering and querying knowledge. -
First Steps
-
Register Knowledge: Enter a title and content directly, or upload
.txt
or.pdf
files. When you upload a file, a summary will be displayed. If there are no issues with the summary, you can proceed to register it as is. This summary will be used for searching. - Extract Knowledge: When you ask a question, a search will be conducted based on the content of the question, and a response will be generated based on the registered knowledge.
-
Register Knowledge: Enter a title and content directly, or upload
CognifyVault allows users to register knowledge either by manually entering text or by uploading files in .txt
or .pdf
format.
- Navigate to the "Register your knowledge" section on the main page.
- Enter a title for your knowledge entry.
- Input the text content directly into the provided text area.
- Click "Register" to save your knowledge.
- Drag and drop the file into the designated area.
- The file name will appear once uploaded.
- If the file is in
.txt
or.pdf
format, the content will be automatically summarized and displayed in the text area. - Review the summary, make any necessary edits, and click "Register" to save. This information will be used during searches.
With CognifyVault, you can extract relevant information from your knowledge base by asking questions in natural language.
- Navigate to the "Extract knowledge" section.
- Type your question or request into the input box.
- Click "Request" to submit your query.
- CognifyVault uses Weaviate's vector search to search the registered knowledge and generate responses based on the most relevant information.
- "Overview of [topic]"
- "Summarize information related to [topic] from the meeting minutes"
- "Organize information about [topic]"
- Use specific keywords relevant to the content you've registered.
- Ensure that your queries are clear and concise to get the most accurate responses.
CognifyVault automatically generates summaries of uploaded files, optimizing them for effective knowledge search.
- Text Files (
.txt
,.md
) - PDF Files (
.pdf
) - Audio Files:
.mp3
.wav
.m4a
- Video Files:
.mp4
.avi
.mov
.flv
.wmv
The CognifyVault project is organized as follows:
- app.py: The main application file where the Flask app is defined and routes are set up.
- templates/: Contains the HTML templates rendered by Flask.
- static/: Holds static assets like CSS, JavaScript, and images.
- uploaded_files_Article/: The directory where uploaded files are stored.
- Dockerfile: The Dockerfile for containerizing the application.
- docker-compose.yml: Docker Compose configuration file for managing multi-container setups.
- Flask Routes: Define the endpoints for knowledge registration, extraction, and file management.
- Weaviate Integration: Handles the connection and interaction with the Weaviate server for vector searches.
- OpenAI Integration: Manages the API calls to OpenAI for text summarization and content generation.
CognifyVault integrates with two major APIs: OpenAI and Weaviate.
- Purpose: Used for generating text summaries and processing natural language requests.
- Endpoint Usage: The application sends text data to OpenAI's GPT models and receives summarized or generated content in response.
-
API Key Management: The API key must be set in the
Dockerfile
or as an environment variable (OPENAI_API_KEY
).
-
Purpose: Provides vector-based search capabilities for efficient retrieval of relevant knowledge.
-
Integration: The app interacts with Weaviate through its REST API to store and search for knowledge objects.
-
Configuration: Set the
WEAVIATE_SERVER
environment variable to connect to the appropriate Weaviate instance.