Skip to content
Katsumi Aradono edited this page Aug 23, 2024 · 8 revisions

CognifyVault Wiki

image

Introduction

CognifyVault is a powerful knowledge management and search support tool designed to help users efficiently register, search, and extract information from their knowledge base. Utilizing advanced AI technologies such as OpenAI's API for text summarization and Weaviate for vector searches, CognifyVault provides a seamless experience for managing and retrieving knowledge.

Newly Supported File Formats

We have expanded the system's capabilities by adding support for the following file formats:

  • Audio Files:

    • .mp3
    • .wav
    • .m4a
  • Video Files:

    • .mp4
    • .avi
    • .mov
    • .flv
    • .wmv

Key Features:

  • Automatic Transcription: Uploaded video files are automatically converted to audio, and both audio and video files are transcribed to text using OpenAI's Whisper model.
  • Enhanced Analysis: The transcribed text is processed and analyzed, enabling the system to generate summaries and perform vector searches based on the content of the audio and video files.
  • Seamless Integration: These new capabilities are seamlessly integrated into the existing framework, allowing users to upload and analyze a broader range of media formats with the same ease as text and PDF files.

Functionality Improvements

  • Title Warning: When registering or editing a title, if the entered title matches an existing one, a red warning is displayed to alert the user of the duplication.
  • Duplicate Title Check: Before saving a new title, the system checks for any existing titles with the same name and prompts the user to confirm if a duplicate is found.
  • Duplicate File Check: When uploading a file, the system checks if the content matches an existing file and warns the user of the duplication before proceeding.
  • Improved PDF Text Extraction: Enhanced the accuracy of text extraction from PDF files by removing unnecessary line breaks and spaces.
  • Enhanced AI Prompts: Optimized the interaction with OpenAI API, leading to more accurate and relevant responses based on user queries.
  • Optimized Reference Handling: Improved the consistency and accuracy of search results by preventing the referencing of duplicate files.
  • Incorporation of Dates in Vector Search: The functionality to incorporate dates into vector search has been added, allowing the AI to reference materials filtered by date according to user instructions.

Features

  • Knowledge Registration: Register knowledge by directly entering text or uploading files (supports .txt, .pdf, and .md formats).
  • Title Warning: When registering or editing a title, if the entered title matches an existing one, a red warning is displayed to alert the user of the duplication.
  • Duplicate Title Check: Before saving a new title, the system checks for any existing titles with the same name and prompts the user to confirm if a duplicate is found.
  • Knowledge Extraction: Ask questions and get responses based on the registered knowledge.
  • Enhanced AI Prompts: Optimized the interaction with OpenAI API, leading to more accurate and relevant responses based on user queries.
  • File Summarization: Automatically generate summaries for uploaded files using the OpenAI API.
  • Duplicate File Check: When uploading a file, the system checks if the content matches an existing file and warns the user of the duplication before proceeding.
  • Improved PDF Text Extraction: Enhanced the accuracy of text extraction from PDF files by removing unnecessary line breaks and spaces.
  • Vector Search: Efficiently search through registered knowledge using vector search powered by Weaviate.
  • Optimized Reference Handling: Improved the consistency and accuracy of search results by preventing the referencing of duplicate files.

Report Generation Feature

This application offers an advanced report generation feature that creates detailed reports based on user requests. Unlike simple text generation, this feature intelligently analyzes the provided documents and articles to produce reports that align closely with the user's intent.

    1. Information Extraction Using Vector Search First, the application automatically extracts relevant information from the provided materials (such as articles or documents). This process utilizes vector search technology, which considers the semantic relationships between words and phrases, ensuring that the most relevant content is selected in response to the user's request.
    1. Understanding and Reflecting User Intent Next, the application interprets the user's request to understand their intent. This step goes beyond surface-level processing and delves into what the user is truly asking for, ensuring that the report is constructed in a way that accurately reflects the user's needs.
    1. Report Generation and Proofreading Based on the extracted information and the interpreted user intent, the application generates a report. The generated report is then further proofread to verify the accuracy of numbers, names, translation quality, and the appropriateness of the format. This process ensures that the final document is of high quality.
    1. Accuracy and Cost This approach involves multiple invocations of large language models (LLMs), which increases processing costs. However, the precision and quality of the resulting reports are significantly enhanced, meeting the user's expectations. While the cost is higher, the end result is a highly reliable document.

Setup Steps

  1. Clone the Repository Clone the CognifyVault repository to your local machine.

    git clone https://github.com/yourusername/cognifyvault.git
    cd cognifyvault
  2. Configure the OpenAI API Key Open the Dockerfile in a text editor and set your OpenAI API key in the ENV OPENAI_API_KEY= line.

    ENV OPENAI_API_KEY=your_openai_api_key_here
  3. Build the Docker Image Build the Docker image using Docker Compose.

    docker-compose build
  4. Start the Application Start the application in detached mode.

    docker-compose up -d
  5. Access the Application Open your web browser and go to http://127.0.0.1:5000 to access CognifyVault.

Configuration

Environment Variables

CognifyVault uses several environment variables to configure its behavior. These can be set in the Dockerfile or through your environment management tool.

Environment Variables

  • WEAVIATE_SERVER: The URL for the Weaviate server. Default is http://weaviate:8080.
  • COGNIFY_VAULT_PORT: The public port number for CognifyVault. Default is 5000.
  • ARTICLE_NAMES: The class name used in Weaviate for storing articles. Default is ArticleV1_1,ArticleV1_2,ArticleV1_3.
  • OPENAI_API_KEY: The API key for accessing OpenAI services.
  • LLM_MODEL: The model used for handling critical tasks. Default is gpt-4o-mini.
  • SUPPORT_LLM_MODEL: The model used for support tasks. Default is gpt-4o-mini.
  • SPEECH_TO_TEXT_MODEL: The model used for speech-to-text processing. Default is whisper-1.
  • WEAVIATE_SEARCH_DISTANCE: Determines the closeness of the match to the search keywords. Default is 0.2.
  • WEAVIATE_SEARCH_LIMIT: Limits the number of references returned in search results. Default is 3.

Optional Variables

  • COGNIFY_VAULT_PORT: The port on which the CognifyVault application will run. Defaults to 5000.

Running the Application

Once you've completed the installation and configuration steps, you can run the application using Docker.

  1. Start the Application To start the application, navigate to your project directory and run:

    docker-compose up -d
  2. Stop the Application To stop the application, run:

    docker-compose down
  3. Accessing the Application Open a browser and navigate to http://127.0.0.1:5000. The CognifyVault interface should load, where you can start registering and querying knowledge.

  4. First Steps

    • Register Knowledge: Enter a title and content directly, or upload .txt or .pdf files. When you upload a file, a summary will be displayed. If there are no issues with the summary, you can proceed to register it as is. This summary will be used for searching.
    • Extract Knowledge: When you ask a question, a search will be conducted based on the content of the question, and a response will be generated based on the registered knowledge.

Features

Knowledge Registration

CognifyVault allows users to register knowledge either by manually entering text or by uploading files in .txt or .pdf format.

Manual Entry

  • Navigate to the "Register your knowledge" section on the main page.
  • Enter a title for your knowledge entry.
  • Input the text content directly into the provided text area.
  • Click "Register" to save your knowledge.

File Upload

  • Drag and drop the file into the designated area.
  • The file name will appear once uploaded.
  • If the file is in .txt or .pdf format, the content will be automatically summarized and displayed in the text area.
  • Review the summary, make any necessary edits, and click "Register" to save. This information will be used during searches.

Knowledge Extraction

With CognifyVault, you can extract relevant information from your knowledge base by asking questions in natural language.

How It Works

  • Navigate to the "Extract knowledge" section.
  • Type your question or request into the input box.
  • Click "Request" to submit your query.
  • CognifyVault uses Weaviate's vector search to search the registered knowledge and generate responses based on the most relevant information.

Example Queries

  • "Overview of [topic]"
  • "Summarize information related to [topic] from the meeting minutes"
  • "Organize information about [topic]"

Tips for Effective Searches

  • Use specific keywords relevant to the content you've registered.
  • Ensure that your queries are clear and concise to get the most accurate responses.

File Summarization

CognifyVault automatically generates summaries of uploaded files, optimizing them for effective knowledge search.

Supported File Formats

  • Text Files (.txt, .md)
  • PDF Files (.pdf)
  • Audio Files:
    • .mp3
    • .wav
    • .m4a
  • Video Files:
    • .mp4
    • .avi
    • .mov
    • .flv
    • .wmv

Development

Code Structure

The CognifyVault project is organized as follows:

  • app.py: The main application file where the Flask app is defined and routes are set up.
  • templates/: Contains the HTML templates rendered by Flask.
  • static/: Holds static assets like CSS, JavaScript, and images.
  • uploaded_files_Article/: The directory where uploaded files are stored.
  • Dockerfile: The Dockerfile for containerizing the application.
  • docker-compose.yml: Docker Compose configuration file for managing multi-container setups.

Key Components

  • Flask Routes: Define the endpoints for knowledge registration, extraction, and file management.
  • Weaviate Integration: Handles the connection and interaction with the Weaviate server for vector searches.
  • OpenAI Integration: Manages the API calls to OpenAI for text summarization and content generation.

Advanced Topics

API Integration

CognifyVault integrates with two major APIs: OpenAI and Weaviate.

OpenAI API

  • Purpose: Used for generating text summaries and processing natural language requests.
  • Endpoint Usage: The application sends text data to OpenAI's GPT models and receives summarized or generated content in response.
  • API Key Management: The API key must be set in the Dockerfile or as an environment variable (OPENAI_API_KEY).

Weaviate API

  • Purpose: Provides vector-based search capabilities for efficient retrieval of relevant knowledge.

  • Integration: The app interacts with Weaviate through its REST API to store and search for knowledge objects.

  • Configuration: Set the WEAVIATE_SERVER environment variable to connect to the appropriate Weaviate instance.