Skip to content
This repository has been archived by the owner on Sep 12, 2024. It is now read-only.

Commit

Permalink
minor update
Browse files Browse the repository at this point in the history
  • Loading branch information
SeeknnDestroy committed Oct 16, 2023
1 parent 8991fc2 commit 279624e
Show file tree
Hide file tree
Showing 3 changed files with 42 additions and 4 deletions.
39 changes: 39 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,39 @@ LLM Total Token Cost: $0.002317
"""
```

### Document Providers (Powerful Github and Local Solutions)

Unlock the potential of your content with AutoLLM's robust document providers. Seamlessly pull, process, and analyze documents from GitHub repositories or local directories.

#### GitHub Document Provider

Fetch up-to-date documents directly from your GitHub repositories—ideal for real-time data pipelines and collaborative projects.

```python
from autollm.utils.document_providers import github_document_provider

git_repo_url = "https://github.com/safevideo.git"
local_repo_path = Path("/safevideo/")
# Specify where to find the documents in the repo
relative_docs_path = Path("docs/")

# Fetch and process documents
documents = github_document_provider(git_repo_url, local_repo_path, relative_docs_path)
```

#### Local Document Provider

Process documents from local directories—ideal for offline data pipelines and local development.

```python
from autollm.utils.document_providers import local_document_provider

input_dir = "/local/documents/path"

# Read files as documents from local directory
documents = local_document_provider(input_dir=input_dir)
```

______________________________________________________________________

## FAQ
Expand Down Expand Up @@ -205,6 +238,12 @@ Our roadmap outlines upcoming features and integrations aimed at making QuickLLM

- [ ] Add unit tests for online vectorDB integrations

- [ ] **Additional Document Providers**:

- [ ] Amazon S3-based document provider
- [ ] FTP-based document provider
- [ ] Google Drive-based document provider

______________________________________________________________________

## Contributing
Expand Down
File renamed without changes.
7 changes: 3 additions & 4 deletions autollm/utils/hash_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@

from llama_index.schema import Document

from autollm.vectorstores.base import BaseVS

logger = logging.getLogger(__name__)


Expand All @@ -27,13 +25,14 @@ def get_md5(file_path: Path) -> str:
return hasher.hexdigest()


def check_for_changes(documents: Sequence[Document], vs: BaseVS) -> Tuple[Sequence[Document], List[str]]:
# TODO: add vs type
def check_for_changes(documents: Sequence[Document], vs) -> Tuple[Sequence[Document], List[str]]:
"""
Check for file changes based on their hashes.
Parameters:
documents (Sequence[Document]): List of documents to check for changes.
vs (BaseVS): The vector store to check for changes in.
vs: The vector store to check for changes in.
Returns:
changed_documents (Sequence[Document]): List of documents that have changed.
Expand Down

0 comments on commit 279624e

Please sign in to comment.