-
Notifications
You must be signed in to change notification settings - Fork 56
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #84 from gomate-community/pipeline
Pipeline@Mardown Parse
- Loading branch information
Showing
22 changed files
with
811 additions
and
190 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -48,3 +48,4 @@ class AppConfig: | |
DEBUGGER: bool = True | ||
|
||
SHOW_DOCS: bool = True | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Example usage: parse common files | ||
|
||
from trustrag.modules.document.common_parser import CommonParser | ||
from trustrag.modules.document.chunk import TextChunker | ||
if __name__ == '__main__': | ||
cp=CommonParser() | ||
tc=TextChunker() | ||
|
||
doc_paths=[ | ||
"../../data/docs/基础知识.md", | ||
"../../data/docs/5G垂直行业基础知识介绍--口袋小册子.pdf" | ||
"../../data/docs/5G专网需求提问方式-广东.xlsx" | ||
] | ||
for doc_path in doc_paths: | ||
# contents=cp.parse("../../data/docs/基础知识.md") | ||
# paragraphs=cp.parse("../../data/docs/5G垂直行业基础知识介绍--口袋小册子.pdf") | ||
paragraphs=cp.parse("../../data/docs/5G专网需求提问方式-广东.xlsx") | ||
chunks=tc.chunk_sentences(paragraphs,chunk_size=256) | ||
# print(chunks) | ||
print(len(chunks)) | ||
|
||
for chunk in chunks: | ||
print(chunk) | ||
print("+++"*100) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.