archives-handwriting-text-extraction

The objective of this project is to create versatile text extraction and cleaning tools available through local application or by Amazon Textract. This flexibility allows the tools to align with a specific repository or project requirements, as well as facilitate local file processing and customization.

Both local and AWS codes extract text from handwritten documents, performs text cleaning operations and saves the extracted and cleaned text to the existing metadata templates used by the repository.

Extracting text from handwritten documents and exporting it to metadata worksheets can significantly enhance the efficiency of processing archival collections. Here's how:

1. Time Efficiency:

Automated text extraction eliminates the need for manual transcription, saving a significant amount of time.

2. Bulk Processing:

Automation enables bulk processing, allowing the extraction of text from multiple documents simultaneously.

3. Efficient Review:

Archivists can quickly scan the extracted text for keywords, names, or dates to determine the document's significance without reading every page.

4. Cross-Collection Analysis:

Extracted text can be used for cross-collection analysis.
Researchers can analyze trends, topics, and themes across different collections, leading to deeper insights.

By integrating text extraction and metadata creation, archival processing becomes more streamlined, accessible, and conducive to meaningful research. Automation empowers archivists to manage and leverage archival content more effectively, ultimately enhancing the value and impact of the collection.

student contributors (graduate and undergraduate)

See acknowledgements for more information

communication

email: japryse@ou.edu or cacarchives@ou.edu
homepage: carl albert center archives
twitter: @CarlAlbertCtr
finding aid: https://arc.ou.edu/

license

See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
scripts		scripts
CAC.png		CAC.png
CAC_CC_001_7_121_3_0034_Page_1.png		CAC_CC_001_7_121_3_0034_Page_1.png
CAC_CC_001_7_121_3_0066_Page_1.png		CAC_CC_001_7_121_3_0066_Page_1.png
CAC_CC_001_7_121_3_0130_Page_1.png		CAC_CC_001_7_121_3_0130_Page_1.png
CITATION.cff		CITATION.cff
README.md		README.md
a-t.jpg		a-t.jpg
acknowledgements.md		acknowledgements.md
multi-hand-results-v1.1.png		multi-hand-results-v1.1.png
multi-hand-results-v1.png		multi-hand-results-v1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

archives-handwriting-text-extraction

student contributors (graduate and undergraduate)

communication

license

About

Releases

Packages

Languages

prys0000/archives-handwriting-text-extract-project

Folders and files

Latest commit

History

Repository files navigation

archives-handwriting-text-extraction

student contributors (graduate and undergraduate)

communication

license

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages