Skip to content

Commit

Permalink
Added requirements.txt,
Browse files Browse the repository at this point in the history
Vastly improved installation guide for Poppler and Tesseract
Minor improvements for date result handling
  • Loading branch information
ptmrio committed Apr 17, 2023
1 parent 3bb511b commit 7767f89
Show file tree
Hide file tree
Showing 3 changed files with 51 additions and 19 deletions.
38 changes: 27 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,29 @@
# AIAutoRename
AIAutoRename
============

AIAutoRename is a Python script that automatically renames PDF files based on their content. It leverages the power of the [OpenAI GPT API](https://platform.openai.com/account/api-keys) to extract relevant information such as the document date, company name, and document type from the PDF's text. This tool is designed to simplify the organization and management of your PDF files by automating the renaming process.
AIAutoRename is a Python script that automatically renames PDF files based on their content. It leverages the power of the OpenAI GPT Chat API to extract relevant information, such as the document date, company name, and document type, from the PDF's text. This tool is designed to simplify the organization and management of your PDF files by automating the renaming process.

## Installation
Installation
------------

To use AIAutoRename, you'll need Python 3.6 or later. You can download it from the [official Python website](https://www.python.org/downloads/) or the Microsoft Store.

After installing Python, you can install the required packages by running the following command in your terminal:
1. Clone or download this repository and navigate to the root directory of the project in your terminal.

2. Install the required packages using the `requirements.txt` file:


```
pip install python-dotenv pdf2image pytesseract openai dateparser
pip install -r requirements.txt
```

Next, clone or download this repository and navigate to the root directory of the project in your terminal.
3. Install [Tesseract OCR](https://github.com/UB-Mannheim/tesseract/) for Windows by following the installation instructions on their GitHub page. During the installation process, ensure that the "Add tesseract to PATH" option is checked. This will automatically add Tesseract to your PATH environment variable.

4. Download and install [poppler for Windows](https://github.com/oschwartz10612/poppler-windows). After installation, add the `bin` folder of the installed poppler directory to your PATH environment variable. Here's a [guide](https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/) on how to add directories to the PATH variable on Windows 10.


## Configuration
Configuration
-------------

AIAutoRename uses environment variables to configure the OpenAI API key and the name of your company. Before running the script, you'll need to create a file named `.env` in the root directory of the project and add the following lines:

Expand All @@ -26,7 +35,8 @@ MY_COMPANY_NAME=<your-company-name>

Replace `<your-api-key>` with your OpenAI API key, which can be obtained from the [OpenAI website](https://platform.openai.com/docs/developer-quickstart/your-api-keys). Set `<your-company-name>` to your company's name. This information will help the OpenAI API to better understand the context and decide whether to use the sender or recipient of the PDF document.

## Usage
Usage
-----

### Renaming a single PDF file

Expand Down Expand Up @@ -54,8 +64,14 @@ Replace `path/to/folder` with the path to your folder (no trailing slash).

**Example:**

Suppose you have a folder named `invoices` on your desktop containing multiple PDF files. After running AIAutoRename on the folder, all PDF files within the folder and its subfolders will be renamed according to their content, such as document date, company name, and document type.
Suppose you have a folder named `invoices` on your desktop containing multiple PDF files. After running AIAutoRename on the folder, all PDF files within the folder and its subfolders will be renamed according to their content, such as document date, company name, and document type. For example, a file originally named `invoice123.pdf` might be renamed to `20220215 MegaCorp PO.pdf`, where `20220215` is the document date, `MegaCorp` is the company name, and `PO` is the document type (purchase order).

## Contributing
Contributing
------------

We welcome contributions from anyone! If you find a bug or have a feature request, please open an issue on our [GitHub repository](https://github.com/example/AIAutoRename). If you'd like to contribute code, please open a pull request with your changes. We appreciate your support in making AIAutoRename even better!
We welcome contributions from everyone! If you find a bug or have a feature request, please open an issue on our [GitHub repository](https://github.com/example/AIAutoRename). If you'd like to contribute code, please open a pull request with your changes. We appreciate your support in making AIAutoRename even better!

Support
-------

If you encounter any issues or need assistance using AIAutoRename, please don't hesitate to reach out by opening an issue on our [GitHub repository](https://github.com/example/AIAutoRename). We'll do our best to help you as soon as possible.
32 changes: 24 additions & 8 deletions autorename.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def get_openai_response(text):
print('---------------------------------')

print('PDF text (preview):')
print({text[:1000]})
print({text[:100]})
print('---------------------------------')

completion = openai.ChatCompletion.create(
Expand All @@ -64,6 +64,7 @@ def get_openai_response(text):
"Example incoming invoice: {\"company_name\": \"ACME\", \"document_date\": \"01.01.2021\", \"document_type\": \"ER\"} " +
"Example outgoing invoice: {\"company_name\": \"ACME\", \"document_date\": \"01.01.2021\", \"document_type\": \"AR\"} " +
"Example document: {\"company_name\": \"ACME\", \"document_date\": \"01.01.2021\", \"document_type\": \"Angebot\"}"
"If date is unavailable: {\"company_name\": \"ACME\", \"document_date\": \"00.00.0000\", \"document_type\": \"Angebot\"}"
},
{"role": "user", "content": f"Extract the \"company_name\", \"document_date\", \"document_type\" from this PDF document and return a JSON object:\n\n{text}"},
]
Expand All @@ -82,10 +83,6 @@ def get_openai_response(text):
document_date = json_response['document_date']
document_type = json_response['document_type']

document_date = dateparser.parse(document_date, settings={
'DATE_ORDER': 'DMY'
})

if (is_valid_filename(company_name) and is_valid_filename(document_type) and document_date):
break

Expand Down Expand Up @@ -134,15 +131,34 @@ def harmonize_company_name(company_name):

def parse_openai_response(response):
company_name = response.get('company_name', 'Unknown')
document_date = dateparser.parse(response.get(
'document_date', '00000000'), settings={'DATE_ORDER': 'DMY'})

document_date = response.get('document_date', '00000000')
if document_date is None or document_date.strip() == '' or document_date.strip().lower() == 'unbekannt':
document_date = "00000000"

parsed_document_date = dateparser.parse(str(document_date), settings={
'DATE_ORDER': 'DMY'
})

if parsed_document_date is None:
document_date = dateparser.parse('00000000', settings={
'DATE_ORDER': 'DMY'
})
else:
document_date = parsed_document_date

document_type = response.get('document_type', 'Unknown')

return company_name, document_date, document_type



def rename_invoice(pdf_path, company_name, document_date, document_type):
base_name = f'{document_date.strftime("%Y%m%d")} {company_name} {document_type}'
if document_date is not None:
base_name = f'{document_date.strftime("%Y%m%d")} {company_name} {document_type}'
else:
base_name = f'{company_name} {document_type}'

counter = 0
new_name = base_name + '.pdf'
new_path = os.path.join(os.path.dirname(pdf_path), new_name)
Expand Down
Binary file added requirements.txt
Binary file not shown.

0 comments on commit 7767f89

Please sign in to comment.