Added requirements.txt,

Vastly improved installation guide for Poppler and Tesseract Minor improvements for date result handling
ptmrio · Apr 17, 2023 · 7767f89 · 7767f89
1 parent 3bb511b
commit 7767f89
Show file tree

Hide file tree

Showing 3 changed files with 51 additions and 19 deletions.
diff --git a/README.md b/README.md
@@ -1,20 +1,29 @@
-# AIAutoRename
+AIAutoRename
+============
 
-AIAutoRename is a Python script that automatically renames PDF files based on their content. It leverages the power of the [OpenAI GPT API](https://platform.openai.com/account/api-keys) to extract relevant information such as the document date, company name, and document type from the PDF's text. This tool is designed to simplify the organization and management of your PDF files by automating the renaming process.
+AIAutoRename is a Python script that automatically renames PDF files based on their content. It leverages the power of the OpenAI GPT Chat API to extract relevant information, such as the document date, company name, and document type, from the PDF's text. This tool is designed to simplify the organization and management of your PDF files by automating the renaming process.
 
-## Installation
+Installation
+------------
 
 To use AIAutoRename, you'll need Python 3.6 or later. You can download it from the [official Python website](https://www.python.org/downloads/) or the Microsoft Store.
 
-After installing Python, you can install the required packages by running the following command in your terminal:
+1.  Clone or download this repository and navigate to the root directory of the project in your terminal.
+
+2.  Install the required packages using the `requirements.txt` file:
+
 
 ```
-pip install python-dotenv pdf2image pytesseract openai dateparser
+pip install -r requirements.txt
 ```
 
-Next, clone or download this repository and navigate to the root directory of the project in your terminal.
+3.  Install [Tesseract OCR](https://github.com/UB-Mannheim/tesseract/) for Windows by following the installation instructions on their GitHub page. During the installation process, ensure that the "Add tesseract to PATH" option is checked. This will automatically add Tesseract to your PATH environment variable.
+
+4.  Download and install [poppler for Windows](https://github.com/oschwartz10612/poppler-windows). After installation, add the `bin` folder of the installed poppler directory to your PATH environment variable. Here's a [guide](https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/) on how to add directories to the PATH variable on Windows 10.
+
 
-## Configuration
+Configuration
+-------------
 
 AIAutoRename uses environment variables to configure the OpenAI API key and the name of your company. Before running the script, you'll need to create a file named `.env` in the root directory of the project and add the following lines:
 
@@ -26,7 +35,8 @@ MY_COMPANY_NAME=<your-company-name>
 
 Replace `<your-api-key>` with your OpenAI API key, which can be obtained from the [OpenAI website](https://platform.openai.com/docs/developer-quickstart/your-api-keys). Set `<your-company-name>` to your company's name. This information will help the OpenAI API to better understand the context and decide whether to use the sender or recipient of the PDF document.
 
-## Usage
+Usage
+-----
 
 ### Renaming a single PDF file
 
@@ -54,8 +64,14 @@ Replace `path/to/folder` with the path to your folder (no trailing slash).
 
 **Example:**
 
-Suppose you have a folder named `invoices` on your desktop containing multiple PDF files. After running AIAutoRename on the folder, all PDF files within the folder and its subfolders will be renamed according to their content, such as document date, company name, and document type.
+Suppose you have a folder named `invoices` on your desktop containing multiple PDF files. After running AIAutoRename on the folder, all PDF files within the folder and its subfolders will be renamed according to their content, such as document date, company name, and document type. For example, a file originally named `invoice123.pdf` might be renamed to `20220215 MegaCorp PO.pdf`, where `20220215` is the document date, `MegaCorp` is the company name, and `PO` is the document type (purchase order).
 
-## Contributing
+Contributing
+------------
 
-We welcome contributions from anyone! If you find a bug or have a feature request, please open an issue on our [GitHub repository](https://github.com/example/AIAutoRename). If you'd like to contribute code, please open a pull request with your changes. We appreciate your support in making AIAutoRename even better!
+We welcome contributions from everyone! If you find a bug or have a feature request, please open an issue on our [GitHub repository](https://github.com/example/AIAutoRename). If you'd like to contribute code, please open a pull request with your changes. We appreciate your support in making AIAutoRename even better!
+
+Support
+-------
+
+If you encounter any issues or need assistance using AIAutoRename, please don't hesitate to reach out by opening an issue on our [GitHub repository](https://github.com/example/AIAutoRename). We'll do our best to help you as soon as possible.
diff --git a/autorename.py b/autorename.py
@@ -43,7 +43,7 @@ def get_openai_response(text):
         print('---------------------------------')
 
         print('PDF text (preview):')
-        print({text[:1000]})
+        print({text[:100]})
         print('---------------------------------')
 
         completion = openai.ChatCompletion.create(
@@ -64,6 +64,7 @@ def get_openai_response(text):
                         "Example incoming invoice: {\"company_name\": \"ACME\", \"document_date\": \"01.01.2021\", \"document_type\": \"ER\"} " +
                         "Example outgoing invoice: {\"company_name\": \"ACME\", \"document_date\": \"01.01.2021\", \"document_type\": \"AR\"} " +
                         "Example document: {\"company_name\": \"ACME\", \"document_date\": \"01.01.2021\", \"document_type\": \"Angebot\"}"
+                        "If date is unavailable: {\"company_name\": \"ACME\", \"document_date\": \"00.00.0000\", \"document_type\": \"Angebot\"}"
                 },
                 {"role": "user", "content": f"Extract the \"company_name\", \"document_date\", \"document_type\" from this PDF document and return a JSON object:\n\n{text}"},
             ]
@@ -82,10 +83,6 @@ def get_openai_response(text):
                 document_date = json_response['document_date']
                 document_type = json_response['document_type']
 
-                document_date = dateparser.parse(document_date, settings={
-                    'DATE_ORDER': 'DMY'
-                })
-
                 if (is_valid_filename(company_name) and is_valid_filename(document_type) and document_date):
                     break
 
@@ -134,15 +131,34 @@ def harmonize_company_name(company_name):
 
 def parse_openai_response(response):
     company_name = response.get('company_name', 'Unknown')
-    document_date = dateparser.parse(response.get(
-        'document_date', '00000000'), settings={'DATE_ORDER': 'DMY'})
+
+    document_date = response.get('document_date', '00000000')
+    if document_date is None or document_date.strip() == '' or document_date.strip().lower() == 'unbekannt':
+        document_date = "00000000"
+
+    parsed_document_date = dateparser.parse(str(document_date), settings={
+        'DATE_ORDER': 'DMY'
+    })
+
+    if parsed_document_date is None:
+        document_date = dateparser.parse('00000000', settings={
+            'DATE_ORDER': 'DMY'
+        })
+    else:
+        document_date = parsed_document_date
+
     document_type = response.get('document_type', 'Unknown')
 
     return company_name, document_date, document_type
 
 
+
 def rename_invoice(pdf_path, company_name, document_date, document_type):
-    base_name = f'{document_date.strftime("%Y%m%d")} {company_name} {document_type}'
+    if document_date is not None:
+        base_name = f'{document_date.strftime("%Y%m%d")} {company_name} {document_type}'
+    else:
+        base_name = f'{company_name} {document_type}'
+
     counter = 0
     new_name = base_name + '.pdf'
     new_path = os.path.join(os.path.dirname(pdf_path), new_name)

diff --git a/requirements.txt b/requirements.txt