Skip to content

Commit

Permalink
fix: atualiza versão do Apache Tika em uso. (#74)
Browse files Browse the repository at this point in the history
Quando tentado rodar o Apache Tika versão 1.9.4 atualmente em uso no
container para rodar o servidor o comando falha. Parece que o binário
está corrompido. Por isso, esse commit atualiza o Apache Tika em use
para a versão 2.9.1.

Fix #73
  • Loading branch information
ogecece authored Sep 14, 2024
2 parents 4700489 + bbb7038 commit 983601e
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 2 deletions.
5 changes: 4 additions & 1 deletion data_extraction/text_extraction.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,10 @@ def _try_extract_text(self, filepath: str) -> str:
if self.is_txt(filepath):
return self._return_file_content(filepath)
with open(filepath, "rb") as file:
headers = {"Content-Type": self._get_file_type(filepath)}
headers = {
"Content-Type": self._get_file_type(filepath),
"Accept": "text/plain",
}
response = requests.put(f"{self._url}/tika", data=file, headers=headers)
response.encoding = "UTF-8"
return response.text
Expand Down
2 changes: 1 addition & 1 deletion scripts/Dockerfile_apache_tika
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ RUN adduser --system gazette && \
apt-get clean

# install Apache Tika
RUN curl -o /tika-server.jar http://archive.apache.org/dist/tika/tika-server-1.24.1.jar && \
RUN curl -o /tika-server.jar https://dlcdn.apache.org/tika/2.9.2/tika-server-standard-2.9.2.jar && \
chmod 755 /tika-server.jar

USER gazette
Expand Down

0 comments on commit 983601e

Please sign in to comment.