Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected C++ exception in tesseract::addAvailableLanguages #4364

Open
jobermayr opened this issue Dec 2, 2024 · 6 comments
Open

Unexpected C++ exception in tesseract::addAvailableLanguages #4364

jobermayr opened this issue Dec 2, 2024 · 6 comments

Comments

@jobermayr
Copy link

Current Behavior

$ LANG=en ls /usr/share/tessdata
ls: cannot access '/usr/share/tessdata': No such file or directory

#8 0x00007ffff7aad57b in std::filesystem::__cxx11::recursive_directory_iterator::recursive_directory_iterator
(this=0x7fffffffc2b0, __p=filesystem::path "/usr/share/tessdata/" = {...}, __options=(std::filesystem::directory_options::follow_directory_symlink | std::filesystem::directory_options::skip_permission_denied))
at /usr/include/c++/14/bits/fs_dir.h:514
#9 tesseract::addAvailableLanguages (datadir="/usr/share/tessdata/", langs=langs@entry=0x7fffffffc620) at src/api/baseapi.cpp:152
#10 0x00007ffff7aae2d0 in tesseract::TessBaseAPI::GetAvailableLanguagesAsVector (this=this@entry=0x7fffdc0021d8, langs=langs@entry=0x7fffffffc620) at src/api/baseapi.cpp:399

Expected Behavior

It doesn't crash.

Suggested Fix

https://github.com/tesseract-ocr/tesseract/blob/main/src/api/baseapi.cpp#L149:
Add check here whether path/file exists and return if not.

tesseract -v

No response

Operating System

No response

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

No response

zdenop added a commit to zdenop/tesseract that referenced this issue Dec 2, 2024
@stweil
Copy link
Member

stweil commented Dec 2, 2024

There is also an exception if /usr/share/tessdata is a file. Just try touch /usr/share/tessdata and run the test again.

@stweil
Copy link
Member

stweil commented Dec 2, 2024

@jobermayr, do you really get a crash? Or is it a C++ exception (which is not a crash)?

@zdenop
Copy link
Contributor

zdenop commented Dec 2, 2024

@stweil: I get a crash on Windows (built with VS2017)

@jobermayr
Copy link
Author

$ crow
terminate called after throwing an instance of 'std::filesystem::__cxx11::filesystem_error'
  what():  filesystem error: recursive directory iterator cannot open directory: No such file or directory [/usr/share/tessdata/]
Abgebrochen (Speicherabzug geschrieben)
(gdb) bt
#0  0x00007ffff5a9a25c in __pthread_kill_implementation () at /lib64/libc.so.6
#1  0x00007ffff5a414b6 in raise () at /lib64/libc.so.6
#2  0x00007ffff5a2891a in abort () at /lib64/libc.so.6
#3  0x00007ffff5eadc4d in ??? () at /lib64/libstdc++.so.6
#4  0x00007ffff5ebf28c in ??? () at /lib64/libstdc++.so.6
#5  0x00007ffff5ead7f5 in std::terminate() () at /lib64/libstdc++.so.6
#6  0x00007ffff5ebf518 in __cxa_throw () at /lib64/libstdc++.so.6
#7  0x00007ffff5eb4947 in ??? () at /lib64/libstdc++.so.6
#8  0x00007ffff7aad57b in std::filesystem::__cxx11::recursive_directory_iterator::recursive_directory_iterator
    (this=0x7fffffffc290, __p=filesystem::path "/usr/share/tessdata/" = {...}, __options=(std::filesystem::directory_options::follow_directory_symlink | std::filesystem::directory_options::skip_permission_denied))
    at /usr/include/c++/14/bits/fs_dir.h:514
#9  tesseract::addAvailableLanguages (datadir="/usr/share/tessdata/", langs=langs@entry=0x7fffffffc600) at src/api/baseapi.cpp:152
#10 0x00007ffff7aae2d0 in tesseract::TessBaseAPI::GetAvailableLanguagesAsVector (this=this@entry=0x7fffdc0020c8, langs=langs@entry=0x7fffffffc600) at src/api/baseapi.cpp:399
#11 0x00005555555ac9f0 in Ocr::availableLanguages (this=0x7fffdc0020a0) at /usr/src/debug/crow-translate-v3.1.0/src/ocr/ocr.cpp:42
#12 SettingsDialog::SettingsDialog (this=<optimized out>, parent=<optimized out>) at /usr/src/debug/crow-translate-v3.1.0/src/settings/settingsdialog.cpp:64
#13 MainWindow::openSettings (this=0x7fffffffd6c0) at /usr/src/debug/crow-translate-v3.1.0/src/mainwindow.cpp:293
#14 0x00007ffff6535672 in ??? () at /lib64/libQt5Core.so.5
#15 0x00007ffff7299b22 in QAbstractButton::clicked(bool) () at /lib64/libQt5Widgets.so.5
#16 0x00007ffff7299dea in ??? () at /lib64/libQt5Widgets.so.5
#17 0x00007ffff729b665 in ??? () at /lib64/libQt5Widgets.so.5
#18 0x00007ffff729b894 in QAbstractButton::mouseReleaseEvent(QMouseEvent*) () at /lib64/libQt5Widgets.so.5
#19 0x00007ffff7395cda in QToolButton::mouseReleaseEvent(QMouseEvent*) () at /lib64/libQt5Widgets.so.5
#20 0x00007ffff71e6b58 in QWidget::event(QEvent*) () at /lib64/libQt5Widgets.so.5
#21 0x00007ffff71a522e in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /lib64/libQt5Widgets.so.5
#22 0x00007ffff71ad4ba in QApplication::notify(QObject*, QEvent*) () at /lib64/libQt5Widgets.so.5
#23 0x00007ffff64fc188 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /lib64/libQt5Core.so.5
#24 0x00007ffff71ab50e in QApplicationPrivate::sendMouseEvent(QWidget*, QMouseEvent*, QWidget*, QWidget*, QWidget**, QPointer<QWidget>&, bool, bool) () at /lib64/libQt5Widgets.so.5
#25 0x00007ffff72003b6 in ??? () at /lib64/libQt5Widgets.so.5
#26 0x00007ffff7203a3f in ??? () at /lib64/libQt5Widgets.so.5
#27 0x00007ffff71a522e in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /lib64/libQt5Widgets.so.5
#28 0x00007ffff64fc188 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /lib64/libQt5Core.so.5
#29 0x00007ffff6982833 in QGuiApplicationPrivate::processMouseEvent(QWindowSystemInterfacePrivate::MouseEvent*) () at /lib64/libQt5Gui.so.5
#30 0x00007ffff69538cc in QWindowSystemInterface::sendWindowSystemEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib64/libQt5Gui.so.5
#31 0x00007ffff555f010 in ??? () at /lib64/libQt5WaylandClient.so.5
#32 0x00007ffff4510eb8 in ??? () at /lib64/libglib-2.0.so.0
#33 0x00007ffff4512ca8 in ??? () at /lib64/libglib-2.0.so.0
#34 0x00007ffff45134bc in g_main_context_iteration () at /lib64/libglib-2.0.so.0
#35 0x00007ffff6556f79 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib64/libQt5Core.so.5
#36 0x00007ffff64fab82 in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib64/libQt5Core.so.5
#37 0x00007ffff65032be in QCoreApplication::exec() () at /lib64/libQt5Core.so.5
#38 0x00005555555eb33c in launchGui(int, char**) [clone .constprop.0] (argv=<optimized out>, argc=<optimized out>) at /usr/src/debug/crow-translate-v3.1.0/src/main.cpp:77
#39 0x00007ffff5a2a2ae in __libc_start_call_main () at /lib64/libc.so.6
#40 0x00007ffff5a2a379 in __libc_start_main_impl () at /lib64/libc.so.6
#41 0x00005555555926c5 in _start () at ../sysdeps/x86_64/start.S:115

Why should I touch not installed file/dir?

$ zypper se -i tesseract
Loading repository data...
Reading installed packages...

S  | Name                              | Summary                                     | Type
---+-----------------------------------+---------------------------------------------+--------
i  | libtesseract5                     | Open Source OCR Engine                      | package
i  | libtesseract5-debuginfo           | Debug information for package libtesseract5 | package
i  | libtesseract5-x86-64-v3           | Open Source OCR Engine                      | package
i+ | libtesseract5-x86-64-v3-debuginfo | Debug information for package libtesseract5 | package
i  | tesseract-ocr-debugsource         | Debug sources for package tesseract-ocr     | package

$ rpm -ql libtesseract5
/usr/lib64/libtesseract.so.5
/usr/lib64/libtesseract.so.5.0.5
/usr/share/licenses/libtesseract5
/usr/share/licenses/libtesseract5/LICENSE

 $ rpm -ql libtesseract5-x86-64-v3
/usr/lib64/glibc-hwcaps/x86-64-v3/libtesseract.so.5
/usr/lib64/glibc-hwcaps/x86-64-v3/libtesseract.so.5.0.5

There are also tesseract-data and some tesseract-ocr* but they thankfully aren't hard dependencies.

Rather, libtesseract has to react correctly and not leading to the crash if they aren't installed (correct error handling!).

@stweil
Copy link
Member

stweil commented Dec 2, 2024

$ crow
terminate called after throwing an instance of 'std::filesystem::__cxx11::filesystem_error'
what(): filesystem error: recursive directory iterator cannot open directory: No such file or directory [/usr/share/tessdata/]
Abgebrochen (Speicherabzug geschrieben)

So this is a C++ exception, not a crash. I'll update the issue title.

@stweil stweil changed the title Crash in tesseract::addAvailableLanguages Unexpected C++ exception in tesseract::addAvailableLanguages Dec 2, 2024
@stweil
Copy link
Member

stweil commented Dec 2, 2024

Why should I touch not installed file/dir?

There are two different issues with similar results:

  1. There is an exception if there is neither a directory nor a file for tessdata.
  2. There is an exception if there is a file instead of a directory for tessdata.

The touch command is one way to test the 2nd case.

zdenop added a commit to zdenop/tesseract that referenced this issue Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants