Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The ability to save the scanned OCR book as a PDF or Word document, not just a text file #111

Open
DraganRatkovich opened this issue Feb 27, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@DraganRatkovich
Copy link
Collaborator

DraganRatkovich commented Feb 27, 2022

�Bookworm currently allows the user to save a scanned book as a plain text file, which is inconvenient in some cases, as either Word document or pdf file formats are currently widely used.

Describe alternatives you've considered

Allow the user to save the scanned book in either pdf format or Microsoft Word document format, which, in turn will give more options in the resulting file for editing in word processing programs.
This can be done in the following ways:

  • Create additional .pdf and .docx file formats along with the .txt format in the "Save As" dialog box to allow the user to choose from the available file formats;
  • Create a submenu in the file menu called "Export As" and put the three formats there, .txt, .docx and .pdf, to quickly select and simply enter a file name and save in the previously selected file format.

@mush42 Let me know your thoughts whether this is possible or not.

@mush42
Copy link
Collaborator

mush42 commented Feb 27, 2022

@DraganRatkovich
It is possible, of course.
But I couldn't see any benefit of those two formats over plain text.
No structure information is extracted from the document, except pages and lines. No headings, no paragraphs, and no formatting information.
You can copy the text from the text file and paste it in word, and word will restore paging and lines.
Best
Musharraf

@DraganRatkovich
Copy link
Collaborator Author

DraganRatkovich commented Feb 27, 2022

@mush42 Of course, but the main advantage of direct saving as pdf or docx is time. It may take a long time to process in Microsoft Word the contents of the extracted text file, especially if the book being scanned contains more than 300 pages.

@DraganRatkovich DraganRatkovich added the enhancement New feature or request label Jun 14, 2022
@pauliyobo
Copy link
Collaborator

Hello.
One year later. Is this feature still desired? If yes, @DraganRatkovich , would you mind explaining why?
I did read the previous comment, however note that even if we did save the txt into a PDF you would not retain any structure from the original image.
Iirc, what you get now in the scanned file's output is at most the page number. Is that correct?

@pauliyobo pauliyobo closed this as not planned Won't fix, can't repro, duplicate, stale Dec 28, 2024
@DraganRatkovich
Copy link
Collaborator Author

@pauliyobo This is the second time I've noted this, don't close an issue that was raised due to community interest and there is still no progress on it.

@pauliyobo
Copy link
Collaborator

@DraganRatkovich
Hello,
I had closed this for reasons similar to #151
I will leave this open, just in case people are still interested by this, though it'd be interesting to know the motivation behind this proposal.
Also, I think it's best if we all avoid closing and reopening issues without a compelling reason, me included. I hadn't actually commented with a closing reason on this one so this one is on me.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants