Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to listen to audiobooks #98

Open
DraganRatkovich opened this issue Feb 13, 2022 · 16 comments
Open

Ability to listen to audiobooks #98

DraganRatkovich opened this issue Feb 13, 2022 · 16 comments
Labels
enhancement New feature or request

Comments

@DraganRatkovich
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

No

Describe the solution you'd like

I wonder what the community thinks to add the ability to listen to an audiobook in this program? Well, I know that you can listen to an audiobook in any media player, but listening to an audiobook with the ability to save bookmarks to quickly jump to the right section, make notes, etc., would probably be amazing. Opinions of others are very welcome

Describe alternatives you've considered

Additional context

It would be great to hear from the developers @mush42 @MichelSuch @cary-rowen @pauliyobo about this feature.

@mush42
Copy link
Collaborator

mush42 commented Feb 13, 2022

Hello @DraganRatkovich

The first question that pops into mind is whether there is any standard format for audio books.
If so, we can contemplate adding support for it the same way we have support for other document types.

Technically speaking, it is very easy to support playing audio in Bookworm. But this is out of scope for an eBook reader.

Another related feature is support for EPUB 3 media overlays. But this is hampered by the lack of adoption for such feature by publishers.

Best

Musharraf

@DraganRatkovich
Copy link
Collaborator Author

Good day @mush42
No, there are no special standards for audiobooks, unless the bookseller wants to encrypt them and convert them to another format, such as .lkf. Currently, the most popular audiobook formats are standard media files, .mp3 and .wav, and to my knowledge, Python can easily open these formats. The biggest reason for requesting this feature was the amazing abilities of the Bookworm like opening a wide range of formats like .pdf, .doc and many more and I was really shocked when Bookworm opened a 1000 page document in a matter of seconds and still not sure what you guys did there, but you are amazing and for sure adobe acrobat is removed from my system for this reason🙂 and of course when I scanned with your new OCR document in georgian language which was images combined in .pdf format and bookworm rendered this page amazingly, my shock increased, because guys this program is the first PC software that can read Georgian scanned PDF files in an accessible way, so thanks for that. Now, in terms of the audiobook format, if this feature is added to your program, Bookworm will become the number one program in the blind community, at least in Georgia, which can be imagined as a library in a computer that will help people read. scan or listen to audiobooks in a convenient way.

@TheQuinbox
Copy link
Contributor

@mush42, we can support them the same way Voice Dream Reader does, I think. Just take in either an MP3/wav file, or a zip file of audio files that get turned into their own chapters.

@pauliyobo
Copy link
Collaborator

My only concern is how would be able to keep this consistent?
If we allowed any MP3/WAV file to be opened within bookworm, it would basically become a media player. Is that what @DraganRatkovich is essentially proposing?
Also, I believe another problem is that I am not sure how we would be able to easily represent the book. Using zip files could be an idea, but what if the entire book is in an unique file? How do we make sure that what is being opened is an audiobook and not, say, a simple audio clip?
If there's anything I should be aware of that I did not get I would be more than happy to receive pointers to it.

@mush42
Copy link
Collaborator

mush42 commented Feb 16, 2022

@TheQuinbox
Should we enable the same tools we have for textual documents?
I.e. bookmarking, note-taking ..etc

If yes, how would that look like?

@TheQuinbox
Copy link
Contributor

@mush42, I'm not sure exactly how the bookmarking code works, but I'd assume that it stores an index in the text? We could store the position that the user was at in the audio file.

@mush42
Copy link
Collaborator

mush42 commented Feb 16, 2022

@TheQuinbox Yes, that's exactly how it is done.

Still the issues raised by @pauliyobo need to be resolved before we embark on this.

Best

@DraganRatkovich
Copy link
Collaborator Author

Good day to all,
Sorry for the late reply. I have seen all the comments left on this feature. To be honest, I don't know how it can be done to prevent listening to other media files with Bookworm and only listening to audiobooks, because as @pauliyobo said, music files can be easily opened with Bookworm, but if there are any ways to prevent this , not sure. On the other hand, there are special programs that allow you to listen to audiobooks, but do not interfere with the playback of other media files, so, simply put, listening should depend on the user whether he wants to listen to audiobook or music file by simply opening it like an audiobook. If you have any ideas or know other ways, I'm very happy to discuss, as I'm also very interested.

@iuriguilherme
Copy link

What about converting the audiobook to a text book first then process it like usual?

This isn't as simple as just sending audio to a speech to text engine, but I think it would be the most straightforward way to accomplish the given task using the all the features readily available now.

This is theoretically similar to performing OCR in image in order to make it text, but in practice it would depend on how the audiobook defined it's sections, how it reads it's titles, etc. Some of them have little soundtracks to acknowledge end of sections, others use the narrator voice, etc.

@DraganRatkovich
Copy link
Collaborator Author

Hello @iuriguilherme

Could you give more technical details on what exactly do you mean by converting audiobook to text book format? I can understand the Daisy format which is basically .mp3 files with extra html and stuff to handle sections, chapters etc properly, but I can't understand correctly how it is possible to convert the mp3 file recorded in the studio to a text file. I know of several encryption methods for preserving the digital rights of audiobooks, but they are completely different things.

@iuriguilherme
Copy link

iuriguilherme commented Apr 2, 2022

Could you give more technical details on what exactly do you mean by converting audiobook to text book format?

Through the use of a speech-to-text engine (STT).

EDIT: an example using google stt: https://github.com/googleapis/python-speech/blob/main/samples/snippets/quickstart.py

@DraganRatkovich
Copy link
Collaborator Author

@iuriguilherme Then what's the point of integrating the audiobook listening feature into Bookworm if it needs to be converted into a textbook?

Or if you mean text-to-speech technology instead of STT? Do you have examples of the types of books mentioned?

@iuriguilherme
Copy link

No, I mean exactly converting speech to text. Because that is what allows all the processing BEFORE you use a text to speech (TTS) to interact with the user.

I use this approach with hearing/speaking robots. In fact, every talking robot you see out there is essentially a chatbot which converts what it hears to text, process it using neural models then convert the text reply to speech so the robot can answer.

@DraganRatkovich
Copy link
Collaborator Author

@iuriguilherme Sounds really interesting. Perhaps the lead developer might consider this as I don't really have any knowledge of programming or other development.

@iuriguilherme
Copy link

I don't have good knowledge of how to programatically use Cortana (the windows builtin STT engine), I only know the cloud api services. But the logic is what I described.

@Lucas18503
Copy link

In terms of a standard audiobook format, M4b comes to mind. It is at least something that I would like to see if this feature gets added, given that the format has chapter tags built in that most modern media players support.

@DraganRatkovich DraganRatkovich added the enhancement New feature or request label Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants