Join our Discord server for any questions or discussions.
This project provides a command-line tool to convert EPUB ebooks into audiobooks. It now supports both the Microsoft Azure Text-to-Speech API (alternativly EdgeTTS) and the OpenAI Text-to-Speech API to generate the audio for each chapter in the ebook. The output audio files are optimized for use with Audiobookshelf.
This project is developed with the help of ChatGPT.
If you're interested in hearing a sample of the audiobook generated by this tool, check the links bellow.
- Azure TTS Sample
- OpenAI TTS Sample
- Edge TTS Sample: the voice is almost the same as Azure TTS
- Piper TTS
- Python 3.6+ Or Docker
- For using Azure TTS, A Microsoft Azure account with access to the Microsoft Cognitive Services Speech Services is required.
- For using OpenAI TTS, OpenAI API Key is required.
- For using Edge TTS, no API Key is required.
- Piper TTS executable and models for Piper TTS
The audiobooks generated by this project are optimized for use with Audiobookshelf. Each chapter in the EPUB file is converted into a separate MP3 file, with the chapter title extracted and included as metadata.
Parsing and extracting chapter titles from EPUB files can be challenging, as the format and structure may vary significantly between different ebooks. The script employs a simple but effective method for extracting chapter titles, which works for most EPUB files. The method involves parsing the EPUB file and looking for the title
tag in the HTML content of each chapter. If the title tag is not present, a fallback title is generated using the first few words of the chapter text.
Please note that this approach may not work perfectly for all EPUB files, especially those with complex or unusual formatting. However, in most cases, it provides a reliable way to extract chapter titles for use in Audiobookshelf.
When you import the generated MP3 files into Audiobookshelf, the chapter titles will be displayed, making it easy to navigate between chapters and enhancing your listening experience.
-
Clone this repository:
git clone https://github.com/p0n1/epub_to_audiobook.git cd epub_to_audiobook
-
Create a virtual environment and activate it:
python3 -m venv venv source venv/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Set the following environment variables with your Azure Text-to-Speech API credentials, or your OpenAI API key if you're using OpenAI TTS:
export MS_TTS_KEY=<your_subscription_key> # for Azure export MS_TTS_REGION=<your_region> # for Azure export OPENAI_API_KEY=<your_openai_api_key> # for OpenAI
To convert an EPUB ebook to an audiobook, run the following command, specifying the TTS provider of your choice with the --tts
option:
python3 main.py <input_file> <output_folder> [options]
To check the latest option descriptions for this script, you can run the following command in the terminal:
python3 main.py -h
usage: main.py [-h] [--tts {azure,openai,edge,piper}]
[--log {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--preview]
[--no_prompt] [--language LANGUAGE]
[--newline_mode {single,double,none}]
[--title_mode {auto,tag_text,first_few}]
[--chapter_start CHAPTER_START] [--chapter_end CHAPTER_END]
[--output_text] [--remove_endnotes]
[--search_and_replace_file SEARCH_AND_REPLACE_FILE]
[--voice_name VOICE_NAME] [--output_format OUTPUT_FORMAT]
[--model_name MODEL_NAME] [--voice_rate VOICE_RATE]
[--voice_volume VOICE_VOLUME] [--voice_pitch VOICE_PITCH]
[--proxy PROXY] [--break_duration BREAK_DURATION]
[--piper_path PIPER_PATH] [--piper_speaker PIPER_SPEAKER]
[--piper_sentence_silence PIPER_SENTENCE_SILENCE]
[--piper_length_scale PIPER_LENGTH_SCALE]
input_file output_folder
Convert text book to audiobook
positional arguments:
input_file Path to the EPUB file
output_folder Path to the output folder
options:
-h, --help show this help message and exit
--tts {azure,openai,edge,piper}
Choose TTS provider (default: azure). azure: Azure
Cognitive Services, openai: OpenAI TTS API. When using
azure, environment variables MS_TTS_KEY and
MS_TTS_REGION must be set. When using openai,
environment variable OPENAI_API_KEY must be set.
--log {DEBUG,INFO,WARNING,ERROR,CRITICAL}
Log level (default: INFO), can be DEBUG, INFO,
WARNING, ERROR, CRITICAL
--preview Enable preview mode. In preview mode, the script will
not convert the text to speech. Instead, it will print
the chapter index, titles, and character counts.
--no_prompt Don't ask the user if they wish to continue after
estimating the cloud cost for TTS. Useful for
scripting.
--language LANGUAGE Language for the text-to-speech service (default: en-
US). For Azure TTS (--tts=azure), check
https://learn.microsoft.com/en-us/azure/ai-
services/speech-service/language-
support?tabs=tts#text-to-speech for supported
languages. For OpenAI TTS (--tts=openai), their API
detects the language automatically. But setting this
will also help on splitting the text into chunks with
different strategies in this tool, especially for
Chinese characters. For Chinese books, use zh-CN, zh-
TW, or zh-HK.
--newline_mode {single,double,none}
Choose the mode of detecting new paragraphs: 'single',
'double', or 'none'. 'single' means a single newline
character, while 'double' means two consecutive
newline characters. 'none' means all newline
characters will be replace with blank so paragraphs
will not be detected. (default: double, works for most
ebooks but will detect less paragraphs for some
ebooks)
--title_mode {auto,tag_text,first_few}
Choose the parse mode for chapter title, 'tag_text'
search 'title','h1','h2','h3' tag for title,
'first_few' set first 60 characters as title, 'auto'
auto apply the best mode for current chapter.
--chapter_start CHAPTER_START
Chapter start index (default: 1, starting from 1)
--chapter_end CHAPTER_END
Chapter end index (default: -1, meaning to the last
chapter)
--output_text Enable Output Text. This will export a plain text file
for each chapter specified and write the files to the
output folder specified.
--remove_endnotes This will remove endnote numbers from the end or
middle of sentences. This is useful for academic
books.
--search_and_replace_file SEARCH_AND_REPLACE_FILE
Path to a file that contains 1 regex replace per line,
to help with fixing pronunciations, etc. The format
is: <search>==<replace> Note that you may have to
specify word boundaries, to avoid replacing parts of
words.
--voice_name VOICE_NAME
Various TTS providers has different voice names, look
up for your provider settings.
--output_format OUTPUT_FORMAT
Output format for the text-to-speech service.
Supported format depends on selected TTS provider
--model_name MODEL_NAME
Various TTS providers has different neural model names
edge specific:
--voice_rate VOICE_RATE
Speaking rate of the text. Valid relative values range
from -50%(--xxx='-50%') to +100%. For negative value
use format --arg=value,
--voice_volume VOICE_VOLUME
Volume level of the speaking voice. Valid relative
values floor to -100%. For negative value use format
--arg=value,
--voice_pitch VOICE_PITCH
Baseline pitch for the text.Valid relative values like
-80Hz,+50Hz, pitch changes should be within 0.5 to 1.5
times the original audio. For negative value use
format --arg=value,
--proxy PROXY Proxy server for the TTS provider. Format:
http://[username:password@]proxy.server:port
azure/edge specific:
--break_duration BREAK_DURATION
Break duration in milliseconds for the different
paragraphs or sections (default: 1250, means 1.25 s).
Valid values range from 0 to 5000 milliseconds for
Azure TTS.
piper specific:
--piper_path PIPER_PATH
Path to the Piper TTS executable
--piper_speaker PIPER_SPEAKER
Piper speaker id, used for multi-speaker models
--piper_sentence_silence PIPER_SENTENCE_SILENCE
Seconds of silence after each sentence
--piper_length_scale PIPER_LENGTH_SCALE
Phoneme length, a.k.a. speaking rate
Example:
python3 main.py examples/The_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder
Executing the above command will generate a directory named output_folder
and save the MP3 files for each chapter inside it using default TTS provider and voice. Once generated, you can import these audio files into Audiobookshelf or play them with any audio player of your choice.
Before converting your epub file to an audiobook, you can use the --preview
option to get a summary of each chapter. This will provide you with the character count of each chapter and the total count, instead of converting the text to speech.
Example:
python3 main.py examples/The_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder --preview
You may want to search and replace text, either to expand abbreviations, or to help with pronunciation. You can do this by specifying a search and replace file, which contains a single regex search and replace per line, separated by '==':
Example:
search.conf:
# this is the general structure
<search>==<replace>
# this is a comment
# fix cardinal direction abbreviations
N\.E\.==north east
# be careful with your regexes, as this would also match Sally N. Smith
N\.==north
# pronounce Barbadoes like the locals
Barbadoes==Barbayduss
python3 main.py examples/The_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder --search_and_replace_file search.conf
Example:
python3 main.py examples/The_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder --preview
This tool is available as a Docker image, making it easy to run without needing to manage Python dependencies.
First, make sure you have Docker installed on your system.
You can pull the Docker image from the GitHub Container Registry:
docker pull ghcr.io/p0n1/epub_to_audiobook:latest
Then, you can run the tool with the following command:
docker run -i -t --rm -v ./:/app -e MS_TTS_KEY=$MS_TTS_KEY -e MS_TTS_REGION=$MS_TTS_REGION ghcr.io/p0n1/epub_to_audiobook your_book.epub audiobook_output --tts azure
For OpenAI, you can run:
docker run -i -t --rm -v ./:/app -e OPENAI_API_KEY=$OPENAI_API_KEY ghcr.io/p0n1/epub_to_audiobook your_book.epub audiobook_output --tts openai
Replace $MS_TTS_KEY
and $MS_TTS_REGION
with your Azure Text-to-Speech API credentials. Replace $OPENAI_API_KEY
with your OpenAI API key. Replace your_book.epub
with the name of the input EPUB file, and audiobook_output
with the name of the directory where you want to save the output files.
The -v ./:/app
option mounts the current directory (.
) to the /app
directory in the Docker container. This allows the tool to read the input file and write the output files to your local file system.
The -i
and -t
options are required to enable interactive mode and allocate a pseudo-TTY.
You can also check the this example config file for docker compose usage.
For Windows users, especially if you're not very familiar with command-line tools, we've got you covered. We understand the challenges and have created a guide specifically tailored for you.
Check this step by step guide and leave a message if you encounter issues.
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Check https://platform.openai.com/docs/quickstart/account-setup. Make sure you check the price details before use.
Edge TTS and Azure TTS are almost same, the difference is that Edge TTS don't require API Key because it's based on Edge read aloud functionality, and parameters are restricted a bit, like custom ssml.
Check https://gist.github.com/BettyJJ/17cbaa1de96235a7f5773b8690a20462 for supported voices.
If you want to try this project quickly, Edge TTS is highly recommended.
You can customize the voice and language used for the Text-to-Speech conversion by passing the --voice_name
and --language
options when running the script.
Microsoft Azure offers a range of voices and languages for the Text-to-Speech service. For a list of available options, consult the Microsoft Azure Text-to-Speech documentation.
You can also listen to samples of the available voices in the Azure TTS Voice Gallery to help you choose the best voice for your audiobook.
For example, if you want to use a British English female voice for the conversion, you can use the following command:
python3 main.py <input_file> <output_folder> --voice_name en-GB-LibbyNeural --language en-GB
For OpenAI TTS, you can specify the model, voice, and format options using --model_name
, --voice_name
, and --output_format
, respectively.
Here are some examples that demonstrate various option combinations:
-
Basic conversion using Azure with default settings
This command will convert an EPUB file to an audiobook using Azure's default TTS settings.python3 main.py "path/to/book.epub" "path/to/output/folder" --tts azure
-
Azure conversion with custom language, voice and logging level
Converts an EPUB file to an audiobook with a specified voice and a custom log level for debugging purposes.python3 main.py "path/to/book.epub" "path/to/output/folder" --tts azure --language zh-CN --voice_name "zh-CN-YunyeNeural" --log DEBUG
-
Azure conversion with chapter range and break duration
Converts a specified range of chapters from an EPUB file to an audiobook with custom break duration between paragraphs.python3 main.py "path/to/book.epub" "path/to/output/folder" --tts azure --chapter_start 5 --chapter_end 10 --break_duration "1500"
-
Basic conversion using OpenAI with default settings
This command will convert an EPUB file to an audiobook using OpenAI's default TTS settings.python3 main.py "path/to/book.epub" "path/to/output/folder" --tts openai
-
OpenAI conversion with HD model and specific voice
Converts an EPUB file to an audiobook using the high-definition OpenAI model and a specific voice choice.python3 main.py "path/to/book.epub" "path/to/output/folder" --tts openai --model_name "tts-1-hd" --voice_name "fable"
-
OpenAI conversion with preview and text output
Enables preview mode and text output, which will display the chapter index and titles instead of converting them and will also export the text.python3 main.py "path/to/book.epub" "path/to/output/folder" --tts openai --preview --output_text
-
Basic conversion using Edge with default settings
This command will convert an EPUB file to an audiobook using Edge's default TTS settings.python3 main.py "path/to/book.epub" "path/to/output/folder" --tts edge
-
Edge conversion with custom language, voice and logging level Converts an EPUB file to an audiobook with a specified voice and a custom log level for debugging purposes.
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts edge --language zh-CN --voice_name "zh-CN-YunxiNeural" --log DEBUG
-
Edge conversion with chapter range and break duration Converts a specified range of chapters from an EPUB file to an audiobook with custom break duration between paragraphs.
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts edge --chapter_start 5 --chapter_end 10 --break_duration "1500"
Make sure you have installed Piper TTS and have an onnx model file and corresponding config file. Check Piper TTS for more details. You can follow their instructions to install Piper TTS, download the models and config files, play with it and then come back to try the examples below.
This command will convert an EPUB file to an audiobook using Piper TTS using the bare minimum parameters.
You always need to specify an onnx model file and the piper
executable needs to be in the current $PATH.
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts piper --model_name <path_to>/en_US-libritts_r-medium.onnx
You can specify your custom path to the piper executable by using the --piper_path
parameter.
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts piper --model_name <path_to>/en_US-libritts_r-medium.onnx --piper_path <path_to>/piper
Some models support multiple voices and that can be specified by using the voice_name parameter.
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts piper --model_name <path_to>/en_US-libritts_r-medium.onnx --piper_speaker 256
You can also specify speed (piper_length_scale) and pause duration (piper_sentence_silence).
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts piper --model_name <path_to>/en_US-libritts_r-medium.onnx --piper_speaker 256 --piper_length_scale 1.5 --piper_sentence_silence 0.5
Piper TTS outputs wav
format files (or raw) by default you should be able to specify any reasonable format via the --output_format
parameter. The opus
and mp3
are good choices for size and compatibility.
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts piper --model_name <path_to>/en_US-libritts_r-medium.onnx --piper_speaker 256 --piper_length_scale 1.5 --piper_sentence_silence 0.5 --output_format opus
This may be because the Python version you are using is less than 3.8. You can try to manually install it by pip3 install importlib-metadata
, or use a higher Python version.
Make sure ffmpeg binary is accessible from your path. If you are on a mac and use homebrew, you can do brew install ffmpeg
, On Ubuntu you can do sudo apt install ffmpeg
For installation-related issues, please refer to the Piper TTS repository. It's important to note that if you're installing piper-tts
via pip, only Python 3.10 is currently supported. Mac users may encounter additional challenges when using the downloaded binary. For more information on Mac-specific issues, please check this issue and this pull request.
Also check this if you're having trouble with Piper TTS.
- Epub to Audiobook (M4B): Epub to MB4 Audiobook, with StyleTTS2 via HuggingFace Spaces API.
- Storyteller: A self-hosted platform for automatically syncing ebooks and audiobooks.
This project is licensed under the MIT License. See the LICENSE file for details.