Skip to content

API ‐ OpenAI V1 Speech Compatible Endpoint

erew123 edited this page Dec 23, 2024 · 5 revisions

AllTalk provides an endpoint compatible with the OpenAI Speech v1 API. This allows for easy integration with existing systems designed to work with OpenAI's text-to-speech service.

Endpoint Details

  • URL: http://{ipaddress}:{port}/v1/audio/speech
  • Method: POST
  • Content-Type: application/json

Request Format

The request body must be a JSON object with the following fields:

Field Type Description
model string The TTS model to use. Currently ignored, but required in the request.
input string The text to generate audio for. Maximum length is 4096 characters.
voice string The voice to use when generating the audio.
response_format string (Optional) The format of the audio. Audio will be transcoded to the requested format.
speed float (Optional) The speed of the generated audio. Must be between 0.25 and 4.0. Default is 1.0.

Supported Voices

The voice parameter supports the following values:

  • alloy
  • echo
  • fable
  • nova
  • onyx
  • shimmer

These voices are mapped to AllTalk voices on a one-to-one basis within the AllTalk Gradio interface, on a per-TTS engine basis.

Example cURL Request

curl -X POST "http://127.0.0.1:7851/v1/audio/speech" \
     -H "Content-Type: application/json" \
     -d '{
           "model": "any_model_name",
           "input": "Hello, this is a test.",
           "voice": "nova",
           "response_format": "wav",
           "speed": 1.0
         }'

Response

The endpoint returns the generated audio data directly in the response body.

Additional Notes

  • There is no capability within this API to specify a language. The response will be in whatever language the currently loaded TTS engine and model support.
  • If RVC is globally enabled in AllTalk settings and a voice other than "Disabled" is selected for the character voice, the chosen RVC voice will be applied after the TTS is generated and before the audio is transcoded and sent back out.

Voice Remapping

Voices can be re-mapped in the Gradio Interface > TTS Engine Settings > {Chosen TTS Engine} > OpenAI Voice Mappings.

You can also remap the 6 OpenAI voices to any voices supported by the currently loaded TTS engine using the following endpoint:

  • URL: http://{ipaddress}:{port}/api/openai-voicemap
  • Method: PUT
  • Content-Type: application/json

Example Voice Remapping Request

curl -X PUT "http://localhost:7851/api/openai-voicemap" \
     -H "Content-Type: application/json" \
     -d '{
           "alloy": "female_01.wav",
           "echo": "female_01.wav",
           "fable": "female_01.wav",
           "nova": "female_01.wav",
           "onyx": "male_01.wav",
           "shimmer": "male_02.wav"
         }'

Note: The Gradio interface will not reflect these changes until AllTalk is reloaded, as Gradio caches the list.

Code Examples

Python Example

import requests
import json

# Define the endpoint URL
url = "http://127.0.0.1:7851/v1/audio/speech"

# Define the request payload
payload = {
    "model": "any_model_name",
    "input": "Hello, this is a test.",
    "voice": "nova",
    "response_format": "wav",
    "speed": 1.0
}

# Set the headers
headers = {
    "Content-Type": "application/json"
}

# Send the POST request
response = requests.post(url, data=json.dumps(payload), headers=headers)

# Check the response
if response.status_code == 200:
    with open("output.wav", "wb") as f:
        f.write(response.content)
    print("Audio file saved as output.wav")
else:
    print(f"Error: {response.status_code} - {response.text}")

JavaScript Example

// Define the endpoint URL
const url = "http://127.0.0.1:7851/v1/audio/speech";

// Define the request payload
const payload = {
    model: "any_model_name",
    input: "Hello, this is a test.",
    voice: "nova",
    response_format: "wav",
    speed: 1.0
};

// Set the headers
const headers = {
    "Content-Type": "application/json"
};

// Send the POST request
fetch(url, {
    method: "POST",
    headers: headers,
    body: JSON.stringify(payload)
})
.then(response => {
    if (response.ok) {
        return response.blob();
    } else {
        return response.text().then(text => { throw new Error(text); });
    }
})
.then(blob => {
    // Create a link element
    const url = window.URL.createObjectURL(blob);
    const a = document.createElement('a');
    a.style.display = 'none';
    a.href = url;
    a.download = 'output.wav';

    // Append the link to the body
    document.body.appendChild(a);

    // Programmatically click the link to trigger the download
    a.click();

    // Remove the link from the document
    window.URL.revokeObjectURL(url);
    document.body.removeChild(a);

    console.log("Audio file saved as output.wav");
})
.catch(error => {
    console.error("Error:", error);
});

These examples demonstrate how to use the OpenAI V1 API Compatible Endpoint with AllTalk in both Python and JavaScript environments.

Clone this wiki locally