Speak text
🤖/text/speak synthesizes speech in documents.
You can use the audio that we return in your application, or you can pass the audio down to other Robots to add a voice track to a video for example.
Another common use case is making your product accessible to people with a reading disability.
Warning: Transloadit aims to be deterministic, but this Robot uses third-party AI services. The providers (AWS, GCP) will evolve their models over time, giving different responses for the same input media. Avoid relying on exact responses in your tests and application.
Supported languages and voices
AWS
Language | Voices |
---|---|
arb | female-1 |
cmn-CN | female-1 |
da-DK | female-1, male-1 |
nl-NL | female-1, male-1 |
en-AU | female-1, male-1 |
en-GB | female-1, male-1 |
en-IN | female-1 |
en-US | female-1, female-child-1, male-1, male-child-1 |
en-GB-WLS | male-1 |
fr-FR | female-1, male-1 |
fr-CA | female-1 |
de-DE | female-1, male-1 |
hi-IN | female-1 |
is-IS | female-1, male-1 |
it-IT | female-1, male-1 |
ja-JP | female-1, male-1 |
ko-KR | female-1 |
nb-NO | female-1 |
pl-PL | female-1, male-1 |
pt-BR | female-1, male-1 |
pt-PT | female-1, male-1 |
ro-RO | female-1 |
ru-RU | female-1, male-1 |
es-ES | female-1, male-1 |
es-MX | female-1 |
es-US | female-1, male-1 |
sv-SE | female-1 |
tr-TR | female-1 |
cy-GB | female-1 |
GCP
Language | Voices |
---|---|
ar-XA | female-1, male-1 |
bn-IN | female-1, male-1 |
yue-HK | female-1, male-1 |
cs-CZ | female-1 |
da-DK | female-1, male-1 |
nl-NL | female-1, male-1 |
en-AU | female-1, male-1 |
en-IN | female-1, male-1 |
en-GB | female-1, male-1 |
en-US | female-1, female-2, female-3, male-1 |
fil-PH | female-1, male-1 |
fi-FI | female-1 |
fr-CA | female-1, male-1 |
fr-FR | female-1, male-1 |
de-DE | female-1, male-1 |
el-GR | female-1 |
gu-IN | female-1, male-1 |
hi-IN | female-1, male-1 |
hu-HU | female-1 |
id-ID | female-1, male-1 |
it-IT | female-1, male-1 |
ja-JP | female-1, male-1 |
kn-IN | female-1, male-1 |
ko-KR | female-1, male-1 |
ml-IN | female-1, male-1 |
cmn-CN | female-1, male-1 |
cmn-TW | female-1, male-1 |
nb-NO | female-1, male-1 |
pl-PL | female-1, male-1 |
pt-BR | female-1 |
pt-PT | female-1, male-1 |
ro-RO | female-1 |
ru-RU | female-1, male-1 |
sk-SK | female-1 |
es-ES | female-1, male-1 |
sv-SE | female-1 |
ta-IN | female-1, male-1 |
te-IN | female-1, male-1 |
th-TH | female-1 |
tr-TR | female-1, male-1 |
uk-UA | female-1 |
vi-VN | female-1, male-1 |
Usage example
Synthesize speech from uploaded text documents, using a female voice in American English:
{
"steps": {
"synthesized": {
"robot": "/text/speak",
"use": ":original",
"provider": "aws",
"voice": "female-1",
"target_language": "en-US"
}
}
}
Parameters
-
use
String / Array of Strings / Object requiredSpecifies which Step(s) to use as input.
-
You can pick any names for Steps except
":original"
(reserved for user uploads handled by Transloadit) -
You can provide several Steps as input with arrays:
"use": [ ":original", "encoded", "resized" ]
💡 That’s likely all you need to know about
use
, but you can view Advanced use cases. -
-
prompt
StringWhich text to speak. You can also set this to
null
and supply an input text file. -
provider
StringrequiredWhich AI provider to leverage. Valid values are
"aws"
and"gcp"
.Transloadit outsources this task and abstracts the interface so you can expect the same data structures, but different latencies and information being returned. Different cloud vendors have different areas they shine in, and we recommend to try out and see what yields the best results for your use case.
-
target_language
String ⋅ default:"en-US"
The written language of the document. This will also be the language of the spoken text.
The language should be specified in the BCP-47 format, such as
"en-GB"
,"de-DE"
or"fr-FR"
. Please consult the list of supported languages and voices. -
voice
String ⋅ default:"female-1"
The gender to be used for voice synthesis. Please consult the list of supported languages and voices.
-
ssml
Boolean ⋅ default:false
Supply Speech Synthesis Markup Language instead of raw text, in order to gain more control over how your text is voiced, including rests and pronounciations.
Demos
Related blog posts
- Building an AI-powered video dubber with Transloadit July 9, 2021
- Building a screen reader plugin with /text/speak Robot June 3, 2021
- Celebrating transloadit’s 2021 milestones and progress January 31, 2022
- Building an alt-text to speech generator with Transloadit May 9, 2022