Transcribe speech in audio or video files
🤖/speech/transcribe transcribes speech in audio or video files.
You can use the text that we return in your application, or you can pass the text down to other Robots to filter audio or video files that contain (or do not contain) certain content, or burn the text into images or video for example.
Another common use case is automatically subtitling videos, or making audio searchable.
Warning: Transloadit aims to be deterministic, but this Robot uses third-party AI services. The providers (AWS, GCP) will evolve their models over time, giving different responses for the same input media. Avoid relying on exact responses in your tests and application.
Usage example
Transcribe speech in French from uploaded audio or video, and save it to a text file:
{
"steps": {
"transcribed": {
"robot": "/speech/transcribe",
"use": ":original",
"provider": "aws",
"source_language": "fr-FR",
"format": "text"
}
}
}
Parameters
-
use
String / Array of Strings / Object requiredSpecifies which Step(s) to use as input.
-
You can pick any names for Steps except
":original"
(reserved for user uploads handled by Transloadit) -
You can provide several Steps as input with arrays:
"use": [ ":original", "encoded", "resized" ]
💡 That’s likely all you need to know about
use
, but you can view Advanced use cases. -
-
provider
StringrequiredWhich AI provider to leverage.
Transloadit outsources this task and abstracts the interface so you can expect the same data structures, but different latencies and information being returned. Different cloud vendors have different areas they shine in, and we recommend to try out and see what yields the best results for your use case.
-
granularity
String ⋅ default:"full"
Whether to return a full response (
"full"
), or a flat list of descriptions ("list"
). -
format
String ⋅ default:"json"
Output format for the transcription.
"text"
outputs a plain text file that you can store and process."json"
outputs a JSON file containing timestamped words."srt"
and"webvtt"
output subtitle files of those respective file types, which can be stored separately or used in other encoding Steps."meta"
does not return a file, but stores the data inside Transloadit's file object (under${file.meta.transcription.text}
) that's passed around between encoding Steps, so that you can use the values to burn the data into videos, filter on them, etc.
-
source_language
String ⋅ default:"en-US"
The spoken language of the audio or video. This will also be the language of the transcribed text.
The language should be specified in the BCP-47 format, such as
"en-GB"
,"de-DE"
or"fr-FR"
. Please also consult the list of supported languages for thegcp
provider and the theaws
provider.
Demos
Related blog posts
- Tech preview: new AI Robots for enhanced media processing February 17, 2020
- New feature: auto-transcribe videos with subtitles March 8, 2021
- Building a screen reader plugin with /text/speak Robot June 3, 2021
- Building an AI-powered video dubber with Transloadit July 9, 2021
- Celebrating transloadit’s 2021 milestones and progress January 31, 2022
- Build a Reddit video subtitling bot with Transloadit February 10, 2022
- Creating engaging audio visualizations with Transloadit April 2, 2023