Last updated: September 18, 2024

Tech preview: new AI Robots for enhanced media processing

Kevin van Zonneveld

Co-founder · Amsterdam, The Netherlands · Show bio ·

We're happy to share that we are launching a new line of AI bots in Tech Preview. We've had one AI Robot in production for some time: /image/facedetect. It's powered by internal software and we've been pleased with its performance. Today's announced bots, however, are powered by external cloud services.

It's hard to miss the AI advancements that the The Big Five are making. With access to virtually unlimited data, models can be trained to achieve unparalleled accuracy. We felt that offering these AI capabilites right inside our encoding pipelines could add tremendous value to customers seeking to further automate their media processing.

We've tested and mapped out AI offerings by the Google Cloud Platform (GCP) and Amazon Web Services (AWS), and started drawing Venn diagrams to pinpoint overlapping functionality.

A venn diagram, with AWS in the left circle, Google Cloud on the right, and Transloadit in the crossover between both.

Our idea was to offer an abstraction over the lowest common denominator. In other words: with a single API, our customers can plug in, say, image recognition of either provider, and get back uniform responses. It would be just a matter of specifying either provider: 'aws' or provider: 'gcp' to switch from one provider to the next.

Why let Transloadit wrap this?

It goes without saying that Transloadit cannot beat or even meet the pricing of the AI providers themselves, so if you need to process massive amounts of data, consider integrating with GCP/AWS directly.

However, if your use case doesn't revolve around squeezing every last penny out of every last byte, there are four reasons why our customers may want to use these AI services in conjunction with Transloadit:

1. LEGO-like composability

You can drop AI in existing encoding pipelines, mixing and matching with 83 features (Robots) to create workflows unique to your business. All of this without writing imperative boilerplate code to string it all together, which would result in more moving parts and points of failure.

Transloadit offers an integral solution that can be wielded with a single deterministic JSON recipe. With twenty lines of declarative instructions, you could order Transloadit to pass a video through these Robots:

/speech/transcribe: turn the video into human-readable text
/text/translate: translate the text into Japanese
/text/speak: synthesize the Japanese into spoken language
/video/merge: merge the new spoken Japanese as an audio track over the original video

Essentially, you have now made Transloadit translate a video automatically 😄 It's probably not ready for prime time, but this does illustrate how powerful our Assembly line can be. For code samples around our declarative composability, check further down.

2. Easily compare and switch between AI providers

The vendors use different notations for languages (when translating), they structure their responses differently and they have different docs, SDKs, formats, settings, etc.

Transloadit abstracts all of this and accepts uniform input, delivering uniform output — no matter the provider.

After having used Amazon, you can see how Google describes the same image without changing anything but the provider parameter. This way, you can easily compare results and latencies in your app, to see what bests suits your use case. And that could change of course. These AIs are constantly learning and improving for the majority of cases, but if one of your own customers has an unlucky minority case, you could offer to switch in a heartbeat.

3. Possibly cut down on vendors

If you are either:

already using Transloadit for your media processing
in the market for an AI feature but would also like to augment that with automated image optimization, encoding, or leverage any of our other 83 features

.. this saves you the hassle of integrating with yet another provider. We already indicated the engineering costs associated with many moving parts, but there is also different billing to consider, SLA agreements to monitor, and support desks to deal with.

4. Hassle-free 💆‍♀️

We automatically sanitize and cleanup inputs. For instance, while AWS will accept any audio file to transcribe, depending on settings, Google will want it in the PCM format with signed 16-bit, 1-channel, little-endian encoding. With Transloadit, you just throw any audio (or video!) at us, and we'll make sure it gets converted to whatever way the AI provider you picked, likes it.

Features we are launching today

Today, we are launching two Robots in Tech Preview:

Our /image/describe Robot. Input an image and get back a list of objects that were detected: Tree, Car, House, etc. We can return it as a text file, JSON file, or pass it to another Robot for processing. Common use cases include automatically flagging (in)appropriate content, providing alt captions for images, and/or making images searchable.

Our /speech/transcribe Robot. Input an audio or video recording and get back human-readable text. We can return it as a text file, JSON file, or pass it to another Robot for processing. Common use cases include automated subtitling, or making audio/video searchable.

We're launching them in conjunction with an upgrade to:

Our /file/filter Robot. Pass it a file and criteria, and this Robot acts as a gatekeeper, optionally passing files through to another Step, like exporting. We changed it so that it now also takes an includes operator.

With the newly added includes operator, you can now start automatically rejecting (or flagging) undesired content like so:

"described": {
  "use": ":original",
  "robot": "/image/describe",
  "provider": "aws",
  "format": "meta",
  "granularity": "list"
},
"filtered": {
  "use": "described",
  "robot": "/file/filter",
  "declines": [
    [ "${file.meta.descriptions}", "includes", [ "Naked", "Sex" ] ]
  ]
},
"exported": {
  "use": "filtered",
  "robot": "/s3/store",
  "credentials": "YOUR_AWS_CREDENTIALS"
}

Now, if I wanted to only allow pictures of cars for my used cars sales website, and I preferred Google's image recognition, I'd just change:

"declines" to "accepts"
[ "Naked", "Sex" ] to [ "Car", "Tires" ]
"provider": "aws" to "provider": "gcp"

And that's it! ✨

We also have a full code sample featuring our /image/describe Robot further down, as well as links to demos for our /speech/transcribe and /image/facedetect Robots.

What AI features are planned?

Besides the two Robots launched today in Tech Preview, our Venn diagrams have showed us we should also build the following:

/image/ocr: input an image, get back any human-readable text that it had on it, like name/traffic signs
/document/ocr: input a PDF, get back any human-readable text that it had on it, so that documents can be made searchable if they aren't already
/text/translate: input human-readable text and get it back in a different language
/text/speak: input human-readable text and get back an audio file with a recording of synthesized speech

Missing something on this list? We're happy to take suggestions for more!

What about pricing?

We track input and output bytes passing through these Robots, and subtract that from your regular Transloadit plan — no need for any extra subscriptions. We do charge a minimum fee of 1MB per transaction: if you submit a 100KB image, and a 2KB text file is returned, even though that adds up to 102KB, we still subtract 1MB from your plan. On our Startup Plan (10GB for $49/mo) that would have costed $0.0049, on our Medium Business Plan $0.00166. More info on our Pricing page.

What about other providers like Microsoft Azure?

We feel there's enough value here to start offering this as Tech Preview today. Since we abstract the providers, we're not dependent on a single offering. So should GCP be shut down in 2023 (just kidding! we think!) or AWS raises prices on us, there are options. Integrations remain the same when switching providers. In fact, we are looking into adding Microsoft Azure into the mix as well.

And, just like we are powering our /image/facedetect AI Robot ourselves, when it becomes feasible in the future to run high-quality transcription AI ourselves, you may find we add a provider: 'transloadit' to the /speech/transcribe Robot, offered at a lower price.

What does "tech preview" mean?

It means that you can start using this tech today! We might still make changes to the API and pricing (but we do not foresee them outside of adding more features).

How do I get started?

After signing up, pick a programming language of choice, and crank out an integration! Sounds hard? Let's look at a demo.

// implementation 'com.transloadit.sdk:transloadit:1.0.0'
import com.transloadit.sdk.Assembly;
import com.transloadit.sdk.Transloadit;
import com.transloadit.sdk.exceptions.LocalOperationException;
import com.transloadit.sdk.exceptions.RequestException;
import com.transloadit.sdk.response.AssemblyResponse;
import java.io.File;
import java.util.HashMap;
import java.util.Map;
public class Main {
  public static void main(String[] args) {
    // Initialize the Transloadit client
    Transloadit transloadit = new Transloadit("YOUR_TRANSLOADIT_KEY", "YOUR_TRANSLOADIT_SECRET");
    
    Assembly assembly = transloadit.newAssembly();
    
    // Set Encoding Instructions
    Map<String, Object> _originalStepOptions = new HashMap();
    assembly.addStep(":original", "/upload/handle", _originalStepOptions);
    
    Map<String, Object> describedStepOptions = new HashMap();
    describedStepOptions.put("use", ":original");
    describedStepOptions.put("provider", "aws");
    describedStepOptions.put("format", "meta");
    describedStepOptions.put("granularity", "list");
    describedStepOptions.put("result", true);
    assembly.addStep("described", "/image/describe", describedStepOptions);
    
    Map<String, Object> filteredStepOptions = new HashMap();
    filteredStepOptions.put("result", true);
    filteredStepOptions.put("use", "described");
    filteredStepOptions.put("accepts", new String[] { new String[] { "${file.meta.descriptions}", "includes", "Building" } });
    assembly.addStep("filtered", "/file/filter", filteredStepOptions);
    
    Map<String, Object> exportedStepOptions = new HashMap();
    exportedStepOptions.put("use", "filtered");
    exportedStepOptions.put("credentials", "demo_s3_credentials");
    exportedStepOptions.put("url_prefix", "https://demos.transloadit.com/");
    assembly.addStep("exported", "/s3/store", exportedStepOptions);
    
    // Add files to upload
    assembly.addFile(new File("prinsengracht.jpg"));
    assembly.addFile(new File("chameleon.jpg"));
    
    // Start the Assembly
    try {
      AssemblyResponse response = assembly.save();
    
      // Wait for Assembly to finish executing
      while (!response.isFinished()) {
        response = transloadit.getAssemblyByUrl(response.getSslUrl());
      }
    
      System.out.println(response.getId());
      System.out.println(response.getUrl());
      System.out.println(response.json());
    } catch (RequestException | LocalOperationException e) {
      // Handle exception here
    }
  }
}

Read docs: Java SDK

// go get gopkg.in/transloadit/go-sdk.v1
package main
import (
  "context"
  "fmt"
  "github.com/transloadit/go-sdk"
)
func main() {
  // Create client
  options := transloadit.DefaultConfig
  options.AuthKey = "YOUR_TRANSLOADIT_KEY"
  options.AuthSecret = "YOUR_TRANSLOADIT_SECRET"
  client := transloadit.NewClient(options)
  
  // Initialize new Assembly
  assembly := transloadit.NewAssembly()
  
  // Set Encoding Instructions
  assembly.AddStep(":original", map[string]interface{}{
    "robot": "/upload/handle",
  })
  
  assembly.AddStep("described", map[string]interface{}{
    "use": ":original",
    "robot": "/image/describe",
    "provider": "aws",
    "format": "meta",
    "granularity": "list",
    "result": true,
  })
  
  assembly.AddStep("filtered", map[string]interface{}{
    "result": true,
    "use": "described",
    "robot": "/file/filter",
    "accepts": [["${file.meta.descriptions}", "includes", "Building"]],
  })
  
  assembly.AddStep("exported", map[string]interface{}{
    "use": "filtered",
    "robot": "/s3/store",
    "credentials": "demo_s3_credentials",
    "url_prefix": "https://demos.transloadit.com/",
  })
  
  // Add files to upload
  assembly.AddFile("prinsengracht.jpg"))
  assembly.AddFile("chameleon.jpg"))
  
  // Start the Assembly
  info, err := client.StartAssembly(context.Background(), assembly)
  if err != nil {
    panic(err)
  }
  
  // All files have now been uploaded and the Assembly has started but no
  // results are available yet since the conversion has not finished.
  // WaitForAssembly provides functionality for polling until the Assembly
  // has ended.
  info, err = client.WaitForAssembly(context.Background(), info)
  if err != nil {
    panic(err)
  }
  
  fmt.Printf("You can check some results at: ")
  fmt.Printf("  - %s\n", info.Results[":original"][0].SSLURL)
  fmt.Printf("  - %s\n", info.Results["described"][0].SSLURL)
  fmt.Printf("  - %s\n", info.Results["filtered"][0].SSLURL)
  fmt.Printf("  - %s\n", info.Results["exported"][0].SSLURL)
}

Read docs: Go SDK

# pip install pytransloadit
from transloadit import client
tl = client.Transloadit('YOUR_TRANSLOADIT_KEY', 'YOUR_TRANSLOADIT_SECRET')
assembly = tl.new_assembly()
# Set Encoding Instructions
assembly.add_step(":original", "/upload/handle", {})
assembly.add_step("described", "/image/describe", {
  'use': ':original',
  'provider': 'aws',
  'format': 'meta',
  'granularity': 'list',
  'result': True
})
assembly.add_step("filtered", "/file/filter", {
  'result': True,
  'use': 'described',
  'accepts': [['${file.meta.descriptions}', 'includes', 'Building']]
})
assembly.add_step("exported", "/s3/store", {
  'use': 'filtered',
  'credentials': 'demo_s3_credentials',
  'url_prefix': 'https://demos.transloadit.com/'
})
# Add files to upload
assembly.add_file(open('prinsengracht.jpg', 'rb'))
assembly.add_file(open('chameleon.jpg', 'rb'))
# Start the Assembly
assembly_response = assembly.create(retries=5, wait=True)
print(assembly_response.data.get('assembly_ssl_url'))
# or:
print(assembly_response.data['assembly_ssl_url'])

Read docs: Python SDK

# gem install transloadit
# $ irb -rubygems
# >> require 'transloadit'
# => true
transloadit = Transloadit.new([
  :key => "YOUR_TRANSLOADIT_KEY"
])
# Set Encoding Instructions
_original = transloadit.step(":original", "/upload/handle", {})
described = transloadit.step("described", "/image/describe", [
  :use => ":original",
  :provider => "aws",
  :format => "meta",
  :granularity => "list",
  :result => true
])
filtered = transloadit.step("filtered", "/file/filter", [
  :result => true,
  :use => "described",
  :accepts => [["${file.meta.descriptions}", "includes", "Building"]]
])
exported = transloadit.step("exported", "/s3/store", [
  :use => "filtered",
  :credentials => "demo_s3_credentials",
  :url_prefix => "https://demos.transloadit.com/"
])
transloadit.assembly([
  :steps => [_original, described, filtered, exported]
])
# Add files to upload
files = []
files.push("prinsengracht.jpg")
files.push("chameleon.jpg")
# Start the Assembly
response = assembly.create! *files
until response.finished?
  sleep 1; response.reload!
end
if !response.error?
  # handle success
end

Read docs: Ruby SDK

<?php
// composer require transloadit/php-sdk
use transloadit\Transloadit;
$transloadit = new Transloadit([
  "key" => "YOUR_TRANSLOADIT_KEY",
  "secret" => "YOUR_TRANSLOADIT_SECRET",
]);
// Start the Assembly
$response = $transloadit->createAssembly([
  "files" => ["prinsengracht.jpg", "chameleon.jpg"],
  "params" => [
    "steps" => [
      ":original" => [
        "robot" => "/upload/handle",
      ],
      "described" => [
        "use" => ":original",
        "robot" => "/image/describe",
        "provider" => "aws",
        "format" => "meta",
        "granularity" => "list",
        "result" => true,
      ],
      "filtered" => [
        "result" => true,
        "use" => "described",
        "robot" => "/file/filter",
        "accepts" => [["${file.meta.descriptions}", "includes", "Building"]],
      ],
      "exported" => [
        "use" => "filtered",
        "robot" => "/s3/store",
        "credentials" => "demo_s3_credentials",
        "url_prefix" => "https://demos.transloadit.com/",
      ],
    ],
  ],
]);
?>

Read docs: PHP SDK

# Auth export TRANSLOADIT_KEY="YOUR_TRANSLOADIT_KEY" export TRANSLOADIT_SECRET="YOUR_TRANSLOADIT_SECRET" # Save Encoding Instructions echo '{ "steps": { ":original": { "robot": "/upload/handle" }, "described": { "use": ":original", "robot": "/image/describe", "provider": "aws", "format": "meta", "granularity": "list", "result": true }, "filtered": { "result": true, "use": "described", "robot": "/file/filter", "accepts": [["${file.meta.descriptions}", "includes", "Building"]] }, "exported": { "use": "filtered", "robot": "/s3/store", "credentials": "demo_s3_credentials", "url_prefix": "https://demos.transloadit.com/" } } }' > ./steps.json

# Execute npx transloadit@latest --input "prinsengracht.jpg" --input "chameleon.jpg" --steps "./steps.json" --output "./output.example"

Read docs: Transloadify

// npm install transloadit
// Import
import { Transloadit } from 'transloadit'
const main = async () => {
  // Init
  const transloadit = new Transloadit({
    authKey: 'YOUR_TRANSLOADIT_KEY',
    authSecret: 'YOUR_TRANSLOADIT_SECRET',
  })
// Set Encoding Instructions
  const options = {
    files: {
      myfile_1: './prinsengracht.jpg',
      myfile_2: './chameleon.jpg',
    },
    params: {
      steps: {
        ':original': {
          robot: '/upload/handle',
        },
        described: {
          use: ':original',
          robot: '/image/describe',
          provider: 'aws',
          format: 'meta',
          granularity: 'list',
          result: true,
        },
        filtered: {
          result: true,
          use: 'described',
          robot: '/file/filter',
          accepts: [['${file.meta.descriptions}', 'includes', 'Building']],
        },
        exported: {
          use: 'filtered',
          robot: '/s3/store',
          credentials: 'demo_s3_credentials',
          url_prefix: 'https://demos.transloadit.com/',
        },
      },
    },
  }
// Execute
  const result = await transloadit.createAssembly(options)
// Show results
  console.log({ result })
}
main().catch(console.error)

Read docs: Node SDK

// npm i @transloadit/convex
import { makeTransloaditAPI } from '@transloadit/convex'
import { components } from './_generated/api'
import { action } from './_generated/server'
const { createAssemblyOptions } = makeTransloaditAPI(components.transloadit)
export const createUpload = action({
  args: {},
  handler: async (ctx) =>
    createAssemblyOptions(ctx, {
      steps: {
        ':original': {
          robot: '/upload/handle',
        },
        described: {
          use: ':original',
          robot: '/image/describe',
          provider: 'aws',
          format: 'meta',
          granularity: 'list',
          result: true,
        },
        filtered: {
          result: true,
          use: 'described',
          robot: '/file/filter',
          accepts: [['${file.meta.descriptions}', 'includes', 'Building']],
        },
        exported: {
          use: 'filtered',
          robot: '/s3/store',
          credentials: 'demo_s3_credentials',
          url_prefix: 'https://demos.transloadit.com/',
        },
      },
      additionalParams: {
        auth: {
          key: 'YOUR_TRANSLOADIT_KEY',
        },
      },
    }),
})

Read docs: Convex

<!-- This pulls Uppy from our CDN -->
<!-- For smaller self-hosted bundles, install Uppy and plugins manually: -->
<!-- npm i --save @uppy/core @uppy/dashboard @uppy/remote-sources @uppy/transloadit ... -->
<link
  href="https://releases.transloadit.com/uppy/v5.1.1/uppy.min.css"
  rel="stylesheet"
/>
<button id="browse">Select Files</button>
<script type="module">
  import {
    Uppy,
    Dashboard,
    ImageEditor,
    RemoteSources,
    Transloadit,
  } from 'https://releases.transloadit.com/uppy/v5.1.1/uppy.min.mjs'
  const uppy = new Uppy()
    .use(Transloadit, {
      waitForEncoding: true,
      alwaysRunAssembly: true,
      assemblyOptions: {
        params: {
          // To avoid tampering, use Signature Authentication:
          // https://transloadit.com/docs/api/authentication/
          auth: {
            key: 'YOUR_TRANSLOADIT_KEY',
          },
          // It's often better store encoding instructions in your account
          // and use a template_id instead of adding these steps inline
          steps: {
            ':original': {
              robot: '/upload/handle',
            },
            described: {
              use: ':original',
              robot: '/image/describe',
              provider: 'aws',
              format: 'meta',
              granularity: 'list',
              result: true,
            },
            filtered: {
              result: true,
              use: 'described',
              robot: '/file/filter',
              accepts: [['${file.meta.descriptions}', 'includes', 'Building']],
            },
            exported: {
              use: 'filtered',
              robot: '/s3/store',
              credentials: 'demo_s3_credentials',
              url_prefix: 'https://demos.transloadit.com/',
            },
          },
        },
      },
    })
    .use(Dashboard, { trigger: '#browse' })
    .use(ImageEditor, { target: Dashboard })
    .use(RemoteSources, {
      companionUrl: 'https://api2.transloadit.com/companion',
    })
    .on('complete', ({ transloadit }) => {
      // Due to waitForEncoding:true this is fired after encoding is done.
      // Alternatively, set waitForEncoding to false and provide a notify_url
      console.log(transloadit) // Array of Assembly Statuses
      for (const assembly of transloadit) {
        console.log(assembly.results) // Array of all encoding results
      }
    })
    .on('error', (error) => {
      console.error(error)
    })
</script>

Read docs: Uppy File Uploader

// Install via Swift Package Manager:
// dependencies: [
//   .package(url: "https://github.com/transloadit/TransloaditKit", .upToNextMajor(from: "3.0.0"))
// ]
// Or via CocoaPods:
// pod 'Transloadit', '~> 3.0.0'
// Auth
let credentials = Credentials(key: "YOUR_TRANSLOADIT_KEY", secret: "YOUR_TRANSLOADIT_SECRET")
// Init
let transloadit = Transloadit(credentials: credentials, session: "URLSession.shared")
// Add files to upload
let filesToUpload: [URL] = ...
// Execute
let assembly = transloadit.assembly(steps: [_originalStep, describedStep, filteredStep, exportedStep], andUpload: filesToUpload) { result in
  switch result {
  case .success(let assembly):
    print("Retrieved (assembly)")
  case .failure(let error):
    print("Assembly error (error)")
  }
}.pollAssemblyStatus { result in
  switch result {
  case .success(let assemblyStatus):
    print("Received assemblystatus (assemblyStatus)")
  case .failure(let error):
    print("Caught polling error (error)")
  }

Read docs: TransloaditKit

# Prerequisites: brew install curl jq || sudo apt install curl jq
# To avoid tampering, use Signature Authentication
echo '{
  "auth": {
    "key": "YOUR_TRANSLOADIT_KEY"
  },
  "steps": {
    ":original": {
      "robot": "/upload/handle"
    },
    "described": {
      "use": ":original",
      "robot": "/image/describe",
      "provider": "aws",
      "format": "meta",
      "granularity": "list",
      "result": true
    },
    "filtered": {
      "result": true,
      "use": "described",
      "robot": "/file/filter",
      "accepts": [["${file.meta.descriptions}", "includes", "Building"]]
    },
    "exported": {
      "use": "filtered",
      "robot": "/s3/store",
      "credentials": "demo_s3_credentials",
      "url_prefix": "https://demos.transloadit.com/"
    }
  }
}' | curl 

    --request POST 

    --form 'params=<-' 

    --form myfile1=@./prinsengracht.jpg 

    --form myfile2=@./chameleon.jpg 

    https://api2.transloadit.com/assemblies | jq

Read docs: cURL

{
  "steps": {
    ":original": {
      "robot": "/upload/handle"
    },
    "described": {
      "use": ":original",
      "robot": "/image/describe",
      "provider": "aws",
      "format": "meta",
      "granularity": "list",
      "result": true
    },
    "filtered": {
      "result": true,
      "use": "described",
      "robot": "/file/filter",
      "accepts": [["${file.meta.descriptions}", "includes", "Building"]]
    },
    "exported": {
      "use": "filtered",
      "robot": "/s3/store",
      "credentials": "demo_s3_credentials",
      "url_prefix": "https://demos.transloadit.com/"
    }
  }
}

We're uploading two photos, one of which contains bridges:

A three-way venn diagram, with AWS, GCP and Azure AI in the three outer circles, and Transloadit in the middle of all three.

When I check the meta.descriptions of the results of the described Step, I'll see:

[ 'Water',
'Outdoors',
'Bridge',
'Building',
'Canal',
'Castle',
'Architecture',
'Fort' ]
https://demos.transloadit.com/53/fae6219071430cb7b794cf9f3513c2/prinsengracht.jpg

Only photos with 'Bridge' are allowed through, so the photo of our chameleon would not have been saved on S3 in the proper location either.

In such cases you could choose to:

gracefully ignore
error out hard
pipe unrecognized images to an export Step that uses a different directory, like ./flagged-for-review/

Here are more related AI demos with code samples for all major platforms:

Docs

Since these Robots remain in Tech Preview for now, we could still change the implementation, but we've already written preliminary documentation:

Have fun!

We're happy to expand this post, our docs, and how the bots work, based on your feedback. Just leave a comment below or on Twitter.

#artificial-intelligence-service #image-describe-robot #speech-transcribe-robot #image-facedetect-robot #tech-preview #file-filter-robot

Co-written by Marius Kleidl