Introducing the OCR Robot for easy text extraction
"A picture is worth a thousand words" is an adage in many languages – and for good reason. Our visual cortex is arguably the most powerful part of the brain, and leveraging that to convey meaning is equally powerful. With the introduction of our /image/describe Robot, we allow programmers to unlock the visual cortex of AI models trained by prominent cloud vendors. Today, we are taking this a step further by not only recognizing objects in images, but also reading any words present.
Let's say you had a picture of a traffic sign, or a menu. Almost any human would be able to make sense of these objects, but their meaning has always been opaque to machines until the introduction of powerful new OCR models that can read – virtually – text from a collection of pixels.
Introducing the /image/ocr Robot. The newest member of
our AI Robot family. It can use either AWS or GCP in the backend, with each provider
being easily swappable using the provider
parameter. This means that if one provider produces
unfavorable results, you can switch to the other with no further configuration or pricing changes.
Feel free to switch providers as you see fit. Furthermore, as each backend AI API model advances,
our service will automatically utilize the improved recognition benefits.
Extracting text from images
To demonstrate how simple it is to use this new Robot, we will walk through the Template below:
{
"steps": {
":original": {
"robot": "/upload/handle"
},
"image-ocr": {
"use": ":original",
"robot": "/image/ocr",
"format": "text",
"provider": "aws"
},
"exported": {
"use": ["image-orc", ":original"],
"robot": "/s3/store",
"credentials": "YOUR_AWS_CREDENTIALS"
}
}
}
There are many options for getting files to Transloadit, but we will be
using the /upload/handle Robot in our first
Step, :original
.
Our image recognition occurs in the following Step, image-ocr
. Here, we pass
:original
to our /image/ocr Robot and specify its
format
parameter to return its result as a text file. If this is not enabled, the Robot
will output JSON by default. Additionally, we have specified that we want to use aws
as the
backend AI provider. As indicated, writing gcp
here would leverage the Google Cloud Platform, with
no changes to the interface or structure of the data returned. Only the data itself (i.e., the text
read) may vary as each AI plaform was trained independently, and continues to evolve equally
independently.
After the text has been extracted and returned, we export the results to our S3 Bucket.
YOUR_AWS_CREDENTIALS
refers to the Template Credentials that can be set up in the
Credentials tab of your Transloadit Console.
Click (here) to set up your own Amazon S3 Bucket, if you have not already done so.
Testing
Now, let's test our Template with the image below:
Once our Assembly has finished encoding, we will be left with the following result:
Transloadit
Conclusion
As you can see, the Assembly was a success. Hopefully, in this short introduction, we have shown that extracting text from an image is not difficult or time-consuming when using our /image/ocr Robot.
What's more, thanks to Transloadit's composability, all of our 74 wildly different features can be strung together to create workflows unique to your use case. In other words, the OCR Step could be just one of the many cogs that make up your intricate machine. As shown, it only takes a declaratively written JSON recipe to set this up, making it a fool- and bullet-proof method of adding great value to your business.
Because this Robot is a member of our AI family, it is only available to our paying customers. If you are interested in using this feature, please consider upgrading to a premium plan, the first of which costs $49/mo and includes 10GB of encoding data.