GCP OCR & Java: invoice automation

Manual invoice processing can be tedious, error-prone, and time-consuming. Optical Character Recognition (OCR) technology offers a powerful solution by automating text extraction from invoices, significantly improving accuracy and efficiency in financial workflows.
Prerequisites
Before getting started with GCP OCR and Java, ensure you have:
- Java 11 or later installed
- Maven 3.8+ for dependency management
- A Google Cloud account with billing enabled
- Cloud Vision API enabled in your Google Cloud project
- Google Cloud CLI installed
Why OCR matters in invoice processing
OCR technology converts images of text into machine-readable data. Integrating OCR into invoice processing workflows can:
- Reduce manual data entry errors
- Accelerate invoice processing times
- Enable scalable and automated financial operations
Setting up Google cloud vision API with Java
Authentication setup
Before using the Cloud Vision API, you need to set up authentication:
-
Install the Google Cloud CLI if you haven't already.
-
Initialize the CLI by running:
gcloud init
-
Set up application default credentials:
gcloud auth application-default login
-
Ensure the Cloud Vision API is enabled in your Google Cloud project:
gcloud services enable vision.googleapis.com
Adding dependencies
Add the Cloud Vision dependency to your Java project using Maven. The recommended approach is to use the Google Cloud BOM (Bill of Materials):
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>libraries-bom</artifactId>
<version>26.56.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-vision</artifactId>
</dependency>
</dependencies>
Extracting text from invoices
Here's a robust Java example to extract text from an invoice image:
import com.google.cloud.vision.v1.AnnotateImageRequest;
import com.google.cloud.vision.v1.AnnotateImageResponse;
import com.google.cloud.vision.v1.BatchAnnotateImagesResponse;
import com.google.cloud.vision.v1.Feature;
import com.google.cloud.vision.v1.Image;
import com.google.cloud.vision.v1.ImageAnnotatorClient;
import com.google.cloud.vision.v1.TextAnnotation;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
public class InvoiceOCR {
public static void main(String[] args) {
// The path to your invoice image
String filePath = "invoice.jpg";
try {
// Load the image
ByteString imgBytes = ByteString.readFrom(Files.newInputStream(Paths.get(filePath)));
Image img = Image.newBuilder().setContent(imgBytes).build();
// Create feature for text detection
Feature feature = Feature.newBuilder()
.setType(Feature.Type.DOCUMENT_TEXT_DETECTION)
.build();
// Build the request
AnnotateImageRequest request = AnnotateImageRequest.newBuilder()
.addFeatures(feature)
.setImage(img)
.build();
List<AnnotateImageRequest> requests = new ArrayList<>();
requests.add(request);
// Process the request
try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
List<AnnotateImageResponse> responses = response.getResponsesList();
for (AnnotateImageResponse res : responses) {
if (res.hasError()) {
System.err.printf("Error: %s\nCode: %d\n",
res.getError().getMessage(),
res.getError().getCode());
return;
}
TextAnnotation annotation = res.getFullTextAnnotation();
if (annotation == null) {
System.out.println("No text found in image");
return;
}
System.out.println("Extracted Text:\n" + annotation.getText());
}
}
} catch (IOException e) {
System.err.println("Failed to read image file: " + e.getMessage());
} catch (Exception e) {
System.err.println("Error during processing: " + e.getMessage());
e.printStackTrace();
}
}
}
Document text detection vs. Text detection
Google Cloud Vision offers two main OCR modes:
-
DOCUMENT_TEXT_DETECTION: Optimized for dense text in structured documents like invoices. It preserves the layout and structure of the text, making it ideal for invoice processing.
-
TEXT_DETECTION: Better for scene text or images with sparse text. It's less structured but works well for capturing text in natural scenes.
For invoice processing, DOCUMENT_TEXT_DETECTION
is typically the better choice as it preserves the
document's structure.
Parsing and structuring invoice data
After extracting raw text, you'll need to parse it into structured data. Regular expressions or NLP libraries can help identify key fields like invoice number, date, total amount, and vendor details.
Here's an example of parsing invoice data with more robust error handling:
import java.util.regex.*;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeParseException;
import java.util.HashMap;
import java.util.Map;
public class InvoiceParser {
public static Map<String, String> parseInvoiceData(String extractedText) {
Map<String, String> invoiceData = new HashMap<>();
try {
// Common invoice field patterns
Pattern invoiceNumberPattern = Pattern.compile("(?i)Invoice\\s*(?:#|No|Number)?\\s*[:]?\\s*(\\w+[-/]?\\w+)");
Pattern datePattern = Pattern.compile("(?i)(?:Invoice\\s*Date|Date)\\s*[:]?\\s*(\\d{1,2}[/-]\\d{1,2}[/-]\\d{2,4})");
Pattern totalPattern = Pattern.compile("(?i)(?:Total|Amount Due|Balance Due)\\s*[:]?\\s*[$€£]?\\s*(\\d+[,\\.]\\d{2})");
Pattern vendorPattern = Pattern.compile("(?i)(?:From|Vendor|Supplier|Company)\\s*[:]?\\s*([A-Za-z0-9\\s.,]+)\\s*\\n");
// Extract invoice number
Matcher invoiceMatcher = invoiceNumberPattern.matcher(extractedText);
if (invoiceMatcher.find()) {
invoiceData.put("invoiceNumber", invoiceMatcher.group(1).trim());
}
// Extract date
Matcher dateMatcher = datePattern.matcher(extractedText);
if (dateMatcher.find()) {
invoiceData.put("date", dateMatcher.group(1).trim());
}
// Extract total amount
Matcher totalMatcher = totalPattern.matcher(extractedText);
if (totalMatcher.find()) {
invoiceData.put("totalAmount", totalMatcher.group(1).trim());
}
// Extract vendor
Matcher vendorMatcher = vendorPattern.matcher(extractedText);
if (vendorMatcher.find()) {
invoiceData.put("vendor", vendorMatcher.group(1).trim());
}
// Validate extracted data
validateInvoiceData(invoiceData);
} catch (Exception e) {
System.err.println("Error parsing invoice data: " + e.getMessage());
}
return invoiceData;
}
private static void validateInvoiceData(Map<String, String> data) {
// Validate invoice number
if (!data.containsKey("invoiceNumber")) {
System.out.println("Warning: Could not extract invoice number");
}
// Validate and format date if present
if (data.containsKey("date")) {
try {
String dateStr = data.get("date");
// Attempt to parse and standardize the date format
// This is simplified - real implementation would handle various date formats
DateTimeFormatter inputFormatter = DateTimeFormatter.ofPattern("MM/dd/yyyy");
DateTimeFormatter outputFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd");
LocalDate date = LocalDate.parse(dateStr, inputFormatter);
data.put("date", outputFormatter.format(date));
} catch (DateTimeParseException e) {
System.out.println("Warning: Date format could not be standardized");
}
} else {
System.out.println("Warning: Could not extract date");
}
// Validate amount
if (!data.containsKey("totalAmount")) {
System.out.println("Warning: Could not extract total amount");
}
}
}
Automating data entry into financial systems
Once structured, invoice data can be automatically entered into financial systems via APIs or database integrations. This automation reduces manual intervention and streamlines accounting processes.
Here's a conceptual example of how to integrate with a financial system:
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
import java.util.Map;
public class FinancialSystemIntegration {
private static final String API_ENDPOINT = "https://your-financial-system-api.com/invoices";
private static final String API_KEY = "YOUR_API_KEY";
public static void submitInvoiceData(Map<String, String> invoiceData) {
try {
// Convert invoice data to JSON
String jsonData = convertToJson(invoiceData);
// Create HTTP client
HttpClient client = HttpClient.newBuilder()
.connectTimeout(Duration.ofSeconds(10))
.build();
// Build request
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(API_ENDPOINT))
.header("Content-Type", "application/json")
.header("Authorization", "Bearer " + API_KEY)
.POST(HttpRequest.BodyPublishers.ofString(jsonData))
.build();
// Send request and get response
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
// Handle response
if (response.statusCode() >= 200 && response.statusCode() < 300) {
System.out.println("Invoice successfully submitted to financial system");
} else {
System.err.println("Failed to submit invoice: " + response.body());
}
} catch (Exception e) {
System.err.println("Error submitting invoice data: " + e.getMessage());
}
}
private static String convertToJson(Map<String, String> data) {
// Simple JSON conversion - in production, use a proper JSON library like Jackson or Gson
StringBuilder json = new StringBuilder("{\n");
for (Map.Entry<String, String> entry : data.entrySet()) {
json.append(" \"")
.append(entry.getKey())
.append("\": \"")
.append(entry.getValue())
.append("\",\n");
}
// Remove trailing comma and close JSON object
json.setLength(json.length() - 2); // Remove last comma and newline
json.append("\n}");
return json.toString();
}
}