Introduction

What is Document AI?

Google Cloud Document AI empowers organizations to unlock the value hidden within their unstructured documents. It leverages pre-trained machine learning models to extract, classify, and structure information from various document types, including invoices, receipts, contracts, and forms. For organizations grappling with an ever-growing volume of unstructured documents, it offers a powerful solution to unlock hidden data and gain valuable insights, driving efficiency and informed decision-making.

Key Features

Automated Information Extraction: Extract text, key-value pairs, entities, and tables from diverse document formats (PDF, scans, images).
Efficient Task Automation: Streamline workflows by automating tedious manual tasks like data entry and document classification.
Customizable Model Development: Train and deploy custom models tailored to specific document types and information needs.
Integrated Cloud Environment: Seamlessly connect Document AI with other Google Cloud services for holistic data processing and analysis.
Enhanced Accuracy and Scalability: Benefit from pre-trained models and fine-tuning capabilities for high-precision document understanding.

Scope of this blog

This blog acts as an in-depth guide on interacting with the Document AI REST API using the curl command-line tool, addressing the potential errors and difficulties and streamlining the process for you when you try it for the first time. I will also cover other aspects and details of Document AI in my future blogs. You can refer to the official Document AI documentation for further details:

Official Document AI GCP documentation

Prerequisites

Curl installed
Google Cloud Platform project with Document AI-enabled
Service account with Document AI User role

Step-by-Step Guide

Authentication

Initialize gcloud CLI:
```
 gcloud init
```
Obtain an access token:

gcloud auth application-default print-access-token

Save this access token for later use.

Constructing the command

The basic structure remains:

  curl -X POST -H "Authorization: Bearer <access_token>" -H 
  "Content-Type: application/json" -d @<input_file.json> -o 
  <output_file.json> <prediction_endpoint>

Remember to replace placeholders with your specific details.

Creating a General Purpose Processor

In the Google Cloud Console, navigate to the Document AI section.
Click on the Explore processors button.
Click on Create Processor and select Form Parser as the processor type.
Give your processor a name and select a region.
Click Create to create the processor.
Once created, go to the Overview tab and copy the Prediction Endpoint for your new processor.

Obtaining the Prediction Endpoint

Open the Google Cloud Console and navigate to the Document AI section.
Select the Processors tab.
Click on the name of the General Purpose Processor you want to use.
In the Overview tab, copy the Prediction Endpoint listed under API details. This is what you need in your curl command.

Input File Structure

The input_file.json format stays the same as follows:

  {
       "inlineDocument": {
       "mimeType": "<mime_type>",
       "content": "<base64_encoded_content>"
       }
  }

Ensure matches supported formats (PDF, GIF, TIFF, JPEG, PNG, BMP, WEBP.

Encoding the content securely

Prefer not to use online base64 converters due to potential security risks and processing issues.

Encode content locally using platform-specific methods:

Windows

Powershell

 $pdfBytes = [System.IO.File]::ReadAllBytes("<path_to_pdf>")

 $base64String = [System.Convert]::ToBase64String($pdfBytes)

 echo $base64String > test.txt

Linux/Mac OS

Bash

 base64 <path_to_pdf> > test.txt

Extract the base64 string from test.txt and put it in the content field of input_file.json.

Executing the command

Navigate to the directory containing input_file.json and output_file.json.
Run the curl command with your specific details.

The extracted information will be saved in output_file.json.

Troubleshooting

401 Authentication Error

Double-check access token validity and service account's Document AI User role.

Mime Type Not Acceptable

Confirm your document type matches supported formats.

Invalid Content Input

Always use local encoding methods mentioned above instead of using avoiding online converters.

Beyond Basics

Supported features

Explore the Document AI API documentation for details on:

Supported processors and their functionalities.
Extractable information types (e.g., text, entities, tables).
Language support for different processors
Tailor your curl commands to extract specific information based on your processor's capabilities.

Environment Variables

Consider using environment variables for sensitive information like access tokens to enhance security and manage multiple accounts effectively.

Tools like dotenv can simplify environment variable usage in shell scripts.

Error Handling

Incorporate error-handling mechanisms in your curl commands to gracefully handle potential issues like network failures or API errors.
Utilize curl exit codes and conditionals to provide informative feedback to the user.

Refer to the official Document AI API documentation for a comprehensive understanding of advanced features and functionalities. This guide provides a solid foundation, but continuous exploration of the API opens up wider possibilities for document processing automation. This enhanced guide empowers you with deeper insights and best practices for using the Document AI REST API effectively via curl.

Thanks for reading :)

Document AI REST API || Google Cloud

Table of contents