APEX: Integration Playground - Adobe PDF Services


Many of us are leveraging various APIs, REST, AI services to generate text, gain a better understanding of content and having real data driven conversations with business data and content.
In this, hopefully first of many, post I start to experiment with services in the area of understanding & sharing content. For all of the candidates I will be using the Free, very important, Developer edition of the service APIs and limits.
Today’s test set of APIs are Adobe PDF Services that will help be curate content for an application that I am working on. You can Get Started as an Adobe Developer and start testing today.
About this Post
My current projects or question workload are based on REST, JSON, or AI related topics, such as Vector Search, Document Understanding or Generative AI. I do not claim to an expert in this areas but they are in my current research and day to day activities as I work with APEX.
Oracle Cloud Infrastructure (OCI) does provide an AI service for Document understanding but I am subject to tenant and service limits just like any other average Joe Schmo. Working around these constraints I figure why not give someone else a chance to shine.
Use Case
Transcribe, summarise, Tag users upload content, so that it may be used to support user enquiries via conversations.
For Document Types
IF (TYPE ≠ ‘Document’) THEN Convert to PDF
All Content should be stored in an OCI Object Storage Bucket
The Content’s Text should be Extracted
A Summary & Tags of the Content’s Text should be Generated
Embeddings for the Content’s Text should be Generated
A similar set of requirements are performed for Audio and Video content but that is something separate and touched on in my post APEX: OCI Speech Integration. The entries highlighted in bold will be covered today.
Assumptions
You can get credentials on your own from Adobe Developer portal and the same goes for getting a token.
You’ve read the Introduction to the developer services
You’ve peeked at the document for PDF Services API → Overview sections and perhaps even Getting Started
The documentation is fairly straight forward and gives enough curl examples that make it simple to translate into APEX calls.
Adobe PDF Services
Adobe PDF Services offer great Adobe document services
Creating a PDF from multiple formats, including HTML, Microsoft Office documents, and text files
Exporting a PDF to other formats or an image
Combining entire PDFs or specified page ranges
Using OCR to make a PDF file searchable with a custom locale
Compress PDFs with compression level and Linearize PDFs
Protect PDFs with password(s) and Remove password protection from PDFs
Common page operations, including inserting, replacing, deleting, reordering, and rotating
Splitting PDFs into multiple files
Extract PDF as JSON: the content, structure & renditions of table and figure elements along with Character Bounding Boxes
Get the properties of a PDF file like page count, PDF version, file size, compliance levels, font info, permissions and more
Improving the accessibility of PDFs (Available under Early Access Program)
You can explore more about these APIs with the sample Postman Collection (Zip download - link is subject to change and if you don’t trust me, get it from the API link provided).
The API round up will include:
Getting an Access Token
Uploading an Asset
Creating a Job & Checking its Status
Extract PDF as JSON
Downloading an Asset
Getting an Access Token
Adobe has decided that instead of providing authentication to the service with username(client id) + pwd (client secret) combination that can be stored as APEX Web Credential, it expects these as parameters in the Body of the request.
This is not ideal and alarm bells may be ringing here but it is over a SSL connection.
In the World of APEX we can use REST Data Sources with these passed as Bind variables parameters. I’ve done the same using params.
declare
l_client_id VARCHAR2(1000) := 'Your Client ID';
l_client_secret VARCHAR2(1000) := 'Your Client Secret';
l_response_clob CLOB;
l_rest_token VARCHAR2(1000);
l_token_url VARCHAR2(1000) := 'https://pdf-services-ue1.adobe.io/token';
l_parm_names apex_application_global.vc_arr2;
l_parm_values apex_application_global.vc_arr2;
begin
select filename, mime_type, blob_contents
into l_filename, l_mime_type, l_blob_content
from AICD_MEDIA
where id = :P105_MEDIA_ID;
-- Adobe PDF Services
-- Setup up initial parameters
l_parm_names(1) := 'client_id';
l_parm_values(1) := l_client_id;
l_parm_names(2) := 'client_secret';
l_parm_values(2) := l_client_secret;
-- Get Token
apex_web_service.g_request_headers.delete();
apex_web_service.g_request_headers(1).name := 'Content-Type';
apex_web_service.g_request_headers(1).value := 'application/x-www-form-urlencoded';
l_rest_token := JSON_VALUE(apex_web_service.make_rest_request(
p_url => l_token_url,
p_http_method => 'POST',
p_parm_name => l_parm_names,
p_parm_value => l_parm_values
),'$.access_token');
apex_debug.info('Adobe PDF Services Toke -> %s',l_rest_token);
--DBMS_OUTPUT.PUT_LINE('Adobe PDF Services Toke -> ' || l_rest_token); -- Outout for SQL clients
end;
The Access Token and Client ID are required for other non-pre-signed requests.
Uploading an Asset
Before we can upload an asset, we must acquire a pre-signed upload URI for the asset. We are not allowed to simple upload the content as with OCI Object Storage, but if you wish to draw parallels, then this could be considered a pre-authorised request.
The Pre-Signed Upload URI is a unique time bound URI. It can be referenced as many times as required before it expires.
So, we will use a two step process to upload the content as an asset.
Getting a Pre-Signed Upload URI
This request prepares a location to allow assets to be uploaded and will return references to the new Asset ID and the upload location.
Referencing the Access Token and Client ID, we can request the pre-signed Upload URI
declare
l_client_id VARCHAR2(1000) := 'Your Client ID';
l_rest_token VARCHAR2(1000) := 'Your Access Token';
l_mime_type VARCHAR2(1000) := 'Your Assets Content Type';
l_request_url VARCHAR2(1000) := 'https://pdf-services-ue1.adobe.io/assets';
l_body JSON_OBJECT_T := new JSON_OBJECT_T;
l_response_clob CLOB;
l_presigned_url VARCHAR2(4000);
l_asset_id VARCHAR2(4000);
begin
-- Adobe PDF Services
-- Get Pre-signed Upload URL
l_body.put('mediaType', l_mime_type );
APEX_DEBUG.INFO('Adobe PDF Services Upload Body: %s', l_body.to_string);
-- Set additional API parameters
apex_web_service.g_request_headers.delete();
apex_web_service.g_request_headers(1).name := 'x-api-key';
apex_web_service.g_request_headers(1).value := l_client_id;
apex_web_service.g_request_headers(2).name := 'Authorization';
apex_web_service.g_request_headers(2).value := 'bearer ' || l_rest_token;
apex_web_service.g_request_headers(3).name := 'content-type';
apex_web_service.g_request_headers(3).value := 'application/json';
l_response_clob := apex_web_service.make_rest_request(
p_url => l_request_url,
p_http_method => 'POST',
p_body => l_body.to_string
);
APEX_DEBUG.INFO('Adobe PDF Services Asset: %s Upload URL: %s', JSON_VALUE(l_response_clob, '$.assetID') || CHR(10), JSON_VALUE(l_response_clob,'$.uploadUri'));
l_presigned_url := JSON_VALUE(l_response_clob,'$.uploadUri');
l_asset_id := JSON_VALUE(l_response_clob, '$.assetID');
END;
The Pre-Signed Upload URI and Asset ID are required for the Upload request.
Uploading an Asset with the Pre-Signed Upload URI
Uploading assets no longer requires Authorization Bear Token of the Client ID for the API Key. We simply need to get our Blob and upload it with its correct mime type.
I am storing my object both locally in the database for easy retrieval but this can be any location used for storage.
declare
l_mime_type VARCHAR2(1000);
l_blob_content BLOB;
l_response_clob CLOB;
l_asset_url VARCHAR2(1000) := 'https://pdf-services-ue1.adobe.io/assets';
begin
-- Retrieve the Blob Content, sample from a local table but this could be on OCI Object Storage
select mime_type, blob_contents
into l_mime_type, l_blob_content
from YOUR_TABLE_STORING_YOUR_BLOB_CONTENT
where id = YOUR_CONTENTS_ID;
-- Adobe PDF Services
-- Set additional API parameters
apex_web_service.g_request_headers.delete();
apex_web_service.g_request_headers(1).name := 'content-type';
apex_web_service.g_request_headers(1).value := l_mime_type;
l_response_clob := apex_web_service.make_rest_request(
p_url => l_asset_url,
p_http_method => 'PUT',
p_body_blob => l_blob_content
);
END;
Funny enough there is no real document about using the pre-signed Upload URL outside of the Getting Started section with its curl example
curl --location -g --request PUT 'https://dcplatformstorageservice-prod-us-east-1.s3-accelerate.amazonaws.com/b37fd583-1ab6-4f49-99ef-d716180b5de4?X-Amz-Security-Token={{Placeholder for X-Amz-Security-Token}}&X-Amz-Algorithm={{Placeholder for X-Amz-Algorithm}}&X-Amz-Date={{Placeholder for X-Amz-Date}}&X-Amz-SignedHeaders={{Placeholder for X-Amz-SignedHeaders}}&X-Amz-Expires={{Placeholder for X-Amz-Expires}}&X-Amz-Credential={{Placeholder for X-Amz-Credential}}&X-Amz-Signature={{Placeholder for X-Amz-Signature}}' \
--header 'Content-Type: application/pdf' \
--data-binary '@{{Placeholder for file path}}'
Normal checks should be added to the code to ensure an acceptable HTTP 200 code was received.
Creating a Job & Checking its Status
My next request will be to convert this asset if it is not already a PDF. I will submit a Create Job request and then loop/poll to check on its status.
As this is APEX and it has Workflow capability, I am actually orchestrating all of this as a workflow and leveraging the Wait activity. Feel free to add you looping code with sleep methods.
Create PDF Job
The Create PDF API will generate a PDF document from Microsoft Office documents (Word, Excel and PowerPoint) and Image file formats.
It has a few parameters but only the Access Token, Client ID and Asset ID are required.
Let’s submit the request with our current details
DECLARE
l_client_id VARCHAR2(1000) := 'Your Client ID';
l_rest_token VARCHAR2(1000) := 'Your Access Token';
l_asset_id VARCHAR2(4000);
l_response_clob CLOB;
l_job_url VARCHAR2(1000) := 'https://pdf-services-ue1.adobe.io/operation/createpdf';
l_body JSON_OBJECT_T := new JSON_OBJECT_T;
l_pdf_job_url VARCHAR2(4000);
l_pdf_job_id VARCHAR2(4000);
BEGIN
BEGIN
-- Adobe PDF Services
-- Submit the Create PDF Job
l_body.put('assetID', l_asset_id);
--l_body.put('documentLanguage', l_lang_code);
-- Lang Code needs to be in ISO format if this is important to you
-- Set additional API parameters
apex_web_service.g_request_headers.delete();
apex_web_service.g_request_headers(1).name := 'x-api-key';
apex_web_service.g_request_headers(1).value := l_client_id;
apex_web_service.g_request_headers(2).name := 'Authorization';
apex_web_service.g_request_headers(2).value := 'bearer ' || l_rest_token;
apex_web_service.g_request_headers(3).name := 'content-type';
apex_web_service.g_request_headers(3).value := 'application/json';
l_response_clob := apex_web_service.make_rest_request(
p_url => l_job_url,
p_http_method => 'POST',
p_body => l_body.to_string
);
-- Check if everything went okay and get the request id and location to check the job's status.
IF (apex_web_service.g_status_code = 201) THEN
-- Check Status
APEX_DEBUG.INFO('Status Code %s job is created', apex_web_service.g_status_code);
-- Get Headers for Location and Request ID
FOR i in 1.. apex_web_service.g_headers.count
LOOP
IF apex_web_service.g_headers(i).name = 'location'
THEN l_pdf_job_url := apex_web_service.g_headers(i).value;
END IF;
IF apex_web_service.g_headers(i).name = 'x-request-id'
THEN l_pdf_job_id := apex_web_service.g_headers(i).value;
END IF;
END LOOP;
END IF;
APEX_DEBUG.INFO('PDF Job URL is: %s ', l_pdf_job_url);
APEX_DEBUG.INFO('PDF Job Request ID is: %s ', l_pdf_job_id);
EXCEPTION
WHEN OTHERS THEN
BEGIN
APEX_DEBUG.INFO('Exception while submitting Create PDF Job: %s ', apex_web_service.g_status_code);
APEX_DEBUG.INFO(CHR(10) || SQLCODE);
APEX_DEBUG.INFO(CHR(10) || SUBSTR(SQLERRM, 1, 64));
-- Get Headers for Location and Request ID
FOR i in 1.. apex_web_service.g_headers.count
LOOP
IF apex_web_service.g_headers(i).name = 'location'
THEN l_pdf_job_url := apex_web_service.g_headers(i).value;
END IF;
IF apex_web_service.g_headers(i).name = 'x-request-id'
THEN l_pdf_job_id := apex_web_service.g_headers(i).value;
END IF;
END LOOP;
APEX_DEBUG.INFO('PDF Job URL is: %s ', l_pdf_job_url);
APEX_DEBUG.INFO('PDF Job Request ID is: %s ', l_pdf_job_id);
END;
END;
END;
Checking the Create PDF Job Status
We need to loop/poll the job until is is done. As I mentioned I am using APEX Workflow for my loop execution so I will not have any code examples to show that but a simple sleep will do.
BEGIN
DBMS_SESSION.SLEEP(60);
END;
The API needs the Access Token, Client ID and Job Request ID.
declare
l_client_id VARCHAR2(1000) := 'Your Client ID';
l_rest_token VARCHAR2(1000) := 'Your Access Token';
l_pdf_job_url VARCHAR2(4000); -- Create Job URL from Create PDF Job step
l_response_clob CLOB;
l_pdf_job_status VARCHAR2(100);
l_pdf_asset_id VARCHAR2(4000);
l_pdf_downld_url VARCHAR2(4000);
begin
BEGIN
-- Adobe PDF Services
-- Check the Create PDF Job Status
-- Set additional API parameters
apex_web_service.g_request_headers.delete();
apex_web_service.g_request_headers(1).name := 'x-api-key';
apex_web_service.g_request_headers(1).value := l_client_id;
apex_web_service.g_request_headers(2).name := 'Authorization';
apex_web_service.g_request_headers(2).value := 'bearer ' || l_rest_token;
l_response_clob := apex_web_service.make_rest_request(
p_url => l_pdf_job_url,
p_http_method => 'GET'
);
l_pdf_job_status := JSON_VALUE(l_response_clob,'$.status');
l_pdf_asset_id := JSON_VALUE(l_response_clob,'$.asset.assetID');
l_pdf_downld_url := JSON_VALUE(l_response_clob,'$.asset.downloadUri');
APEX_DEBUG.INFO('Current Job Status is: %s ', l_pdf_job_status);
APEX_DEBUG.INFO('Download URL is: %s ', l_pdf_downld_url);
EXCEPTION
WHEN OTHERS THEN
APEX_DEBUG.INFO('Exception Checking Job Status: %s ', apex_web_service.g_status_code);
END;
END;
The Job Status has three states
inprogress
done
failed
Evaluate the status to continue, downloading the PDF asset, extracting the contents as JSON, retry the job, or raising an Exception to be handled manually.
Sudo Code
case l_pdf_job_status
when 'inprogress' then loop ...
when 'done' then great!
end case;
Downloading an Asset
The download uses a similar request as when getting the pre-signed Upload URI. The Check Status will return the asset’s download URI a pre-signed Download URI.
DECLARE
l_pdf_downld_url VARCHAR2(4000);
l_response CLOB;
l_content_length NUMBER;
l_content_type VARCHAR2(4000);
BEGIN
BEGIN
-- Get PDF Download URL
APEX_DEBUG.INFO('PDF Download URL %s' , l_pdf_downld_url);
-- Set additional API parameters
apex_web_service.g_request_headers.delete();
-- Download PDF Adobe PDF Services
l_blob_content := apex_web_service.make_rest_request_b(
p_url => l_pdf_downld_url,
p_http_method => 'GET');
IF apex_web_service.g_status_code = 200 THEN
BEGIN
APEX_DEBUG.INFO('PDF Downloaded');
FOR i IN 1..apex_web_service.g_headers.count
LOOP
APEX_DEBUG.INFO(apex_web_service.g_headers(i).name || ': ' || apex_web_service.g_headers(i).value ||CHR(10));
IF apex_web_service.g_headers(i).name = 'Content-Length'
THEN
l_content_length := apex_web_service.g_headers(i).value;
END IF;
IF apex_web_service.g_headers(i).name = 'Content-Type'
THEN
l_content_type := apex_web_service.g_headers(i).value;
END IF;
END LOOP;
END;
END IF;
APEX_DEBUG.INFO('Object Content Length: %s ', l_content_length);
APEX_DEBUG.INFO('Object Content Type: %s ', l_content_type);
EXCEPTION
WHEN OTHERS THEN
APEX_DEBUG.INFO('Exception Downloading PDF: %s ', apex_web_service.g_status_code);
END;
END;
Extract PDF as JSON
My next step is actually to extract the content from the document asset. The Extract API extracts content from PDF documents and output it in a structured JSON format, along with tables and figures.
The process is similar to the Create PDF as a job will be submitted to perform the task that will need to be checked.
Create Extract Job
It has a few parameters but only the Access Token, Client ID and Asset ID are required.
DECLARE
l_client_id VARCHAR2(1000) := 'Your Client ID';
l_rest_token VARCHAR2(1000) := 'Your Access Token';
l_asset_id VARCHAR2(4000);
l_response_clob CLOB;
l_job_url VARCHAR2(1000) := 'https://pdf-services-ue1.adobe.io/operation/extractpdf';
l_body JSON_OBJECT_T := new JSON_OBJECT_T;
l_extr_job_url VARCHAR2(4000);
l_extr_job_id VARCHAR2(4000);
BEGIN
-- Adobe PDF Services
BEGIN
-- Submit the Create PDF Job
l_body.put('assetID', :EXTRACT_PDF_ASSET_ID);
--l_body.put('documentLanguage', l_lang_code);
-- Lang Code needs to be in ISO format
-- Set additional API parameters
apex_web_service.g_request_headers.delete();
apex_web_service.g_request_headers(1).name := 'x-api-key';
apex_web_service.g_request_headers(1).value := l_client_id;
apex_web_service.g_request_headers(2).name := 'Authorization';
apex_web_service.g_request_headers(2).value := 'bearer ' || l_rest_token;
apex_web_service.g_request_headers(3).name := 'content-type';
apex_web_service.g_request_headers(3).value := 'application/json';
l_response_clob := apex_web_service.make_rest_request(
p_url => l_job_url,
p_http_method => 'POST',
p_body => l_body.to_string
);
IF (apex_web_service.g_status_code = 201) THEN
-- Check Status
APEX_DEBUG.INFO('Status Code %s job is created', apex_web_service.g_status_code);
-- Get Headers for Location and Request ID
FOR i in 1.. apex_web_service.g_headers.count
LOOP
IF apex_web_service.g_headers(i).name = 'location'
THEN l_extr_job_url := apex_web_service.g_headers(i).value;
END IF;
IF apex_web_service.g_headers(i).name = 'x-request-id'
THEN l_extr_job_id := apex_web_service.g_headers(i).value;
END IF;
END LOOP;
END IF;
APEX_DEBUG.INFO('PDF Job URL is: %s ', l_extr_job_url);
APEX_DEBUG.INFO('PDF Job Request ID is: %s ', l_extr_job_id);
EXCEPTION
WHEN OTHERS THEN
BEGIN
APEX_DEBUG.INFO('Exception submitted Create PDF Job: %s ', apex_web_service.g_status_code);
APEX_DEBUG.INFO(CHR(10) || SQLCODE);
APEX_DEBUG.INFO(CHR(10) || SUBSTR(SQLERRM, 1, 64));
-- Get Headers for Location and Request ID
FOR i in 1.. apex_web_service.g_headers.count
LOOP
IF apex_web_service.g_headers(i).name = 'location'
THEN l_extr_job_url := apex_web_service.g_headers(i).value;
END IF;
IF apex_web_service.g_headers(i).name = 'x-request-id'
THEN l_extr_job_id := apex_web_service.g_headers(i).value;
END IF;
END LOOP;
APEX_DEBUG.INFO('PDF Job URL is: %s ', l_extr_job_url);
APEX_DEBUG.INFO('PDF Job Request ID is: %s ', l_extr_job_id);
END;
END;
END;
Checking the Create PDF Job Status
We need to loop/poll the job until is is done.
The API needs the Access Token, Client ID and Job Request ID.
declare
l_client_id VARCHAR2(1000) := 'Your Client ID';
l_rest_token VARCHAR2(1000) := 'Your Access Token';
l_extr_job_url VARCHAR2(4000); -- Create Job URL from Create PDF Job step
l_response_clob CLOB;
l_extr_job_status VARCHAR2(100);
l_extr_downld_url VARCHAR2(4000);
begin
BEGIN
-- Adobe PDF Services
-- Check the Extract PDF Job Status
-- Set additional API parameters
apex_web_service.g_request_headers.delete();
apex_web_service.g_request_headers(1).name := 'x-api-key';
apex_web_service.g_request_headers(1).value := l_client_id;
apex_web_service.g_request_headers(2).name := 'Authorization';
apex_web_service.g_request_headers(2).value := 'bearer ' || l_rest_token;
l_response_clob := apex_web_service.make_rest_request(
p_url => l_extr_job_url,
p_http_method => 'GET'
);
l_extr_job_status := JSON_VALUE(l_response_clob,'$.status');
l_extr_downld_url := JSON_VALUE(l_response_clob,'$.content.downloadUri');
APEX_DEBUG.INFO('Current Job Status is: %s ', l_extr_job_status);
APEX_DEBUG.INFO('Download URL is: %s ', l_extr_downld_url);
EXCEPTION
WHEN OTHERS THEN
APEX_DEBUG.INFO('Exception: %s ', apex_web_service.g_status_code);
END;
END;
The Job Status has three states
inprogress
done
failed
Evaluate the status to continue, downloading the extracted contents as JSON, retry the job, or raising an Exception to be handled manually.
Download the Extracted JSON
The last step for this article is downloading the Extracted JSON. We’ve seen how to download from the PDF services already and this is exactly the same.
Download an Asset with a Pre-Signed Download URI
DECLARE
l_extr_downld_url VARCHAR2(4000);
bl_text BLOB;
l_content_length number;
l_content_type varchar2(4000);
BEGIN
-- Set additional API parameters
apex_web_service.g_request_headers.delete();
-- Download PDF Adobe PDF Services
bl_text := apex_web_service.make_rest_request_b(
p_url => l_extr_downld_url,
p_http_method => 'GET');
IF apex_web_service.g_status_code = 200 THEN
BEGIN
APEX_DEBUG.INFO('Text Downloaded');
FOR i IN 1..apex_web_service.g_headers.count
LOOP
APEX_DEBUG.INFO(apex_web_service.g_headers(i).name || ': ' || apex_web_service.g_headers(i).value ||CHR(10));
IF apex_web_service.g_headers(i).name = 'Content-Length'
THEN
l_content_length := apex_web_service.g_headers(i).value;
END IF;
IF apex_web_service.g_headers(i).name = 'Content-Type'
THEN
l_content_type := apex_web_service.g_headers(i).value;
END IF;
END LOOP;
-- Save the Extract some where
END;
END IF;
END;
The structuredData.json file with the extracted content & PDF element structure. See the JSON schema for a description of the default output. Review the How-To Extract PDF for more details.
The Oracle database provides many options to store JSON documents, parse and extract the data attributes.
Conclusion
The Adobe PDF Services offer a complete and easy to use set of APIs to work with PDF documents and has strong AI capabilities to extract, insert and manipulate the documents contents.
For my use case the generated PDF matched the Power Point test document and the extract had more than enough details about each page including styling metadata.
All-in-all I had a positive experience using these APIs, though I am not a big fan of passing credentials in the request body and hope this is not susceptible to attacks.
I hope you find this information useful and interesting.
Subscribe to my newsletter
Read articles from Sydney Nurse directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Sydney Nurse
Sydney Nurse
I work with software but it does not define me and my constant is change and I live a life of evolution. Learning, adapting, forgetting, re-learning, repeating I am not a Developer, I simply use software tools to solve interesting challenges and implement different use cases.