Whenever I’m researching something to solve a specific need, I usually explore Google and GitHub, downloading apps shared by other devs to learn new techniques. While looking for ways to implement drag and drop in an interactive grid, I stumbled upon an audio-to-text transcription test by Paulo Kunzel [at this link]. Since I already had the app installed, I decided to give it a try.

Here’s my take: I found this approach fantastic because it’s incredibly fast and runs entirely in the browser — no need to sign up for any external service, call REST endpoints, or use AI APIs to process the transcription. I asked ChatGPT to explain how it works, and here’s the answer:

“Speech recognition in the browser using the SpeechRecognition API works natively, without needing to manually send the audio to a server. However, the audio is processed in the cloud — usually by Google’s servers — automatically and transparently. The browser captures sound from the microphone, sends it to the speech recognition servers, and returns the converted text via JavaScript events, all without requiring manual API calls or external libraries.”

After exploring the app a bit, I made a few small tweaks to handle longer audio and decided to share this simple technique here.

Create a button

Give it any label and static ID. In this case, I’ll call the button DITAR and set its static-id to st_ditar. The button type should be one that triggers a Dynamic Action.
Create a text field

This will display the transcribed text in real time. In this example, it’s P31_TEXTO_RAPIDO.
Add the following code

Insert it in the “Execute when Page Loads” section of your APEX page, adjusting the IDs to match your setup:

let recognition = new window.SpeechRecognition();

recognition.grammars = new window.SpeechGrammarList(); recognition.continuous = true; 
recognition.lang = 'pt-BR'; 
recognition.interimResults = true; 
recognition.maxAlternatives = 1; 

/* recognition.addEventListener('speechend', () =>{ recognition.stop(); }); */

document.querySelector('#st_ditar').addEventListener('mousedown', () =>{ recognition.start(); });
document.querySelector('#st_ditar').addEventListener('mouseup', () =>{ recognition.stop(); });

recognition.addEventListener('error', (event) => { console.error('Erro no reconhecimento do texto: ', event.error); });

recognition.onresult = function (event) { console.log('-----------------------------'); 

let last = event.results.length - 1; 
let texto = event.results[last][0].transcript; 
apex.item('P31_TEXTO_RAPIDO').setValue(texto);

Quick Explanation

This script enables speech recognition in the browser using the SpeechRecognition API. When the user presses and holds the button with id="st_ditar", recognition starts (mousedown), and when they release (mouseup), it stops. The audio is sent to the browser’s speech recognition service (usually Google or Microsoft), which returns the text. That text is then assigned to the P31_TEXTO_RAPIDO item in Oracle APEX.

The settings specify:

Language: Brazilian Portuguese (pt-BR). You can also change this setting to recognize other languages.
interimResults: true for real-time feedback
maxAlternatives: 1 to return just the best match
Errors are logged to the console using console.error.

About the Two Key Parameters:

interimResults

true: Returns partial results as you speak, ideal for real-time transcription.
false: Only returns the final result after a pause, with fewer event triggers but no live feedback.

maxAlternatives

1: Returns only the most confident interpretation.
>1 (e.g., 3): Returns multiple options ranked by confidence — useful if you want to let the user pick or apply fuzzy logic.

My Example Screen Looked Like This:

You can replicate this in your app in under 5 minutes. Try it out and share your feedback in the comments!

Oracle APEX: Transcribing Audio to Text Without REST, Directly in the Browser in only 5 Minutes

Quick Explanation

About the Two Key Parameters:

Subscribe to my newsletter

Valter Zanchetti Filho

Valter Zanchetti Filho