How to Make Personalized Study Guide Audiobooks Using LLMs and Text-to-Speech

Background
I consider myself as a constant learner. If a few consecutive days pass by without having learned something new, or improved an existing skill, alarm bells start ringing in my head.
As such, I try to make learning as accessible as possible, at all times.
Often times I find myself driving for extended periods of time. A perfect opportunity to plug in my Android phone into my car, open Audible or a similar app, and play an audiobook.
However, despite the plethora of available titles, sometimes you just can’t find a suitable book for the current topic or niche you are interested in. Concrete example: I was studying for the GCP Data Engineer exam, and I wanted an audio guide to accompany the written and visual study material I was using at the time. Since this wasn’t available, I made my own.
Sourcing the data
Nowadays, I rely heavily on LLMs such as GPT o3 and Claude Opus 4 to generate my study guides. I have gone through quite a bit of trial and error with the prompts, but I believe I narrowed the ideal prompt down, to something like:
I plan to take the Professional Data Engineer certification from Google: https://cloud.google.com/learn/certification/data-engineer/.
I already have the professional cloud architect and developer certifications and I also have AWS certifications.
I've used both GCP and AWS in practice, so i have a fairly good understanding of Cloud, but haven't gone in depth on the Data products yet.
Give me a detailed study guide, for the latest exam content (2025) (including all relevant objectives/topics), split by high-level objective that is tested.
I want to be able to just read what you provide and ace the exam.
Give me concrete study guide style content, not things like "you will be be tested on X and Y so study for them".
THIS will be the study. Make it as detailed as you need to
Don't skimp on details, I want what you provide to be as good as what I'd find in full blown courses online.
Take your time and deliver.
If you need me to answer any clarifying questions, let me know!
Obviously, mileage may very, depending on model used, temperature settings, exact prompt, etc. However, I am consistently happy with the level of detail I get in the responses, using a prompt similar to the above.
The above prompt ended up generating about 45 pages of very useful, distilled information for the exam I was studying for.
Modest Beginnings
Google Play Books - Read Aloud
As with most studying I do nowadays, I start from a latest reasoning model available from any of the many LLMs available. For this particular case, I went with ChatGPT’s o3 model, with extended + web search, to generate a PDF study guide for the exam. It produced around 45 pages of distilled content, filled to the brim with useful information.
Then I tried the fastest approach available to me, based on the tech stack I had available (macOS, Android Phone, Android Auto compatible Car): Upload the PDF to Google Play Books → tap Read Aloud. However the Read Alound function in Google Play Books was unavailable:. Play Books will only speak files it can open in Flowing text mode, and user‑uploaded PDFs always open in the fixed‑layout Original pages view.
PDF to MP3 to Android Auto/VLC
I then turned my attention to converting the PDF to an mp3, that I can simply copy to my Android phone and play with VLC in my car.
Install Poppler, which allows us to convert PDFs to text:
brew install poppler
Convert the PDF to text, copy to clipboard, dump to txt file:
pdftotext "Guide.pdf" - | pbcopy
pbpaste > /tmp/guide.txt
Convert the text file to speech, using macOS’s built in speech capabilities
say -v "Samantha" \
-f "Google Cloud Professional Data Engineer Exam Study Guide (2025).txt" \
-o ~/Music/Guide.aiff
afconvert ~/Music/Guide.aiff \
-f mp4f -d aac -b 64000 \
-o ~/Music/Guide.m4a
Using Macdroid, copy the generated m4a
file to the storage / emulated / 0 / Music
path.
That did the trick, but there were a couple of glaring issues:
The voice and inflexions were very hard to understand as an audiobook, especially when driving. I needed something a lot more human sounding
The actual content of the pdf/txt wasn’t properly created for an audiobook. I needed to regenerate it and clean it up in a way that is audiobook friendly
Aiming for Better Quality
Generating Audiobook Suitable Text
I added a few details to the prompt to o3:
VERY IMPORTANT:
I will convert the text you provide to me to an audio file that i want to play as an audiobook.
Make sure that the verbiage, phrasing, etc. that you output are suitable for playing in an audiobook.
I am not sure exactly what the differences would be between optimizing for reading vs optimizing for listening, but do some research before hand to make sure that you provide me the correct style.
And we were off to the races
Excellent—thanks for confirming!
I will now create a full audiobook-optimized, deeply detailed study guide for the 2025 Google Cloud Professional Data Engineer exam.
It will follow a single continuous narration style, include all tested knowledge, and be structured around the official exam guide's five sections.
I’ll emphasize core services like BigQuery, Dataflow, and Pub/Sub with proportionate depth, and I’ll verbally call out where architectural diagrams should be reviewed.
This guide will require no additional reading and will fully prepare you to pass.
I’ll begin compiling everything and let you know once the full study guide is ready.
And just to compare, I also gave the same information and prompt to Claude Opus 4, who was more than happy to oblige and start researching. Partly because of this concerning message: “Your remaining queries are powered by a lighter version of deep research. Your full access resets on July 26. Upgrade to ChatGPT Pro to continue using deep research.”
I understand exactly what you need! You want a comprehensive, audiobook-optimized study guide for the Google Cloud Professional Data Engineer certification that covers everything you need to ace the exam without consulting any other resources.
Let me dive into extensive research to create this comprehensive guide for you, ensuring it covers:
All 2025 exam objectives from the guide you uploaded
In-depth coverage of each GCP data service weighted by exam importance
Content optimized for listening (audiobook format)
50+ pages of detailed, practical content
I'll research the latest exam updates, practice questions, and real-world scenarios to make this the only resource you'll need!
GPT o3 ended up generating about 20 pages, with references. Time to convert them to a pleasing and soothing voice!
Claude fell over and timed out multiple times, so I decided to abandon it for now.
Converting to audio
First, we need to set up our CLI:
gcloud auth application-default login && gcloud auth application-default set-quota-project <your_project_id>
Then we make sure we have all the required software installed (I’m a Mac user, and am leveraging brew
):
brew install poppler ffmpeg coreutils
We convert the pdf to text
pdftotext -layout "Google Cloud Professional Data Engineer – Comprehensive Study Guide (2025).pdf" guide.txt
Google’s synchronous endpoint accepts up to 5000 characters per call and can return MP3.
Split → loop → base64‑decode → stitch:
#!/usr/bin/env bash
set -euo pipefail # exit on *any* error
# ── USER CONFIG ────────────────────────────────────────────────────────────────
PROJECT_ID="<YOUR_PROJECT_ID>"
VOICE="en-US-Wavenet-D"
CHUNK_SIZE=4900 # bytes (< 5000-byte API limit)
FILE="guide.txt" # text extracted with pdftotext
OUT_MP3="guide_full.mp3"
TMP_DIR="tts_tmp_$(date +%s)" # scratch work-dir
# ───────────────────────────────────────────────────────────────────────────────
log() { printf '[%s] %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$*"; }
mkdir -p "$TMP_DIR"
cd "$TMP_DIR"
# 1) Split the text file -------------------------------------------------------
log "Splitting ${FILE} into ≤${CHUNK_SIZE}-byte chunks"
if command -v gsplit >/dev/null 2>&1; then
gsplit -b "$CHUNK_SIZE" -a 3 "../$FILE" chunk_
else
split -b "$CHUNK_SIZE" -a 3 "../$FILE" chunk_
fi
TOTAL=$(printf "%s\n" chunk_* | wc -l | tr -d " ")
log "Created $TOTAL chunk files"
# 2) Loop over chunks and call WaveNet ----------------------------------------
idx=0
for f in chunk_*; do
log "Chunk $((idx+1))/$TOTAL → building JSON"
jq -n --rawfile t "$f" \
'{"input":{"text":$t},
"voice":{"languageCode":"en-US","name":"'"$VOICE"'"},
"audioConfig":{"audioEncoding":"MP3"}}' > req.json
log "Posting to Google TTS"
curl -s -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "x-goog-user-project: ${PROJECT_ID}" \
-H "Content-Type: application/json" \
-d @req.json \
"https://texttospeech.googleapis.com/v1/text:synthesize" \
| jq -r .audioContent | base64 -d > "part_${idx}.mp3"
log "Saved part_${idx}.mp3 ($(du -h part_${idx}.mp3 | cut -f1))"
idx=$((idx+1))
done
# 3) Merge MP3 parts without re-encoding --------------------------------------
log "Concatenating $idx pieces into ${OUT_MP3}"
printf "file '%s'\n" part_*.mp3 > list.txt # protect * from zsh globbing
ffmpeg -loglevel error -f concat -safe 0 -i list.txt -c copy "../${OUT_MP3}"
log "Final MP3 size: $(du -h ../${OUT_MP3} | cut -f1)"
# 4) Clean up ------------------------------------------------------------------
cd ..
rm -r "$TMP_DIR"
log "✅ Completed – ${OUT_MP3} ready to copy to phone."
It spit out a 35 MiB .mp3 file, whose voice quality was quite a bit better compared to the first attempted.
I copied it to my phone and got it to play in Android Auto, using VLC player.
Subscribe to my newsletter
Read articles from Radu Pana directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
