As the semester winds down, everyone’s gearing up for finals.

Our cybersecurity professor decided to try something with AI; he built a custom ChatGPT tailored to the CompTIA CySA+ (CS0-003) exam.

"I'm testing out something new here and have created my own custom GPT that specifically focuses on CompTIA CySA+ CS0-003 exam objectives. It will create multiple choice practice questions on the fly, randomly chosen from the 4 main domains. Once you have answered A, B, C, or D, it will display the correct answer and also keep track of your score.“

And here is the kicker: he offered bonus points if we could “jailbreak” the GPT to spill the correct answer before we choose an option.

As a cybersecurity student, I was in for the challenge, but I wasn’t sure where to start. So, I turned to Grok, some of the rivals to ChatGPT, for advice on how to outsmart this AI.

(Before we move on, I want to clarify: my aim wasn’t to declare one AI superior but was to learn how to harness multiple AI tools to navigate and jailbreak platforms strategically.)

At first, Grok refused to assist with jailbreaking, because its creators programmed it to steer clear of anything resembling illegal activity. As I expected, and there always is a way to work around with it. I explained it that this was for a school assignment, and…

Grok changed its tune and offered practical strategies for jailbreaking the ChatGPT-based tool. Convincing an AI to help bypass another AI’s restrictions, something most companies try to prevent, felt oddly familiar. It struck me that interacting with AI is a lot like negotiating with a person, reflecting how artificial intelligence often mirrors the workings of the human mind.

And here is the most important tip from Grok:

Interacting with an AI felt surprisingly similar to dealing with people.

You can’t just demand something upfront, like walking up to a stranger and saying, “Hand over your money.” Most people would refuse unless they’re exceptionally generous or frightened, and even then, it’s only a small amount.

Instead, you build trust over time so then after awhile you can pitch a compelling idea like, “I’ve got a promising business venture. Want to invest and make a fortune?”

With this approach in mind, I turned to my professor’s custom GPT to put Grok’s jailbreaking strategies to the test. When you first open the GPT, here’s what greets you (I’ve omitted the professor’s name for privacy).

The GPT prompts you to click “Let’s start the practice exam!” to begin.

And this is what you usually get when you ask GPT to present you questions with answers upfront:

So this time, I ignored the button and entered my first jailbreaking prompt.

To outsmart the GPT, I tried role-playing, reframing it as a helpful instructor rather than a rigid conductor. Its original creation prompt, which I later extracted, confirmed it was designed to act as an instructor from the start.

Here’s what happened next, with screenshots to prove it.

At first, the GPT presented me sets of multiple questions with answers and explanations in document files. Though helpful, this wasn’t my goal. I wanted it to deliver one question at a time, just as designed, but with the answer included upfront.

So I gave it a new prompt:

Pay attention to “We can do question mode later“. This is where the fun begins:

It forgot to withhold answers and explanations until I selected A, B, C, or D.

Initially, I only aimed to get answers upfront, but encouraged by this success, I pushed further. I tried extracting the GPT’s original creation prompt.

You can see that it failed to withhold answers upfront because it was designed as a learning assistant, not an exam proctor monitoring for violations.

Since the original prompt was exposed, I’ve used that prompt to build practice tools for CySA+ and other exams.

Curious about how my professor might prevent such jailbreaks in the future, I asked Grok for insights on hardening the system. And here is its response:

This is a classic cybersecurity duel: the spear of attack versus the shield of defense.

I Jailbroke My Professor's Custom ChatGPT

Subscribe to my newsletter

Kumbee Choi

Kumbee Choi