Get Past the Hype: The Evidence-Based Anatomy of a "Good" Prompt

gnostic_whispergnostic_whisper
7 min read

I’ve been sharing a few “soft” posts that speak to my journey, but what I really want to talk about is my new passion: prompt engineering.

Let’s start at the top. What makes a prompt “good?” There are lots of places you can look to find prompt tips. Most of them are bad. Really bad. I’ve been looking, and it’s unbelievable how many prompt tutorials teach “what works” without ever citing professional evidence or peer reviewed sources. That leads me to my first and most important tip for anyone trying to learn prompt engineering:

Cut through the hype and anchor yourself to evidence

There is lots of prompt research happening right now, and using an evidence based approach is our best way to leverage the growing corpus of evidence based prompt guidance, but it also gives us (the practitioners) top cover. If you are prompting using established best practice from peer-reviewed sources, you are acting responsibly and giving yourself the best chance at a safe, useful output. And if you are subject to any regulatory, governance, or other oversight, appealing to evidence-based practice is the gold standard for defending your work.

The anatomy of a “good” prompt (so says gnostic_whisper)

So after telling you to stop reading blogs to learn prompting, I’d like to take you through my blog to show you how to prompt! But these aren’t just my good ideas … they are the sum of the intellectual superheroes who have gone before us to prove these things out. I’ve cited the papers that I’ve been using to fine tune my approach, but let me know if there are other papers out there that have captured your attention!

Let’s look at an example of a prompt structure that I’m fond of using for this:

You are a python code coach.  
Identify up to five best practices in python repo management (design of the codebase) that I can use to improve the professionalism of my python application.  
You will not consult popular sites or blogs, and will only return information drawn from credible sources.
You will use the following workflow: identify credible sources, draw best practice, summarize those most suitable for a hobby developer, and rank the most impactful practices according to the prevalence in literature.
Your output will be a bulleted list with a short one sentence description.
You may ask me any questions you want before we begin.

This gives me a very good product:

<AI_output>

  • Adopt a Standardized Project Layout: Organize your repository with a predictable structure, such as separate directories for your source code (e.g., a src folder), tests, and documentation, to make your project intuitive for others to navigate.

  • Isolate and Declare Dependencies: Use a virtual environment for every project and explicitly list all required packages in a pyproject.toml or requirements.txt file to ensure your application is reproducible and avoid conflicts.

  • Write Modular and Cohesive Code: Structure your codebase into distinct modules and packages with single, well-defined responsibilities, which makes the code easier to understand, test, and maintain.

  • Enforce a Consistent Code Style: Adhere strictly to the PEP 8 style guide, using automated tools like ruff or black to format your code, which significantly improves readability and collaboration.

  • Integrate a Testing Framework: Create a dedicated tests directory and write automated tests for your code's functionality, ensuring that new changes don't break existing features and increasing overall code quality.

</AI_output>

First, let’s decompose the prompt

You should note a few strong practices in this prompt, and I will lay them out here:

You are a python code coach.
  • Use of a persona (or creating an expert identity): This is an established best practice, and one that works as promised. I’ve given the llm a clear understanding of the role they are to adopt in the conversation, which sets their identity context.
Identify up to five best practices in python repo management (design of the codebase) that I can use to improve the professionalism of my python application.
You will not consult popular sites or blogs, and will only return information drawn from credible sources.
  • We can unpack a lot here. This instruction acts as a "constitution" for the AI, setting hard rules for its behavior. This is an idea explored in papers like Constitution AI. We are also guiding its retrieval process and encouraging it to acknowledge its sources. I’m going to do deep dives into these concepts later, but for now, think of it as setting non-negotiable ground rules.
You will use the following workflow: identify credible sources, draw best practice, summarize those most suitable for a hobby developer, and rank the most impactful practices according to the prevalence in literature.
  • Chain-of-thought prompting: It comes in many forms, and the paper cited here gives an outstanding survey of current best practice. Telling the llm the reasoning steps you want it to take can have a profound impact on its output. My recent research (not yet peer reviewed, so take this with a grain of salt) suggests that a well-formatted instruction to follow a chain of thought is a powerful reasoning accelerator. The only problem is, the acceleration is not always favourable. A poor reasoning path has as significant an impact on output (but negatively) as a good reasoning path. Handle with care.
Your output will be a bulleted list with a short one sentence description.
  • This comes to us from the same source that gave us our guidance on direct imperative instructions, and it’s useful to define the output format we want to receive.
You may ask me any questions you want before we begin.
  • The “self-ask” framework: One of the most powerful frameworks out there, this makes sure the llm has permission to ask for more context before assuming anything about what you want.

Synthesis: Bringing it all together

I’ve made the claim above that my prompt is “good.” While I don’t have specific evidence that this exact prompt performs better or worse than another construction, it is made up of at least six good prompt practices:

  • Issue a relevant persona or expert personality that could solve your problem

  • Use clear imperative language to state your expectations

  • Create a constitution with directive language to curate the llm’s memory and define the “rules of engagement”

  • Use a reasoning directive through chain of thought

  • Define the output

  • Use a “self-ask” approach

If you can start to use these six components in your prompting, you will be well on your way to getting good results.

Of course, you can stop there if you want! But if you want to see how to go “from good to great” (to borrow a phrase) then stick with me and I’ll show you how we can use a few other advanced approaches to really accelerate llm performance, and all of them can be accomplished in a commercially facing Copilot. You don’t need any fancy accounts or tools, just words.

Over the next few posts we will talk about:

  • Meta-Prompting

  • Human-initiated chain of thought (from the prompt, not using agentic workflows)

  • The importance of recursion in prompt optimization

  • Socratic dialog as a mechanism for co-evolutionary recursion between two sessions

  • The value of imposing structure on your prompt

And so much more. Honestly, I’m bursting at the seams with things I want to share.

But tell me my friends, where do you want to go? What’s on your mind, and how can we become the best prompt engineers possible as a community? Whether you are new to this or not, a technical person or a layperson, reach out in the comments. Tell me what’s working for you and what you wish I was doing different.

Feed me all your context, because just like the llm, it’s how you can make sure you get the best output from gnostic_whisper.

0
Subscribe to my newsletter

Read articles from gnostic_whisper directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

gnostic_whisper
gnostic_whisper