#2 - Learn Prompt Engineering from an SDET

Introduction

After 12 years of writing test scripts in JavaScript, Java and Python, I thought I’d seen it all—until I joined a GenAI cohort. Turns out, crafting prompts for Large Language Models (LLMs) isn’t so different from writing test cases. Both demand precision, structure, and an obsession with avoiding GIGO (Garbage In, Garbage Out). Here’s how my QA mindset transformed my prompting skills.


Why GIGO Hits Home for QA

In software testing, a vague requirement (“Test the login flow”) guarantees bugs. The same applies to AI: a lazy prompt like “Write about cloud computing” invites chaos. My QA instincts kicked in—specificity is king.

Example:

Bad prompt example : “Explain APIs.”

Good prompt example: “Explain REST APIs in 3 bullet points for a junior front-end developer with 2 years of experience using JavaScript.”

Interestingly, this mirrors writing test steps like:

# Bad  
def test_login():  
    # Steps: 1. call api to get token 2. call api with token 3. check answer  

# Good  
def test_oauth_login_with_invalid_token():  
    # Steps: 1. Generate invalid JWT token using X function, 2. Send token in a POST request to /login, 3. Assert 401 response

Key Takeaway:
Treat prompts like test cases. The more the ambiguity, the more the technical debt.


Model-Specific Prompts: The “Syntax” of AI

Just as Python and JavaScript have different syntax, LLMs like ChatGPT, LLaMA-3, and Alpaca require unique prompt structures. Here are some examples:

1. Alpaca (Stanford) prompt

Think of this as structured JSON for AI perfect for QA engineers who love organised inputs:

## instruction  
Validate this JSON payload against a schema:  
## input  
{ "user_id": 123, "name": "Alice" }  
## response

Why It Works: Separates instructions, inputs, and outputs—just like test fixtures.

2. LLaMA-3’s INST Tags based prompt

This reminded me of XML test reports, where tags define boundaries:

<s>
[INST] 
Debug this Python code:  
    def sum(a, b):  
        return a - b  
[/INST]
</s>

QA Insight: Use <s> like <testcase> tags in JUnit—it tells the model where the “test” starts and ends.

3. ChatGPT’s ‘ChatML’ prompt

Role-based prompts felt like mocking user personas in QA:

// Simulate a "system" role like a test user  
{ role: "system", content: "You are a security auditor." },  
{ role: "user", content: "Scan this JS code for SQL vulnerabilities." }

Prompting Techniques, QA-Style

Here’s how I mapped prompting strategies to QA workflows:

  1. Zero-Shot Prompting :
    Directly instructs the model to perform a task without providing examples, relying solely on its pre-trained knowledge.This is not the preferred approach in prompt as it leaves a lot to the LLM to assume things but is the most commonly used.

    QA Similarity: Quick Smoke Test

     Check if this email string matches RFC 5322 standards: "lambo@yahoo.com"
    
  2. One-Shot Prompting :

    Provides a single example to demonstrate the desired output format or reasoning pattern for the model to replicate.

    QA Similarity: Documenting a bug reproduction step.

     How to replicate:  
     1. Click "Submit" without filling the form → Error 500.  
     Now, describe how to test a payment form timeout.
    
  3. Few-Shot Prompting :

    Supplies 2–5 examples to establish context or patterns, guiding the model to generalise solutions for similar tasks.

    QA Similarity: This feels similar to training new hires on defect triage.

     Defect 1: "App crashes on iOS 16" → Priority: High  
     Defect 2: "Typos in footer text" → Priority: Low  
     Defect 3: "API returns 404 for /users" → Priority:
    
  4. Chain-of-Thought (CoT) Prompting :

    Requests step-by-step reasoning to solve complex problems, mimicking human-like logical or mathematical breakdown.

    QA Similarity: Root cause analysis.

     A user reports slow dashboard loads. Investigate step by step:  
     1. Review server resource utilization
     2. Check network latency.
     3. Review database query times.  
     4. Inspect frontend rendering.
    
  5. Self-Consistency Prompting :

    Generates multiple candidate answers to a problem and selects the most consistent result through majority voting or aggregation.

    QA Similarity: Flaky test resolution.

     Test `test_checkout_flow` fails randomly.  
     Generate 3 hypotheses for the cause and pick the most likely.
    
  6. Persona-Based Prompting :

    Assigns a specific expertise, trait, or perspective (e.g., “doctor” or “historian”) to align responses with a defined role.

    QA Similarity: Testing user roles (admin vs. guest).

     Act as a non-technical user. Describe how you’d file a bug report.
    
  7. Role-Play Prompting :

    Simulates interactive scenarios where the model adopts a character or conversational role (e.g., “travel agent” or “customer”).

    QA Similarity: User journey simulations.

     You’re a frustrated customer. Walk through password reset steps.
    

Some more prompting techniques

There are some other prompting techniques but those seemed too far-fetched for the scope of this article and therefore let me just share what they are in a one-line sentence and leave them as a topic to be covered in a later article:

  • Contextual Prompting: Requires real-time data (e.g., “Summarise today’s news”), which most LLMs lack.

  • Multi model Prompting: Mixing text, images, or audio (e.g., “Describe this painting”) isn’t universally supported.


Final Thoughts: Precision Beats Luck

Prompting is equal parts art and engineering. Start with clarity, test iteratively, and always tailor prompts to your model’s architecture. As AI evolves, so will these techniques—but GIGO will forever remain the law of the land.

For me, the following are two key takeaways that I have learned while researching on prompting techniques as a QA:

  1. Iterate Like You’re Debugging: My first prompts failed spectacularly. I tweaked them like I’d fix a flaky test and eventually it yields better and accurate results.

  2. Validate Outputs Rigorously: Treat AI responses like test results—assert correctness, check edge cases.


Reference

  1. GenAI Cohort by Hitesh Choudhary and Piyush Garg (use coupon code: MISHAL53139 for discount)
0
Subscribe to my newsletter

Read articles from Mishal Alexander directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mishal Alexander
Mishal Alexander