Prompt Comparison app

Donald TuckerDonald Tucker
2 min read

One of my favorite ways to learn how large-language models behave is to drop the same prompt into two different models and watch the discrepancies unfold. A few weeks ago I kept switching tabs between ChatGPT and local models (LM Studio), but the process felt clunky. So I built a micro-tool that does the comparison for me—and it runs entirely on Hugging Face.

Although it only compares open source models and not the latest foundation models, it still is useful for me to compare differences between the open source models, at least the lightweight versions.

Why I Built It

  • Rapid prompt engineering

    Seeing two answers side-by-side helps me decide which wording and styling.

  • Qualitative benchmarking

    Formal evaluation metrics are great, but a quick visual gut-check on tone and factuality often saves me time. I like get a ‘feel’ for each model.

How It Works under the Hood

  1. Gradio front-end

    A single Textbox for the prompt and two response panes keep the UI friction-free.

  2. huggingface-hub’s InferenceClient

    Instead of loading giant weights locally, the app makes chat_completion calls to

    • mistralai/Mistral-7B-Instruct-v0.2

    • meta-llama/Llama-2-7b-chat-hf

from huggingface_hub import InferenceClient

client = InferenceClient("mistralai/Mistral-7B-Instruct-v0.2", token=HF_TOKEN)
resp = client.chat_completion(
        messages=[{"role": "user", "content": prompt}],
        max_tokens=200).choices[0].message["content"]

Using the Tool

  1. Paste a prompt – e.g. “Explain the second law of thermodynamics in plain English.”

  2. Hit “Submit.”

Compare the Mistral answer (usually concise and instructional) with LLaMA-2’s (often more conversational).

Prompt Comparison Tool

Observations

My personal observations experimenting with the two models:

  • Stylistic flavor – Mistral tends to jump straight into bullet points; LLaMA-2 sprinkles more transition words.

  • Output – Mistral packs more information than LLaMA-2.

Check it out

➡️ Prompt Comparison Tool on HuggingFace

Feel free to ping me with improvements or bug reports. Happy prompting!

0
Subscribe to my newsletter

Read articles from Donald Tucker directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Donald Tucker
Donald Tucker

I am an Industrial Engineer utilizing the power of python to gain deeper insights in data. I am currently learning Deep learning with TensorFlow