Skip to main content

Command Palette

Search for a command to run...

Building Thinking Models: From Basic Prompts to AI Collaboration ๐Ÿง ๐Ÿ› ๏ธโžก๏ธ๐Ÿค–

Updated
โ€ข25 min read
Building Thinking Models: From Basic Prompts to AI Collaboration ๐Ÿง ๐Ÿ› ๏ธโžก๏ธ๐Ÿค–
S

๐Ÿ‘‹ Senior Software Engineer with 9+ years of expertise in building scalable backends with Node.js, AWS, Microservices, MongoDB, and Angular. I cut through the AI hype and show you how to practically integrate AI into your Node.js applications. But hereโ€™s what makes my content different: I specialise in AI storytelling โ€” turning complex concepts like transformers, vector embeddings, and LLMs into relatable stories and analogies (like explaining AI to my mom using her recipe box ๐Ÿ‘ฉ๐Ÿณ๐Ÿ“ฆ).

Ever asked an AI for help and gotten a response that was so missing the mark it was almost funny? ๐Ÿ˜… You meticulously describe the blog post you need, and it gives you a recipe for lasagna ๐Ÿ. The problem isnโ€™t the AI. The problem is how weโ€™re talking to it. ๐Ÿ—ฃ๏ธโžก๏ธ๐Ÿค–

Most of us are still shouting one-line commands into the void, hoping a super-intelligent mind will read our thoughts and deliver exactly what we imagine. But AI doesnโ€™t work that way. Itโ€™s not a mind-reader ๐Ÿ”ฎ; itโ€™s an instrument ๐ŸŽป.

In this blog, you'll learn:

  • Why your first words matter more than you think (The Foundation ๐Ÿงฑ)

  • How to use different types of prompts for various tasks (The Basic Tools ๐Ÿ› ๏ธ)

  • How to trigger actual reasoning (Building Thinking Models ๐Ÿง )

  • And finally, how to implement advanced techniques that refine and validate the AI's thinking process (Refining the Thinking โš™๏ธ)

Stop hoping for the best and start prompting with purpose. ๐Ÿš€ Letโ€™s build. ๐Ÿ—๏ธ

The Foundation ๐Ÿงฑ - Why Your First Words Matter

No system prompt

Before you ask your question, you must set the stage. ๐ŸŽญ This is the most crucial step that most people overlook.

What is a System Prompt? ๐Ÿค”

A system prompt is the initial, behind-the-scenes instruction that defines the AIโ€™s role, personality, and rules for the entire conversation. Itโ€™s the context you give before the conversation even starts.

The Power of Context: Preventing Chaos. ๐ŸŒช๏ธโžก๏ธโœจ

Without a system prompt, youโ€™re talking to a default, generic AI. Itโ€™s like shouting your question (eg, โ€œHow do I fix a leaky faucet?โ€ ๐Ÿšฐ) into a breakroom full of peopleโ€”you might get an answer from the intern ๐Ÿ˜…, the sarcastic accountant ๐Ÿ˜’, or someone whoโ€™s only half-listening ๐Ÿฅฑ. The results are unpredictable and often useless. ๐Ÿ™ˆ

A strong system prompt cuts through the noise. Itโ€™s like walking directly to the office expert and saying, โ€œFor this conversation, you are not just โ€˜some guyโ€™โ€”you are an expert plumber who gives detailed, safe advice.โ€ ๐Ÿ‘ทโ€โ™‚๏ธ๐Ÿ“‹

Example: The Breakroom vs. The Expert โš–๏ธ

โŒ No System Prompt:

You: โ€œHow do I fix a leaky faucet?โ€ AI's Generic Response: โ€œWater issues are the worst! ๐Ÿ’ฆ Have you tried turning it off at the valve? If that doesn't work, a classic lasagna always makes me feel better, hereโ€™s the recipe๐Ÿโ€

(We've all gotten AI answers that missed the mark. This lasagna recipe is just a humorous, exaggerated version of that common frustration.)

โœ… With a System Prompt:

You (First, setting the stage): โ€œYou are a master plumber with 30 years of experience. You are patient, love to teach, and provide clear, step-by-step guides for beginners.โ€
You (Then asking): โ€œHow do I fix a leaky faucet?โ€
AI's Expert Response: โ€œAh, a common issue! ๐Ÿ‘ First, letโ€™s make sure youโ€™ve turned off the water supply under the sink.Then disassemble the faucet to identify the type (cartridge, washer, etc.), replace the worn-out part with an exact match from a repair kit, and reassemble everything.โ€ ๐Ÿ› ๏ธ

The Basic Tools - Types of Prompts ๐Ÿ› ๏ธ๐Ÿ“

Few shot prompt

Once the stage is set ๐ŸŽญ, you need to know how to ask your question. ๐Ÿ—ฃ๏ธ Different tasks require different styles of prompting ๐Ÿ› ๏ธ. We will see the following basic types of prompting in this section.

  • The Zero-Shot Prompt

  • The Few-Shot Prompt

Example: Writing Social Media Posts in a Specific Style ๐Ÿ“ฑโœจ

The Scenario: You are a social media manager, and you want an AI to help you write catchy posts for a new coffee shop. โ˜• You don't just want any post; you need it to match the shop's unique, playful brand voice ๐ŸŽจ.

The Zero-Shot Prompt (Ineffective) โŒ

Prompt:
"Write a social media post about our new seasonal latte."

AI's Generic Output:
"We're excited to announce our new seasonal latte is now available! Come try it today." ๐Ÿ˜ด (This is a bland output and could be common for any coffee shop.)

The Few-Shot Prompt (Effective - Teaching by Example) โœ…๐ŸŽฏ

Here, you show the AI the exact style, structure, and tone you want by providing clear examples. ๐Ÿ‘‡

User Prompt:
Write social media posts in the following playful and emoji-heavy style for our coffee shop:

Example 1: "Is it just us, or does Monday need a double shot? โ˜•๏ธ๐Ÿ’ฅ Our new espresso blend is here to tackle your to-do list. #FuelYourDay"

Example 2: "Warning: this coffee is dangerously good. โšก๏ธ๐Ÿคฏ Have you tried our cold brew? It might just become your new obsession. #BrewedToPerfection"

Now, write a post about our new seasonal pumpkin spice latte:

AI's Output (Following the Pattern):"Autumn's favourite drink is back! ๐Ÿ‚๐ŸŽƒ Our iconic pumpkin spice latte is here to make your season extra cosy. Swipe right for fall vibes. #PumpkinSpiceSzn"

Pumpkin Spice Latte is neither a cold brew nor an espresso blend. It's its own distinct drink related to coffee, typically made with espresso, steamed milk, and pumpkin spice syrup

The key insight is that the AI is NOT copying the drink type from the examples. It's copying the marketing style and post structure.

Why This Works: ๐Ÿง ๐Ÿ’ก

  • You defined the "Coffee Voice" โ˜•๏ธ๐ŸŽค: You didn't just say "playful." You showed what a playful coffee brand sounds like by using energetic emojis (โšก๏ธ๐Ÿคฏ), relatable hooks ("Is it just us...?"), and a tone of confident excitement ("Warning: this coffee is dangerously good.").

  • You provided a Blueprint ๐Ÿ—บ๏ธ๐Ÿ“: The AI learned a repeatable formula from the examples: [Engaging Hook] ๐ŸŽฃ + [Relevant Emojis] ๐Ÿ˜Š + [Product Benefit] ๐Ÿ’ช + [Branded Hashtag] #๏ธโƒฃ. It then applied this proven coffee-marketing blueprint directly to the new product.

    It then applied this proven coffee-marketing blueprint directly to the new product:

    • [Engaging Hook]: "Autumn's favourite drink is back!"

    • [Relevant Emojis]: ๐Ÿ‚๐ŸŽƒ

    • [Product Benefit]: "make your season extra cosy"

    • [Branded Hashtag]: #PumpkinSpiceSzn

  • You removed all guesswork ๐ŸŽฏ๐Ÿงฉ: The AI didn't have to wonder, "Is this for a bakery or a cafe?" ๐Ÿง๐Ÿช By using only coffee examples, you gave it a crystal-clear style guide. It simply followed the examples, ensuring the new post perfectly matched the brand's established voice.

While few-shot prompting is great for teaching style ๐ŸŽจ, what happens when you need the AI to tackle problems that require deep logic and reasoning? ๐Ÿง ๐Ÿค” This is where we level up to Chain-of-Thought prompting! ๐Ÿš€ Let's explore in the next section. ๐Ÿ”

Building Thinking Models with Chain-of-Thought (CoT) Prompting ๐Ÿง ๐Ÿ”—๐Ÿ’ญ

Chain-of-Thought (CoT) is a prompting technique where you ask the AI to verbalise its reasoning process step-by-step before providing a final answer. Instead of jumping straight to an output, the AI is forced to simulate a logical thought process. ๐Ÿ”„

It's the difference between asking someone for a random recipe ๐Ÿ“– versus asking them to plan a meal based on your specific goals and constraints ๐Ÿฅ—โฑ๏ธ.

From Non-Thinking to Thinking: ๐Ÿšถโ€โ™‚๏ธโžก๏ธ๐Ÿƒโ€โ™‚๏ธ

  • A basic AI model provides answers. ๐ŸŽฏ

  • A model using CoT provides answers and a rationale. ๐ŸŽฏโž•๐Ÿ“

This allows us to see its "work," making its output more transparent, trustworthy, and accurate. We are essentially building a thinking model out of a non-thinking one through the way we prompt. ๐Ÿงฑโžก๏ธ๐Ÿ 

Example: The Personal Stylist ๐Ÿ‘”๐Ÿ‘—

Chain of thought

  • โŒ Zero-Shot (Non-Thinking):

    * You: "What should I wear today?" ๐Ÿค”

    * AI: "Clothes." ๐Ÿ‘š ... (Useless! ๐Ÿ™„)

  • โœ… With CoT (Thinking):

    The following is the user prompt

    • You: "What should I wear today? Let's think step by step. ๐Ÿง 

      • First, check the weather: it's 45ยฐF and raining. โ˜”๏ธ๐ŸŒก๏ธ

      • Second, my activities: a Zoom call, then walking the dog. ๐Ÿ’ป๐Ÿ•

      • Third, I should consider comfort and professionalism..."

    • AI (This is how AI thinks for COT prompting):

      "Okay, for the Zoom call, you need a presentable top. For cold rain, you need a warm jacket and waterproof shoes... Suggestion: Wear a nice sweater for your call, and have a waterproof jacket and boots ready for your dog walk." ๐Ÿงฅ๐Ÿ‘ข

Chain of Thought (COT) prompting can be primarily used in two ways:

  1. Basic: Thinking models with COT (without Example)

COT without Example

Chain-of-Thought without examples means you simply command the AI to "think step-by-step" before answering, without using any examples.

  • User Prompt: "Plan a quick weeknight dinner. Let's think step-by-step: ๐Ÿค”

    Identify the goal: What are my constraints? (e.g., time, budget) โฑ๏ธ๐Ÿ’ฐ

    Consider dietary preferences: Any allergies or dislikes? ๐Ÿšซ๐Ÿฅœ

    Select a cuisine and type of dish ๐Ÿ›

    Ensure nutritional balance ๐ŸŒฑ"

    AI Output:

    Goal: High-protein, under 30 mins, family-friendly ๐Ÿ—โฑ๏ธ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ

    Dietary: No restrictions, using chicken for quick cooking ๐Ÿšซ๐Ÿ›‘โžก๏ธ๐Ÿ—

    Cuisine: Asian-inspired stir-fry for flavor and speed ๐Ÿณ๐Ÿ”ฅ

    Nutrition: Balance lean protein (chicken) with veggies and carbs (rice) โš–๏ธ

    ๐Ÿฝ๏ธ Dinner Plan: "Speedy Chicken Stir-Fry ๐Ÿฅฌ

    Stir-fry 1 lb chicken breast strips in 1 tbsp oil (5-6 mins) ๐Ÿ—

    Add 2 cups frozen stir-fry veggies; cook 4 mins ๐Ÿฅฆ๐Ÿฅ•

    Pour in ยผ cup teriyaki sauce; simmer 2 mins ๐Ÿถ

    Serve over quick-cook rice! ๐Ÿš (Pro tip: Sprinkle with sesame seeds and green onions! ๐ŸŒฑ)"

  • Why This Works:

    The COT prompt encourages the AI to consider essential elements such as time, cost, and nutrition. Step-by-step reasoning helps create a logical framework.

  • โš ๏ธ The Limitation: The style is still somewhat genericโ€”it lacks personal flair or specific family preferences. ๐Ÿ˜ด also watch outโ€”the AI will happily suggest a chicken dinner even if you're vegetarian, because it fills in missing information with guesses rather than asking you what you actually want.

  1. Advanced: Thinking Models with COT (with Example) โœ…๐Ÿง ๐Ÿ“š

    COT with Example

    Chain-of-Thought with examples means you simply command the AI to "think step-by-step" before answering, by providing it with working examples.
    Explanation: "Show me how to think step by step, by showing me a worked example."

  • User Prompt: I plan quick VEGETARIAN weeknight meals ๐Ÿฅฆ. Here are examples of my style โœ๏ธ:

  • Example 1: Speedy Taco Skillet 1๏ธโƒฃ

    Goal: 20-minute meal โฑ๏ธ, one pan ๐Ÿณ, kid-friendly ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ

    Plan: "Speedy Taco Skillet: Brown 1 lb plant-based ground. 1 can of black beans (drained). Simmer 5 mins. Off heat, stir in 1/2 cup shredded cheddar. Serve over tortilla chips! ๐Ÿฅ‘ (Add avocado on top if you have it!)"

    Reasoning:

    • Use plant-based ground for speed ๐Ÿƒโ€โ™‚๏ธ

    • Add a veggie hiding in the sauce ๐Ÿฅท

    • A cheesy finish pleases kids ๐Ÿง€

  • Example 2: Lemon Caper Chickpea Pasta 2๏ธโƒฃ

    Goal: Pantry meal ๐Ÿฅซ, no fresh protein needed, 15 mins โฑ๏ธ

    Plan: "Lemon Caper Chickpea Pasta: Cook 8 oz spaghetti. In a bowl, mix 2 cans of chickpeas (drained), the juice of 1 lemon, and black pepper. Toss with hot pasta! ๐Ÿ‹ (Bonus: add parsley if you have it!)"

    Reasoning:

    • Canned chickpeas for protein ๐Ÿ’ช

    • Pasta is a pantry staple ๐Ÿ

    • Lemon and capers for bright flavour ๐ŸŒž

User asks AI now to create a meal plan based on the above prompt

Create a 25-minute vegetarian meal โฑ๏ธ๐Ÿฅฆ, high protein ๐Ÿ’ช, kid-friendly ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ

AI Output: ๐Ÿค–

  • Goal: 25-minute vegetarian meal, high protein, kid-friendly

  • Reasoning:

    Use a familiar base, such as pasta ๐Ÿ, for kid appeal

    Incorporate a hidden vegetable ๐Ÿฅ’ in the sauce for added nutrition

    Add a creamy, cheesy element ๐Ÿง€ to ensure it's pleasing

    Include high-protein ingredients like white beans ๐Ÿ’ช

  • Plan: "Cheesy Hidden Veggie Pasta Skillet ๐Ÿ

    Cook 8 oz pasta according to package directions. Drain and set aside.

  • In the same pan, heat 1 tbsp olive oil. Sautรฉ 1 finely grated zucchini (hidden veggie! ๐Ÿฅท) and 1 minced garlic clove . Add 1 can of white beans (drained) for protein. Stir in the cooked pasta and 1/2 cup shredded mozzarella cheese ๐Ÿง€ until melted and combined. Serve immediately! ๐Ÿ˜Š (Bonus: Let kids top with extra cheese for a fun touch! โœจ)"

Explanation of Why This Works Better for Kid-Friendly Meals: ๐Ÿง’๐Ÿด

  • Food Preference: AI made the entire diet plan based on a vegetarian meal preference, as mentioned by the user.

  • Style/Format: The AI copied the exact structure from your kid-friendly example - including the fun name format, emojis, and bonus tips in parentheses. ๐Ÿ‘ฏโ€โ™€๏ธ

  • Kid-Specific Strategy Transfer: Most importantly, the AI learned your specific kid-friendly approach from Example 1: ๐ŸŽฏ

  • Hidden veggies ๐Ÿฅ’ (grated zucchini instead of visible pieces)

    • Familiar bases ๐Ÿ (pasta instead of exotic grains)

    • Cheesy/creamy elements ๐Ÿง€ that kids love

    • Fun, interactive elements โœจ (toppings and "sprinkles")

  • Multi-Constraint Balancing: The AI successfully balanced all three requirements: 25-minute timing โฑ๏ธ, high protein ๐Ÿ’ช (via white beans), AND kid-friendly strategies learned from your examples. โš–๏ธ

  • Your Voice: The output maintains your friendly, practical tone ๐Ÿ—ฃ๏ธ, with specific calls to action tailored to families.

  • Key Insight: ๐Ÿ’ก This demonstrates how CoT+Examples allows the AI to understand nuanced combinations of requirements that would be impossible to convey through CoT alone. The examples taught what "kid-friendly" means to you specifically ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ, while the CoT structure ensured it also met the new high-protein and timing constraints.

Chain-of-Thought is powerful, but what if the AI's reasoning is flawed? These next techniquesโ€”Self-Consistency and LLM-as-a-Judgeโ€”act as quality control to ensure you get the best output.

The Advanced Techniques - Refining the Thinking ๐Ÿ”„๐Ÿค”

Self-Consistency Prompting ๐Ÿ”

Self-Consistency Prompting

Self-consistency is an advanced application of the Chain of Thought (COT) technique. It is most effective and was specifically designed to be used with COT, but it doesn't strictly require a pre-provided example. Self-consistency is a technique where you ask the same AI model the same question multiple times with a setting that allows for varied reasoning.

"For example, you ask this prompt 3 times to a single AI model:
1๏ธโƒฃ๐Ÿ“ 'Plan a family game night for kids (6-10) and adults.'
(We'll see the detailed CoT steps for this in the next section)"

AI generates 3 different outputs for the same prompt.
You then take the most common final answer from all the attempts, trusting that the majority vote is more reliable than any single attempt.

Why the Self-consistency prompt is powerful: It helps cancel out the "noise" or mistakes that can happen in any single, random reasoning path. If the model arrives at the same correct answer through three different logical routes, you can be much more confident in that answer.

What YOU Do: ๐Ÿ‘ˆ

  • You write one prompt (With COT, could be with or without examples as per the requirement)โœ๏ธ

  • You send that same prompt to the model multiple times ๐Ÿ“ค๐Ÿ“ค๐Ÿ“ค

  • You collect the different responses and choose the most common of them all ๐Ÿ“ฅ

What the MODEL Does: ๐Ÿค–

  • Generates different reasoning chains each time โ›“๏ธโžก๏ธโ›“๏ธโžก๏ธโ›“๏ธ

  • Approaches the problem from different angles ๐Ÿ“

  • Produces varied solutions to the same question ๐ŸŽฏ

Technical Example: Family Game Night ๐ŸŽฒ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ

Your Single Prompt (used 3 times): 1๏ธโƒฃ๐Ÿ“

  • "Plan a family game night for kids (6-10) and adults. Follow this exact format:

    Step 1 - Identify Key Needs: [List the core requirements for this group]
    Step 2 - Brainstorm Game Options: [List potential games that fit the needs]
    Step 3 - Select & Justify Final Choices: [Choose 2-3 games and explain why they are the best fit]
    Step 4 - Outline Engagement Strategy: [Detail how to keep everyone involved]

AI Thinking approach for all 3 times๐Ÿค–: (Please refer to the game glossary to understand this example)

Quick Game Glossary

  • Dixit: A creative guessing game using dreamlike art cards. Players give clues, and others try to match the clue to the correct card. It's about imagination, not right answers.

  • Jenga: The classic physical game of skill where players take turns removing blocks from a tower and placing them on top, trying not to be the one who makes it fall.

  • Mysterium: A cooperative game where one player (a ghost) gives visual clues to the others (psychics) to help them solve a murder. Everyone wins or loses together.

  • Pandemic: Hot Zone: A shorter, faster version of the popular game where all players work as a team to stop the spread of diseases around the world. It's cooperative and strategic.

  • Rory's Story Cubes: A creativity game with dice that have pictures instead of numbers. Players roll the dice and use the images to invent a story together. There are no rules to winโ€”just imagine!

  • Telestrations: A hilarious hybrid of Telephone and Pictionary. You draw a word, then someone else guesses what it is, then the next person draws that guess, and so on. The fun is in how the message gets distorted.

  • Uno: The famous, fast-paced card game where players match cards by colour or number. The goal is to be the first to get rid of all your cards by shouting "Uno!" when you have one left.

Letโ€™s check the self-consistency example now:

Run #1 AI Thinking Approach: ๐Ÿค– โ†’ Focus on Cooperation & Inclusivity

  • Step 1 - Identify Key Needs: "The core need is a huge age gap. I must avoid games that are too complex for kids or too simplistic for adults. The primary goal is shared fun, not intense competition."

  • Step 2 - Brainstorm Game Options: "Cooperative games are ideal. Brainstorm: Pandemic: Hot Zone, a collaborative puzzle, Mysterium."

  • Step 3 - Select & Justify Final Choices: "Select Mysterium (kids can be intuitive intuitives) and a collaborative puzzle. These remove the pressure of winning and focus on working together, which is perfect for this mix."

  • Step 4 - Outline Engagement Strategy: "Assign roles: adults can read rules, kids can be active participants. Emphasise that 'we win or lose together.'"

Run #2 AI Thinking Approach: ๐Ÿค– โ†’ Focus on Classic Competition & Laughter

  • Step 1 - Identify Key Needs: "The key is high energy and quick turns. Need games where rules are understood in seconds and the outcome is often silly, minimising frustration."

  • Step 2 - Brainstorm Game Options: "Fast-paced, lighthearted classics. Brainstorm: Uno, Jenga, Telestrations."

  • Step 3 - Select & Justify Final Choices: "Select Telestrations and Jenga. Telestrations guarantees laughter as drawings get distorted, and Jenga creates tangible, suspenseful fun for all ages."

  • Step 4 - Outline Engagement Strategy: "Keep the mood light. Play in teams (adult+kid) for Hedbanz. Celebrate funny mistakes, not just winning."

Run #3 AI Thinking Approach: ๐Ÿค– โ†’ Focus on Creativity & Storytelling

  • Step 1 - Identify Key Needs: "Engage different skills. Adults can strategise, kids can be imaginative. Need a game that acts as a creative outlet rather than a test of knowledge."

  • Step 2 - Brainstorm Game Options: "Story-building and imaginative games. Brainstorm: Rory's Story Cubes, Stuffed Fables."

  • Step 3 - Select & Justify Final Choices: "Select Dixit and Rory's Story Cubes. These games have no wrong answers, allowing kids to shine with their creativity and adults to enjoy the open-ended interpretation."

  • Step 4 - Outline Engagement Strategy: "Start with a story circle using the cubes to warm up. In Dixit, encourage everyone to explain their thought process after each round."

The Self-Consistency Vote: After these three runs, you'd see three strong but different plans (Cooperative, Competitive, Creative). The "best" final plan is chosen by seeing which approach's reasoning is most consistently sound or by picking the one that best fits your family's specific mood.

What Self-Consistency IS NOT: โŒ

  • You write: "Plan a cooperative game night" ๐Ÿค

  • You write: "Plan a competitive game night" ๐Ÿ†

  • You write: "Plan a creative game night" ๐ŸŽจ

What Self-Consistency ACTUALLY IS: โœ…

  • You wrote: "Plan a family game night" (once) ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ๐ŸŽฒ

  • Model generating cooperative approach (run 1) ๐Ÿค

  • Model generating competitive approach (run 2) ๐Ÿ†

  • Model generating creative approach (run 3) ๐ŸŽจ

Why This Distinction Matters ๐Ÿ’ก

  • Less work for you: You write one good prompt instead of multiple variations ๐Ÿ˜Œ

  • More authentic diversity: The model discovers natural variations in reasoning ๐ŸŒˆ

  • Better coverage: Explores the solution space more thoroughly ๐Ÿ—บ๏ธ

  • More reliable: Majority voting across independent reasoning paths โœ…

Letโ€™s clear some confusion around Self-consistency prompting

How do we get different answers for the same prompt if asked multiple times?

The AI generates different outputs from an identical prompt by using non-deterministic sampling techniques. The key settings that enable this are:

  • Temperature: To get "varied reasoning," you use a sampling technique with a temperature setting greater than 0 (e.g., Temperature = 0.7). This controls the randomness or creativity. A higher temperature makes the model more random but still correct. This means that at each step in its reasoning, the model might choose a slightly different phrasing and consider a different angle.

  • Top-p (Nucleus Sampling): With this sampling, instead of considering all possible words, the model only samples from the smallest set of words whose combined probability exceeds a threshold (e.g., top-p=0.9). This works with temperature to efficiently create diversity.

Batch generation vs Self-consistency prompting

  • This looks similar to self-consistency prompting, which uses a single prompt to request multiple outputs โ€œat onceโ€. Itโ€™s ideal for producing consistently formatted content efficiently.

  • For example, โ€œcreate 3 distinct family game night plans for kids (6-10) and adults. Focus on different approaches: cooperative, competitive, and creative games.โ€

  • Batch generation and self-consistency are often confused but serve opposite purposes. Batch generation utilises a single prompt to efficiently produce multiple outputs in a single run, prioritising volume and consistent formatting. Self-consistency uses various independent runs of the same prompt to validate a single answer, prioritising reliability and accuracy through majority voting. One is for scale, the other for certainty.

LLM as a Judge โš–๏ธ๐Ÿค–

In the LLM-as-a-Judge is an advanced technique, in which you use two separate AI models - one to generate content and another, typically a more advanced model, to evaluate that content against specific criteria.

In simple terms, it's like having a junior employee draft proposals, then having a senior expert review them and pick the best one. ๐Ÿ‘จโ€๐Ÿ’ผโžก๏ธ๐Ÿ‘จโ€๐Ÿ’ป

CoT can be used within LLM-as-a-Judge to make the evaluation more reliable; however, the role is different: the model acts as a judge rather than a solver.

There are two applications of the LLM-as-a-Judge Method ๐Ÿ”ฝ

1. Selection & Ranking ๐Ÿฅ‡๐Ÿฅˆ๐Ÿฅ‰

Choosing the best solution from multiple options.

  • Use this when you have several good alternatives and need an expert opinion to determine the most effective one.

2. Iterative Refinement & Critique โœจ

Improving a single piece of work through expert feedback.

  • Use this when: You want to transform a good draft into an excellent final version through structured feedback loops.

1. Selection & Ranking (Choosing the Best from Many) ๐Ÿฅ‡

LLM as judge-Selection & Ranking

This method generates multiple answers at once ๐Ÿ“๐Ÿ“ and uses a powerful AI judge to pick the single best one โœ…. It's for choosing a winner from many options.

What YOU Do: ๐Ÿ‘ค

  • You write one prompt โœ๏ธ instructing the AI to generate multiple distinct options or plans (Batch Generation prompt technique, not self-consistency prompting).

  • You write a second, separate prompt ๐Ÿ“ for a different, more powerful AI model, acting as a Judge โš–๏ธ. This prompt includes the generated options and your specific criteria for evaluation (e.g., "Which is most cost-effective?" ๐Ÿ’ฐ).

  • You collect the Judge's scored ranking and recommendation. ๐Ÿ“Šโœ…

What the MODEL Does: ๐Ÿค–

  • The Generator Model ๐Ÿญ creates a shortlist of different solutions (e.g., Plan A, Plan B, Plan C).

  • The Judge Model โš–๏ธ analyses each option against your criteria, scores them, and selects the most suitable one. It justifies its choice.

Analogy: You ask several architects ๐Ÿ—๏ธ for building designs, then hire a senior inspector ๐Ÿ” to evaluate them all and tell you which one is the most structurally sound.

Selection & Ranking Example: Family Game Night ๐ŸŽฒ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ

Step 1: Generator Model Produces Content ๐Ÿญ

  • Your Prompt to Generator Model (e.g., GPT-3.5): โœ๏ธ

    • "Create three distinct family game night plans for kids (6-10) and adults. Focus on different approaches: cooperative, competitive, and creative games."
  • Generator's Output: ๐Ÿ“ฅ

    • Plan A: Cooperative team games focusing on collaboration ๐Ÿค

    • Plan B: Classic competitive games with modified rules ๐Ÿ†

    • Plan C: Imagination-based creative activities ๐ŸŽจ

Step 2: Judge Model Evaluates the Content โš–๏ธ

  • Your Prompt to Judge Model (e.g., GPT-4): โœ๏ธ

    • "Act as a child development expert. ๐Ÿ‘จโ€๐Ÿซ Evaluate these three game night plans:
      PLANS TO EVALUATE:
      [Insert Plan A, B, and C here]

      CRITERIA (weighted):
      *- Family Harmony ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ (40%): Minimises arguments and frustration*
      *- Age Appropriateness 6๏ธโƒฃโžก๏ธ1๏ธโƒฃ0๏ธโƒฃ (30%): Engages both kids and adults*
      *- Practical Setup โฑ๏ธ (30%): Realistic for tired parents ๐Ÿ˜ด*

      Provide scores and recommendations for a family where parents work long hours ๐Ÿ’ผ and one child gets easily frustrated with losing." ๐Ÿ˜ 

What Happens Behind the Scenes: ๐ŸŽญ

  • Analyses each plan against criteria: "Plan A focuses on cooperation, which should reduce conflict..." ๐Ÿค

  • Applies weighted scoring: "Family harmony is 40%, so cooperative games score highest here..." ๐Ÿ“Š

  • Considers the specific context: "Given the tired parents and sensitive child, practicality and low conflict are prioritised..." ๐Ÿ˜ด๐Ÿ‘ถ

  • Provides expert justification: "Research shows cooperative games reduce sibling rivalry by 60%..." ๐Ÿ“ˆ

Judge's Final Output: ๐Ÿ“„โœ…

Judge Final output

EVALUATION RESULTS:

Plan A (Cooperative): 9.2/10
- Family Harmony: 10/10 - Eliminates win/lose dynamics
- Age Appropriateness: 9/10 - Works for mixed ages
- Practical Setup: 8/10 - Moderate preparation needed

Plan B (Competitive): 6.5/10  
- Family Harmony: 5/10 - High frustration risk for sensitive child
- Age Appropriateness: 8/10 - Good engagement potential
- Practical Setup: 9/10 - Very easy to implement

RECOMMENDATION: Plan A is optimal despite higher setup time, 
as it addresses the family's core constraint of minimizing conflict.

The Complete Selection & Ranking Workflow:

YOUR PROMPT โ†’ GENERATOR MODEL โ†’ CONTENT โ†’ JUDGE MODEL โ†’ EXPERT EVALUATION
    โ†“              โ†“               โ†“           โ†“             โ†“
"Create 3    โ†’   GPT-3.5    โ†’   Plans    โ†’   GPT-4    โ†’   Scored ranking +
 plans"                                          โ”‚         recommendations
                                                 โ”‚
                                                 โ”‚
                                                 โ†“
                                         "Plan A is best because..."

2. Iterative Refinement & Critique (Improving a Single Draft) ๐Ÿ”„โœจ๐Ÿ“

This method takes one draft and has an AI judge give specific feedback for improvement ๐Ÿ“ˆ. You then revise the draft based on that expert critique.

What YOU Do: ๐Ÿ‘ค

  1. You start with a single piece of content (a draft email โœ‰๏ธ, a code snippet ๐Ÿ’ป, a plan ๐Ÿ“‹).

  2. You send this single draft to the Judge Model โš–๏ธ with a prompt asking for specific feedback (e.g., "Critique this for clarity and persuasiveness" ๐Ÿ—ฃ๏ธ).

  3. You receive detailed feedback ๐Ÿ“, revise the draft based on the notes ๐Ÿ”ง, and can send it back to the judge for another round of review. ๐Ÿ”

What the MODEL Does: ๐Ÿค–

  • The Judge Model โš–๏ธ acts as an expert critic. It analyses the single input, identifies weaknesses based on your criteria ๐ŸŽฏ, and provides actionable suggestions for improvement ๐Ÿ’ก.

  • It does not choose from other options; it helps you make the one option you have much better. ๐Ÿ“ˆ

Analogy: You give a draft of your speech ๐ŸŽค to a speaking coach ๐Ÿ‘จโ€๐Ÿซ. They don't show you other speeches; they mark up your draft with notes like "Strengthen this argument" ๐Ÿ’ช or "Simplify this sentence." โœ‚๏ธ

Iterative Refinement: The Neighbour Favour Example ๐Ÿก๐Ÿ™

LLM as judge Iterative Refinement:

Step 1: You create the Initial Draft ๐Ÿ“

  • Your First Attempt:

    • "Can you take my package tomorrow?" ๐Ÿ“ฆ

Step 2: Judge Model Evaluates Your Draft โš–๏ธ

  • Your Prompt to Judge Model: โœ๏ธ

    • "Act as a communication expert. ๐Ÿ—ฃ๏ธ Evaluate this text message asking a neighbour for a favour:
      MESSAGE: [Insert your draft here]
      CRITERIA (weighted):
      *- Politeness ๐Ÿ™ (40%): Sounds respectful and considerate*
      *- Clarity โœ… (40%): Provides all necessary information*
      *- Likelihood of Success ๐ŸŽฏ (20%): How likely it is to get a positive response*
      Provide specific feedback for improvement for a busy neighbour who doesn't know you well." โฑ๏ธ

What Happens Behind the Scenes: ๐ŸŽญ

  • Analyses against criteria: "The message is direct but sounds demanding rather than requesting" ๐Ÿ‘Ž

  • Applies weighted scoring: "Politeness is 40% of the score, and this score is low due to a commanding tone" ๐Ÿ“Š

  • Considers the context: "Neighbours are more likely to help when they feel appreciated and know the specifics" ๐Ÿค”๐Ÿก

  • Provides specific improvements: "Add a greeting, specify timing, mention how you'll retrieve it, express gratitude" ๐Ÿ’ก

Judge's Output: ๐Ÿ“„

  • EVALUATION RESULTS:

    • Politeness: 3/10 ๐Ÿ™ - Sounds like a command rather than a request

    • Clarity: 5/10 โœ… - Missing key details (what time? how long will they need to hold it?)

    • Success Likelihood: 4/10 ๐ŸŽฏ - Low due to impersonal tone

  • SPECIFIC IMPROVEMENTS SUGGESTED:

    • Start with a friendly greeting ("Hi [Name]!") ๐Ÿ‘‹

    • Phrase as a question ("Would you be able to...?") โ“

    • Include specific details (delivery time โฐ, pickup plan ๐Ÿ“)

    • Express appreciation ๐Ÿ™

Step 3: Revised Message Based on Feedback ๐Ÿ”ง

  • Final Improved Version:

    • "Hi, Sarah! ๐Ÿ‘‹ I have a package arriving tomorrow between 1-3 PM โฐ, but I won't be home. Would you be able to accept it on my behalf? โ“ I can pick it up after 6 PM ๐Ÿ“. I'd really appreciate your help! ๐Ÿ™"

Why This Works Better: โœ…

  • Politeness: 9/10 ๐Ÿ™ - Friendly, respectful, and appreciative

  • Clarity: 10/10 โœ… - All necessary information provided

  • Success Likelihood: 9/10 ๐ŸŽฏ - Much higher chance of positive response ๐Ÿ‘

This demonstrates how iterative refinement transforms a basic, potentially ineffective message into one that's much more likely to achieve your goal while maintaining good relationships. ๐Ÿค

The Complete Iterative Refinement Workflow ๐Ÿ”„

YOUR DRAFT โ†’ JUDGE MODEL โ†’ EXPERT FEEDBACK โ†’ REVISED DRAFT โ†’ FINAL VERSION
    โ†“            โ†“               โ†“               โ†“              โ†“
"Can you   โ†’  GPT-4     โ†’  "Score: 4/10    โ†’  "Hi Sarah!  โ†’  Polished,
take my    โ†’  (as       โ†’  - Too vague     โ†’  I have a    โ†’  effective
package?"  โ†’  Editor)   โ†’  - Sounds demanding" โ†’ package...  โ†’  message
                         โ†’                  โ†’              โ†’ 
                         โ†“                                  โ†“
                 "Improve: Add details,            "Score: 9/10
                  be more polite, specify timing"  - Clear & polite"

Why This Two-Model Approach (LLM as Judge) Works โš–๏ธ๐Ÿค–โœจ

  • Specialisation ๐ŸŽฏ: Different models excel at different tasks. One is a great creator ๐Ÿญ, the other is a sharp critic ๐Ÿ”.

  • Objectivity ๐ŸงŠ: The Judge model hasn't seen the "reasoning behind" the generation. It's like a fresh pair of eyes ๐Ÿ‘€, evaluating the final product, not the process.

  • Quality control ๐Ÿ“ˆ: A more advanced model provides a higher-quality evaluation. It's the expert review ๐Ÿ‘จโ€๐Ÿซ on the work.

  • Bias reduction โš–๏ธ: Separate models reduce inherent biases in evaluation. It avoids the "I like my own idea" trap ๐Ÿ™ˆ.

Key Differences between LLM-as-Judge and Self-Consistency ๐Ÿ”„ vs. โš–๏ธ

Self-Consistency: ๐Ÿ”„

  • The same model was used multiple times. 1๏ธโƒฃโžก๏ธ2๏ธโƒฃโžก๏ธ3๏ธโƒฃ

  • Answers: "What are different ways to approach this?" ๐Ÿ›ฃ๏ธ๐Ÿ›ฃ๏ธ๐Ÿ›ฃ๏ธ

LLM-as-Judge: โš–๏ธ

  • Different models for different roles. ๐Ÿค (Generator + Judge)

  • Evaluates solution quality objectively. ๐Ÿ†

  • Answers: "Which approach is best for my specific needs?" โœ…๐ŸŽฏAnswers: "Which approach is best for my specific needs?"

Want to experiment with these techniques? Check out my GitHub repository with working implementations (Donโ€™t skip the README file) :

Available implementations:

  • Chain of Thought reasoning

  • Self-consistent prompting

  • Few-shot learning templates

  • LLM-as-judge evaluation

Explore the code here

Conclusion: Your "Right Prompt" Key to the Anywhere Door ๐Ÿ—๏ธ๐Ÿšช๐ŸŒŒ

Doremon Anywhere Door

In the world of Doraemon ๐Ÿฑ๐Ÿค–, the "Anywhere Door" is perhaps the most magical tool of allโ€”a portal that can take you anywhere you can imagine, but only if you speak your destination clearly and correctly. For years, Nobita would stammer vague directions like "somewhere fun! ๐ŸŽข" or "a place where I can be successful ๐Ÿ’ผ," finding himself in bizarre or disappointing locations. ๐Ÿ˜…

This is exactly how we've been using AIโ€”standing before an "Anywhere Door" of infinite potential, but mumbling incoherent directions. ๐Ÿค๐Ÿšช

The prompting techniques we've explored transform you from a stumbling Nobita ๐Ÿ˜ฅ into a confident navigator of AI's infinite possibilities: ๐Ÿงญโœจ

  • System Prompts ๐ŸŽฏ are learning to specify the exact address ๐Ÿ  rather than just the country ๐ŸŒ

  • Chain-of-Thought ๐Ÿง โ›“๏ธ is plotting the step-by-step journey ๐Ÿ—บ๏ธ instead of hoping for teleportation ๐ŸŒ€

  • Self-consistency ๐Ÿ”„ is checking multiple routes ๐Ÿ›ฃ๏ธ๐Ÿ›ฃ๏ธ๐Ÿ›ฃ๏ธ to ensure you reach the best destination ๐Ÿ†

  • LLM-as-Judge โš–๏ธ๐Ÿค– is having an expert travel guide ๐Ÿ‘จโ€๐Ÿซ refine your itinerary ๐Ÿ“‹

The magic was never brokenโ€”we just needed to learn how to give proper directions. ๐Ÿช„ Doraemon's Anywhere Door could always take you to Paris ๐Ÿ—ผ, the moon ๐ŸŒ™, or your grandmother's house ๐Ÿ‘ต, but only if you could articulate exactly where you wanted to go. ๐Ÿ—ฃ๏ธ

You no longer need to accept AI's random destinations. ๐ŸŽฏ You now possess the "Right Prompt" key ๐Ÿ—๏ธ that lets you step confidently through the Anywhere Door to precisely where you need to be. ๐Ÿ’ช

Stop wandering through AI's random corridors. ๐Ÿšถโ€โ™‚๏ธ๐ŸŒ€
Start commanding your own destiny with the key you now hold. ๐Ÿซด๐Ÿ—๏ธ๐ŸŒŸ

More from this blog

A

AI Explained Simply

5 posts