Building Thinking Models: From Basic Prompts to AI Collaboration ๐ง ๐ ๏ธโก๏ธ๐ค

๐ Senior Software Engineer with 9+ years of expertise in building scalable backends with Node.js, AWS, Microservices, MongoDB, and Angular. I cut through the AI hype and show you how to practically integrate AI into your Node.js applications. But hereโs what makes my content different: I specialise in AI storytelling โ turning complex concepts like transformers, vector embeddings, and LLMs into relatable stories and analogies (like explaining AI to my mom using her recipe box ๐ฉ๐ณ๐ฆ).
Ever asked an AI for help and gotten a response that was so missing the mark it was almost funny? ๐ You meticulously describe the blog post you need, and it gives you a recipe for lasagna ๐. The problem isnโt the AI. The problem is how weโre talking to it. ๐ฃ๏ธโก๏ธ๐ค
Most of us are still shouting one-line commands into the void, hoping a super-intelligent mind will read our thoughts and deliver exactly what we imagine. But AI doesnโt work that way. Itโs not a mind-reader ๐ฎ; itโs an instrument ๐ป.
In this blog, you'll learn:
Why your first words matter more than you think (The Foundation ๐งฑ)
How to use different types of prompts for various tasks (The Basic Tools ๐ ๏ธ)
How to trigger actual reasoning (Building Thinking Models ๐ง )
And finally, how to implement advanced techniques that refine and validate the AI's thinking process (Refining the Thinking โ๏ธ)
Stop hoping for the best and start prompting with purpose. ๐ Letโs build. ๐๏ธ
The Foundation ๐งฑ - Why Your First Words Matter

Before you ask your question, you must set the stage. ๐ญ This is the most crucial step that most people overlook.
What is a System Prompt? ๐ค
A system prompt is the initial, behind-the-scenes instruction that defines the AIโs role, personality, and rules for the entire conversation. Itโs the context you give before the conversation even starts.
The Power of Context: Preventing Chaos. ๐ช๏ธโก๏ธโจ
Without a system prompt, youโre talking to a default, generic AI. Itโs like shouting your question (eg, โHow do I fix a leaky faucet?โ ๐ฐ) into a breakroom full of peopleโyou might get an answer from the intern ๐ , the sarcastic accountant ๐, or someone whoโs only half-listening ๐ฅฑ. The results are unpredictable and often useless. ๐
A strong system prompt cuts through the noise. Itโs like walking directly to the office expert and saying, โFor this conversation, you are not just โsome guyโโyou are an expert plumber who gives detailed, safe advice.โ ๐ทโโ๏ธ๐
Example: The Breakroom vs. The Expert โ๏ธ
โ No System Prompt:
You: โHow do I fix a leaky faucet?โ AI's Generic Response: โWater issues are the worst! ๐ฆ Have you tried turning it off at the valve? If that doesn't work, a classic lasagna always makes me feel better, hereโs the recipe๐โ
(We've all gotten AI answers that missed the mark. This lasagna recipe is just a humorous, exaggerated version of that common frustration.)
โ With a System Prompt:
You (First, setting the stage): โYou are a master plumber with 30 years of experience. You are patient, love to teach, and provide clear, step-by-step guides for beginners.โ
You (Then asking): โHow do I fix a leaky faucet?โ
AI's Expert Response: โAh, a common issue! ๐ First, letโs make sure youโve turned off the water supply under the sink.Then disassemble the faucet to identify the type (cartridge, washer, etc.), replace the worn-out part with an exact match from a repair kit, and reassemble everything.โ ๐ ๏ธ
The Basic Tools - Types of Prompts ๐ ๏ธ๐

Once the stage is set ๐ญ, you need to know how to ask your question. ๐ฃ๏ธ Different tasks require different styles of prompting ๐ ๏ธ. We will see the following basic types of prompting in this section.
The Zero-Shot Prompt
The Few-Shot Prompt
Example: Writing Social Media Posts in a Specific Style ๐ฑโจ
The Scenario: You are a social media manager, and you want an AI to help you write catchy posts for a new coffee shop. โ You don't just want any post; you need it to match the shop's unique, playful brand voice ๐จ.
The Zero-Shot Prompt (Ineffective) โ
Prompt:
"Write a social media post about our new seasonal latte."
AI's Generic Output:"We're excited to announce our new seasonal latte is now available! Come try it today." ๐ด (This is a bland output and could be common for any coffee shop.)
The Few-Shot Prompt (Effective - Teaching by Example) โ ๐ฏ
Here, you show the AI the exact style, structure, and tone you want by providing clear examples. ๐
User Prompt:
Write social media posts in the following playful and emoji-heavy style for our coffee shop:
Example 1: "Is it just us, or does Monday need a double shot? โ๏ธ๐ฅ Our new espresso blend is here to tackle your to-do list. #FuelYourDay"
Example 2: "Warning: this coffee is dangerously good. โก๏ธ๐คฏ Have you tried our cold brew? It might just become your new obsession. #BrewedToPerfection"
Now, write a post about our new seasonal pumpkin spice latte:
AI's Output (Following the Pattern):"Autumn's favourite drink is back! ๐๐ Our iconic pumpkin spice latte is here to make your season extra cosy. Swipe right for fall vibes. #PumpkinSpiceSzn"
Pumpkin Spice Latte is neither a cold brew nor an espresso blend. It's its own distinct drink related to coffee, typically made with espresso, steamed milk, and pumpkin spice syrup
The key insight is that the AI is NOT copying the drink type from the examples. It's copying the marketing style and post structure.
Why This Works: ๐ง ๐ก
You defined the "Coffee Voice" โ๏ธ๐ค: You didn't just say "playful." You showed what a playful coffee brand sounds like by using energetic emojis (โก๏ธ๐คฏ), relatable hooks ("Is it just us...?"), and a tone of confident excitement ("Warning: this coffee is dangerously good.").
You provided a Blueprint ๐บ๏ธ๐: The AI learned a repeatable formula from the examples: [Engaging Hook] ๐ฃ + [Relevant Emojis] ๐ + [Product Benefit] ๐ช + [Branded Hashtag] #๏ธโฃ. It then applied this proven coffee-marketing blueprint directly to the new product.
It then applied this proven coffee-marketing blueprint directly to the new product:
[Engaging Hook]: "Autumn's favourite drink is back!"
[Relevant Emojis]:
๐๐[Product Benefit]: "make your season extra cosy"
[Branded Hashtag]: #PumpkinSpiceSzn
You removed all guesswork ๐ฏ๐งฉ: The AI didn't have to wonder, "Is this for a bakery or a cafe?" ๐ง๐ช By using only coffee examples, you gave it a crystal-clear style guide. It simply followed the examples, ensuring the new post perfectly matched the brand's established voice.
While few-shot prompting is great for teaching style ๐จ, what happens when you need the AI to tackle problems that require deep logic and reasoning? ๐ง ๐ค This is where we level up to Chain-of-Thought prompting! ๐ Let's explore in the next section. ๐
Building Thinking Models with Chain-of-Thought (CoT) Prompting ๐ง ๐๐ญ
Chain-of-Thought (CoT) is a prompting technique where you ask the AI to verbalise its reasoning process step-by-step before providing a final answer. Instead of jumping straight to an output, the AI is forced to simulate a logical thought process. ๐
It's the difference between asking someone for a random recipe ๐ versus asking them to plan a meal based on your specific goals and constraints ๐ฅโฑ๏ธ.
From Non-Thinking to Thinking: ๐ถโโ๏ธโก๏ธ๐โโ๏ธ
A basic AI model provides answers. ๐ฏ
A model using CoT provides answers and a rationale. ๐ฏโ๐
This allows us to see its "work," making its output more transparent, trustworthy, and accurate. We are essentially building a thinking model out of a non-thinking one through the way we prompt. ๐งฑโก๏ธ๐
Example: The Personal Stylist ๐๐

โ Zero-Shot (Non-Thinking):
* You: "What should I wear today?" ๐ค
* AI: "Clothes." ๐ ... (Useless! ๐)
โ With CoT (Thinking):
The following is the user prompt
You: "What should I wear today? Let's think step by step. ๐ง
First, check the weather: it's 45ยฐF and raining. โ๏ธ๐ก๏ธ
Second, my activities: a Zoom call, then walking the dog. ๐ป๐
Third, I should consider comfort and professionalism..."
AI (This is how AI thinks for COT prompting):
"Okay, for the Zoom call, you need a presentable top. For cold rain, you need a warm jacket and waterproof shoes... Suggestion: Wear a nice sweater for your call, and have a waterproof jacket and boots ready for your dog walk." ๐งฅ๐ข
Chain of Thought (COT) prompting can be primarily used in two ways:
Basic: Thinking models with COT (without Example)

Chain-of-Thought without examples means you simply command the AI to "think step-by-step" before answering, without using any examples.
User Prompt: "Plan a quick weeknight dinner. Let's think step-by-step: ๐ค
Identify the goal: What are my constraints? (e.g., time, budget) โฑ๏ธ๐ฐ
Consider dietary preferences: Any allergies or dislikes? ๐ซ๐ฅ
Select a cuisine and type of dish ๐
Ensure nutritional balance ๐ฑ"
AI Output:Goal: High-protein, under 30 mins, family-friendly ๐โฑ๏ธ๐จโ๐ฉโ๐งโ๐ฆDietary: No restrictions, using chicken for quick cooking ๐ซ๐โก๏ธ๐Cuisine: Asian-inspired stir-fry for flavor and speed ๐ณ๐ฅNutrition: Balance lean protein (chicken) with veggies and carbs (rice) โ๏ธ๐ฝ๏ธ Dinner Plan: "Speedy Chicken Stir-Fry ๐ฅฌStir-fry 1 lb chicken breast strips in 1 tbsp oil (5-6 mins) ๐Add 2 cups frozen stir-fry veggies; cook 4 mins ๐ฅฆ๐ฅPour in ยผ cup teriyaki sauce; simmer 2 mins ๐ถServe over quick-cook rice! ๐ (Pro tip: Sprinkle with sesame seeds and green onions! ๐ฑ)"Why This Works:
The COT prompt encourages the AI to consider essential elements such as time, cost, and nutrition. Step-by-step reasoning helps create a logical framework.
โ ๏ธ The Limitation: The style is still somewhat genericโit lacks personal flair or specific family preferences. ๐ด also watch outโthe AI will happily suggest a chicken dinner even if you're vegetarian, because it fills in missing information with guesses rather than asking you what you actually want.
Advanced: Thinking Models with COT (with Example) โ ๐ง ๐

Chain-of-Thought with examples means you simply command the AI to "think step-by-step" before answering, by providing it with working examples.
Explanation: "Show me how to think step by step, by showing me a worked example."
User Prompt: I plan quick VEGETARIAN weeknight meals ๐ฅฆ. Here are examples of my style โ๏ธ:
Example 1: Speedy Taco Skillet 1๏ธโฃ
Goal: 20-minute meal โฑ๏ธ, one pan ๐ณ, kid-friendly ๐จโ๐ฉโ๐งโ๐ฆ
Plan: "Speedy Taco Skillet: Brown 1 lb plant-based ground. 1 can of black beans (drained). Simmer 5 mins. Off heat, stir in 1/2 cup shredded cheddar. Serve over tortilla chips! ๐ฅ (Add avocado on top if you have it!)"
Reasoning:
Use plant-based ground for speed ๐โโ๏ธ
Add a veggie hiding in the sauce ๐ฅท
A cheesy finish pleases kids ๐ง
Example 2: Lemon Caper Chickpea Pasta 2๏ธโฃ
Goal: Pantry meal ๐ฅซ, no fresh protein needed, 15 mins โฑ๏ธ
Plan: "Lemon Caper Chickpea Pasta: Cook 8 oz spaghetti. In a bowl, mix 2 cans of chickpeas (drained), the juice of 1 lemon, and black pepper. Toss with hot pasta! ๐ (Bonus: add parsley if you have it!)"
Reasoning:
Canned chickpeas for protein ๐ช
Pasta is a pantry staple ๐
Lemon and capers for bright flavour ๐
User asks AI now to create a meal plan based on the above prompt
Create a 25-minute vegetarian meal โฑ๏ธ๐ฅฆ, high protein ๐ช, kid-friendly ๐จโ๐ฉโ๐งโ๐ฆ
AI Output: ๐ค
Goal: 25-minute vegetarian meal, high protein, kid-friendlyReasoning:Use a familiar base, such as pasta ๐, for kid appealIncorporate a hidden vegetable ๐ฅ in the sauce for added nutritionAdd a creamy, cheesy element ๐ง to ensure it's pleasingInclude high-protein ingredients like white beans ๐ชPlan: "Cheesy Hidden Veggie Pasta Skillet ๐Cook 8 oz pasta according to package directions. Drain and set aside.In the same pan, heat 1 tbsp olive oil. Sautรฉ 1 finely grated zucchini (hidden veggie! ๐ฅท) and 1 minced garlic clove . Add 1 can of white beans (drained) for protein. Stir in the cooked pasta and 1/2 cup shredded mozzarella cheese ๐ง until melted and combined. Serve immediately! ๐ (Bonus: Let kids top with extra cheese for a fun touch! โจ)"
Explanation of Why This Works Better for Kid-Friendly Meals: ๐ง๐ด
Food Preference: AI made the entire diet plan based on a vegetarian meal preference, as mentioned by the user.
Style/Format: The AI copied the exact structure from your kid-friendly example - including the fun name format, emojis, and bonus tips in parentheses. ๐ฏโโ๏ธ
Kid-Specific Strategy Transfer: Most importantly, the AI learned your specific kid-friendly approach from Example 1: ๐ฏ
Hidden veggies ๐ฅ (grated zucchini instead of visible pieces)
Familiar bases ๐ (pasta instead of exotic grains)
Cheesy/creamy elements ๐ง that kids love
Fun, interactive elements โจ (toppings and "sprinkles")
Multi-Constraint Balancing: The AI successfully balanced all three requirements: 25-minute timing โฑ๏ธ, high protein ๐ช (via white beans), AND kid-friendly strategies learned from your examples. โ๏ธ
Your Voice: The output maintains your friendly, practical tone ๐ฃ๏ธ, with specific calls to action tailored to families.
Key Insight: ๐ก This demonstrates how CoT+Examples allows the AI to understand nuanced combinations of requirements that would be impossible to convey through CoT alone. The examples taught what "kid-friendly" means to you specifically ๐จโ๐ฉโ๐งโ๐ฆ, while the CoT structure ensured it also met the new high-protein and timing constraints.
Chain-of-Thought is powerful, but what if the AI's reasoning is flawed? These next techniquesโSelf-Consistency and LLM-as-a-Judgeโact as quality control to ensure you get the best output.
The Advanced Techniques - Refining the Thinking ๐๐ค
Self-Consistency Prompting ๐

Self-consistency is an advanced application of the Chain of Thought (COT) technique. It is most effective and was specifically designed to be used with COT, but it doesn't strictly require a pre-provided example. Self-consistency is a technique where you ask the same AI model the same question multiple times with a setting that allows for varied reasoning.
"For example, you ask this prompt 3 times to a single AI model:
1๏ธโฃ๐ 'Plan a family game night for kids (6-10) and adults.'
(We'll see the detailed CoT steps for this in the next section)"
AI generates 3 different outputs for the same prompt.
You then take the most common final answer from all the attempts, trusting that the majority vote is more reliable than any single attempt.
Why the Self-consistency prompt is powerful: It helps cancel out the "noise" or mistakes that can happen in any single, random reasoning path. If the model arrives at the same correct answer through three different logical routes, you can be much more confident in that answer.
What YOU Do: ๐
You write one prompt (With COT, could be with or without examples as per the requirement)โ๏ธ
You send that same prompt to the model multiple times ๐ค๐ค๐ค
You collect the different responses and choose the most common of them all ๐ฅ
What the MODEL Does: ๐ค
Generates different reasoning chains each time โ๏ธโก๏ธโ๏ธโก๏ธโ๏ธ
Approaches the problem from different angles ๐
Produces varied solutions to the same question ๐ฏ
Technical Example: Family Game Night ๐ฒ๐จโ๐ฉโ๐งโ๐ฆ
Your Single Prompt (used 3 times): 1๏ธโฃ๐
"Plan a family game night for kids (6-10) and adults. Follow this exact format:
Step 1 - Identify Key Needs: [List the core requirements for this group]
Step 2 - Brainstorm Game Options: [List potential games that fit the needs]
Step 3 - Select & Justify Final Choices: [Choose 2-3 games and explain why they are the best fit]
Step 4 - Outline Engagement Strategy: [Detail how to keep everyone involved]
AI Thinking approach for all 3 times๐ค: (Please refer to the game glossary to understand this example)
Quick Game Glossary
Dixit: A creative guessing game using dreamlike art cards. Players give clues, and others try to match the clue to the correct card. It's about imagination, not right answers.
Jenga: The classic physical game of skill where players take turns removing blocks from a tower and placing them on top, trying not to be the one who makes it fall.
Mysterium: A cooperative game where one player (a ghost) gives visual clues to the others (psychics) to help them solve a murder. Everyone wins or loses together.
Pandemic: Hot Zone: A shorter, faster version of the popular game where all players work as a team to stop the spread of diseases around the world. It's cooperative and strategic.
Rory's Story Cubes: A creativity game with dice that have pictures instead of numbers. Players roll the dice and use the images to invent a story together. There are no rules to winโjust imagine!
Telestrations: A hilarious hybrid of Telephone and Pictionary. You draw a word, then someone else guesses what it is, then the next person draws that guess, and so on. The fun is in how the message gets distorted.
Uno: The famous, fast-paced card game where players match cards by colour or number. The goal is to be the first to get rid of all your cards by shouting "Uno!" when you have one left.
Letโs check the self-consistency example now:
Run #1 AI Thinking Approach: ๐ค โ Focus on Cooperation & Inclusivity
Step 1 - Identify Key Needs: "The core need is a huge age gap. I must avoid games that are too complex for kids or too simplistic for adults. The primary goal is shared fun, not intense competition."
Step 2 - Brainstorm Game Options: "Cooperative games are ideal. Brainstorm: Pandemic: Hot Zone, a collaborative puzzle, Mysterium."
Step 3 - Select & Justify Final Choices: "Select Mysterium (kids can be intuitive intuitives) and a collaborative puzzle. These remove the pressure of winning and focus on working together, which is perfect for this mix."
Step 4 - Outline Engagement Strategy: "Assign roles: adults can read rules, kids can be active participants. Emphasise that 'we win or lose together.'"
Run #2 AI Thinking Approach: ๐ค โ Focus on Classic Competition & Laughter
Step 1 - Identify Key Needs: "The key is high energy and quick turns. Need games where rules are understood in seconds and the outcome is often silly, minimising frustration."
Step 2 - Brainstorm Game Options: "Fast-paced, lighthearted classics. Brainstorm: Uno, Jenga, Telestrations."
Step 3 - Select & Justify Final Choices: "Select Telestrations and Jenga. Telestrations guarantees laughter as drawings get distorted, and Jenga creates tangible, suspenseful fun for all ages."
Step 4 - Outline Engagement Strategy: "Keep the mood light. Play in teams (adult+kid) for Hedbanz. Celebrate funny mistakes, not just winning."
Run #3 AI Thinking Approach: ๐ค โ Focus on Creativity & Storytelling
Step 1 - Identify Key Needs: "Engage different skills. Adults can strategise, kids can be imaginative. Need a game that acts as a creative outlet rather than a test of knowledge."
Step 2 - Brainstorm Game Options: "Story-building and imaginative games. Brainstorm: Rory's Story Cubes, Stuffed Fables."
Step 3 - Select & Justify Final Choices: "Select Dixit and Rory's Story Cubes. These games have no wrong answers, allowing kids to shine with their creativity and adults to enjoy the open-ended interpretation."
Step 4 - Outline Engagement Strategy: "Start with a story circle using the cubes to warm up. In Dixit, encourage everyone to explain their thought process after each round."
The Self-Consistency Vote: After these three runs, you'd see three strong but different plans (Cooperative, Competitive, Creative). The "best" final plan is chosen by seeing which approach's reasoning is most consistently sound or by picking the one that best fits your family's specific mood.
What Self-Consistency IS NOT: โ
You write: "Plan a cooperative game night" ๐ค
You write: "Plan a competitive game night" ๐
You write: "Plan a creative game night" ๐จ
What Self-Consistency ACTUALLY IS: โ
You wrote: "Plan a family game night" (once) ๐จโ๐ฉโ๐งโ๐ฆ๐ฒ
Model generating cooperative approach (run 1) ๐ค
Model generating competitive approach (run 2) ๐
Model generating creative approach (run 3) ๐จ
Why This Distinction Matters ๐ก
Less work for you: You write one good prompt instead of multiple variations ๐
More authentic diversity: The model discovers natural variations in reasoning ๐
Better coverage: Explores the solution space more thoroughly ๐บ๏ธ
More reliable: Majority voting across independent reasoning paths โ
Letโs clear some confusion around Self-consistency prompting
How do we get different answers for the same prompt if asked multiple times?
The AI generates different outputs from an identical prompt by using non-deterministic sampling techniques. The key settings that enable this are:
Temperature: To get "varied reasoning," you use a sampling technique with a temperature setting greater than 0 (e.g., Temperature = 0.7). This controls the randomness or creativity. A higher temperature makes the model more random but still correct. This means that at each step in its reasoning, the model might choose a slightly different phrasing and consider a different angle.
Top-p (Nucleus Sampling): With this sampling, instead of considering all possible words, the model only samples from the smallest set of words whose combined probability exceeds a threshold (e.g., top-p=0.9). This works with temperature to efficiently create diversity.
Batch generation vs Self-consistency prompting
This looks similar to self-consistency prompting, which uses a single prompt to request multiple outputs โat onceโ. Itโs ideal for producing consistently formatted content efficiently.
For example, โcreate 3 distinct family game night plans for kids (6-10) and adults. Focus on different approaches: cooperative, competitive, and creative games.โ
Batch generation and self-consistency are often confused but serve opposite purposes. Batch generation utilises a single prompt to efficiently produce multiple outputs in a single run, prioritising volume and consistent formatting. Self-consistency uses various independent runs of the same prompt to validate a single answer, prioritising reliability and accuracy through majority voting. One is for scale, the other for certainty.
LLM as a Judge โ๏ธ๐ค
In the LLM-as-a-Judge is an advanced technique, in which you use two separate AI models - one to generate content and another, typically a more advanced model, to evaluate that content against specific criteria.
In simple terms, it's like having a junior employee draft proposals, then having a senior expert review them and pick the best one. ๐จโ๐ผโก๏ธ๐จโ๐ป
CoT can be used within LLM-as-a-Judge to make the evaluation more reliable; however, the role is different: the model acts as a judge rather than a solver.
There are two applications of the LLM-as-a-Judge Method ๐ฝ
1. Selection & Ranking ๐ฅ๐ฅ๐ฅ
Choosing the best solution from multiple options.
- Use this when you have several good alternatives and need an expert opinion to determine the most effective one.
2. Iterative Refinement & Critique โจ
Improving a single piece of work through expert feedback.
- Use this when: You want to transform a good draft into an excellent final version through structured feedback loops.
1. Selection & Ranking (Choosing the Best from Many) ๐ฅ

This method generates multiple answers at once ๐๐ and uses a powerful AI judge to pick the single best one โ . It's for choosing a winner from many options.
What YOU Do: ๐ค
You write one prompt โ๏ธ instructing the AI to generate multiple distinct options or plans (Batch Generation prompt technique, not self-consistency prompting).
You write a second, separate prompt ๐ for a different, more powerful AI model, acting as a Judge โ๏ธ. This prompt includes the generated options and your specific criteria for evaluation (e.g., "Which is most cost-effective?" ๐ฐ).
You collect the Judge's scored ranking and recommendation. ๐โ
What the MODEL Does: ๐ค
The Generator Model ๐ญ creates a shortlist of different solutions (e.g., Plan A, Plan B, Plan C).
The Judge Model โ๏ธ analyses each option against your criteria, scores them, and selects the most suitable one. It justifies its choice.
Analogy: You ask several architects ๐๏ธ for building designs, then hire a senior inspector ๐ to evaluate them all and tell you which one is the most structurally sound.
Selection & Ranking Example: Family Game Night ๐ฒ๐จโ๐ฉโ๐งโ๐ฆ
Step 1: Generator Model Produces Content ๐ญ
Your Prompt to Generator Model (e.g., GPT-3.5): โ๏ธ
- "Create three distinct family game night plans for kids (6-10) and adults. Focus on different approaches: cooperative, competitive, and creative games."
Generator's Output: ๐ฅ
Plan A: Cooperative team games focusing on collaboration ๐ค
Plan B: Classic competitive games with modified rules ๐
Plan C: Imagination-based creative activities ๐จ
Step 2: Judge Model Evaluates the Content โ๏ธ
Your Prompt to Judge Model (e.g., GPT-4): โ๏ธ
"Act as a child development expert. ๐จโ๐ซ Evaluate these three game night plans:
PLANS TO EVALUATE:
[Insert Plan A, B, and C here]CRITERIA (weighted):
*- Family Harmony ๐จโ๐ฉโ๐งโ๐ฆ (40%): Minimises arguments and frustration*
*- Age Appropriateness 6๏ธโฃโก๏ธ1๏ธโฃ0๏ธโฃ (30%): Engages both kids and adults*
*- Practical Setup โฑ๏ธ (30%): Realistic for tired parents ๐ด*Provide scores and recommendations for a family where parents work long hours ๐ผ and one child gets easily frustrated with losing." ๐
What Happens Behind the Scenes: ๐ญ
Analyses each plan against criteria: "Plan A focuses on cooperation, which should reduce conflict..." ๐ค
Applies weighted scoring: "Family harmony is 40%, so cooperative games score highest here..." ๐
Considers the specific context: "Given the tired parents and sensitive child, practicality and low conflict are prioritised..." ๐ด๐ถ
Provides expert justification: "Research shows cooperative games reduce sibling rivalry by 60%..." ๐
Judge's Final Output: ๐โ

EVALUATION RESULTS:
Plan A (Cooperative): 9.2/10
- Family Harmony: 10/10 - Eliminates win/lose dynamics
- Age Appropriateness: 9/10 - Works for mixed ages
- Practical Setup: 8/10 - Moderate preparation needed
Plan B (Competitive): 6.5/10
- Family Harmony: 5/10 - High frustration risk for sensitive child
- Age Appropriateness: 8/10 - Good engagement potential
- Practical Setup: 9/10 - Very easy to implement
RECOMMENDATION: Plan A is optimal despite higher setup time,
as it addresses the family's core constraint of minimizing conflict.
The Complete Selection & Ranking Workflow:
YOUR PROMPT โ GENERATOR MODEL โ CONTENT โ JUDGE MODEL โ EXPERT EVALUATION
โ โ โ โ โ
"Create 3 โ GPT-3.5 โ Plans โ GPT-4 โ Scored ranking +
plans" โ recommendations
โ
โ
โ
"Plan A is best because..."
2. Iterative Refinement & Critique (Improving a Single Draft) ๐โจ๐
This method takes one draft and has an AI judge give specific feedback for improvement ๐. You then revise the draft based on that expert critique.
What YOU Do: ๐ค
You start with a single piece of content (a draft email โ๏ธ, a code snippet ๐ป, a plan ๐).
You send this single draft to the Judge Model โ๏ธ with a prompt asking for specific feedback (e.g., "Critique this for clarity and persuasiveness" ๐ฃ๏ธ).
You receive detailed feedback ๐, revise the draft based on the notes ๐ง, and can send it back to the judge for another round of review. ๐
What the MODEL Does: ๐ค
The Judge Model โ๏ธ acts as an expert critic. It analyses the single input, identifies weaknesses based on your criteria ๐ฏ, and provides actionable suggestions for improvement ๐ก.
It does not choose from other options; it helps you make the one option you have much better. ๐
Analogy: You give a draft of your speech ๐ค to a speaking coach ๐จโ๐ซ. They don't show you other speeches; they mark up your draft with notes like "Strengthen this argument" ๐ช or "Simplify this sentence." โ๏ธ
Iterative Refinement: The Neighbour Favour Example ๐ก๐

Step 1: You create the Initial Draft ๐
Your First Attempt:
- "Can you take my package tomorrow?" ๐ฆ
Step 2: Judge Model Evaluates Your Draft โ๏ธ
Your Prompt to Judge Model: โ๏ธ
- "Act as a communication expert. ๐ฃ๏ธ Evaluate this text message asking a neighbour for a favour:
MESSAGE: [Insert your draft here]
CRITERIA (weighted):
*- Politeness ๐ (40%): Sounds respectful and considerate*
*- Clarity โ (40%): Provides all necessary information*
*- Likelihood of Success ๐ฏ (20%): How likely it is to get a positive response*
Provide specific feedback for improvement for a busy neighbour who doesn't know you well." โฑ๏ธ
- "Act as a communication expert. ๐ฃ๏ธ Evaluate this text message asking a neighbour for a favour:
What Happens Behind the Scenes: ๐ญ
Analyses against criteria: "The message is direct but sounds demanding rather than requesting" ๐
Applies weighted scoring: "Politeness is 40% of the score, and this score is low due to a commanding tone" ๐
Considers the context: "Neighbours are more likely to help when they feel appreciated and know the specifics" ๐ค๐ก
Provides specific improvements: "Add a greeting, specify timing, mention how you'll retrieve it, express gratitude" ๐ก
Judge's Output: ๐
EVALUATION RESULTS:
Politeness: 3/10 ๐ - Sounds like a command rather than a request
Clarity: 5/10 โ - Missing key details (what time? how long will they need to hold it?)
Success Likelihood: 4/10 ๐ฏ - Low due to impersonal tone
SPECIFIC IMPROVEMENTS SUGGESTED:
Start with a friendly greeting ("Hi [Name]!") ๐
Phrase as a question ("Would you be able to...?") โ
Include specific details (delivery time โฐ, pickup plan ๐)
Express appreciation ๐
Step 3: Revised Message Based on Feedback ๐ง
Final Improved Version:
- "Hi, Sarah! ๐ I have a package arriving tomorrow between 1-3 PM โฐ, but I won't be home. Would you be able to accept it on my behalf? โ I can pick it up after 6 PM ๐. I'd really appreciate your help! ๐"
Why This Works Better: โ
Politeness: 9/10 ๐ - Friendly, respectful, and appreciative
Clarity: 10/10 โ - All necessary information provided
Success Likelihood: 9/10 ๐ฏ - Much higher chance of positive response ๐
This demonstrates how iterative refinement transforms a basic, potentially ineffective message into one that's much more likely to achieve your goal while maintaining good relationships. ๐ค
The Complete Iterative Refinement Workflow ๐
YOUR DRAFT โ JUDGE MODEL โ EXPERT FEEDBACK โ REVISED DRAFT โ FINAL VERSION
โ โ โ โ โ
"Can you โ GPT-4 โ "Score: 4/10 โ "Hi Sarah! โ Polished,
take my โ (as โ - Too vague โ I have a โ effective
package?" โ Editor) โ - Sounds demanding" โ package... โ message
โ โ โ
โ โ
"Improve: Add details, "Score: 9/10
be more polite, specify timing" - Clear & polite"
Why This Two-Model Approach (LLM as Judge) Works โ๏ธ๐คโจ
Specialisation ๐ฏ: Different models excel at different tasks. One is a great creator ๐ญ, the other is a sharp critic ๐.
Objectivity ๐ง: The Judge model hasn't seen the "reasoning behind" the generation. It's like a fresh pair of eyes ๐, evaluating the final product, not the process.
Quality control ๐: A more advanced model provides a higher-quality evaluation. It's the expert review ๐จโ๐ซ on the work.
Bias reduction โ๏ธ: Separate models reduce inherent biases in evaluation. It avoids the "I like my own idea" trap ๐.
Key Differences between LLM-as-Judge and Self-Consistency ๐ vs. โ๏ธ
Self-Consistency: ๐
The same model was used multiple times. 1๏ธโฃโก๏ธ2๏ธโฃโก๏ธ3๏ธโฃ
Answers: "What are different ways to approach this?" ๐ฃ๏ธ๐ฃ๏ธ๐ฃ๏ธ
LLM-as-Judge: โ๏ธ
Different models for different roles. ๐ค (Generator + Judge)
Evaluates solution quality objectively. ๐
Answers: "Which approach is best for my specific needs?" โ ๐ฏAnswers: "Which approach is best for my specific needs?"
๐ป Github Code Links
Want to experiment with these techniques? Check out my GitHub repository with working implementations (Donโt skip the README file) :
Available implementations:
Chain of Thought reasoning
Self-consistent prompting
Few-shot learning templates
LLM-as-judge evaluation
Conclusion: Your "Right Prompt" Key to the Anywhere Door ๐๏ธ๐ช๐

In the world of Doraemon ๐ฑ๐ค, the "Anywhere Door" is perhaps the most magical tool of allโa portal that can take you anywhere you can imagine, but only if you speak your destination clearly and correctly. For years, Nobita would stammer vague directions like "somewhere fun! ๐ข" or "a place where I can be successful ๐ผ," finding himself in bizarre or disappointing locations. ๐
This is exactly how we've been using AIโstanding before an "Anywhere Door" of infinite potential, but mumbling incoherent directions. ๐ค๐ช
The prompting techniques we've explored transform you from a stumbling Nobita ๐ฅ into a confident navigator of AI's infinite possibilities: ๐งญโจ
System Prompts ๐ฏ are learning to specify the exact address ๐ rather than just the country ๐
Chain-of-Thought ๐ง โ๏ธ is plotting the step-by-step journey ๐บ๏ธ instead of hoping for teleportation ๐
Self-consistency ๐ is checking multiple routes ๐ฃ๏ธ๐ฃ๏ธ๐ฃ๏ธ to ensure you reach the best destination ๐
LLM-as-Judge โ๏ธ๐ค is having an expert travel guide ๐จโ๐ซ refine your itinerary ๐
The magic was never brokenโwe just needed to learn how to give proper directions. ๐ช Doraemon's Anywhere Door could always take you to Paris ๐ผ, the moon ๐, or your grandmother's house ๐ต, but only if you could articulate exactly where you wanted to go. ๐ฃ๏ธ
You no longer need to accept AI's random destinations. ๐ฏ You now possess the "Right Prompt" key ๐๏ธ that lets you step confidently through the Anywhere Door to precisely where you need to be. ๐ช
Stop wandering through AI's random corridors. ๐ถโโ๏ธ๐
Start commanding your own destiny with the key you now hold. ๐ซด๐๏ธ๐



