AI Collaboration: From Prompts to Thinking Models

Ever asked an AI for help and gotten a response that was so missing the mark it was almost funny? 😅 You meticulously describe the blog post you need, and it gives you a recipe for lasagna 🍝. The problem isn’t the AI. The problem is how we’re talking to it. 🗣️➡️🤖

Most of us are still shouting one-line commands into the void, hoping a super-intelligent mind will read our thoughts and deliver exactly what we imagine. But AI doesn’t work that way. It’s not a mind-reader 🔮; it’s an instrument 🎻.

In this blog, you'll learn:

Why your first words matter more than you think (The Foundation 🧱)
How to use different types of prompts for various tasks (The Basic Tools 🛠️)
How to trigger actual reasoning (Building Thinking Models 🧠)
And finally, how to implement advanced techniques that refine and validate the AI's thinking process (Refining the Thinking ⚙️)

Stop hoping for the best and start prompting with purpose. 🚀 Let’s build. 🏗️

The Foundation 🧱 - Why Your First Words Matter

No system prompt

Before you ask your question, you must set the stage. 🎭 This is the most crucial step that most people overlook.

What is a System Prompt? 🤔

A system prompt is the initial, behind-the-scenes instruction that defines the AI’s role, personality, and rules for the entire conversation. It’s the context you give before the conversation even starts.

The Power of Context: Preventing Chaos. 🌪️➡️✨

Without a system prompt, you’re talking to a default, generic AI. It’s like shouting your question (eg, “How do I fix a leaky faucet?” 🚰) into a breakroom full of people—you might get an answer from the intern 😅, the sarcastic accountant 😒, or someone who’s only half-listening 🥱. The results are unpredictable and often useless. 🙈

A strong system prompt cuts through the noise. It’s like walking directly to the office expert and saying, “For this conversation, you are not just ‘some guy’—you are an expert plumber who gives detailed, safe advice.” 👷‍♂️📋

Example: The Breakroom vs. The Expert ⚖️

❌ No System Prompt:

You: “How do I fix a leaky faucet?” AI's Generic Response: “Water issues are the worst! 💦 Have you tried turning it off at the valve? If that doesn't work, a classic lasagna always makes me feel better, here’s the recipe🍝”

(We've all gotten AI answers that missed the mark. This lasagna recipe is just a humorous, exaggerated version of that common frustration.)

✅ With a System Prompt:

You (First, setting the stage): “You are a master plumber with 30 years of experience. You are patient, love to teach, and provide clear, step-by-step guides for beginners.”
You (Then asking): “How do I fix a leaky faucet?”
AI's Expert Response: “Ah, a common issue! 👍 First, let’s make sure you’ve turned off the water supply under the sink.Then disassemble the faucet to identify the type (cartridge, washer, etc.), replace the worn-out part with an exact match from a repair kit, and reassemble everything.” 🛠️

The Basic Tools - Types of Prompts 🛠️📝

Few shot prompt

Once the stage is set 🎭, you need to know how to ask your question. 🗣️ Different tasks require different styles of prompting 🛠️. We will see the following basic types of prompting in this section.

The Zero-Shot Prompt
The Few-Shot Prompt

Example: Writing Social Media Posts in a Specific Style 📱✨

The Scenario: You are a social media manager, and you want an AI to help you write catchy posts for a new coffee shop. ☕ You don't just want any post; you need it to match the shop's unique, playful brand voice 🎨.

The Zero-Shot Prompt (Ineffective) ❌

Prompt:
"Write a social media post about our new seasonal latte."

AI's Generic Output:
"We're excited to announce our new seasonal latte is now available! Come try it today." 😴 (This is a bland output and could be common for any coffee shop.)

The Few-Shot Prompt (Effective - Teaching by Example) ✅🎯

Here, you show the AI the exact style, structure, and tone you want by providing clear examples. 👇

User Prompt:
Write social media posts in the following playful and emoji-heavy style for our coffee shop:

Example 1: "Is it just us, or does Monday need a double shot? ☕️💥 Our new espresso blend is here to tackle your to-do list. #FuelYourDay"

Example 2: "Warning: this coffee is dangerously good. ⚡️🤯 Have you tried our cold brew? It might just become your new obsession. #BrewedToPerfection"

Now, write a post about our new seasonal pumpkin spice latte:

AI's Output (Following the Pattern):"Autumn's favourite drink is back! 🍂🎃 Our iconic pumpkin spice latte is here to make your season extra cosy. Swipe right for fall vibes. #PumpkinSpiceSzn"

Pumpkin Spice Latte is neither a cold brew nor an espresso blend. It's its own distinct drink related to coffee, typically made with espresso, steamed milk, and pumpkin spice syrup

The key insight is that the AI is NOT copying the drink type from the examples. It's copying the marketing style and post structure.

Why This Works: 🧠💡

You defined the "Coffee Voice" ☕️🎤: You didn't just say "playful." You showed what a playful coffee brand sounds like by using energetic emojis (⚡️🤯), relatable hooks ("Is it just us...?"), and a tone of confident excitement ("Warning: this coffee is dangerously good.").
You provided a Blueprint 🗺️📐: The AI learned a repeatable formula from the examples: [Engaging Hook] 🎣 + [Relevant Emojis] 😊 + [Product Benefit] 💪 + [Branded Hashtag] #️⃣. It then applied this proven coffee-marketing blueprint directly to the new product.

It then applied this proven coffee-marketing blueprint directly to the new product:
- [Engaging Hook]: "Autumn's favourite drink is back!"
- [Relevant Emojis]: 🍂🎃
- [Product Benefit]: "make your season extra cosy"
- [Branded Hashtag]: #PumpkinSpiceSzn
You removed all guesswork 🎯🧩: The AI didn't have to wonder, "Is this for a bakery or a cafe?" 🧁🏪 By using only coffee examples, you gave it a crystal-clear style guide. It simply followed the examples, ensuring the new post perfectly matched the brand's established voice.

While few-shot prompting is great for teaching style 🎨, what happens when you need the AI to tackle problems that require deep logic and reasoning? 🧠🤔 This is where we level up to Chain-of-Thought prompting! 🚀 Let's explore in the next section. 🔍

Building Thinking Models with Chain-of-Thought (CoT) Prompting 🧠🔗💭

Chain-of-Thought (CoT) is a prompting technique where you ask the AI to verbalise its reasoning process step-by-step before providing a final answer. Instead of jumping straight to an output, the AI is forced to simulate a logical thought process. 🔄

It's the difference between asking someone for a random recipe 📖 versus asking them to plan a meal based on your specific goals and constraints 🥗⏱️.

From Non-Thinking to Thinking: 🚶‍♂️➡️🏃‍♂️

A basic AI model provides answers. 🎯
A model using CoT provides answers and a rationale. 🎯➕📝

This allows us to see its "work," making its output more transparent, trustworthy, and accurate. We are essentially building a thinking model out of a non-thinking one through the way we prompt. 🧱➡️🏠

Example: The Personal Stylist 👔👗

Chain of thought

❌ Zero-Shot (Non-Thinking):

* You: "What should I wear today?" 🤔

* AI: "Clothes." 👚 ... (Useless! 🙄)
✅ With CoT (Thinking):

The following is the user prompt
- You: "What should I wear today? Let's think step by step. 🧠
  - First, check the weather: it's 45°F and raining. ☔️🌡️
  - Second, my activities: a Zoom call, then walking the dog. 💻🐕
  - Third, I should consider comfort and professionalism..."
- AI (This is how AI thinks for COT prompting):
  
  "Okay, for the Zoom call, you need a presentable top. For cold rain, you need a warm jacket and waterproof shoes... Suggestion: Wear a nice sweater for your call, and have a waterproof jacket and boots ready for your dog walk." 🧥👢

Chain of Thought (COT) prompting can be primarily used in two ways:

Basic: Thinking models with COT (without Example)

COT without Example

Chain-of-Thought without examples means you simply command the AI to "think step-by-step" before answering, without using any examples.

User Prompt: "Plan a quick weeknight dinner. Let's think step-by-step: 🤔

Identify the goal: What are my constraints? (e.g., time, budget) ⏱️💰

Consider dietary preferences: Any allergies or dislikes? 🚫🥜

Select a cuisine and type of dish 🍛

Ensure nutritional balance 🌱"

AI Output:

Goal: High-protein, under 30 mins, family-friendly 🍗⏱️👨‍👩‍👧‍👦

Dietary: No restrictions, using chicken for quick cooking 🚫🛑➡️🍗

Cuisine: Asian-inspired stir-fry for flavor and speed 🍳🔥

Nutrition: Balance lean protein (chicken) with veggies and carbs (rice) ⚖️

🍽️ Dinner Plan: "Speedy Chicken Stir-Fry 🥬

Stir-fry 1 lb chicken breast strips in 1 tbsp oil (5-6 mins) 🍗

Add 2 cups frozen stir-fry veggies; cook 4 mins 🥦🥕

Pour in ¼ cup teriyaki sauce; simmer 2 mins 🍶

Serve over quick-cook rice! 🍚 (Pro tip: Sprinkle with sesame seeds and green onions! 🌱)"
Why This Works:

The COT prompt encourages the AI to consider essential elements such as time, cost, and nutrition. Step-by-step reasoning helps create a logical framework.
⚠️ The Limitation: The style is still somewhat generic—it lacks personal flair or specific family preferences. 😴 also watch out—the AI will happily suggest a chicken dinner even if you're vegetarian, because it fills in missing information with guesses rather than asking you what you actually want.

Advanced: Thinking Models with COT (with Example) ✅🧠📚

Chain-of-Thought with examples means you simply command the AI to "think step-by-step" before answering, by providing it with working examples.
Explanation: "Show me how to think step by step, by showing me a worked example."

User Prompt: I plan quick VEGETARIAN weeknight meals 🥦. Here are examples of my style ✍️:
Example 1: Speedy Taco Skillet 1️⃣

Goal: 20-minute meal ⏱️, one pan 🍳, kid-friendly 👨‍👩‍👧‍👦

Plan: "Speedy Taco Skillet: Brown 1 lb plant-based ground. 1 can of black beans (drained). Simmer 5 mins. Off heat, stir in 1/2 cup shredded cheddar. Serve over tortilla chips! 🥑 (Add avocado on top if you have it!)"

Reasoning:
- Use plant-based ground for speed 🏃‍♂️
- Add a veggie hiding in the sauce 🥷
- A cheesy finish pleases kids 🧀
Example 2: Lemon Caper Chickpea Pasta 2️⃣

Goal: Pantry meal 🥫, no fresh protein needed, 15 mins ⏱️

Plan: "Lemon Caper Chickpea Pasta: Cook 8 oz spaghetti. In a bowl, mix 2 cans of chickpeas (drained), the juice of 1 lemon, and black pepper. Toss with hot pasta! 🍋 (Bonus: add parsley if you have it!)"

Reasoning:
- Canned chickpeas for protein 💪
- Pasta is a pantry staple 🍝
- Lemon and capers for bright flavour 🌞

User asks AI now to create a meal plan based on the above prompt

Create a 25-minute vegetarian meal ⏱️🥦, high protein 💪, kid-friendly 👨‍👩‍👧‍👦

AI Output: 🤖

Goal: 25-minute vegetarian meal, high protein, kid-friendly
Reasoning:

Use a familiar base, such as pasta 🍝, for kid appeal

Incorporate a hidden vegetable 🥒 in the sauce for added nutrition

Add a creamy, cheesy element 🧀 to ensure it's pleasing

Include high-protein ingredients like white beans 💪
Plan: "Cheesy Hidden Veggie Pasta Skillet 🍝

Cook 8 oz pasta according to package directions. Drain and set aside.
In the same pan, heat 1 tbsp olive oil. Sauté 1 finely grated zucchini (hidden veggie! 🥷) and 1 minced garlic clove . Add 1 can of white beans (drained) for protein. Stir in the cooked pasta and 1/2 cup shredded mozzarella cheese 🧀 until melted and combined. Serve immediately! 😊 (Bonus: Let kids top with extra cheese for a fun touch! ✨)"

Explanation of Why This Works Better for Kid-Friendly Meals: 🧒🍴

Food Preference: AI made the entire diet plan based on a vegetarian meal preference, as mentioned by the user.
Style/Format: The AI copied the exact structure from your kid-friendly example - including the fun name format, emojis, and bonus tips in parentheses. 👯‍♀️
Kid-Specific Strategy Transfer: Most importantly, the AI learned your specific kid-friendly approach from Example 1: 🎯
Hidden veggies 🥒 (grated zucchini instead of visible pieces)
- Familiar bases 🍝 (pasta instead of exotic grains)
- Cheesy/creamy elements 🧀 that kids love
- Fun, interactive elements ✨ (toppings and "sprinkles")
Multi-Constraint Balancing: The AI successfully balanced all three requirements: 25-minute timing ⏱️, high protein 💪 (via white beans), AND kid-friendly strategies learned from your examples. ⚖️
Your Voice: The output maintains your friendly, practical tone 🗣️, with specific calls to action tailored to families.
Key Insight: 💡 This demonstrates how CoT+Examples allows the AI to understand nuanced combinations of requirements that would be impossible to convey through CoT alone. The examples taught what "kid-friendly" means to you specifically 👨‍👩‍👧‍👦, while the CoT structure ensured it also met the new high-protein and timing constraints.

Chain-of-Thought is powerful, but what if the AI's reasoning is flawed? These next techniques—Self-Consistency and LLM-as-a-Judge—act as quality control to ensure you get the best output.

The Advanced Techniques - Refining the Thinking 🔄🤔

Self-Consistency Prompting 🔁

Self-Consistency Prompting

Self-consistency is an advanced application of the Chain of Thought (COT) technique. It is most effective and was specifically designed to be used with COT, but it doesn't strictly require a pre-provided example. Self-consistency is a technique where you ask the same AI model the same question multiple times with a setting that allows for varied reasoning.

"For example, you ask this prompt 3 times to a single AI model:
1️⃣📝 'Plan a family game night for kids (6-10) and adults.'
(We'll see the detailed CoT steps for this in the next section)"

AI generates 3 different outputs for the same prompt.
You then take the most common final answer from all the attempts, trusting that the majority vote is more reliable than any single attempt.

Why the Self-consistency prompt is powerful: It helps cancel out the "noise" or mistakes that can happen in any single, random reasoning path. If the model arrives at the same correct answer through three different logical routes, you can be much more confident in that answer.

What YOU Do: 👈

You write one prompt (With COT, could be with or without examples as per the requirement)✍️
You send that same prompt to the model multiple times 📤📤📤
You collect the different responses and choose the most common of them all 📥

What the MODEL Does: 🤖

Generates different reasoning chains each time ⛓️➡️⛓️➡️⛓️
Approaches the problem from different angles 📐
Produces varied solutions to the same question 🎯

Technical Example: Family Game Night 🎲👨‍👩‍👧‍👦

Your Single Prompt (used 3 times): 1️⃣📝

"Plan a family game night for kids (6-10) and adults. Follow this exact format:

Step 1 - Identify Key Needs: [List the core requirements for this group]
Step 2 - Brainstorm Game Options: [List potential games that fit the needs]
Step 3 - Select & Justify Final Choices: [Choose 2-3 games and explain why they are the best fit]
Step 4 - Outline Engagement Strategy: [Detail how to keep everyone involved]

AI Thinking approach for all 3 times🤖: (Please refer to the game glossary to understand this example)

Quick Game Glossary

Dixit: A creative guessing game using dreamlike art cards. Players give clues, and others try to match the clue to the correct card. It's about imagination, not right answers.
Jenga: The classic physical game of skill where players take turns removing blocks from a tower and placing them on top, trying not to be the one who makes it fall.
Mysterium: A cooperative game where one player (a ghost) gives visual clues to the others (psychics) to help them solve a murder. Everyone wins or loses together.
Pandemic: Hot Zone: A shorter, faster version of the popular game where all players work as a team to stop the spread of diseases around the world. It's cooperative and strategic.
Rory's Story Cubes: A creativity game with dice that have pictures instead of numbers. Players roll the dice and use the images to invent a story together. There are no rules to win—just imagine!
Telestrations: A hilarious hybrid of Telephone and Pictionary. You draw a word, then someone else guesses what it is, then the next person draws that guess, and so on. The fun is in how the message gets distorted.
Uno: The famous, fast-paced card game where players match cards by colour or number. The goal is to be the first to get rid of all your cards by shouting "Uno!" when you have one left.

Let’s check the self-consistency example now:

Run #1 AI Thinking Approach: 🤖 → Focus on Cooperation & Inclusivity

Step 1 - Identify Key Needs: "The core need is a huge age gap. I must avoid games that are too complex for kids or too simplistic for adults. The primary goal is shared fun, not intense competition."
Step 2 - Brainstorm Game Options: "Cooperative games are ideal. Brainstorm: Pandemic: Hot Zone, a collaborative puzzle, Mysterium."
Step 3 - Select & Justify Final Choices: "Select Mysterium (kids can be intuitive intuitives) and a collaborative puzzle. These remove the pressure of winning and focus on working together, which is perfect for this mix."
Step 4 - Outline Engagement Strategy: "Assign roles: adults can read rules, kids can be active participants. Emphasise that 'we win or lose together.'"

Run #2 AI Thinking Approach: 🤖 → Focus on Classic Competition & Laughter

Step 1 - Identify Key Needs: "The key is high energy and quick turns. Need games where rules are understood in seconds and the outcome is often silly, minimising frustration."
Step 2 - Brainstorm Game Options: "Fast-paced, lighthearted classics. Brainstorm: Uno, Jenga, Telestrations."
Step 3 - Select & Justify Final Choices: "Select Telestrations and Jenga. Telestrations guarantees laughter as drawings get distorted, and Jenga creates tangible, suspenseful fun for all ages."
Step 4 - Outline Engagement Strategy: "Keep the mood light. Play in teams (adult+kid) for Hedbanz. Celebrate funny mistakes, not just winning."

Run #3 AI Thinking Approach: 🤖 → Focus on Creativity & Storytelling

Step 1 - Identify Key Needs: "Engage different skills. Adults can strategise, kids can be imaginative. Need a game that acts as a creative outlet rather than a test of knowledge."
Step 2 - Brainstorm Game Options: "Story-building and imaginative games. Brainstorm: Rory's Story Cubes, Stuffed Fables."
Step 3 - Select & Justify Final Choices: "Select Dixit and Rory's Story Cubes. These games have no wrong answers, allowing kids to shine with their creativity and adults to enjoy the open-ended interpretation."
Step 4 - Outline Engagement Strategy: "Start with a story circle using the cubes to warm up. In Dixit, encourage everyone to explain their thought process after each round."

The Self-Consistency Vote: After these three runs, you'd see three strong but different plans (Cooperative, Competitive, Creative). The "best" final plan is chosen by seeing which approach's reasoning is most consistently sound or by picking the one that best fits your family's specific mood.

What Self-Consistency IS NOT: ❌

You write: "Plan a cooperative game night" 🤝
You write: "Plan a competitive game night" 🏆
You write: "Plan a creative game night" 🎨

What Self-Consistency ACTUALLY IS: ✅

You wrote: "Plan a family game night" (once) 👨‍👩‍👧‍👦🎲
Model generating cooperative approach (run 1) 🤝
Model generating competitive approach (run 2) 🏆
Model generating creative approach (run 3) 🎨

Why This Distinction Matters 💡

Less work for you: You write one good prompt instead of multiple variations 😌
More authentic diversity: The model discovers natural variations in reasoning 🌈
Better coverage: Explores the solution space more thoroughly 🗺️
More reliable: Majority voting across independent reasoning paths ✅

Let’s clear some confusion around Self-consistency prompting

How do we get different answers for the same prompt if asked multiple times?

The AI generates different outputs from an identical prompt by using non-deterministic sampling techniques. The key settings that enable this are:

Temperature: To get "varied reasoning," you use a sampling technique with a temperature setting greater than 0 (e.g., Temperature = 0.7). This controls the randomness or creativity. A higher temperature makes the model more random but still correct. This means that at each step in its reasoning, the model might choose a slightly different phrasing and consider a different angle.
Top-p (Nucleus Sampling): With this sampling, instead of considering all possible words, the model only samples from the smallest set of words whose combined probability exceeds a threshold (e.g., top-p=0.9). This works with temperature to efficiently create diversity.

Batch generation vs Self-consistency prompting

This looks similar to self-consistency prompting, which uses a single prompt to request multiple outputs “at once”. It’s ideal for producing consistently formatted content efficiently.
For example, “create 3 distinct family game night plans for kids (6-10) and adults. Focus on different approaches: cooperative, competitive, and creative games.”
Batch generation and self-consistency are often confused but serve opposite purposes. Batch generation utilises a single prompt to efficiently produce multiple outputs in a single run, prioritising volume and consistent formatting. Self-consistency uses various independent runs of the same prompt to validate a single answer, prioritising reliability and accuracy through majority voting. One is for scale, the other for certainty.

LLM as a Judge ⚖️🤖

In the LLM-as-a-Judge is an advanced technique, in which you use two separate AI models - one to generate content and another, typically a more advanced model, to evaluate that content against specific criteria.

In simple terms, it's like having a junior employee draft proposals, then having a senior expert review them and pick the best one. 👨‍💼➡️👨‍💻

CoT can be used within LLM-as-a-Judge to make the evaluation more reliable; however, the role is different: the model acts as a judge rather than a solver.

There are two applications of the LLM-as-a-Judge Method 🔽

1. Selection & Ranking 🥇🥈🥉

Choosing the best solution from multiple options.

Use this when you have several good alternatives and need an expert opinion to determine the most effective one.

Improving a single piece of work through expert feedback.

Use this when: You want to transform a good draft into an excellent final version through structured feedback loops.

1. Selection & Ranking (Choosing the Best from Many) 🥇

LLM as judge-Selection & Ranking

This method generates multiple answers at once 📝📝 and uses a powerful AI judge to pick the single best one ✅. It's for choosing a winner from many options.

What YOU Do: 👤

You write one prompt ✍️ instructing the AI to generate multiple distinct options or plans (Batch Generation prompt technique, not self-consistency prompting).
You write a second, separate prompt 📝 for a different, more powerful AI model, acting as a Judge ⚖️. This prompt includes the generated options and your specific criteria for evaluation (e.g., "Which is most cost-effective?" 💰).
You collect the Judge's scored ranking and recommendation. 📊✅

What the MODEL Does: 🤖

The Generator Model 🏭 creates a shortlist of different solutions (e.g., Plan A, Plan B, Plan C).
The Judge Model ⚖️ analyses each option against your criteria, scores them, and selects the most suitable one. It justifies its choice.

Analogy: You ask several architects 🏗️ for building designs, then hire a senior inspector 🔍 to evaluate them all and tell you which one is the most structurally sound.

Selection & Ranking Example: Family Game Night 🎲👨‍👩‍👧‍👦

Step 1: Generator Model Produces Content 🏭

Your Prompt to Generator Model (e.g., GPT-3.5): ✍️
- "Create three distinct family game night plans for kids (6-10) and adults. Focus on different approaches: cooperative, competitive, and creative games."
Generator's Output: 📥
- Plan A: Cooperative team games focusing on collaboration 🤝
- Plan B: Classic competitive games with modified rules 🏆
- Plan C: Imagination-based creative activities 🎨

Step 2: Judge Model Evaluates the Content ⚖️

Your Prompt to Judge Model (e.g., GPT-4): ✍️
- "Act as a child development expert. 👨‍🏫 Evaluate these three game night plans:
  PLANS TO EVALUATE:
  [Insert Plan A, B, and C here]
  
  CRITERIA (weighted):
  *- Family Harmony 👨‍👩‍👧‍👦 (40%): Minimises arguments and frustration*
  *- Age Appropriateness 6️⃣➡️1️⃣0️⃣ (30%): Engages both kids and adults*
  *- Practical Setup ⏱️ (30%): Realistic for tired parents 😴*
  
  Provide scores and recommendations for a family where parents work long hours 💼 and one child gets easily frustrated with losing." 😠

What Happens Behind the Scenes: 🎭

Analyses each plan against criteria: "Plan A focuses on cooperation, which should reduce conflict..." 🤝
Applies weighted scoring: "Family harmony is 40%, so cooperative games score highest here..." 📊
Considers the specific context: "Given the tired parents and sensitive child, practicality and low conflict are prioritised..." 😴👶
Provides expert justification: "Research shows cooperative games reduce sibling rivalry by 60%..." 📈

Judge's Final Output: 📄✅

Judge Final output

EVALUATION RESULTS:

Plan A (Cooperative): 9.2/10
- Family Harmony: 10/10 - Eliminates win/lose dynamics
- Age Appropriateness: 9/10 - Works for mixed ages
- Practical Setup: 8/10 - Moderate preparation needed

Plan B (Competitive): 6.5/10  
- Family Harmony: 5/10 - High frustration risk for sensitive child
- Age Appropriateness: 8/10 - Good engagement potential
- Practical Setup: 9/10 - Very easy to implement

RECOMMENDATION: Plan A is optimal despite higher setup time, 
as it addresses the family's core constraint of minimizing conflict.

The Complete Selection & Ranking Workflow:

YOUR PROMPT → GENERATOR MODEL → CONTENT → JUDGE MODEL → EXPERT EVALUATION
    ↓              ↓               ↓           ↓             ↓
"Create 3    →   GPT-3.5    →   Plans    →   GPT-4    →   Scored ranking +
 plans"                                          │         recommendations
                                                 │
                                                 │
                                                 ↓
                                         "Plan A is best because..."

This method takes one draft and has an AI judge give specific feedback for improvement 📈. You then revise the draft based on that expert critique.

What YOU Do: 👤

You start with a single piece of content (a draft email ✉️, a code snippet 💻, a plan 📋).
You send this single draft to the Judge Model ⚖️ with a prompt asking for specific feedback (e.g., "Critique this for clarity and persuasiveness" 🗣️).
You receive detailed feedback 📝, revise the draft based on the notes 🔧, and can send it back to the judge for another round of review. 🔁

What the MODEL Does: 🤖

The Judge Model ⚖️ acts as an expert critic. It analyses the single input, identifies weaknesses based on your criteria 🎯, and provides actionable suggestions for improvement 💡.
It does not choose from other options; it helps you make the one option you have much better. 📈

Analogy: You give a draft of your speech 🎤 to a speaking coach 👨‍🏫. They don't show you other speeches; they mark up your draft with notes like "Strengthen this argument" 💪 or "Simplify this sentence." ✂️

Iterative Refinement: The Neighbour Favour Example 🏡🙏

LLM as judge Iterative Refinement:

Step 1: You create the Initial Draft 📝

Your First Attempt:
- "Can you take my package tomorrow?" 📦

Step 2: Judge Model Evaluates Your Draft ⚖️

Your Prompt to Judge Model: ✍️
- "Act as a communication expert. 🗣️ Evaluate this text message asking a neighbour for a favour:
  MESSAGE: [Insert your draft here]
  CRITERIA (weighted):
  *- Politeness 🙏 (40%): Sounds respectful and considerate*
  *- Clarity ✅ (40%): Provides all necessary information*
  *- Likelihood of Success 🎯 (20%): How likely it is to get a positive response*
  Provide specific feedback for improvement for a busy neighbour who doesn't know you well." ⏱️

What Happens Behind the Scenes: 🎭

Analyses against criteria: "The message is direct but sounds demanding rather than requesting" 👎
Applies weighted scoring: "Politeness is 40% of the score, and this score is low due to a commanding tone" 📊
Considers the context: "Neighbours are more likely to help when they feel appreciated and know the specifics" 🤔🏡
Provides specific improvements: "Add a greeting, specify timing, mention how you'll retrieve it, express gratitude" 💡

Judge's Output: 📄

EVALUATION RESULTS:
- Politeness: 3/10 🙏 - Sounds like a command rather than a request
- Clarity: 5/10 ✅ - Missing key details (what time? how long will they need to hold it?)
- Success Likelihood: 4/10 🎯 - Low due to impersonal tone
SPECIFIC IMPROVEMENTS SUGGESTED:
- Start with a friendly greeting ("Hi [Name]!") 👋
- Phrase as a question ("Would you be able to...?") ❓
- Include specific details (delivery time ⏰, pickup plan 📍)
- Express appreciation 🙏

Step 3: Revised Message Based on Feedback 🔧

Final Improved Version:
- "Hi, Sarah! 👋 I have a package arriving tomorrow between 1-3 PM ⏰, but I won't be home. Would you be able to accept it on my behalf? ❓ I can pick it up after 6 PM 📍. I'd really appreciate your help! 🙏"

Why This Works Better: ✅

Politeness: 9/10 🙏 - Friendly, respectful, and appreciative
Clarity: 10/10 ✅ - All necessary information provided
Success Likelihood: 9/10 🎯 - Much higher chance of positive response 👍

This demonstrates how iterative refinement transforms a basic, potentially ineffective message into one that's much more likely to achieve your goal while maintaining good relationships. 🤝

The Complete Iterative Refinement Workflow 🔄

YOUR DRAFT → JUDGE MODEL → EXPERT FEEDBACK → REVISED DRAFT → FINAL VERSION
    ↓            ↓               ↓               ↓              ↓
"Can you   →  GPT-4     →  "Score: 4/10    →  "Hi Sarah!  →  Polished,
take my    →  (as       →  - Too vague     →  I have a    →  effective
package?"  →  Editor)   →  - Sounds demanding" → package...  →  message
                         →                  →              → 
                         ↓                                  ↓
                 "Improve: Add details,            "Score: 9/10
                  be more polite, specify timing"  - Clear & polite"

Why This Two-Model Approach (LLM as Judge) Works ⚖️🤖✨

Specialisation 🎯: Different models excel at different tasks. One is a great creator 🏭, the other is a sharp critic 🔍.
Objectivity 🧊: The Judge model hasn't seen the "reasoning behind" the generation. It's like a fresh pair of eyes 👀, evaluating the final product, not the process.
Quality control 📈: A more advanced model provides a higher-quality evaluation. It's the expert review 👨‍🏫 on the work.
Bias reduction ⚖️: Separate models reduce inherent biases in evaluation. It avoids the "I like my own idea" trap 🙈.

Key Differences between LLM-as-Judge and Self-Consistency 🔄 vs. ⚖️

Self-Consistency: 🔄

The same model was used multiple times. 1️⃣➡️2️⃣➡️3️⃣
Answers: "What are different ways to approach this?" 🛣️🛣️🛣️

LLM-as-Judge: ⚖️

Different models for different roles. 🤝 (Generator + Judge)
Evaluates solution quality objectively. 🏆
Answers: "Which approach is best for my specific needs?" ✅🎯Answers: "Which approach is best for my specific needs?"

💻 Github Code Links

Want to experiment with these techniques? Check out my GitHub repository with working implementations (Don’t skip the README file) :

Available implementations:

Chain of Thought reasoning
Self-consistent prompting
Few-shot learning templates
LLM-as-judge evaluation

Explore the code here

Conclusion: Your "Right Prompt" Key to the Anywhere Door 🗝️🚪🌌

Doremon Anywhere Door

In the world of Doraemon 🐱🤖, the "Anywhere Door" is perhaps the most magical tool of all—a portal that can take you anywhere you can imagine, but only if you speak your destination clearly and correctly. For years, Nobita would stammer vague directions like "somewhere fun! 🎢" or "a place where I can be successful 💼," finding himself in bizarre or disappointing locations. 😅

This is exactly how we've been using AI—standing before an "Anywhere Door" of infinite potential, but mumbling incoherent directions. 🤐🚪

The prompting techniques we've explored transform you from a stumbling Nobita 😥 into a confident navigator of AI's infinite possibilities: 🧭✨

System Prompts 🎯 are learning to specify the exact address 🏠 rather than just the country 🌍
Chain-of-Thought 🧠⛓️ is plotting the step-by-step journey 🗺️ instead of hoping for teleportation 🌀
Self-consistency 🔄 is checking multiple routes 🛣️🛣️🛣️ to ensure you reach the best destination 🏆
LLM-as-Judge ⚖️🤖 is having an expert travel guide 👨‍🏫 refine your itinerary 📋

The magic was never broken—we just needed to learn how to give proper directions. 🪄 Doraemon's Anywhere Door could always take you to Paris 🗼, the moon 🌙, or your grandmother's house 👵, but only if you could articulate exactly where you wanted to go. 🗣️

You no longer need to accept AI's random destinations. 🎯 You now possess the "Right Prompt" key 🗝️ that lets you step confidently through the Anywhere Door to precisely where you need to be. 💪

Stop wandering through AI's random corridors. 🚶‍♂️🌀
Start commanding your own destiny with the key you now hold. 🫴🗝️🌟

Building Thinking Models: From Basic Prompts to AI Collaboration 🧠🛠️➡️🤖

The Foundation 🧱 - Why Your First Words Matter

What is a System Prompt? 🤔

The Power of Context: Preventing Chaos. 🌪️➡️✨

The Basic Tools - Types of Prompts 🛠️📝

The Zero-Shot Prompt (Ineffective) ❌

The Few-Shot Prompt (Effective - Teaching by Example) ✅🎯

Building Thinking Models with Chain-of-Thought (CoT) Prompting 🧠🔗💭

Chain of Thought (COT) prompting can be primarily used in two ways:

Basic: Thinking models with COT (without Example)

Advanced: Thinking Models with COT (with Example) ✅🧠📚

The Advanced Techniques - Refining the Thinking 🔄🤔

Self-Consistency Prompting 🔁

Let’s clear some confusion around Self-consistency prompting

LLM as a Judge ⚖️🤖

There are two applications of the LLM-as-a-Judge Method 🔽

1. Selection & Ranking 🥇🥈🥉

2. Iterative Refinement & Critique ✨

1. Selection & Ranking (Choosing the Best from Many) 🥇

2. Iterative Refinement & Critique (Improving a Single Draft) 🔄✨📝

Why This Two-Model Approach (LLM as Judge) Works ⚖️🤖✨

Key Differences between LLM-as-Judge and Self-Consistency 🔄 vs. ⚖️

💻 Github Code Links

Conclusion: Your "Right Prompt" Key to the Anywhere Door 🗝️🚪🌌

Comments

More from this blog

Chef Cupcake's Secret Recipe is a Transformer Model 👨‍🍳🧁 🤖

Explaining Vector Embeddings to My Mom 👩‍🍳. Just Recipes & a Smart Fridge 🤖

Explaining Tokenization to Freshers: From Pizza Slices 🍕 to Data 💻🧠✨

🤖 Explaining GPT to a 5-Year-Old: The 'Child Brain' Analogy for AI 👧🧠

Command Palette

The Foundation 🧱 - Why Your First Words Matter

What is a System Prompt? 🤔

The Power of Context: Preventing Chaos. 🌪️➡️✨

The Basic Tools - Types of Prompts 🛠️📝

The Zero-Shot Prompt (Ineffective) ❌

The Few-Shot Prompt (Effective - Teaching by Example) ✅🎯

Building Thinking Models with Chain-of-Thought (CoT) Prompting 🧠🔗💭

Chain of Thought (COT) prompting can be primarily used in two ways:

Basic: Thinking models with COT (without Example)

Advanced: Thinking Models with COT (with Example) ✅🧠📚

The Advanced Techniques - Refining the Thinking 🔄🤔

Self-Consistency Prompting 🔁

Let’s clear some confusion around Self-consistency prompting

LLM as a Judge ⚖️🤖

There are two applications of the LLM-as-a-Judge Method 🔽

1. Selection & Ranking 🥇🥈🥉

2. Iterative Refinement & Critique ✨

1. Selection & Ranking (Choosing the Best from Many) 🥇

2. Iterative Refinement & Critique (Improving a Single Draft) 🔄✨📝

Why This Two-Model Approach (LLM as Judge) Works ⚖️🤖✨

Key Differences between LLM-as-Judge and Self-Consistency 🔄 vs. ⚖️

💻 Github Code Links

Conclusion: Your "Right Prompt" Key to the Anywhere Door 🗝️🚪🌌

Comments

More from this blog