Return to site
Return to site

What Happens When Three AI Judges (And One Human Judge) Judge the Same Thing?

That thing being the #CommsHero Award for Best Use of AI

Why "Horses for Courses" Isn’t Just a Saying — It’s a Strategy

By Dan Sodergren | AI Speaker | CommsHero Judge | www.dansodergren.com

This week, I had the joy of judging a new award at CommsHero.

An event I will be once again doing a keynote at around AI and the future of work.

And so not surprisingly, they asked me to judge the BEST Use of AI award. I loved doing this - spending a morning giving something back to the internalcomms community and reading some of the amazing things that REALLY ARE happening in internalcomms and AI.

broken image

It was truly heartwarming to do - as a human being.

And use my many years experience in judging such competitions

And also in AI in general. As that's what I train people in.

And to some extent interncomms.

BUT…

"What happens when three different AIs are then asked to judge the same entries? "

You might think you’d get the same scores — maybe even identical logic — because they’re all highly trained models using the same inputs.

But no. That’s not how it works.

I recently put three well-known large language models (LLMs) to the test — ChatGPT, Claude, and Gemini — by asking them to score the “now anonymous” entries submitted for an AI award in internal communications.

Each one received exactly the same brief.

Same entries. Same judging criteria.

And yet, the way each model judged the work was strikingly different.

This wasn’t just a fun AI experiment. It became something deeper — a powerful reminder of why the phrase I often use in my work, "horses for courses," matters more than ever in the world of artificial intelligence.

broken image

Setting the Scene: Judging at CommsHero

This all began when I was invited to be a judge for the CommsHero 2025 Awards.

For those who don’t know, CommsHero is one of the most respected spaces in the UK for comms professionals. I had the honour of speaking at the 2024 event about AI and the fifth industrial revolution — and you can find my speaker profile here: commshero.com/people/dan-sodergren.

As someone who’s judged the MPA Awards five years in a row, I take these responsibilities seriously. And this year, the "Best Use of AI" award brought together a fascinating set of entries, full of creativity and innovation.

I judged them first the way a human should — by reading every word, applying industry experience, and making critical decisions. I already had a clear sense of the winner.

Then something hit me.

I thought: “What would AI make of this?”

And that sparked what might be one of the smartest ways to use AI right now: asking it to challenge your judgement, not replace it.

So I ran the same submissions through ChatGPT, Claude, and Gemini — and what happened next told me everything I needed to know about how these models truly differ.

Gemini: The Big Picture Thinker

Google’s Gemini model was the most generous scorer.

It gave top marks for ambition, scale, and organisational transformation — even when the entries didn’t always provide detailed results or metrics. Gemini valued broad engagement, narrative framing, and cultural change.

Its underlying logic was clear: if your AI initiative reached many people and seemed forward-thinking, it was good work.

This aligns with Gemini’s roots in Google’s knowledge systems — it leans into breadth, interconnectedness, and momentum.

Claude: The Strategic Architect

Claude, from Anthropic, judged with consistency and caution.

It prioritised structure, ethical framing, and clarity of purpose. Entries that lacked specifics or felt exaggerated were marked down. Claude respected alignment — between a problem, its AI-powered solution, and the outcome.

Claude’s “personality” reflects Anthropic’s focus on trustworthiness and transparency. The model itself is trained using a “constitutional” approach designed to prevent hallucination and prioritise clarity.

In this setting, Claude performed like a no-nonsense strategist: fair, but demanding.

ChatGPT: The Analytical Technician

OpenAI’s ChatGPT (GPT-4o) was the strictest of the three — but also the most insightful.

It gave only one perfect score, and that went to the entry with the clearest metrics, replicable workflows, and disciplined AI use.

ChatGPT clearly preferred precision, performance, and technical logic. Entries that saved measurable time, streamlined systems, or created scalable outputs got rewarded. Entries light on hard data? Not so much.

This tracks with how ChatGPT is used in real business cases — it’s a reasoning engine. Its lens is one of productivity and execution.

Fascinating Consensus — and Sharp Differences

Despite their unique “personalities,” all three LLMs agreed on two key points:

  • The same entry came out on top for all of them.
  • The same entry consistently landed last.
  • That’s a sign that quality, when clear and measurable, cuts through.

But their reasoning for those scores varied wildly:

  • ChatGPT loved structure and scale.
  • Claude respected purpose and alignment.
  • Gemini was energised by ambition and inclusion.

This confirmed something I’ve been saying for a while — and posted about in a LinkedIn update that generated some brilliant conversations:

AI models don’t just give you different answers — they come with different priorities.

Horses for Courses: What This Really Teaches Us

Too many businesses are still asking the wrong question:

“Which AI is the best?”

Here’s the better question:

“Which AI is best for this specific job?”

Need empathetic language analysis? Claude.
Want to optimise business processes? ChatGPT.
Looking to frame a change story? Gemini.

This is what “horses for courses” means in the age of AI.

It’s not about finding a super-tool. It’s about understanding the different judgement styles, biases, and strengths each model brings — and matching them to the job in front of you.

Where Human Judgement Still Wins

Now, let’s bring it full circle.

In the end, I was the real judge.

I’d already read all the submissions. I’d formed my conclusions based on a decade of comms strategy experience, five years of award judging, and a professional obsession with how AI reshapes the future of work.

The AI helped sharpen the analysis. But it didn’t replace my judgement. That’s the sweet spot. Dan Sodergren was the judge after all.

AI is best used at best for this, as a second opinion, a challenger, a logic-checker. Maybe even a data handler. Especially when it comes to big decisions involving human creativity and complexity. And if you’re wondering what AI can and can’t do well, this exercise was the perfect reminder.

Final Thought: Build Your AI Panel, Not Just an AI Tool

The future of intelligent decision-making isn’t about relying on one model. It’s about building a multi-model mindset.

  • ChatGPT for ROI and replication
  • Claude for ethical structure and strategy
  • Gemini for vision and cultural reach

Each AI is a different mind. The real win comes when you design your systems to let each model do what it does best.

That’s not AI replacing human expertise. That’s AI amplifying it.

🧭 Want to explore multi-model AI strategy with your team?

I run keynotes, training and advisory sessions on AI, marketing, and the future of intelligent work.

🔗 www.dansodergren.com

📚 More insights: futureofwork.gumroad.com

💬 Join the discussion on LinkedIn

🐦 Or follow me at @dansodergren

References for the piece

Dan Sodergren is a keynote speaker who is represented by

Pomona Partners

www.dansodergren.com

The Fifth Industrial Revolution book is here

www.thefifthindustrialrevolution.co.uk

The king of AI will be judging our Best use of AI award! 👑 ⬇️

https://www.linkedin.com/feed/update/urn:li:activity:7336334218072936449/

Keynote speaker, professional speaker, Ted X talker, serial tech startup founder, ex marketing agency owner, Dan Sodergren

https://commshero.com/people/dan-sodergren/

My HeyGen AI clone announcing my judging… NOT the results.

https://www.linkedin.com/feed/update/urn:li:activity:7336327930261741569/

Previous
How 3 AI Models (and 1 Very Clever Human) Judged an Award
Next
 Return to site
Cookie Use
We use cookies to improve browsing experience, security, and data collection. By accepting, you agree to the use of cookies for advertising and analytics. You can change your cookie settings at any time. Learn More
Accept all
Settings
Decline All
Cookie Settings
Necessary Cookies
These cookies enable core functionality such as security, network management, and accessibility. These cookies can’t be switched off.
Analytics Cookies
These cookies help us better understand how visitors interact with our website and help us discover errors.
Preferences Cookies
These cookies allow the website to remember choices you've made to provide enhanced functionality and personalization.
Save