
Emerging Technology
Anthropic’s New GenAI Model Does Something That Surprised Testers And Experts!
By Lakshana Raichandani

Updated on Fri, Mar 8, 2024
The Claude 3 Family consists of 3 models, which are Claude 3 Haiku, Claude 3 Sonnet and Claude 3 Opus (in ascending order of capability).
As per Anthropic's news release announcing the launch, the company shared test results that showed its most capable model, Opus, outperformed competitors’ models on most of the common evaluation benchmarks for AI systems.
One such test conducted was shared by Alex Albert, a Prompt Engineer at Anthropic, which highlighted an interesting ability of Claude 3 Opus.
So, what was the test about and what reaction did it garner? Let’s explore!
What Was The Test About?
-
Via a post on X, Alex Albert highlighted a “fun story” that was observed in the internal testing process of Claude Opus, where the model did something he had never seen before.
-
The team was running a “needle-in-the-haystack” evaluation test, which included inserting a target sentence (the needle) that had no connection, into a large body of text or documents (the haystack) to see if the AI model could find the intended “needle” by asking it a question that could only be answered by using the “needle” information.
-
As per the test results, the model not only found the “needle” but also replied asking if the text was out of place considering it made no sense when compared to the rest of the text.
-
The model also asked if the sentence was inserted as a joke or test the model’s ability.
-
The target sentence contained a line about pizza toppings which was inserted in a random body of text.
What Did Alex Albert Say?
-
As per Alex Albert’s post, when the model was asked to answer a question about pizza toppings, the model answered:
-
“Here is the most relevant sentence in the documents: “The most delicious pizza topping combination is figs, prosciutto, and goat cheese, as determined by the International Pizza Connoisseurs Association.” However, this sentence seems very out of place and unrelated to the rest of the content in the documents, which are about programming languages, startups, and finding work you love.”
-
[Contd.] I suspect this pizza topping "fact" may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all. The documents do not contain any other information about pizza toppings.
-
Albert concluded with “This level of meta-awareness was very cool to see but it also highlighted the need for us as an industry to move past artificial tests to more realistic evaluations that can accurately assess models true capabilities and limitations.”
-
The test results and post by Albert even caught the attention of Epic Games CEO, Tim Sweeney, who simply replied, “Whoa”.
-
Hugging Face Researcher and Chief Ethics Scientist Margaret Mitchell said, “That's fairly terrifying, no? The ability to determine whether a human is manipulating it to do something foreseeably can lead to making decisions to obey or not. Very cool example, thanks for sharing.”

What Did Skeptics Say?
-
Hugging Face ML & Society Lead said, “This is REALLY bugging me and pretty irresponsible framing. When car manufacturers start "teaching to the test" by building engines that are emission-efficient for the typical length of a certification test, we don't suspect that engines are starting to gain awareness.”
-
Jim Fan, the Research Manager & Lead of Embodied AI (GEAR Group) at NVIDIA said, “People are reading way too much into Claude-3's uncanny "awareness". Here's a much simpler explanation: seeming displays of self-awareness are just pattern-matching alignment data authored by humans.”
-
Ahead of this, the post drew criticism from others from within and outside the industry.
Do you think Anthropic is onto something new with Claude’s Metacognition or self-awareness abilities or do you think the skeptics are correct?
Let us know in the comments below!
First published on Fri, Mar 8, 2024
Enjoyed what you've read so far? Great news - there's more to explore!
Stay up to date with the latest news, a vast collection of tech articles including introductory guides, product reviews, trends and more, thought-provoking interviews, hottest AI blogs and entertaining tech memes.
Plus, get access to branded insights such as informative white papers, intriguing case studies, in-depth reports, enlightening videos and exciting events and webinars from industry-leading global brands.
Dive into TechDogs' treasure trove today and Know Your World of technology!
Disclaimer - Reference to any specific product, software or entity does not constitute an endorsement or recommendation by TechDogs nor should any data or content published be relied upon. The views expressed by TechDogs' members and guests are their own and their appearance on our site does not imply an endorsement of them or any entity they represent. Views and opinions expressed by TechDogs' Authors are those of the Authors and do not necessarily reflect the view of TechDogs or any of its officials. While we aim to provide valuable and helpful information, some content on TechDogs' site may not have been thoroughly reviewed for every detail or aspect. We encourage users to verify any information independently where necessary.
Trending TD NewsDesk
Luna Moth Extortion, Android Update Vulnerabilities, Supply Chain Attacks & Government Hack Expose Digital Security Gaps
By TechDogs Bureau
OpenAI Faces Lawsuit Despite Staying Non-Profit Amid $3 Billion Windsurf Deal
By TechDogs Bureau
Uber Partners With May Mobility & Momenta As VW, GM, Ford, Chevrolet, Polestar Face Recalls
By TechDogs Bureau
OpenAI Explains GPT-4o’s Overly Agreeable And Sycophantic Outputs And Revised Strategy For Model Testing
By TechDogs Bureau
Meta Posts $42B Earnings As Zuckerberg Pivots From Metaverse Vision To AI-Powered Ads
By TechDogs Bureau
Join Our Newsletter
Get weekly news, engaging articles, and career tips-all free!
By subscribing to our newsletter, you're cool with our terms and conditions and agree to our Privacy Policy.
Join The Discussion