
Emerging Technology
New Cerebras Inference 20x Faster Than AI Hyperscale Clouds
Updated on Wed, Aug 28, 2024
Moreover, developers building AI-powered applications require such solutions to reduce the time, money and effort required. This is where AI inference solutions come in.
AI inferencing refers to the process where trained ML (machine learning) models produce conclusions from brand-new data without the intervention of humans. Essentially, the slower the AI inference solution, the longer the time taken to complete the project.
However, this might be set to change, as Cerebras Systems introduced its new super-fast and cost-effective AI inference solution.
So, what capabilities does this new solution bring and how will it help developers and users? Let’s explore!
What Is Cerebras’ New Inference Tool About?
-
Through a blog post published on its website, Cerebras Systems announced the launch of its new AI inference solution – Cerebras Inference.
-
As per the release, the new tool is the fastest AI inference solution in the world and is 20x faster than NVIDIA GPU-based hyperscale clouds.
-
The solution allows users to get instant responses to code generation, summarization, agentic tasks and perform other tasks.
-
Cerebras Inference delivers 1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B and offers the industry’s best pricing.
-
Furthermore, it offers 2.4x faster inference than Groq in Llama3.1-8B.
-
The tool costs 10c per million tokens for Lama 3.1 8B and 60c per million tokens for Llama 3 70B or roughly around 1/5 the price of GPU solutions.
-
“For Llama3.1-70B, Cerebras is the only platform to enable instant responses at a blistering 450 tokens/sec.”
-
As per its dedicated webpage, “Cerebras Inference is built to scale. Powered by four data centers across the US, Cerebras Inference has capacity to serve hundreds of billions of tokens per day with leading accuracy and reliability.”
-
The solution’s capabilities are powered by Cerebras’ third generation Wafer Scale Engine, a chip that combines hundreds of thousands of AI cores onto a single wafer-sized chip and is used to train some of the industry's largest AI models.
-
Cerebras Inference is available for developers via API access and allows them to integrate the inference solution “by simply swapping out the API key.”
-
Built on the familiar OpenAI Chat Completions format, the solution is also available via chat.
What Did Cerebras Executives Say?
-
In an interview with Reuters, Andrew Feldman, CEO of Cerebras Systems, said, “We're delivering performance that cannot be achieved by a GPU. We're doing it at the highest accuracy, and we're offering it at the lowest price."
-
Through the blog post, James Wang, the Director of Product Marketing at Cerebras Systems, said, “With record-breaking performance, industry-leading pricing, and open API access, Cerebras Inference sets a new standard for open LLM development and deployment.”
-
[Contd.] “As the only solution capable of delivering both high-speed training and inference, Cerebras opens entirely new capabilities for AI. We can’t wait to see the new and exciting applications developers will build with Cerebras Inference.”
Do you think Cerebras will be able to challenge AI inference solution providers using NVIDIA or other GPUs? Do you think Cerebras’ competitors need to make moves to retain their share of the AI inferencing market?
Let us know in the comments below!
First published on Wed, Aug 28, 2024
Enjoyed what you've read so far? Great news - there's more to explore!
Stay up to date with the latest news, a vast collection of tech articles including introductory guides, product reviews, trends and more, thought-provoking interviews, hottest AI blogs and entertaining tech memes.
Plus, get access to branded insights such as informative white papers, intriguing case studies, in-depth reports, enlightening videos and exciting events and webinars from industry-leading global brands.
Dive into TechDogs' treasure trove today and Know Your World of technology!
Disclaimer - Reference to any specific product, software or entity does not constitute an endorsement or recommendation by TechDogs nor should any data or content published be relied upon. The views expressed by TechDogs' members and guests are their own and their appearance on our site does not imply an endorsement of them or any entity they represent. Views and opinions expressed by TechDogs' Authors are those of the Authors and do not necessarily reflect the view of TechDogs or any of its officials. While we aim to provide valuable and helpful information, some content on TechDogs' site may not have been thoroughly reviewed for every detail or aspect. We encourage users to verify any information independently where necessary.
Trending TD NewsDesk
CES 2026 Updates: Intel, Atlas, Smart Bricks, And More
Intel Launches Next-Generation PC Chip at CES 2026
CES 2026 Is Here: Latest Reveals From Samsung, LG, And Plaud!
AWS re:Invent 2025: Amazon & Google Bring Multicloud Service For Faster Connectivity
Grok Is Under Fire As France And India Complain About Sexualized Deepfake Images
Join Our Newsletter
Get weekly news, engaging articles, and career tips-all free!
By subscribing to our newsletter, you're cool with our terms and conditions and agree to our Privacy Policy.
Join The Discussion