Software Development
Open-source AI Model DeepSeek V3 Challenges Meta, AnthropicAI And OpenAI
Updated on Fri, Dec 27, 2024
With hundreds of players in the AI arena, innovations and breakthroughs are coming in thick and fast. Yet, a recent report shows that a new wave of AI applications is upon us, one that is witnessing major funding and interest.
Enterprises are raising their generative AI budgets, with Menlo VC’s 2024: The State of Generative AI in the Enterprise, spanning 600 US enterprise leaders, showing that business spending on Gen AI has grown by 500%.
Another key report, titled Shaping the Future of Generative AI, found that most businesses preferred open-source AI models, with 82% believing open-source to be crucial in shaping Gen AI’s future. This has led to a spate of releases in the open-source AI model space – and the latest one is a game-changer!
Chinese AI research lab DeepSeek has unveiled its latest innovation, DeepSeek-V3, a groundbreaking open-source language model that is poised to challenge proprietary AI systems. Beating even the most refined models, DeepSeek’s AI model might just redefine the possibilities of open-source AI technology.
Well, let’s dive into the specifics!
What Is DeepSeek-V3?
DeepSeek-V3 is a language model built on a Mixture-of-Experts (MoE) framework, boasting 671 billion parameters. This architecture activates only 37 billion parameters per token, making it 3x faster than V2 in optimizing efficiency without compromising its performance.
Trained on a dataset of 14.8 trillion tokens, DeepSeek-V3 is equipped to handle diverse linguistic inputs, enhancing its versatility across tasks like coding, translation and creative writing. Even the model’s training was remarkably cost-efficient, completed with a budget of $5.57 million. Not only was this a fraction of the cost for similar-scale models but V3 is also fully open-source.
The model is available under DeepSeek’s licensing agreement, allowing developers to download, modify and utilize it for a range of commercial applications. It can be accessed through Hugging Face and tested on DeepSeek Chat, a ChatGPT-like interface built by DeepSeek.
So, what enhanced capabilities does DeepSeek-V3 boast?
What Are DeepSeek-V3’s Capabilities?
DeepSeek’s latest model is on par with proprietary AI models, thanks to its range of advanced features. Here’s a quick overview:
-
Coding Competitions: DeepSeek-V3 outperformed industry-leading models, including Meta’s Llama 3.1 405B, OpenAI’s GPT-4o and Alibaba’s Qwen 2.5 72B in Codeforces programming contests.
-
Mathematical Reasoning: The open model scored an impressive 90.2 on the Math-500 test, the highest among its counterparts.
-
General Knowledge And Reasoning: DeepSeek-V3 closely matched the performance of closed-source models like Anthropic’s Claude 3.5 Sonnet, showcasing its prowess in reasoning and language tasks.
So, despite being open-source, DeepSeek's latest offering is already topping the charts, outperforming leading open-source models and almost matching the performances of closed models. So, how does DeepSeek-V3 do this?
What Makes DeepSeek-V3 Unique?
The feature set of DeepSeek-V3 makes it unique from its competitors, including:
-
Auxiliary Loss-Free Load-Balancing: This strategy ensures balanced utilization of internal experts, enhancing model efficiency and stability. It dynamically monitors and adjusts the load on the expert models to utilize them in a balanced way without compromising the model performance.
-
Multi-Token Prediction (MTP): MTP enables the model to predict multiple future tokens simultaneously, improving training efficiency threefold and achieving a generation speed of 60 tokens per second.
-
Cost-Efficiency: Utilizing algorithmic optimizations like FP8 mixed precision training and the DualPipe algorithm for pipeline parallelism, DeepSeek-V3 achieved its high-performance benchmarks in an economical way.
These features set DeepSeek’s open model apart from its much higher-priced and cost-extensive counterparts. So, did it make waves?
What Did AI Experts Say?
DeepSeek-V3’s unveiling has sparked significant reactions across the AI and technology community with its potential to disrupt the AI landscape.
Liang Wenfeng, the CTO of High-Flyer, a company building server clusters for AI model training, said that closed-source AI such as OpenAI’s was a “temporary” moat and “hasn’t stopped others from catching up.”
Even Andrej Karpathy, founder of EurekaLabsAI and an ex-employee of Tesla and OpenAI, said, “DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M).”
He further stated, “For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being brought up today are more around 100K GPUs. E.g. Llama 3 405B used 30.8M GPU-hours, while DeepSeek-V3 looks to be a stronger model at only 2.8M GPU-hours (~11X less compute).”
On the other hand, some analysts have raised concerns over potential regulatory implications, particularly given the model’s origin in China can lead to alignment issues with government guidelines. Yet, DeepSeek-V3 bhas impressed AI leaders and promises to be a game-changer.
Conclusion
By narrowing the performance gap between open-source and closed-source models, DeepSeek-V3 promotes innovation and competition in the AI landscape. Enterprises and developers, especially those invested in open-source AI, now have a robust, cost-effective alternative to proprietary models. As open-source models gain ground, they will challenge the dominance of closed systems.
Do you think the emergence of models like DeepSeek-V3 will democratize AI development? Can DeepSeek and other open-source AI models diminish the reliance on proprietary systems?
Share your thoughts in the comments below!
First published on Fri, Dec 27, 2024
Enjoyed what you read? Great news – there’s a lot more to explore!
Dive into our content repository of the latest tech news, a diverse range of articles spanning introductory guides, product reviews, trends and more, along with engaging interviews, up-to-date AI blogs and hilarious tech memes!
Also explore our collection of branded insights via informative white papers, enlightening case studies, in-depth reports, educational videos and exciting events and webinars from leading global brands.
Head to the TechDogs homepage to Know Your World of technology today!
Disclaimer - Reference to any specific product, software or entity does not constitute an endorsement or recommendation by TechDogs nor should any data or content published be relied upon. The views expressed by TechDogs' members and guests are their own and their appearance on our site does not imply an endorsement of them or any entity they represent. Views and opinions expressed by TechDogs' Authors are those of the Authors and do not necessarily reflect the view of TechDogs or any of its officials. While we aim to provide valuable and helpful information, some content on TechDogs' site may not have been thoroughly reviewed for every detail or aspect. We encourage users to verify any information independently where necessary.
Trending TD NewsDesk
Google To Allow Gmail Address Changes As WeTransfer Co-Founder Launches An Alternative
OpenAI Hires Head Of AI Safety While Naware Uses Technology For Chemical-Free Weed Control
From AI Chips To Robotaxis: NVIDIA, Waymo, And Meta Signal A Turning Point For AI
Intel, Nestlé, And Bharti Make Major Strategic Moves
Alexa+ Expands Service Bookings As Google Pushes Gemini Transition To 2026
Join Our Newsletter
Get weekly news, engaging articles, and career tips-all free!
By subscribing to our newsletter, you're cool with our terms and conditions and agree to our Privacy Policy.

Join The Discussion