
Emerging Technology
Gretel Launches World’s Largest Synthetic Open-source Text-to-SQL Dataset To Boost AI For Businesses!
Updated on Tue, Apr 9, 2024
In order to develop capable tools, AI models must be trained with high-quality data.
Herein lies a challenge for businesses looking to develop their own tools, one that Gretel, a leader in the synthetic data industry that provides organizations with tools to generate high-quality synthetic data, is looking to solve with its latest announcement.
So, what is Gretel bringing businesses that’s set to benefit businesses? Let’s explore!
What Did Gretel Announce?
-
As per a blog post published on Gretel’s website, the company announced the release of a purely synthetic Text-to-SQL dataset, that’s set to accelerate AI model training for businesses and offer them new possibilities through AI tools.
-
As per the post, the release comes as the company builds on its principles of “accelerating the transition to data-centric AI by allowing teams to produce data either from scratch or as an augmented version of existing data, all while preserving privacy and security.”
-
The dataset, which is available on Hugging Face under the Apache 2.0 license, comprises a rich dataset of high-quality synthetic Text-to-SQL samples.
-
Ahead of this, the dataset (called gretelai/synthetic_text_to_sql) is the largest and most diverse synthetic Text-to-SQL dataset available to-date.
-
As of April 2024, the dataset includes 105,851 records, which are broken into 100,000 train and 5,851 test records, covering 100 industry domains.
-
This also includes covering a comprehensive array of SQL tasks, including data definition, retrieval, manipulation and analytics & reporting, as well as a wide range of SQL complexity levels, including subqueries, single joins, multiple joins, aggregations, window functions and set operations.
-
The dataset was designed and generated using Gretel Navigator, which is “Gretel's first generative AI system designed to create, edit and augment tabular data using natural language or SQL prompts.”
-
Essentially, the move will help businesses and developers produce capable AI models that can understand natural language queries and generate SQL queries, by leveraging data from complex data sources.
-
The vast dataset offered by Gretel allows businesses of various kinds to generate instant answers and even offers benefits to governments by providing citizens with easy access to public records databases.
What Did Gretel Say About The Release?
-
Speaking in an interview, Yev Meyer, the Chief Scientist at Gretel, said, “Access to quality training data is one of the biggest obstacles to building with generative AI ... High-quality synthetic data can fill this gap. One of the most notable recent shifts in the world of Large Language Models (LLMs) and AI is the renewed focus on data quality.”
-
Meyer continued, “Our open source Text-to-SQL dataset was generated by Gretel Navigator, our compound AI system that integrates agent-based execution, multiple proprietary models, including a custom tabular Large Language Model, and privacy-enhancing technologies to generate high quality synthetic data from scratch, on demand.”
-
“Every dataset we generate is assessed for quality. Quality benchmarking is central to what we do.”
-
“Gretel solutions are built with enterprise scale in mind so that customers can satisfy their data needs when creating data from scratch or editing and augmenting existing data.”
Do you think Gretel’s move will spark competitors to make similar moves? Do you think the future of AI should be open-source?
Let us know in the comments below!
First published on Tue, Apr 9, 2024
Enjoyed what you've read so far? Great news - there's more to explore!
Stay up to date with the latest news, a vast collection of tech articles including introductory guides, product reviews, trends and more, thought-provoking interviews, hottest AI blogs and entertaining tech memes.
Plus, get access to branded insights such as informative white papers, intriguing case studies, in-depth reports, enlightening videos and exciting events and webinars from industry-leading global brands.
Dive into TechDogs' treasure trove today and Know Your World of technology!
Disclaimer - Reference to any specific product, software or entity does not constitute an endorsement or recommendation by TechDogs nor should any data or content published be relied upon. The views expressed by TechDogs' members and guests are their own and their appearance on our site does not imply an endorsement of them or any entity they represent. Views and opinions expressed by TechDogs' Authors are those of the Authors and do not necessarily reflect the view of TechDogs or any of its officials. While we aim to provide valuable and helpful information, some content on TechDogs' site may not have been thoroughly reviewed for every detail or aspect. We encourage users to verify any information independently where necessary.
Trending TD NewsDesk
Google To Allow Gmail Address Changes As WeTransfer Co-Founder Launches An Alternative
OpenAI Hires Head Of AI Safety While Naware Uses Technology For Chemical-Free Weed Control
From AI Chips To Robotaxis: NVIDIA, Waymo, And Meta Signal A Turning Point For AI
AWS re:Invent 2025: Amazon & Google Bring Multicloud Service For Faster Connectivity
Intel, Nestlé, And Bharti Make Major Strategic Moves
Join Our Newsletter
Get weekly news, engaging articles, and career tips-all free!
By subscribing to our newsletter, you're cool with our terms and conditions and agree to our Privacy Policy.

Join The Discussion