TechDogs-"Hackerrank Introduces New Benchmark To Assess Advanced AI Models"

Software Development

Hackerrank Introduces New Benchmark To Assess Advanced AI Models

By GlobeNewswire

GlobeNewswire
Overall Rating

Industry Leader Known for Software Development Skills Expertise Introduces Real-World Benchmark of AI Software Development Capabilities

CUPERTINO, Calif., Feb. 11, 2025 (GLOBE NEWSWIRE) -- HackerRank, the Developer Skills Company, today introduced its new ASTRA Benchmark. ASTRA, which stands for Assessment of Software Tasks in Real-World Applications, is designed to evaluate the capabilities of advanced AI models, such as ChatGPT, Claude or Gemini, to perform tasks across the entire software development lifecycle.

The ASTRA Benchmark consists of multi-file, project-based problems designed to mimic real-world coding tasks. The intent of the HackerRank ASTRA Benchmark is to determine the correctness and consistency of an AI model’s coding ability in relation to practical applications.

“With the ASTRA Benchmark, we’re setting a new standard for evaluating AI models,” said Vivek Ravisankar, co-founder and CEO of HackerRank. “As software development becomes more human + AI, it’s important that we have a very good understanding of the combined abilities. Our experience pioneering the market in assessing software development skills makes us uniquely qualified to assess the abilities of AI models acting as agents for software developers.”

A key highlight from the benchmark showed o1 from OpenAI was the top performer, but Claude- -3.5-sonnet produced more consistent results.

Key features of ASTRA Benchmark include:

  • Diverse skill domains: The current version includes 65 project-based coding questions, primarily focused on front-end development. These questions are categorized into 10 primary coding skill domains and 34 subcategories.
  • Multi-file project questions: To mimic real-world development, ASTRA’s dataset includes an average of 12 source code and configuration files per question as model inputs. This results in an average of 61 lines of solution code per question.
  • Model correctness and consistency evaluation: To provide a more precise assessment, ASTRA prioritizes comprehensive metrics such as average scores, average pass@1 and median standard deviation.
  • Wide test case coverage: ASTRA’s dataset contains an average of 6.7 test cases per question, designed to rigorously evaluate the correctness of implementations.
  • Benchmark Results: For a full report and analysis of the initial benchmark results, please visit hackerrank.com/ai/astra.

Ravisankar added, “By open sourcing our ASTRA Benchmark, we’re offering the AI community the opportunity to run their models against a high-quality, independent benchmark. This supports the continued advancement of AI while fostering more collaboration and transparency in the AI community to ensure the integrity of new models.”

For more information about HackerRank’s ASTRA Benchmark, contact rafik@hackerrank.com.

About HackerRank
HackerRank, the Developer Skills Company, leads the market with over 2,500 customers and a community of over 25 million developers. Having pioneered this space, companies trust HackerRank to help them set up a skills strategy, showcase their brand to developers, implement a skills-based hiring process, and ultimately upskill and certify employees…all driven by AI. Learn more at hackerrank.com.

CONTACT: Note to editors: Trademarks and registered trademarks referenced herein remain the property of their respective owners. Interview requests will be coordinated through the media contacts listed below.

Media Contact:

Kate Achille
The Devon Group for HackerRank
kate@devonpr.com

First published on Wed, Feb 12, 2025

Enjoyed what you read? Great news – there’s a lot more to explore!

Dive into our content repository of the latest tech news, a diverse range of articles spanning introductory guides, product reviews, trends and more, along with engaging interviews, up-to-date AI blogs and hilarious tech memes!

Also explore our collection of branded insights via informative white papers, enlightening case studies, in-depth reports, educational videos and exciting events and webinars from leading global brands.

Head to the TechDogs homepage to Know Your World of technology today!

Disclaimer - Reference to any specific product, software or entity does not constitute an endorsement or recommendation by TechDogs nor should any data or content published be relied upon. The views expressed by TechDogs' members and guests are their own and their appearance on our site does not imply an endorsement of them or any entity they represent. Views and opinions expressed by TechDogs' Authors are those of the Authors and do not necessarily reflect the view of TechDogs or any of its officials. While we aim to provide valuable and helpful information, some content on TechDogs' site may not have been thoroughly reviewed for every detail or aspect. We encourage users to verify any information independently where necessary.

Join The Discussion

- Promoted By TechDogs -

IDC MarketScape: Worldwide Modern Endpoint Security for Midsize Businesses 2024 Vendor Assessment

Join Our Newsletter

Get weekly news, engaging articles, and career tips-all free!

By subscribing to our newsletter, you're cool with our terms and conditions and agree to our Privacy Policy.

  • Dark
  • Light