• Newsletter
  • Contact
  • Press Releases
Thursday, May 28, 2026
Stay Ahead with Heaptalk: Your Go-To Source for Business News
  • Login
  • Register
  • Whats on
  • News
  • Events
  • Technology
  • Industry
  • GovAct
  • Expert Talk
  • Insight
  • Sustainability
No Result
View All Result
Stay Ahead with Heaptalk: Your Go-To Source for Business News
  • Whats on
  • News
  • Events
  • Technology
  • Industry
  • GovAct
  • Expert Talk
  • Insight
  • Sustainability
No Result
View All Result
Stay Ahead with Heaptalk: Your Go-To Source for Business News
No Result
View All Result
Home News

AI performance startup Arthur introduces an open-source tool for evaluating LLMs

Sinta by Sinta
October 9, 2023
in News, Technology
0
ai performance evaluator developed by arthur startup

Illustration of evaluating AI performance. Image: Arthur

Share on FacebookShare on Twitter

Arthur Startup developed Bench to enable organizations to evaluate performance of diverse LLMs in real-world scenarios, which then helps make informed and data-driven decisions.

Heaptalk, Jakarta — AI performance startup Arthur revealed its recent tool for evaluating large language models (LLMs) called Bench (08/17). This product is an open-source evaluation tool for comparing LLMs, prompts, and hyperparameters for generative text models.

Bench will allow organizations to figure out the performance of diverse LLMs in real-world scenarios. As a result, organizations can make informed and data-driven decisions when integrating the latest AI technologies into their operations.

“With Bench, we’ve created an open-source tool to help teams deeply understand the differences between LLM providers, different prompting and augmentation strategies, and custom training regimes,” said the co-founder and CEO of Arthur Adam Wenchel.

According to Wenchel, the Generative Assessment Project (GAP) research shows that understanding performance differences between LLMs can have an incredible amount of nuance. GAP is a research initiative conducted by Arthur that assesses the strengths and weaknesses of language model offerings from industry leaders, such as OpenAI, Anthropic, and Meta.

Determining the most suitable LLM for corporate applications

In more detail, Bench can help businesses in three ways: model selection & validation, budget & privacy optimization, as well as translation of academic benchmarks to real-world performance. Through model selection & validation, the tool compares the various available LLM options using consistent metrics. In this way, businesses can determine the most suitable LLM for their application.

Budget & privacy optimization helps businesses choose an affordable AI model with the ability to perform the required tasks. According to Arthur, a high price does not always refer to the LLM that best fits a company’s needs since not all applications require the most advanced or expensive AI model. In some cases, a less expensive AI model might also perform the required task in the same way.

By translating academic benchmarks to real-world performance, companies can quantitatively test and compare the performance of diverse models to evaluate them accurately and consistently. In addition, companies can also configure custom benchmarks to focus on what matters most to their specific business and customers.

As an open-source tool, there will be new metrics and other valuable features added as the project and community grow. Bench is accessible through GitHub which then can be run locally or via cloud-based.

Founded in 2019, Arthur has secured over $60M in funding from several firms, including Acrew, Greycroft, Index Ventures, BAM Elevate, Work-Bench, and Plexo Capital. Previously, the New York City-based startup launched Shield in May 2023, a firewall tool to protect organizations against risks and security issues with applied LLMs.

Tags: ai performance evaluatorarthur aiarthur bencharthur startup

Related Posts

Da Vinci Single Port Robotic Surgery

Da Vinci Single Port Robotic Surgery: One Incision. Infinite Precision.

May 18, 2026
Global Youth Diplomacy: 1,200 Delegates Convene for the 20th AYIMUN in Kuala Lumpur

Global Youth Diplomacy: 1,200 Delegates Convene for the 20th AYIMUN in Kuala Lumpur

February 3, 2026
Indonesian Ministry of Creative Economy Launches 12 Fashion Brands to Global Market Through ASIK Fashion Connect

Indonesian Ministry of Creative Economy Launches 12 Fashion Brands to Global Market Through ASIK Fashion Connect

December 10, 2025
Indonesia Under Prabowo: A Stronger Voice for Peace, Food, and Climate Action

Indonesia Under Prabowo: A Stronger Voice for Peace, Food, and Climate Action

September 24, 2025
Paving the Way for Expatriate Homeownership: Savyavasa and Permata Bank Launch Exclusive Foreign Mortgage Program

Paving the Way for Expatriate Homeownership: Savyavasa and Permata Bank Launch Exclusive Foreign Mortgage Program

June 5, 2025
KL1 Phase 2 has been completed. Credit: Equinix

Equinix completes KL1 Phase 2 data center in Kuala Lumpur

June 2, 2025
icx expo 2026
Seedbacklink
Heaptalk business news logo

We Build an Ecosystem by Sharing Business News, Headlines and Expert Talks in Professional Perspective and Positive Point of View. Latest business news media headlines platform today.

Recent Posts

  • Da Vinci Single Port Robotic Surgery: One Incision. Infinite Precision.
  • The Stranglehold at the Strait: How a Distant War Is Quietly Breaking Southeast Asia’s Economy
  • Indonesia Sets USD 17.5 Billion Target for 41st Trade Expo Indonesia 2026
  • Datacentre Innovation Series 2026: Pioneering the Future of Digital Infrastructure
  • Global Youth Diplomacy: 1,200 Delegates Convene for the 20th AYIMUN in Kuala Lumpur

Follow Us

Facebook
Twitter
LinkedIn Youtube Instagram RSS

Newsletter

  • About Us
  • Newsletter
  • Contact
  • Privacy Policy
  • Terms of Services
  • Cyber Media Guidelines
  • Disclaimer
  • SOP Perlindungan Wartawan

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Do not sell my personal information.
Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
No Result
View All Result
  • Home
  • News
  • Technology
  • Industry
  • GovAct
  • Events
  • Whats on
  • Expert Talk
  • Insight
  • Video
  • Sustainability
  • Newsletter
  • Press Releases
  • Login
  • Sign Up

© 2024 Heaptalk.com

Go to mobile version