• Newsletter
  • Contact
  • Press Releases
Saturday, June 7, 2025
Stay Ahead with Heaptalk: Your Go-To Source for Business News
  • Login
  • Register
  • Whats on
  • News
  • Events
  • Technology
  • Industry
  • GovAct
  • Expert Talk
  • Insight
  • Sustainability
No Result
View All Result
Stay Ahead with Heaptalk: Your Go-To Source for Business News
  • Whats on
  • News
  • Events
  • Technology
  • Industry
  • GovAct
  • Expert Talk
  • Insight
  • Sustainability
No Result
View All Result
Stay Ahead with Heaptalk: Your Go-To Source for Business News
No Result
View All Result
Home News

AI performance startup Arthur introduces an open-source tool for evaluating LLMs

Sinta by Sinta
October 9, 2023
in News, Technology
0
ai performance evaluator developed by arthur startup

Illustration of evaluating AI performance. Image: Arthur

Share on FacebookShare on Twitter

Arthur Startup developed Bench to enable organizations to evaluate performance of diverse LLMs in real-world scenarios, which then helps make informed and data-driven decisions.

Heaptalk, Jakarta — AI performance startup Arthur revealed its recent tool for evaluating large language models (LLMs) called Bench (08/17). This product is an open-source evaluation tool for comparing LLMs, prompts, and hyperparameters for generative text models.

Bench will allow organizations to figure out the performance of diverse LLMs in real-world scenarios. As a result, organizations can make informed and data-driven decisions when integrating the latest AI technologies into their operations.

World Ai Jakarta 2025
World Ai Jakarta 2025

“With Bench, we’ve created an open-source tool to help teams deeply understand the differences between LLM providers, different prompting and augmentation strategies, and custom training regimes,” said the co-founder and CEO of Arthur Adam Wenchel.

According to Wenchel, the Generative Assessment Project (GAP) research shows that understanding performance differences between LLMs can have an incredible amount of nuance. GAP is a research initiative conducted by Arthur that assesses the strengths and weaknesses of language model offerings from industry leaders, such as OpenAI, Anthropic, and Meta.

Determining the most suitable LLM for corporate applications

In more detail, Bench can help businesses in three ways: model selection & validation, budget & privacy optimization, as well as translation of academic benchmarks to real-world performance. Through model selection & validation, the tool compares the various available LLM options using consistent metrics. In this way, businesses can determine the most suitable LLM for their application.

Budget & privacy optimization helps businesses choose an affordable AI model with the ability to perform the required tasks. According to Arthur, a high price does not always refer to the LLM that best fits a company’s needs since not all applications require the most advanced or expensive AI model. In some cases, a less expensive AI model might also perform the required task in the same way.

By translating academic benchmarks to real-world performance, companies can quantitatively test and compare the performance of diverse models to evaluate them accurately and consistently. In addition, companies can also configure custom benchmarks to focus on what matters most to their specific business and customers.

As an open-source tool, there will be new metrics and other valuable features added as the project and community grow. Bench is accessible through GitHub which then can be run locally or via cloud-based.

Founded in 2019, Arthur has secured over $60M in funding from several firms, including Acrew, Greycroft, Index Ventures, BAM Elevate, Work-Bench, and Plexo Capital. Previously, the New York City-based startup launched Shield in May 2023, a firewall tool to protect organizations against risks and security issues with applied LLMs.

Tags: ai performance evaluatorarthur aiarthur bencharthur startup

Related Posts

Paving the Way for Expatriate Homeownership: Savyavasa and Permata Bank Launch Exclusive Foreign Mortgage Program

Paving the Way for Expatriate Homeownership: Savyavasa and Permata Bank Launch Exclusive Foreign Mortgage Program

June 5, 2025
KL1 Phase 2 has been completed. Credit: Equinix

Equinix completes KL1 Phase 2 data center in Kuala Lumpur

June 2, 2025
Commemorating Professor Soemitro's 108th anniversary at Soemitro Center (05/29). Credit: Haris

Soemitro Center: A platform for Indonesia’s young economists

May 31, 2025
Lifree breathable adult diapers can help improve skin health. Credit: Haris

Unicharm research: Lifree improves quality of life for elderly

May 31, 2025
Xiaomi's premiumization strategy has yielded positive results in Q1 2025. Credit: Sinta

Applying premiumization strategy, Xiaomi gains 64.5% net profit jump in Q1 2025

May 31, 2025
Kaspersky appointed Defi Nofitra as first country manager for Indonesia. Credit: Kaspersky

Kaspersky appoints Defi Nofitra as first country manager for Indonesia

May 31, 2025
  • 32321

    New tech layoff chapter, Microsoft lays off thousands of its cloud unit ‘Azure’

    1 shares
    Share 0 Tweet 0
  • Nokia rolls out 6600 5G Ultra

    0 shares
    Share 0 Tweet 0
  • Performing a second layoff round, Morgan Stanley to reduce 3,000 workforces in Q2 2023

    1 shares
    Share 0 Tweet 0
  • TikTok Shop to reach a US$15 billion in its GMV transactions

    1 shares
    Share 0 Tweet 0
  • International Women’s Day – Opportunity for Businesses to Support Women in the Workplace

    0 shares
    Share 0 Tweet 0
the 10th world battery & energy industry expo 2025World Ai Jakarta 2025
Heaptalk business news logo

We Build an Ecosystem by Sharing Business News, Headlines and Expert Talks in Professional Perspective and Positive Point of View. Latest business news media headlines platform today.

Recent Posts

  • Paving the Way for Expatriate Homeownership: Savyavasa and Permata Bank Launch Exclusive Foreign Mortgage Program
  • Equinix completes KL1 Phase 2 data center in Kuala Lumpur
  • Soemitro Center: A platform for Indonesia’s young economists
  • Unicharm research: Lifree improves quality of life for elderly
  • Applying premiumization strategy, Xiaomi gains 64.5% net profit jump in Q1 2025

Follow Us

Facebook
Twitter
LinkedIn Youtube Instagram RSS

Newsletter

  • About Us
  • Editorial
  • Newsletter
  • Contact
  • Privacy Policy
  • Cyber Media Guidelines
  • Disclaimer
  • SOP Perlindungan Wartawan

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Do not sell my personal information.
Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
No Result
View All Result
  • Home
  • News
  • Technology
  • Industry
  • GovAct
  • Events
  • Whats on
  • Expert Talk
  • Insight
  • Sustainability
  • Newsletter
  • Press Releases
  • Login
  • Sign Up

© 2024 Heaptalk.com