• Newsletter
  • Contact
  • Press Releases
Saturday, June 7, 2025
Stay Ahead with Heaptalk: Your Go-To Source for Business News
  • Login
  • Register
  • Whats on
  • News
  • Events
  • Technology
  • Industry
  • GovAct
  • Expert Talk
  • Insight
  • Sustainability
No Result
View All Result
Stay Ahead with Heaptalk: Your Go-To Source for Business News
  • Whats on
  • News
  • Events
  • Technology
  • Industry
  • GovAct
  • Expert Talk
  • Insight
  • Sustainability
No Result
View All Result
Stay Ahead with Heaptalk: Your Go-To Source for Business News
No Result
View All Result
Home News

Databricks revamps its open-source code with a new 15k dataset to train AI models for commercial use

Sinta by Sinta
October 9, 2023
in News, Technology
0
open source code

Illustration of Databricks' open source code to train AI chatbots. Photo: Chris Ried/ Unsplash

Share on FacebookShare on Twitter

Databricks collected 15,000 datasets of instruction response pairs from more than 5,000 employees during March and April 2023 to replace the previous training data.

Heaptalk, Jakarta — A startup providing open and unified platforms for data and AI, Databricks, released Dolly 2.0, the open-source instruction-following large language model (LLM) for commercial purposes (04/12).

The latest version of Dolly consists of 15,000 human-generated prompts for training AI models to perform interactivity similar to ChatGPT. According to the company’s official statement, the dataset contains natural and expressive instruction and response pairs, designed to represent a wide range of behaviors.

World Ai Jakarta 2025
World Ai Jakarta 2025

These instruction and response pairs are claimed to include brainstorming, content generation, information extraction, and summarization. Databricks collected this dataset from more than 5,000 employees in 40 countries by filling out questionnaires during March and April 2023.

This new dataset was created to address the constraints that occurred in Dolly 1.0. Released in late March 2023, this initial version was trained by the Stanford Alpaca team using a dataset generated from the OpenAI API.

Apparently, the dataset has terms of service to prevent the creation of a model similar to ChatGPT developed by OpenAI. This caused Dolly 1.0 could not to be used in commercial products. Therefore, Databricks decided to create its own dataset for commercial use.

Users can verify the training data themselves

“We are open-sourcing the entirety of Dolly 2.0, including the training code, the dataset, and the model weights, all suitable for commercial use. This means that any organization can create, own, and customize powerful LLMs that can talk to people, without paying for API access or sharing data with third parties,” stated Databricks on its official blog.

CEO of Databricks, Ali Ghodsi, delivered that the company unveils free training data to help other companies make their own AI systems, possibly by using Databricks, as quoted by Reuters.

Ali admits that the dataset is still not perfect since it comes only from Databricks employees, who are mostly male. However, users can verify the training data themselves, which they cannot do with other models such as OpenAI’s ChatGPT and Google’s Bard.

“We are not claiming that this is an unbiased dataset. We are just trying to push the community to go in this direction of more transparency, and more of everyone owning their own models instead of just a few that we have to trust,” concluded Ali.

Tags: ai chatbot codeai chatbot source codedatabricksdatabricks dollyopen source ai applicationsopen source ai codeopen source chatbot builderopen source code

Related Posts

Paving the Way for Expatriate Homeownership: Savyavasa and Permata Bank Launch Exclusive Foreign Mortgage Program

Paving the Way for Expatriate Homeownership: Savyavasa and Permata Bank Launch Exclusive Foreign Mortgage Program

June 5, 2025
KL1 Phase 2 has been completed. Credit: Equinix

Equinix completes KL1 Phase 2 data center in Kuala Lumpur

June 2, 2025
Commemorating Professor Soemitro's 108th anniversary at Soemitro Center (05/29). Credit: Haris

Soemitro Center: A platform for Indonesia’s young economists

May 31, 2025
Lifree breathable adult diapers can help improve skin health. Credit: Haris

Unicharm research: Lifree improves quality of life for elderly

May 31, 2025
Xiaomi's premiumization strategy has yielded positive results in Q1 2025. Credit: Sinta

Applying premiumization strategy, Xiaomi gains 64.5% net profit jump in Q1 2025

May 31, 2025
Kaspersky appointed Defi Nofitra as first country manager for Indonesia. Credit: Kaspersky

Kaspersky appoints Defi Nofitra as first country manager for Indonesia

May 31, 2025
  • 32321

    New tech layoff chapter, Microsoft lays off thousands of its cloud unit ‘Azure’

    1 shares
    Share 0 Tweet 0
  • Nokia rolls out 6600 5G Ultra

    0 shares
    Share 0 Tweet 0
  • Performing a second layoff round, Morgan Stanley to reduce 3,000 workforces in Q2 2023

    1 shares
    Share 0 Tweet 0
  • TikTok Shop to reach a US$15 billion in its GMV transactions

    1 shares
    Share 0 Tweet 0
  • International Women’s Day – Opportunity for Businesses to Support Women in the Workplace

    0 shares
    Share 0 Tweet 0
the 10th world battery & energy industry expo 2025World Ai Jakarta 2025
Heaptalk business news logo

We Build an Ecosystem by Sharing Business News, Headlines and Expert Talks in Professional Perspective and Positive Point of View. Latest business news media headlines platform today.

Recent Posts

  • Paving the Way for Expatriate Homeownership: Savyavasa and Permata Bank Launch Exclusive Foreign Mortgage Program
  • Equinix completes KL1 Phase 2 data center in Kuala Lumpur
  • Soemitro Center: A platform for Indonesia’s young economists
  • Unicharm research: Lifree improves quality of life for elderly
  • Applying premiumization strategy, Xiaomi gains 64.5% net profit jump in Q1 2025

Follow Us

Facebook
Twitter
LinkedIn Youtube Instagram RSS

Newsletter

  • About Us
  • Editorial
  • Newsletter
  • Contact
  • Privacy Policy
  • Cyber Media Guidelines
  • Disclaimer
  • SOP Perlindungan Wartawan

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Do not sell my personal information.
Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
No Result
View All Result
  • Home
  • News
  • Technology
  • Industry
  • GovAct
  • Events
  • Whats on
  • Expert Talk
  • Insight
  • Sustainability
  • Newsletter
  • Press Releases
  • Login
  • Sign Up

© 2024 Heaptalk.com