Heaptalk, Jakarta — Indonesia’s National Research and Innovation Agency (BRIN), Collaboration for Research and Innovation in Artificial Intelligence (KORIKA), and two of GDP Venture’s portfolios, including Glair.ai and Datasaur.ai performed strategic moves to develop open-source Large Language Model (LLM) enhanced by AI Singapore (AISG) that accessible to a wide range.
In its development, AI Singapore will focus on advancing LLM as the foundation model in the first layer. Meanwhile, Datasaur.ai will provide the tools needed to assist application development that Glair.ai will construct by utilizing AISG’s foundation model and the provided tools of Datasaur.ai.
Head of the Center for Data Science and Information Research, Electronics, and Informatics Research Organization at BRIN, Esa Prakasa, conveyed the Indonesia-based LLM adoption can be beneficial for BRIN to enhance the quality and efficiency of research, improving accessibility to the public, supporting technological development, and enhancing human resources.
Amidst the evolving artificial intelligence (AI) landscape, the latest Natural Language Processing (NLP) breakthrough, the Large Language Model, has become the most highlighted technology. While notable examples, such as OpenAI’s ChatGPT and Google’s Bard, exemplify LLM adoption, most of this domain research remains centered on the English language.
Moreover, the trend demonstrated a void within other language markets, consolidating technological dominance among English nations. Statista’s January 2023 shows that English commands a substantial 58.8% share in web content, whereas Bahasa Indonesia lags only 0.6%. This fact emphasizes the imperative for broader research and development endeavors, aiming to cater to the unique linguistic nuances and demands of Bahasa Indonesia.
For this reason, these multiple entities will work together to provide an LLM for Bahasa Indonesia through the SEA-LION platform developed by AI Singapore (AISG), which is expected to be an alternative for business actors in the archipelago, unlock tremendous opportunities of users to gain comprehensive knowledge following the local culture of the country, assist government in improving communication quality to the public, enhancing public service provision, and driving research and development.
“ASEAN in the global economy has an important role, but still lacks representation. We perceive tremendous potential for the SEA-LION LMM to power products and solutions that provide significant benefits to Indonesia.” The Senior Director of AI Singapore (AISG), Leslie Teo, said.
As is well known, The SEA-LION platform is built on the Mosaic Pretrained Transformers (MPT) architecture with a total vocabulary of 256,000. This model adopts the SEABPETokenizer token, specifically designed for Southeast Asian languages, including Indonesian.
The CTO at GDP Venture, who also served as the CEO and CTO of GDP Labs, On Lee, delivered, “GDP Venture, through its AI technology solutions portfolio, GLAIR.ai, and Datasaur.ai, are currently adapting the SEA-LION platform from AI Singapore to align with the Indonesian context,”
Further, he observed that this effort promises benefits such as operational cost reduction, increased revenue, productivity, and effective collaboration between humans and AI, all contributing to economic growth and technological advancement in Indonesia and Southeast Asia.