IBM and AI Singapore to advance Southeast Asian LLM Heaptalk

Heaptalk, Jakarta — The US-based behemoth company IBM signed a Memorandum of Understanding (MoU) with AI Singapore (AISG) to test the Southeast Asian Large Language Model (LLM), aiming to assist developers in building customized artificial intelligence (AI) applications.

In the detailed cooperation, IBM will test the Southeast Asian Languages in One Network (SEA-LION) using IBM’s Big Blue AI technology and data platform, Watsonx, and work with AI Singapore’s LLM. This scheme intends to help organizations choose suitable AI models for their business requirements.

“The progress of Gen AI will bring more significant performance in smaller language models, with users allowed to personalize models based on their business and industry requirements. No one model is a one-size-fits-all for businesses, and organizations must be empowered to use their models based on their needs,” General Manager and Technology Leader of IBM ASEAN Catherine Lian affirmed.

The open-source LLM of AISG, these entities will construct SEA-LION to be simple, flexible, and more quickly than other LLMs. As it is known, its current iteration runs on two base models, spanning a 3-billion-parameter model and a 7-billion-parameter model.

The LLM’s training data comprises around 981 billion language tokens, which AISG defines as fragments of words created from the breaking down text during the tokenization process. These fragments cover 623 billion English tokens, 128 billion Southeast Asia tokens, and 91 billion Chinese tokens.

As part of the collaboration, IBM also intends to incorporate the model into its AI use case library, Digital Self-Serve Co-Create Experience (DSCE), to allow data scientists, developers, and engineers to explore localized Gen AI to expedite their work.

“The SEA-LION LLM is a big step forward in creating an open AI system and addressing the ASEAN language challenges that companies and governments face when working with AI,” affirmed Lian.