AI2 introduces Dolma’s 3 trillion open dataset to train language models

Sinta — Tue, 22 Aug 2023 03:45:42 +0000

Short for ‘Data to Feed OLMo’s Appetite’, Dolma dataset contains 3 trillion tokens derived from web content, academic publications, code, books, and encyclopedic materials. Heaptalk, Jakarta — Seattle-based non-profit research institute, The Allen Institute for AI (AI2), introduced a massive open dataset Dolma for training language models. The dataset is part of its open language […]

The post AI2 introduces Dolma’s 3 trillion open dataset to train language models appeared first on Stay Ahead with Heaptalk: Your Go-To Source for Business News.

Latest business news media headlines platform today

AI2 introduces Dolma’s 3 trillion open dataset to train language models