BigScience Bloom

Left behind: why the Dutch language is absent from Europe's foremost open language model

Three volunteers. A couple of weeks of work. That’s what it took to add a language to BigScience BLOOM, the open multilingual language model with no fewer than 176 billion parameters that was released mid-2022. It aimed to become an open and multilingual alternative to GPT-3. In the end, 46 languages from all over the world made it into the dataset BLOOM was trained on. Even relatively small languages like Basque and Catalan managed to be included....

18 September 2023 · 10 min · Edwin Rijgersberg