Interview in the Poki-podcast: "The Dutch Language Model: GEITje ft. Edwin Rijgersberg"

This week I was honored to star as a guest in Alexander Klöpping’s en Wietse Hage’s podcast: Poki – de Podcast over Kunstmatige Intelligentie. We had a good converstation about GEITje, about finetuning Large Language models in general and finetuning for Dutch in particular. We spoke for about half an hour, and the conversation ended practically without edits in the podcast. Including what will have become a classic now: the Bassietest....

17 January 2024 Â· 1 min Â· Edwin Rijgersberg

GEITje FAQs: Why the name "GEITje"?

The second in a series of posts about questions I get about GEITje. “Why the name GEITje?” Muppets, Cows, and Seals The name “GEITje” had actually been in the back of my head for a long time as the name for a Dutch large language model. Naming in the world of language models is subject to interesting trends. In 2017, the Muppet generation of language models started with Allen Institute for AI’s ELMo, followed by Google’s breakthrough BERT....

3 January 2024 Â· 3 min Â· Edwin Rijgersberg

GEITje FAQs: Why I trained GEITje

The first in a series of posts about questions I’ve gotten about GEITje. “Why did you create a language model?” I have received this question several times in recent weeks. Usually immediately followed by a follow-up question: “Doesn’t ChatGPT already exist?” Not a strange question, actually. Here are my three main reasons: 1. Because open models are needed ChatGPT performs great in Dutch. If you have an application where you want to try a LLM, definitely go for ChatGPT or one of the OpenAI APIs....

2 January 2024 Â· 6 min Â· Edwin Rijgersberg

GEITje 7B: A Large Open Dutch Language Model

It has been more than two weeks since I open-sourced GEITje 7B. It was an exciting moment, especially since this is my first major open source contribution. But I am very pleased to see how enthusiastic all the reactions have been! GEITje is a large open Dutch language model with 7 billion parameters, based on Mistral 7B. It has been further trained on 10 billion tokens of Dutch text. This has improved its Dutch language skills and increased its knowledge of Dutch topics....

2 January 2024 Â· 2 min Â· Edwin Rijgersberg