It has been more than two weeks since I open-sourced GEITje 7B. It was an exciting moment, especially since this is my first major open source contribution. But I am very pleased to see how enthusiastic all the reactions have been!

GEITje is a large open Dutch language model with 7 billion parameters, based on Mistral 7B. It has been further trained on 10 billion tokens of Dutch text. This has improved its Dutch language skills and increased its knowledge of Dutch topics.

All kinds of people have already started using it for their applications, of which we hopefully will see the first results soon. Bram VanRoy has added it to the Open Dutch LLM Evaluation Leaderboard, and also included it in his latest paper: Language Resources for Dutch Large Language Modelling. Thanks for that!

The most important links at a glance:


A (still running) series of blog posts about frequently asked questions about: