It has been more than two weeks since I open-sourced GEITje 7B. It was an exciting moment, especially since this is my first major open source contribution. But I am very pleased to see how enthusiastic all the reactions have been!
GEITje is a large open Dutch language model with 7 billion parameters, based on Mistral 7B. It has been further trained on 10 billion tokens of Dutch text. This has improved its Dutch language skills and increased its knowledge of Dutch topics.
All kinds of people have already started using it for their applications, of which we hopefully will see the first results soon. Bram VanRoy has added it to the Open Dutch LLM Evaluation Leaderboard, and also included it in his latest paper: Language Resources for Dutch Large Language Modelling. Thanks for that!
Links
The most important links at a glance:
- GEITje on GitHub: Extensive README about the model, and the source code of course.
- π€ Hugging Face Models for direct access to the models:
- Chat with GEITje 7B chat v2 in π€ Hugging Face Spaces (thanks to Hugging Face for the community GPU grant!)
- Overview on π€ Hugging Face Collections with all models, quantized variants, and the datasets.
FAQs
A (still running) series of blog posts about frequently asked questions about: