Universität Bonn

Institute of Computer Science

Shown here is the language distribution of Teuken-7B-v0.4. Next to code Teuken-7B-v0.4 contains approximately 50% non-English text from 23 European countries and around 40% of English pretraining data.

Shown here is the language distribution of Teuken-7B-v0.4. Next to code Teuken-7B-v0.4 contains approximately 50% non-English text from 23 European countries and around 40% of English pretraining data.
Download View full-size image
Wird geladen