Hugging Face is taking its first step into machine translation this week with the release of more than 1,000 models. Researchers trained models using unsupervised learning and the Open Parallel Corpus (OPUS). OPUS is a project undertaken by the University of Helsinki and global partners to gather and open-source a wide variety of language data sets, particularly for low resource languages. Low resource languages are those with less training data than more commonly used languages like English.
Models trained with OPUS data now make up the majority of models provided by Hugging Face and the University of Helsinki’s Language Technology and Research Group the largest contributing organization. Before this week, Hugging Face was best known for enabling easy access to state-of-the-art language models and language generation models, like Google’s BERT, which can predict the next characters, words, or sentences that will appear in text. The Hugging Face Transformers library for Python includes pretrained versions of advanced and state-of-the-art NLP models like versions of Google AI’s BERT and XLNet, Facebook AI’s RoBERTa, and OpenAI’s GPT-2.