Introducing huBERT

This paper introduces the huBERT family of models. The flagship is the eponymous BERT Base model trained on the new Hungarian Webcorpus 2.0, a 9-billion-token corpus of Web text collected from the Common Crawl. This model outperforms the multilingual BERT in masked language modeling by a huge margin...

Teljes leírás

Elmentve itt :

Bibliográfiai részletek
Szerző:	Nemeskey Dávid Márk
Testületi szerző:	Magyar számítógépes nyelvészeti konferencia (17.) (2021) (Szeged)
Dokumentumtípus:	Könyv része
Megjelent:	2021
Sorozat:	Magyar Számítógépes Nyelvészeti Konferencia 17
Kulcsszavak:	Nyelvészet - számítógép alkalmazása
Tárgyszavak:	Természettudományok Számítás- és információtudomány Bölcsészettudományok Nyelvek és irodalom
Online Access:	http://acta.bibl.u-szeged.hu/73353

Leíró adatok
Tartalmi kivonat:	This paper introduces the huBERT family of models. The flagship is the eponymous BERT Base model trained on the new Hungarian Webcorpus 2.0, a 9-billion-token corpus of Web text collected from the Common Crawl. This model outperforms the multilingual BERT in masked language modeling by a huge margin, and achieves state-of-the-art performance in named entity recognition and NP chunking. The models are freely downloadable.
Terjedelem/Fizikai jellemzők:	3-14
ISBN:	978-963-306-781-9

Introducing huBERT

Hasonló tételek