emLam - a Hungarian Language Modeling baseline

This paper aims to make up for the lack of documented baselines for Hungarian language modeling. Various approaches are evaluated on three publicly available Hungarian corpora. Perplexity values comparable to models of similar-sized English corpora are reported. A new, freely downloadable Hungarian...

Full description

Saved in:
Bibliographic Details
Main Author: Nemeskey Dávid Márk
Corporate Author: Magyar Számítógépes Nyelvészeti Konferencia (13.) (2017) (Szeged)
Format: Article
Published: 2017
Series:Magyar Számítógépes Nyelvészeti Konferencia 13
Kulcsszavak:Nyelvészet - számítógép alkalmazása
Online Access:http://acta.bibl.u-szeged.hu/59000
Description
Summary:This paper aims to make up for the lack of documented baselines for Hungarian language modeling. Various approaches are evaluated on three publicly available Hungarian corpora. Perplexity values comparable to models of similar-sized English corpora are reported. A new, freely downloadable Hungarian benchmark corpus is introduced.
Physical Description:91-102
ISBN:978-963-306-518-1