A corpus-based word frequency list of Turkish Evidence from the subcorpora of Turkish National Corpus project /

Word frequency studies have a central role in various disciplines, such as linguistics, cognitive psychology, natural language processing, computational linguistics. Developments in the computer technologies and information processing help researchers make comprehensive word lists on the basis of di...

Teljes leírás

Elmentve itt :

Bibliográfiai részletek
Szerzők:	Aksan Yeşim Yaldır Yilmaz
Testületi szerző:	International Conference on Turkish Linguistics (15.) (2010) (Szeged)
Dokumentumtípus:	Könyv része
Megjelent:	2012
Sorozat:	Studia uralo-altaica 49 The Szeged Conference : proceedings of the 15th International Conference on Turkish Linguistics held on August 20-22, 2010 in Szeged 49
Kulcsszavak:	Török nyelvek
Tárgyszavak:	Bölcsészettudományok Nyelvek és irodalom
Online Access:	http://acta.bibl.u-szeged.hu/16689

Leíró adatok
Tartalmi kivonat:	Word frequency studies have a central role in various disciplines, such as linguistics, cognitive psychology, natural language processing, computational linguistics. Developments in the computer technologies and information processing help researchers make comprehensive word lists on the basis of digitally constructed language corpora. Since Kucera and Francis's first corpus-based word frequency lists derived from the Brown Corpus (1967), a variety of research have been conducted on general or specialized corpora to obtain rank frequency order and distribution of words for different Indo-European languages (Johansson & Hofland 1989; Leech et al. 2001; Baroni et al. 2004; Ha et al. 2006; Davies & Gardner 2010). In Turkish, Goz's dictionary (2003), which is based on a 1 million-word general corpus, is the only work on word frequency. In general, lexical properties of Turkish and, in particular, word frequency lists of text collections representing different registers of Turkish need to be described via corpus-based word frequency lists. Keeping this necessity in mind, this study has two aims: (1) to produce word frequency lists of Turkish on the basis of two subcorpora, namely the Corpus of Contemporary Turkish Fiction and the Corpus of Contemporary Turkish News Texts. In this respect, frequency lists of both root types and word classes in Turkish are prepared; (2) to compare these two corpora by using frequency profiling information. This paper is organized as follows. First we explain basic concepts and review literature of word frequency studies. Then, we describe the construction of two subcorpora used to derive wordlists and explain the steps followed in tokenization and root type mapping scheme on which the token and root counts are based. Finally, we compare rank frequency and word class lists of Turkish Fiction and Turkish News Texts Corpora.
Terjedelem/Fizikai jellemzők:	47-57
ISSN:	0133-4239

A corpus-based word frequency list of Turkish Evidence from the subcorpora of Turkish National Corpus project /

Hasonló tételek