2004
Volume 38, Issue 3
  • ISSN: 1573-9775
  • E-ISSN: 2352-1236

Abstract

Abstract

T-Scan is a tool for the automatic analysis of Dutch text. This paper presents the first large-scale corpus analysis with T-Scan, focusing on lexical complexity. A collection of nearly 1000 text specimens was assembled, containing ten genres: travel blogs, celebrity news features, novels, textbooks for vocational secondary schools, textbooks for general secondary schools, news reports, opinion pieces, political programs, medical advice texts and research articles. The lexical complexity features in the analysis include morphology, word frequency, various word concreteness indices, personal pronouns, names and verb tense. Systematic genre differences are found, such that a genre detection model comprising 18 T-Scan features correctly identifies 83 percent of the corpus texts. Most lexical features differentiating genres intuitively relate to text topic complexity. A closer analysis is offered of the contrast between the two textbook samples in the corpus, which differ only in the educational levels they cater for. Again, topic variation seems a more important factor than stylistic variation. We demonstrate a new method to examine stylistic variation, which consists of within-genre comparisons using the genre prediction; more specifically, ‘deviant’ texts are compared to ‘typical’ members of their genre.

Loading

Article metrics loading...

/content/journals/10.5117/TVT2016.3.PAND
2016-12-01
2021-11-28
Loading full text...

Full text loading...

/deliver/fulltext/15739775/38/3/03_TVT2016.3.PAND.html?itemId=/content/journals/10.5117/TVT2016.3.PAND&mimeType=html&fmt=ahah
http://instance.metastore.ingenta.com/content/journals/10.5117/TVT2016.3.PAND
Loading
/content/journals/10.5117/TVT2016.3.PAND
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error