Hennink, M.W. (2006) Improving Automatic Identification of Coordinated Ellipsis in Dutch. Master's Thesis / Essay, Artificial Intelligence.
|
Text
AI_Ma_2006_MWHennink.CV.pdf - Published Version Download (1MB) | Preview |
Abstract
Ellipsis, the non-expression of sentence elements whose meaning can be retrieved by the hearer, is a common phenomenon in both spoken and written language. This research focuses on three types of ellipsis, namely conjunction reduction, gapping, and right node raising (examples a, b, and c below). a) Jan koopt appels en (Jan) verkoopt peren. Jan buys apples and (Jan) sells pears. b) Jan koopt appels en Piet (koopt) peren. Jan buys apples and Piet (buys) pears. c) Jan koopt (appels) en Net verkoopt appels. Jan buys (apples) and Piet sells apples. Frequency data on ellipsis in Dutch was gathered from a 86,347-word selection of the spoken CGN corpus and a 192,219-word selection of the written Clef corpus, both automatically parsed by the Alpino parser. Initially, 250 conjoined sentences were manually analysed for each corpus. This provided initial frequency data and helped in developing search patterns. Automatic searching was successful for conjunction reduction (b), but right node raising (a) and gapping (c) were parsed incorrectly by Alpino, making the search difficult. An alternative solution involving searching for intransitive parses of typically transitive verbs was used to expand the search for right node raising, the most difficult of the three. The obtained data suggests that the frequency of the different types of ellipsis in Dutch is similar to that in English (Meyer, 2002).
Item Type: | Thesis (Master's Thesis / Essay) |
---|---|
Degree programme: | Artificial Intelligence |
Thesis type: | Master's Thesis / Essay |
Language: | English |
Date Deposited: | 15 Feb 2018 07:30 |
Last Modified: | 15 Feb 2018 07:30 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/8983 |
Actions (login required)
View Item |