Hierarchy-based Partition Models: Using Classification Hierarchies to

OPEN ARCHIVE

Union Jack
Dannebrog

Hierarchy-based Partition Models: Using Classification Hierarchies to

Show simple item record

dc.contributor.author Buch-Kromann, Matthias
dc.contributor.author Haulrich, Martin
dc.date.accessioned 2010-11-30
dc.date.accessioned 2010-12-14T10:36:08Z
dc.date.available 2010-12-14T10:36:08Z
dc.date.issued 2010-12-14
dc.identifier.isbn x656703287
dc.identifier.uri http://hdl.handle.net/10398/8221
dc.description.abstract We propose a novel machine learning technique that can be used to estimate probability distributions for categorical random variables that are equipped with a natural set of classification hierarchies, such as words equipped with word class hierarchies, wordnet hierarchies, and suffix and affix hierarchies. We evaluate the estimator on bigram language modelling with a hierarchy based on word suffixes, using English, Danish, and Finnish data from the Europarl corpus with training sets of up to 1–1.5 million words. The results show that the proposed estimator outperforms modified Kneser-Ney smoothing in terms of perplexity on unseen data. This suggests that important information is hidden in the classification hierarchies that we routinely use in computational linguistics, but that we are unable to utilize this information fully because our current statistical techniques are either based on simple counting models or designed for sample spaces with a distance metric, rather than sample spaces with a non-metric topology given by a classification hierarchy. Keywords: machine learning; categorical variables; classification hierarchies; language modelling; statistical estimation en_US
dc.format.extent 13 en_US
dc.language eng en_US
dc.title Hierarchy-based Partition Models: Using Classification Hierarchies to en_US
dc.type wp en_US
dc.accessionstatus modt10dec14 peed en_US
dc.contributor.corporation Copenhagen Business School. CBS en_US
dc.contributor.department Institut for Internationale Sprogstudier og Vidensteknologi ( en_US
dc.contributor.departmentshort ISV( en_US
dc.contributor.departmentuk Department of International Language Studies and Computational Linguistics( en_US
dc.contributor.departmentukshort ISV( en_US
dc.idnumber x65660260 en_US
dc.publisher.city Frederiksberg en_US
dc.publisher.year 2010 en_US


Creative Commons License This work is licensed under a Creative Commons License.

Files Size Format View
2010-wp-buch-kromann-haulrich.pdf 211.5Kb PDF View/Open Working paper

This item appears in the following Collection(s)

Show simple item record