Methods for augmenting semantic models with structural information for text classification

Advances in Information Retrieval, 2008

Jonathan M. Fishbein, Chris Eliasmith

Abstract

Current representation schemes for automatic text classification treat documents as syntactically unstructured collections of words or `concepts'. Past attempts to encode syntactic structure have treated part-of-speech information as another word-like feature, but have been shown to be less effective than non-structural approaches. Here, we investigate three methods to augment semantic modelling with syntactic structure, which encode the structure across all features of the document vector while preserving text semantics. We present classification results for these methods versus the Bag-of-Concepts semantic modelling representation to determine which method best improves classification scores.

Full text links

PDF

External link

Conference Proceedings

Publisher
Springer
Doi
10.1007/978-3-540-78646-7_58
Journal
Advances in Information Retrieval
Pages
575-579

Cite

Plain text

BibTeX