Ensemble Statistical and Heuristic Models for Unsupervised Word Alignment
Mohaghegh, Mahsa; Sarrafzadeh, Hossein; Mohammadi, Mehdi
Date
2014Citation:
Mohaghegh, M., Sarrafzadeh, A., and Mohammadi, M. (2014). Ensemble Statistical and Heuristic Models for Unsupervised Word Alignment. The 13th International Conference on Machine Learning and Applications (ICMLA'14)(Ed.), Detroit, Michigan, USAPermanent link to Research Bank record:
https://hdl.handle.net/10652/2969Abstract
Statistical word alignment models need large amounts of training data while they are weak in small-sized corpora. This paper proposes a new approach of an unsupervised hybrid word alignment technique using an ensemble learning method. This algorithm uses three base alignment models in several rounds to generate alignments. The ensemble algorithm uses a weighed scheme for resampling training data and a voting score to consider aggregated alignments. The underlying alignment algorithms used in this study include IBM Model 1, 2 and a heuristic method based on Dice measurement. Our experimental results show that by this approach, the alignment error rate could be improved by at least 15% for the base alignment models.
Keywords:
statistical machine translation (SMT), statistical word alignment, ensemble learning, heuristic word alignmentANZSRC Field of Research:
200323 Translation and Interpretation StudiesCopyright Holder:
IEEE (Institute of Electrical and Electronics Engineers)Copyright Notice:
All rights reservedAvailable Online at:
http://www.icmla-conference.org/icmla14/http://www.researchgate.net/publication/272353675_Ensemble_Statistical_and_Heuristic_Models_for_Unsupervised_Word_Alignment