Ensemble Statistical and Heuristic Models for Unsupervised Word Alignment

Loading...
Thumbnail Image
Supplementary material
Other Title
Authors
Mohaghegh, Mahsa
Sarrafzadeh, Hossein
Mohammadi, Mehdi
Author ORCID Profiles (clickable)
Degree
Grantor
Date
2014
Supervisors
Type
Conference Contribution - Paper in Published Proceedings
Ngā Upoko Tukutuku (Māori subject headings)
Keyword
statistical machine translation (SMT)
statistical word alignment
ensemble learning
heuristic word alignment
ANZSRC Field of Research Code (2020)
Citation
Mohaghegh, M., Sarrafzadeh, A., and Mohammadi, M. (2014). Ensemble Statistical and Heuristic Models for Unsupervised Word Alignment. The 13th International Conference on Machine Learning and Applications (ICMLA'14)(Ed.), Detroit, Michigan, USA
Abstract
Statistical word alignment models need large amounts of training data while they are weak in small-sized corpora. This paper proposes a new approach of an unsupervised hybrid word alignment technique using an ensemble learning method. This algorithm uses three base alignment models in several rounds to generate alignments. The ensemble algorithm uses a weighed scheme for resampling training data and a voting score to consider aggregated alignments. The underlying alignment algorithms used in this study include IBM Model 1, 2 and a heuristic method based on Dice measurement. Our experimental results show that by this approach, the alignment error rate could be improved by at least 15% for the base alignment models.
Publisher
IEEE (Institute of Electrical and Electronics Engineers)
Link to ePress publication
DOI
10.1109/ICMLA.2014.15
Copyright holder
IEEE (Institute of Electrical and Electronics Engineers)
Copyright notice
All rights reserved
Copyright license
This item appears in: