Ensemble Statistical and Heuristic Models for Unsupervised Word Alignment

Loading...
Thumbnail Image

Supplementary material

Other Title

Authors

Mohaghegh, Mahsa
Sarrafzadeh, Hossein
Mohammadi, Mehdi

Author ORCID Profiles (clickable)

Degree

Grantor

Date

2014

Supervisors

Type

Conference Contribution - Paper in Published Proceedings

Ngā Upoko Tukutuku (Māori subject headings)

Keyword

statistical machine translation (SMT)
statistical word alignment
ensemble learning
heuristic word alignment

ANZSRC Field of Research Code (2020)

Citation

Mohaghegh, M., Sarrafzadeh, A., and Mohammadi, M. (2014). Ensemble Statistical and Heuristic Models for Unsupervised Word Alignment. The 13th International Conference on Machine Learning and Applications (ICMLA'14)(Ed.), Detroit, Michigan, USA

Abstract

Statistical word alignment models need large amounts of training data while they are weak in small-sized corpora. This paper proposes a new approach of an unsupervised hybrid word alignment technique using an ensemble learning method. This algorithm uses three base alignment models in several rounds to generate alignments. The ensemble algorithm uses a weighed scheme for resampling training data and a voting score to consider aggregated alignments. The underlying alignment algorithms used in this study include IBM Model 1, 2 and a heuristic method based on Dice measurement. Our experimental results show that by this approach, the alignment error rate could be improved by at least 15% for the base alignment models.

Publisher

IEEE (Institute of Electrical and Electronics Engineers)

Link to ePress publication

DOI

10.1109/ICMLA.2014.15

Copyright holder

IEEE (Institute of Electrical and Electronics Engineers)

Copyright notice

All rights reserved

Copyright license

This item appears in: