An Overview of the Challenges and Progress in PeEn-SMT: First Large Scale Persian-English SMT System

Loading...
Thumbnail Image

Supplementary material

Other Title

Authors

Mohaghegh, Mahsa
Sarrafzadeh, Hossein

Author ORCID Profiles (clickable)

Degree

Grantor

Date

2011

Supervisors

Type

Conference Contribution - Paper in Published Proceedings

Ngā Upoko Tukutuku (Māori subject headings)

Keyword

Persian-English translation
machine translating
statistical machine translation (SMT)

Citation

Mohaghegh, M., and Sarrafzadeh, A. (2011). An overview of the challenges and progress in PeEn-SMT: First large scale Persian-English SMT system. Seventh International Conference on Innovations in Information Technology, Abu Dhabi, UAE.

Abstract

This paper documents recent work carried out for PeEn-SMT, our Statistical Machine Translation system for translation between the English-Persian language pair. We give details of our previous SMT system, and present our current development of significantly larger corpora. We explain how recent tests using much larger corpora helped to evaluate problems in parallel corpus alignment, corpus content, and how matching the domains of PeEn-SMT’s components affect translation outcome. We then focus on combining corpora and approaches to improve test data, showing details of experimental setup, together with a number of experiment results and comparisons between them. We show how one combination of corpora gave us a metric score outperforming Google Translate for the English-to-Persian translation. Finally, we outline areas of our intended future work, and how we plan to improve the performance of our system to achieve higher metric scores, and ultimately to provide accurate, reliable language translation.

Publisher

Link to ePress publication

DOI

Copyright holder

Author

Copyright notice

All rights reserved

Copyright license

Available online at

This item appears in: