A second benefit of our approach is that the unconditional language model can be estimated from nonparallel data, which exists in vast quantities.Īlthough the noisy channel model is ideal for exploiting the data resources that naturally exist in the world (large corpora of parallel but independent sentences, and large corpora of monolingual documents), we are faced with a much harder decoding problem (§ 3). This is particularly useful because parallel sentences are much more readily available than parallel documents. In the posterior, we thus have an implicit estimate of a document-level translation system, even though we made no use of parallel documents when estimating the prior or likelihood models. This statistical fact-which is the same trick that gives naïve Bayes classifiers their expressiveness and ease of estimation-permits us to assume independence between sentence translations in the reverse translation model, and therefore to use parallel sentences (rather than parallel documents) to train it. Experiments show that our model benefits from using cross-sentence context in the language model, and it outperforms existing document translation approaches.Īs we will discuss subsequently, although the problems of estimating p( y∣ x) and p( x∣ y) are formally similar, independence assumptions made in p( x∣ y) are less statistically costly than they might otherwise be since, at test time, we will be conditioning on x and reasoning about a posterior distribution over y, which will be jointly dependent on all (conditionally independent) parts of x. The model’s independence assumption not only enables efficient use of available data, but it additionally admits a practical left-to-right beam-search algorithm for carrying out inference. Crucially, at test time, when a source document is observed, the document language model prior induces dependencies between the translations of the source sentences in the posterior. Our proposed model uses a powerful autoregressive language model as the prior on target language documents, but it assumes that each sentence is translated independently from the target to the source language. In our formulation, the posterior probability of a candidate translation is the product of the unconditional (prior) probability of the candidate output document and the “reverse translation probability” of translating the candidate output back into the source language. We show that Bayes’ rule provides an effective mechanism for creating document translation models that can be learned from only parallel sentences and monolingual documents a compelling benefit because parallel documents are not always available.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |