JERUSALEM, Nov. 27 (Xinhua) -- Israeli researchers have developed new technology to summarize multilingual texts automatically, Ben-Gurion University in southern Israel reported Wednesday.
The method, called MUSE (multilingual sentence extractor), has been tested in nine languages, including Chinese, English, Hebrew, Arabic, Persian, Russian, German, French and Spanish.
The summaries made by the new technology have shown much resemblance to those written by humans.
With the huge increase in online texts, there is a need for automated methods to summarize text files, such as articles or interviews, for further processing.
At the same time, the time available to read the vast amounts of texts is shortened, so automated methods are needed to summarize them.
Most of the automated methods available today are language-dependent and the underlying algorithms need to undergo early training on large amounts of text.
The new method provides abstracts of texts in different languages, based on an algorithm that scales the sentences in a document, using statistical characteristics of the sentences.
This rating can be performed in sentences in any language, then extract high-ranking sentences into a synopsis.
Trials show that after initial training of the algorithms on an annotated corpus of summarized documents, the software does not need to be retrained on a summarization corpus in each new language, and the same sentence-ranking model can be used across several languages.
"This tool will be a valuable addition to the ability to benefit from the vast amounts of text available online," the researchers concluded.