WorldLingo Multilingual Archive

As you may already know, I'm the Director of IT of WorldLingo, one of the leaders of online translation and localization. In addition to working with great people, I also have the opportunity to work with cutting-edge technology on a daily basis.

One of our newest projects is the Multilingual Archive, a constantly growing repository of translations of some of the world's best freely available information sources. Initially, we have translated approximately 2.8 million English Wikipedia articles into 8 languages: Spanish, French, Portuguese, German, Dutch, Russian, Korean, and Japanese. Translation into Italian, Swedish, Arabic, Simplified Chinese, Traditional Chinese, and Greek will be completed in the near future, and additional information sources will be added on an ongoing basis.

To create the Multilingual Archive, we leveraged WorldLingo's existing translation technology infrastructure and implemented Hadoop/HBase for storing the articles. Check out Lars George's blog for more information about our use of Hadoop/HBase.

Leave a Reply

Your email address will not be published. Required fields are marked *