BEES

Bilingual Expert for English to Sinhala

Sinhala is one of Indo Ariean family language and it is the spoken language of the majority of Sri Lankans. Most Sri Lankan people are used Sinhala as a spoken and written language. Their Ability of the Sinhala language usage has some reasonable level. However understanding and writing ability of the English language is not reasonable. This can be known as language barrier that effect on both acquisition and dissemination of knowledge. Machine Translation (MT) is a process that translates one natural language into another. MT is a complex and valuable task because it can be used as a solution for the language barrier. Therefore, we have been working on the development of English to Sinhala machine translation system namely BEES. BEES is acronym for Bilingual Expert for English to Sinhala. It has been powered by theory of varanagema (conjugation) in Sinhala language.  Our English to Sinhala machine translation system has been primarily implemented with the use of SWI-Prolog, Java and prolog server pages PSP.

  • Online Test bed: English to Sinhala translatio tool for evaluation. [click here to try]
  • BEES 2.0 : Online Selected text translator for Windows [download]

 

English Morphological analyzer

English Morphological analyzer reads a given English sentence word by word and identifies morphological information for each word. There are many Morphological analyzers available for English language. Therefore, in this development, we have customized an existing English morphological analyzer. At this stage of the project, we assume that the sentences input to the MT system, has no spelling and grammatical mistakes. As such we can use a simple morphological analyzer for the English language at this stage. The morphological analyzer in our MT system has linked up with an English dictionary to get grammatical information of the words in the input sentence. SWI-Prolog has been used to implement this morphological analyzer. [more]
[Test sample]

English Parser

English parser receives source English sentences and tokens from English Morphological analyzer. This parser works as a syntax analyzer. Since there are many English parsers, we have customized an existing parser for our purpose. The current version of the parser used in our MT system mainly concerns only about the simple sentences. The parser has also been implemented using SWI-PROLOG [more]
[**Test Sample**]

Translator

Translator is used to translate English base word into Sinhala base word with the help of bilingual dictionary. This translator is a simple one and it does not automatically handle semantic of sentences. We argue that this stage can be supported by human intervention to generate the most appropriate translation for some words in a sentence. As such handling semantic, pragmatic and Multiword expressions must be addressed with the support from humans, for which we introduce and intermediate-editor. [more]
[**Test sample**]

Sinhala Morphological analyzer

The Sinhala Morphological analyzer [7] works as a Morphological generator. This Morphological analyzer reads the words from Translator (as improved by a human when necessary) word by word. For each word, the morphological analyzer generates the appropriate word with full grammatical information such as nama (nouns), kriya (verb) and nipatha (preposition) in Sinhala language. This morphological analyzer works with the help of three dictionaries, namely, Sinhala Rule dictionary, Sinhala Word dictionary and Sinhala Concepts dictionary. All these databases and the morphological analyzer are implemented using Prolog [more]
[Test sample]

Sinhala parser

The Sinhala parser [6] works as a Sentence composer. It receives tokenized words from the morphological analyzer and composes grammatically correct Sinhala sentence. In generally, a Sinhala sentence contains 5 components, namely Ukktha vishashana (adjunct of subject), Ukkthya (Subject), karma vishashanaya (attributive adjunct of object), karmaya (object) and akkyanaya [16]. These five components of a Sinhala sentence are the building blocks for design and implementation of a Sinhala parser. The parser is also one of the key modules of this Human-Assisted English to Sinhala Machine Translation System and it is also implemented using SWI-PROLOG.[more]
[Test sample]

Dictionaries

Translation system uses six dictionaries such as English word dictionary [18], English concepts dictionary, English-Sinhala bilingual dictionary, Sinhala word dictionary, Sinhala rule dictionary and Sinhala concept dictionary. English word dictionary contains English words and the lexical information. English concept dictionary contains synonyms, anti-synonyms and general knowledge about English words. English to Sinhala bilingual dictionary is used to identify appropriate Sinhala base word for a given English word and it contains relation between English and Sinhala words. Sinhala word dictionary stores Sinhala regular base words and lexical information. Same as English dictionary, Sinhala concept dictionary stores symantec information. The Sinhala rule dictionary stores rules required to generate various word forms. These are the inflection rules for formation of various forms of verbs and nouns from their base words. The rule dictionary also stores vowels, consonants, upasarga (prefix) and vibakthi (postfix).[more]

Transliteration module

MT system needs to solve Out-of-vocabulary problems and handle technical terms. Machine transliteration can be used as a resalable solution for that. Transliteration is the practice of transcribing a word or text written in one writing system into another writing system [13]. In other words, Machine transliteration is a method for automatic conversion of words in one language into phonetically equivalent ones in another language. At present we have developed two types of transliteration models [9]. One of these models transliterates Original English text into Sinhala Transliteration and the other transliterate Sinhala words that are written in English which transliterate into Sinhala. Finite State transducers are used to develop these two modules [more]
[Test sample]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: