About Us | Research & Development | Contact Us | Home
An Official Website on Manipuri Language,Culture and Language Technologies
 

Research and Developments

                    1. Electronic Corpus of Document Images for Manipuri Language

                2. Basic Parallel Corpus (Manipuri-Hindi)

                3. Electronic Dictionary

Electronic Corpus of Document Images for Manipuri Language:

                We developed an electronic corpus of document images for Manipuri language.Several books published by various publishers have been selected as sources for document image corpus. The books have been scanned in two resolutions - 300 dpi and 600 dpi in Grayscale mode.We maintained a meta information file for each book scanned.So far around 5000 pages have been scanned.This document image corpus will provide a good benchmark for testing OCR and document analysis system.

Basic Parallel (Manipuri-Hindi) Corpus :

                We developed a basic parallel corpus (Manipuri-Hindi) which will be of great importance to any kind of research involving the two languages such as multilingual lexicography, contrastive linguistics, machine translation etc. The Manipuri text samples are collected from daily newspaper, journals, leaflets and books in Generic, Tourism and Health domains. The source language is Manipuri and target language is Hindi. The Manipuri text is translated into Hindi by translators who are well versed in both the languages. The skill levels of the translator are novice, average and skilled. The translated texts are then reviewed by faculties from the Department of Hindi, Manipur University. Each Manipuri sentence is translated into one Hindi sentence so that the corpus is sentence aligned. Unicode fonts are used for typing both Manipuri and Hindi.We developed a Corpus manager to manage and maintain the corpora. The statistics of the parallel corpus is given below:   

Domain
No. of sentences
Generic 10,000
Tourism 8,500
Health 7,000
Total 25,500

 Corpus Manager:

               We developed a Corpus manager to manage and maintain the Manipuri-Hindi parallel Corpus. It is developed by using Java. It provides statistical and analytical descriptions of the Manipuri-Hindi parallel corpus. Among many things, the Corpus manager can perform the following tasks:
      i) Corpus files can be viewed categorically.
     ii) Each corpus file can be viewed both in its original form as well as in their respective languages.
     iii) Statistical information can be obtained for each file or for the entire parallel corpus.
     iv) It has a Concordancer.

 Electronic Dictionary:       

                We have developed Electronic Dictionaries, Manipuri to English and English to Manipuri. Dictionary contains the following information:-
      i) Name of the word
      ii) Suffix (if suffix is present in the word)
      iii) Phonetically similar sound words
      iv) Domain
      v) Transliteration in English
      vi) Parts of Speech
      vii) English / Manipuri equivalent meaning
      viii) Example in English
      ix) Example in Manipuri
This dictionary will be essential for R&D activities in Natural Language Processing and Information Retrieval.
The statistics of Electronic Dictionary is given below:

  No. of Headwords
Manipuri-English 10,000
English-Manipuri 35,000


 
Department of Computer Science, Manipur University