We present a Wixarika - Spanish machine translator based on statistic (Statistical Machine Translation, SMT) with complementary grammatic knowledge. Wixarika language (also known as Huichol) is spoken in west Mexico by wixaritari people in the states of Jalisco, Nayarit, Zacatecas and Durango. Although they have thriving culture, socioeconomical factors has prevented the creation of large appropriate written resources for SMT.

The parallel corpus used to train SMT is defined as Scarce (800 pared phrases). We use this corpus to train an SMT and we produce automatic translations from Wixárika to Spanish and form Spanish to Wixárika. A corner stone of our proposal is the automatic processing of Wixárika morphology which allows to reach state of the art results for this small corpus exploiting the polysynthetic features of Wixarika language.

This project is part of the master thesis of Jesús Manuel Mager Hois. This theses was assessored by Carlos Barrón Romero, PhD, (UAM-A) and Ivan Vladimir Meza Ruíz, PhD, (UNAM-IIMAS).

Current State

Although the translator is functional, much more parallel corpus is needed to get a better performance. We invite all wixaritari to extend the current corpus.


State of Art tools used:


You are welcome to use the code under the terms for research or commercial purposes, however please acknowledge its use with a citation: Mager Jesus, Barron Carlos and Meza Ivan. "Traductor estadístico wixarika - español usando descomposición morfológica", COMTEL, number 6, September 2016. Here is a BiBTeX entry:
    author = "Mager Hois, Jesús Manuel and Barron Romero, Carlos and Meza Ruíz, Ivan Vladimir",
    journal = "COMTEL",
    number = "6",
    title = "Traductor estadístico wixarika - español usando descomposición morfológica",
    year = "2016",
    month = sep

About the author

We invite you to visit the Homepage of Jesús Manuel Mager and his blog eeNube.com.


American Indigenous Languages

If you want to know more about NLP applied to Indigenous languages, please visit the following link: Naki: List of publications and resources for Indigenous American Languages