4th Workshop on NLP for Indigenous Languages of the Americas
AmericasNLP 2024 will bo on June 21, 2024, co-located with NAACL 2024 in Mexico City, Mexico! You can find the full program here.KEYNOTE SPEAKERS
Graham Neubig
Graham Neubig is an associate professor at the Language Technologies Institute of Carnegie Mellon University. His research focuses on natural language processing, with a particular interest in fundamentals, applications, and understanding of large language models for tasks such as question answering, code generation, and multilingual applications. His final goal is that every person in the world should be able to communicate with each-other, and with computers in their own language. He also contributes to making NLP research more accessible through open publishing of research papers, advanced NLP course materials and video lectures, and open-source software, all of which are available on his web site.
Fidencio Briceño Chel
Fidencio Briceño Chel has a degree in Linguistics and Literature from the Autonomous University of Yucatán. He has doctoral studies in Anthropological Linguistics from UNAM. He held the Research Directorate of the National Institute of Indigenous Languages in 2006, and has been since 1991 a professor-researcher at INAH. Since 2000 he has been coordinator of the Linguistics Section of the INAH Yucatán. He is co-author of dictionaries of the Mayan language; Author of numerous articles on the Mayan language and culture, as well as books for the dissemination and teaching of the Mayan language. He received the “Wigberto Jiménez Moreno National Award” for the best Master's thesis in the field of Linguistics (1998), and the National Award for journalism and information for his editorial and journalistic collaboration in the production of the Cultural Radio Magazine: “He speaks of the mayab.” IMER (2001). He received the 2016 Yuri Knorosov Medal in the International Festival of Mayan Culture and the 2019 Pánfilo Novelo medal for the defense and dissemination of the Mayan language. Since 2020, he has coordinated the Curricular Academic Group to create the University of Indigenous Languages of Mexico. Today he directs the State Center for Humanistic Training, Research and Dissemination of Yucatán.
List of Accepted Papers
The following papers have been accepted at the workshop:Long Papers
- A Concise Survey of OCR for Low-Resource Languages. Milind Agarwal and Antonios Anastasopoulos
- LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages. Jared R. Coleman, Bhaskar Krishnamachari, Ruben Rosales and Khalil Iskarous
- Mapping 'when'-clauses in Latin American and Caribbean languages: an experiment in subtoken-based typology. Nilo Pedrazzini
- Morphological Tagging in Bribri Using Universal Dependency Features. Jessica Karson and Rolando Coto-Solano
- Translation systems for low-resource Colombian Indigenous languages, a first step towards cultural preservation. Juan Camilo Prieto, Cristian Adrian Martinez, Melissa Robles, Alberto Moreno, Sara Palacios and Rubén Manrique
- Word-level prediction in Plains Cree: First steps. Olga Kriukova and Antti Arppe
Short Papers
- A New Benchmark for Kalaallisut-Danish Neural Machine Translation. Ross Deans Kristensen-McLachlan and Johanne Sofie Krog Nedergård
- Advancing NMT for Indigenous Languages: A Case Study on Yucatec Mayan and Chol. Julio Cesar Rangel and Norio Kobayashi
- Analyzing Finetuned Vision Models for Mixtec Codex Interpretation. Alexander R. Webber, Zachary Sayers, Amy Wu, Elizabeth Thorner, Justin Witter, Gabriel Ayoubi and Christan Grant
- Awajun-OP: Multi-domain dataset for Spanish–Awajun Machine Translation. Oscar Moreno, Yanua Liseth Atamain and Arturo Oncevay
- Enenlhet as a case-study to investigate ASR model generalizability for language documentation. Éric Le Ferrand, Raina Heaton and Emily Prud'hommeaux
- From Field Linguistics to NLP: Creating a curated dataset in Amuzgo language. Antonio Reyes and Hamlet Antonio García
- Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages. David Ifeoluwa Adelani, A. Seza Doğruöz, André Coneglian and Atul Kr. Ojha
- NLP for Language Documentation: Two Reasons for the Gap between Theory and Practice. Luke Gessler and Katharina von der Wense
- Unlocking Knowledge with OCR-Driven Document Digitization for Peruvian Indigenous Languages. Shadya Sanchez Carrera, Roberto Zariquiey and Arturo Oncevay
- Wav2pos: Exploring syntactic analysis from audio for Highland Puebla Nahuatl. Robert Pugh, Varun Sreedhar and Francis Tyers
System Descriptions for Shared Task 1: Machine Translation Systems for Indigenous Languages
- BSC Submission to the AmericasNLP 2024 Shared Task. Javier Garcia Gilabert, Aleix Sant, Carlos Escolano, Francesca De Luca Fornaciari, Audrey Mash and Maite Melero
- Experiments in Mamba Sequence Modeling and NLLB-200 Fine-Tuning for Low Resource Multilingual Machine Translation. Dan DeGenaro and Tom Lupicki
- Exploring Very Low-Resource Translation with LLMs: The University of Edinburgh's Submission to AmericasNLP 2024 Translation Task. Vivek Iyer, Bhavitvya Malik, Wenhao Zhu, Pavel Stepachev, Pinzhen Chen, Barry Haddow and Alexandra Birch
- System Description of the NordicsAlps Submission to the AmericasNLP 2024 Machine Translation Shared Task. Joseph Attieh, Zachary William Hopton, Yves Scherrer and Tanja Samardžić
System Descriptions for Shared Task 2: Creation of Educational Materials for Indigenous Languages
- A Comparison of Fine-Tuning and In-Context Learning for Clause-Level Morphosyntactic Alternation. Jim Su, Justin Minh Ho, George Aaron Broadwell, Sarah Moeller and Bonnie J. Dorr
- Applying Linguistic Expertise to LLMs for Educational Material Development in Indigenous Languages. Justin Vasselli, Arturo Martínez Peguero, Junehwan Sung and Taro Watanabe
- JGU Mainz's Submission to the AmericasNLP 2024 Shared Task on the Creation of Educational Materials for Indigenous Languages. Minh Duc Bui and Katharina von der Wense
- On the Robustness of Neural Models for Full Sentence Transformation. Michael Ginn, Ali Marashian, Bhargav Shandilya, Claire Benet Post, Enora Rice, Juan Vásquez, Marie C. McGregor, Matthew J. Buchholz, Mans Hulden and Alexis Palmer
- The role of morphosyntactic similarity in generating related sentences. Michael Hammond
- The unreasonable effectiveness of large language models for low-resource clause-level morphology: In-context generalization or prior exposure?. Coleman Haley
Previously Published Work that will be presented also at the Workshop
- A Universal Dependencies Treebank for Highland Puebla Nahuatl. Robert Pugh and Francis Tyers
- Contextual Label Projection for Cross-Lingual Structured Prediction. Tanmay Parekh, I-Hung Hsu, Kuan-Hao Huang, Kai-Wei Chang and Nanyun Peng
- Human Evaluation of the Usefulness of Fine-Tuned English Translators for the Guarani Mbya and Nheengatu Indigenous Languages. Claudio Santos Pinhanez, Paulo Cavalin and Julio Nogima
- "It's how you do things that matters": Attending to Process to Better Serve Indigenous Communities with Language Technologies. Ned Cooper, Courtney Heldreth and Ben Hutchinson
- Killkan: The Automatic Speech Recognition Dataset for Kichwa with Morphosyntactic Information. Chihiro Taguchi, Jefferson Saransig, Dayana Velásquez and David Chiang
- Kreyòl-MT: Building MT for Latin American, Caribbean and Colonial African Creole Languages. Nathaniel Romney Robinson, Raj Dabre, Ammon Shurtz, Rasul Dent, Onenamiyi Onesi, Claire Bizon Monroc, Loïc Grobol, Hasan Muhammad, Ashi Garg, Naome A. Etori, Vijay Murari Tiyyala, Olanrewaju Samuel, Matthew Dean Stutzman, Bismarck Bamfo Odoom, Sanjeev Khudanpur, Stephen D. Richardson and Kenton Murray
Shared Task
This year the workshop presents a shared task with two tracks:- Shared Task 1: A machine translation shared task on truly low-resource languages.
- Shared Task 2: A shared task on morphological adaptation to generate educational examples.
Important Dates
- Start of the anonymity period: February 17, 2024
- Submission deadline: March 22, 2024
- ARR commitment deadline (without modifications): March 22, 2024
- Notification of acceptance: April 19, 2024
- Camera ready papers due: April 26, 2024
- Workshop: June 21, 2024
Organizing Committee
- Manuel Mager, AWS AI Labs, pywirrarika@gmail.com
- Abteen Ebrahimi, University of Colorado Boulder, abteen.ebrahimi@colorado.edu
- Shruti Rijhwani, Google DeepMind, rijhwani@google.com
- Arturo Oncevay, JP Morgan AI Research, arturo.oncevay@jpmorgan.com
- Luis Chiruzzo, Universidad de la República, Uruguay, luischir@fing.edu.uy
- Robert Pugh, Indiana University, Bloomington, pughrob@iu.edu
- Katharina Kann, University of Colorado Boulder and Johannes Gutenberg University Mainz, katharina.kann@colorado.edu