News
- November 28, 2022 Winners of the AmericasNLP 2022 competition (finally) announced Winners
- October 16, 2022 The MT test inputs are now online MT Test inputs
- September 20, 2022 The Test inputs are now online!
- September 16, 2022 The input data of test set will be published on September 20.
- May 23, 2022 Pilot data for the AmericasNLP 2022 competition is online
- May 23, 2022 Information about the Second AmericasNLP Competition: Speech-to-Text Translation for Indigenous Languages of the Americas is online
Keynote speakers: Online NeurIPS 2022 Competition event
(Wed 7 Dec 5 a.m. PST — 8 a.m. PST)
Hilaria CruzUniversity of Louisville |
Sebastian RuderGoogle |
|
|
Challenges in achieving a corpus infrastructure to advance research in Computational linguistics and Natural Language Processingin Native American languagesAbstract: Natural Language Processing researchers and computational linguists frequently express disappointment and frustration over the lack of corpus in endangered languages that they can use to train and test their language models. This hindrance, caused in large part by a dwindling number of speakers and language keepers to create new data such as stories, prayers, political speeches, and everyday conversation. Coupled with this is the severe lack of capacity among speakers of endangered languages to prepare a corpus including transcribers, annotators, and translators. What can NLP researchers do to help create and facilitate the corpus in these languages? Collaborating with communities to increase capacity to develop corpora with members would be a first step. Furthermore, teaching basic programming courses in local high schools and colleges, working with legacy materials in language archives, and doing fieldwork to collect data alongside community members would greatly enhance the creation of endangered language corpora for NLP. |
Challenges and Opportunities in NLP for Under-represented LanguagesAbstract: Natural language processing (NLP) technology has seen tremendous improvements in recent years but most of these successes have been concentrated in languages with large amounts of data. In this talk, I will discuss challenges and potential solutions on the way to scaling NLP to more of the world's 7000 languages. In particular, I will highlight recent progress in NLP for African languages and present methods that are applicable to languages with limited data such as employing alternative sources of data and multi-modal information. |
AmericasNLP
AmericasNLP aims to...
- ...encourage research on NLP, computational linguistics, corpus linguistics, and speech around the globe to work on native American languages.
- ...connect researchers and professionals from underrepresented communities and native speakers of endangered languages with the machine learning and natural language processing communities.
- ...promote research on both neural and non-neural machine learning approaches suitable for low-resource languages.