TY - JOUR T1 - Rank Dynamics of Word Usage at Multiple Scales JF - Frontiers in Physics Y1 - 2018 A1 - Morales, José A. A1 - Colman, Ewan A1 - Sánchez, Sergio A1 - Sánchez-Puig, Fernanda A1 - Pineda, Carlos A1 - Iñiguez, Gerardo A1 - Cocho, Germinal A1 - Flores, Jorge A1 - Gershenson, Carlos AB - The recent dramatic increase in online data availability has allowed researchers to explore human culture with unprecedented detail, such as the growth and diversification of language. In particular, it provides statistical tools to explore whether word use is similar across languages, and if so, whether these generic features appear at different scales of language structure. Here we use the Google Books $N$-grams dataset to analyze the temporal evolution of word usage in several languages. We apply measures proposed recently to study rank dynamics, such as the diversity of $N$-grams in a given rank, the probability that an $N$-gram changes rank between successive time intervals, the rank entropy, and the rank complexity. Using different methods, results show that there are generic properties for different languages at different scales, such as a core of words necessary to minimally understand a language. We also propose a null model to explore the relevance of linguistic structure across multiple scales, concluding that $N$-gram statistics cannot be reduced to word statistics. We expect our results to be useful in improving text prediction algorithms, as well as in shedding light on the large-scale features of language use, beyond linguistic and cultural differences across human populations. VL - 6 UR - https://www.frontiersin.org/article/10.3389/fphy.2018.00045 ER - TY - JOUR T1 - Trajectory Stability in the Traveling Salesman Problem JF - Complexity Y1 - 2018 A1 - Sánchez, Sergio A1 - Cocho, Germinal A1 - Flores, Jorge A1 - Gershenson, Carlos A1 - Iñiguez, Gerardo A1 - Pineda, Carlos AB - Two generalizations of the traveling salesman problem in which sites change their position in time are presented. The way the rank of different trajectory lengths changes in time is studied using the rank diversity. We analyze the statistical properties of rank distributions and rank dynamics and give evidence that the shortest and longest trajectories are more predictable and robust to change, that is, more stable. VL - 2018 UR - https://doi.org/10.1155/2018/2826082 ER - TY - JOUR T1 - Generic temporal features of performance rankings in sports and games JF - EPJ Data Science Y1 - 2016 A1 - Morales, José A. A1 - Sánchez, Sergio A1 - Flores, Jorge A1 - Pineda, Carlos A1 - Gershenson, Carlos A1 - Cocho, Germinal A1 - Zizumbo, Jerónimo A1 - Rodríguez, Rosalío F. A1 - Iñiguez, Gerardo AB - Many complex phenomena, from trait selection in biological systems to hierarchy formation in social and economic entities, show signs of competition and heterogeneous performance in the temporal evolution of their components, which may eventually lead to stratified structures such as the worldwide wealth distribution. However, it is still unclear whether the road to hierarchical complexity is determined by the particularities of each phenomena, or if there are generic mechanisms of stratification common to many systems. Human sports and games, with their (varied but simple) rules of competition and measures of performance, serve as an ideal test-bed to look for universal features of hierarchy formation. With this goal in mind, we analyse here the behaviour of performance rankings over time of players and teams for several sports and games, and find statistical regularities in the dynamics of ranks. Specifically the rank diversity, a measure of the number of elements occupying a given rank over a length of time, has the same functional form in sports and games as in languages, another system where competition is determined by the use or disuse of grammatical structures. We use a Gaussian random walk model to reproduce the rank diversity of the studied sports and games. We also discuss the relation between rank diversity and the cumulative rank distribution. Our results support the notion that hierarchical phenomena may be driven by the same underlying mechanisms of rank formation, regardless of the nature of their components. Moreover, such regularities can in principle be used to predict lifetimes of rank occupancy, thus increasing our ability to forecast stratification in the presence of competition. VL - 5 UR - http://dx.doi.org/10.1140/epjds/s13688-016-0096-y ER - TY - JOUR T1 - Rank Diversity of Languages: Generic Behavior in Computational Linguistics JF - PLoS ONE Y1 - 2015 A1 - Cocho, Germinal A1 - Flores, Jorge A1 - Gershenson, Carlos A1 - Pineda, Carlos A1 - Sánchez, Sergio AB -

Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution rank diversity. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation associated with the lognormal distribution, we define three different word regimes of languages: “heads” consist of words which almost do not change their rank in time, “bodies” are words of general use, while “tails” are comprised by context-specific words and vary their rank considerably in time. The heads and bodies reflect the size of language cores identified by linguists for basic communication. We propose a Gaussian random walk model which reproduces the rank variation of words in time and thus the diversity. Rank diversity of words can be understood as the result of random variations in rank, where the size of the variation depends on the rank itself. We find that the core size is similar for all languages studied.

PB - Public Library of Science VL - 10 UR - http://dx.doi.org/10.1371%2Fjournal.pone.0121898 ER -