01720nas a2200241 4500008004100000022001400041245009300055210006900148300001100217490000800228520098300236653002901219653001401248653002001262653001801282100002001300700002901320700002101349700001801370700001901388700002301407856004801430 2019 eng d a0378-437100aRank-frequency distribution of natural languages: A difference of probabilities approach0 aRankfrequency distribution of natural languages A difference of a1217950 v5323 aIn this paper we investigate the time variation of the rank k of words for six Indo-European languages using the Google Books N-gram Dataset. Based on numerical evidence, we regard k as a random variable whose dynamics may be described by a Fokker–Planck equation which we solve analytically. For low ranks the distinct languages behave differently, maybe due to the syntax rules, whereas for k>50 the law of large numbers predominates. We analyze the frequency distribution of words using the data and their adjustment in terms of time-dependent probability density distributions. We find small differences between the data and the fits due to conflicting dynamic mechanisms, but the data show a consistent behavior with our general approach. For the lower ranks the behavior of the data changes among languages presumably, again, due to distinct dynamic mechanisms. We discuss a possible origin of these differences and assess the novel features and limitations of our work.10aFokker–Planck equation10aLanguages10aMaster equation10aRank dynamics1 aCocho, Germinal1 aRodríguez, Rosalío, F.1 aSánchez, Sergio1 aFlores, Jorge1 aPineda, Carlos1 aGershenson, Carlos uhttps://doi.org/10.1016/j.physa.2019.121795