A possible Homeland of the Indo-European Languages

And their Migrations after the extended Separation Level Recovery Method (Separation Level Recovery under Two Distributions, SLR2D)

By Hans J.J.G. Holm

0. Most educated people have at least a rough idea, what 'Indo-European' (IE) languages are: The many languages spoken between the Northwest of Europe to the East of the Indian subcontinent (historically even to Xinjiang in the Northwest of China), which are combined by their common inherited amount of lexemes (e.g. the system of counting or pronouns) as well as the grammar. For basic informations, see any newer encyclopedia, the pertinent Wikipedia sites are substandard. Highly unreliable are pages of non-Indo-Europeanists, often unable to assess the special problems of historical linguistics, lexicostatistics, and prehistory, as addressed in Holm 2007b. Such authors are often recognizable by citations of a few secondary sources or even racial nonsense.

1. However, what is still under discussion, are the pre-historical developments of these languages, the stages of subgrouping. A main error in all these discussions was and still is the superficial view, that a higher amount of agreements automatically and proportionally meant a closer relationship, without noticing that these agreements depend e.g. upon the rest of original residues, or the amount of replacements after the separation of any language (cf. Holm 2003). It should really be understandable that languages with heavy losses (as e.g. Albanian or Armenian), in spite of a close relationship, simply because of their smaller data base, share lesser agreements than so-called big-corpus languages like Greek or Indo-Aryan. All this is regrettably most times overlooked or ignored.

1.1. In fact these parameters - in mathematical terms - depend hypergeometrically upon each other, and must be transformed. Only by this necessary SLRD-transformation we achieve the original state (the amount of features that must have been present in common at the time of separation), the so-called 'separation level'. These figures, for the 91 pairs between 14 attested branches of IE, have been published in Holm 2000.

1.2. As the amount of original features can only decrease in and by historical events, they result in an unambiguous sequence of separations (NOT ´glottochronology´), which can be visualized by a >family-tree, here with the representations of the IE words for 'hand'. This is, of course, a simplification and then can and should be applied to the different hypotheses of a zone of origination ("staging area", "Urheimat") of the speakers of Proto-Indo-European, including the migrations and ending in their final establishment in the concrete geographical area.

1.3. Linguistically proved contacts between earliest stages of Indo-European and Uralian strongly suggest a homeland in the forest steppes north of the Black Sea ('Pontus', cf. e.g. Anthony 2007). Here is a map graph >IE diversification map. Note that the migration routes are up to now not convincingly proven.

2. By going backward from the safe grounds of Hittite historical data, it seems clear that the IE expansion roughly parallels the adoption of the bronze metallurgy, of draught oxen and wagons, and mounded graves (burial hills). That does not mean (!) that speakers of Indo-European invented these techniques and customs, but made extensive use of them. Being herding nomads with a high proportion of horses they had to be good riders. This in turn gave tactical advantages in raids and warfare. The attempts to prove or disprove horseback riding by wear of teeth due to bits overlooks that there are indeed dozens of bitless bridles. The migrations could have happened in quite fewer centuries or somewhat former or later as well.

3. Another point of disagreement and discussion is the question whether the so-called Anatolian languages, in particular Hittite,
- were full members of Proto-IE
- or the latter have achieved their complete development only after the separation of Hittite.
In general, it is a misunderstanding that methods from bio-informatics could speak in favor of one or the other hypothesis, out of reasons addressed in [4]. Further: Cladistic researchers just assume a priori that Hittite has not shared the final development of IE, and use this language as a so-called 'outgroup' to define the starting point of their originally unrooted (!) graph, not vice versa!

4. The momentarily fashionable phylogeny reconstructions by mechanistic misuse of computer packages from the field of biological systematics rest on at least one of two erroneous beliefs:
4.1. The primitive similarity principle, completely ignoring the interdependencies outlined in chapter [1] (the so-called 'Proportionality Trap'- cf. Holm 2003) that languages were closer related the more cognates they share (erroneously looked at as 'evolutional distance'), or even
4.2. That words in languages change like clocks/rates 'by' time, what is obviously wrong, a rehash of the obsolete glottochronology: Look up any word in an etymological dictionary and find the reason for its existence: it will never be 'time', but historical (e.g. cultural, technical, military) events, which nobody ever can foresee = compute: E.g. English did not replace about 50 % of its originally Germanic vocabulary 'by time', but, as educated speakers of English know, by Norman dominance after the battle of Hastings, besides a long-lasting educational background of Latin. That in a so-called "basic vocabulary" the amount of changes is gradually lower, does not at all change the socio-historical reasons and causes, in particular their uncomputability. Even in the basic 100-word list of English, 6% are loans from Viking dialects (cf. Holm 2007c). Journalists cannot be blamed for not understanding what really is going on in these computations.

 5. References:
- Holm, Hans J. (in progress): >Holm's personal maps of European Prehistory From the Biscaya to the Caspi - from the Ice Age to the Middle Ages, in 27 time slices - updated from time to time - Holm, Hans J. (2016, in laufender Bearbeitung): >Meine "Swadesh-Liste" indogermanisch (M. Swadesh's 1971 posthum =letzte Verbesserung). Mit den möglichst unmarkierten Übersetzungen in 17 repräsentativen ausgestorbenen und lebenden indogermanischen Sprachen. - For statistical purposes only! - Holm, Hans J. (2017): Steppe Homeland of Indo–Europeans Favored by a Bayesian Approach with Revised Data and Processing. Glottometrics 37, 54-81. http://www.ram-verlag.eu/journals-e-journals/glottometrics/ - Updated Bayesian approach, with archeological and linguistic parallels.
[Abstract: Despite dozens of hypotheses, the origin and development of the Indo-European language family are still under debate. A well-known glottochronological approach to this problem using Bayesian computation of language divergence dates claims to have provided evidence for the period of Neolithic expansion known as the “Anatolian hypothesis.” The dates have met with considerable criticism from other disciplines. I decided to investigate the evidence for these dates by replicating and analyzing the approach. During this process, a further approach located a date of origin from between 3950 – 4740 BC. One of the insights of that study was that previous results were significantly disrupted by poorly attested languages, which thus were consistently removed step by step. This paper supports this finding confronting data from the previous approaches and my own updated dataset. The resulting date is around 4800 BC. However, the topology of the trees differed considerably over the course of several hundreds of tests. This problem was avoided in previous approaches by rigorous topological forcing. Here we apply a west–east dichotomy from a previous purely lexicostatistical (i.e. without times) approach based on the best available Indo-European dataset of approx. 1,100 verbal roots, which produces dates around 4100 BC. These dates reflect the most recent state of knowledge in linguistics, archeology and genetics in favor of the Steppe hypothesis. A new synopsis of the wheel problem, a primary argument for the divergence date, shows that not one but three different Indo-European denotations coincide in different areas with different types of wheel–axle constructions. Archeological cultures likely to have been affected by the migrations are presented visually at the end of this paper. Keywords: Indo-European, glottochronology, Urheimat, Bayes‘ reasoning, Swadesh list.]
- Holm, Hans J. (2016, in progress): >Indo-European Universal Concepts List (M. Swadesh's 1971=final meanings). With “unmarked” translations in 17 representative extinct and modern IE languages. - For lexicostatistical purpose only! - Holm, Hans J. (2011b): "Swadesh lists" of Albanian Revisited and Consequences for Its Position in the Indo-European Languages. The Journal of Indo-European Studies 39-1&2. - English and updated version (note >Corrigenda).-
[Abstract: In the last decade, several scholars claimed to have finally solved the subgrouping of Indo-European by new lexicostatistical attempts. The public of course was not able to perceive the questionable outcomes, of which the different and idiosyncratic positions of Albanian are particularly conspicuous. One reason for this is the inadequate methods, simply copied from bioinformatics (cf. Holm, H. J. 2007). That defective data may contribute a great deal to these mistakes, is now first demonstrated here by analysing the Albanian part of three representative lists frequently employed in these studies: Thirteen percent of the data on these lists contains errors and this mixes inextricably with the overlooked stochastic dispersion. Seventeen new etymologies are proposed; however, about thirty per-cent of the list remains unsolved or questionable. Moreover, the high amount of differently changing replacements in Albanian is one more compelling argument against the rate assumption in glottochronology.]
- Holm, Hans J. (2011a): Archäoklimatologie des Holozäns: Ein durchgreifender Vergleich der "Wuchshomogenität" mit der Sonnenaktivität und anderen Klimaanzeigern ("Proxies"). - Mid and late Holocene climate change in Greenland icecores compared to Alpine tree lines - Archäologisches Korrespondenzblatt 41-1:119-132. For the pdf, please click >Archäoklimatologie
[Abstract: Recent approaches upon the validity of both the homogeneity of tree-ring widths of Middle-European oaks as well as two proxies for the activity of the sun do not stand our thorough comparison. This holds in particular regarding their alleged climatic meaning, e.g. regarding precipitation. Better correspondences, on the other hand, seem to be recognizable for the last 9 000 years between the alpine tree lines, as well as temperature evidence of the NGRIP ice core.]
- Holm, Hans J. (2010): Review of: Frank Sirocko (Hg.)"Wetter, Klima, Menschheitsentwickung, Von der Eiszeit bis ins 21. Jahrhundert". ["Weather, climate, human development, from the Ice-age to the 21st century"]. Please click >False climate presentations for (pre-)history (In German)
- Holm, Hans J. (2009): Albanische Basiswortlisten und die Stellung des Albanischen in den indogermanischen Sprachen. In: Zeitschrift für Balkanologie, Heft 45-2. (German, updated English version see 2011 above) - Holm, Hans J. (2008): The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages. In: Christine Preisach, Hans Burkhardt, Lars Schmidt-Thieme, Reinhold Decker (eds.): Data Analysis, Machine Learning, and Applications. Proc. of the 31th Annual Conference of the German Classification Society (GfKl), Univ. of Freiburg, March 7-9, 2007. Springer-Verlag, Heidelberg-Berlin: 629-636. For a glance at the raw MS click >SLRD.pdf; - Solving distribution problems in corpora of natural languages -> improved IE "Family Tree" -
[Abstract: Linguists use to assume that languages were closer related, the more features, in particular common innovations, they share. In Holm (2003) has been demonstrated that this assumption is erroneous because these researchers miss the fact that the amount of shared agreements depends stochastically upon three more parameters. Only by help of the maximum likelihood estimator of the hypergeometric distribution we are able to find the amount of features, which must have been present in both languages at the era of their separation. This way we obtain a chain of separation between a family of languages for which the appropriate data is available. When applied to data of the Pokorny IEW, the resulting late separation of Hittite, Albanian and Armenian could well have been caused by their central position and therefore did not appear suspicious. Only when in a further application to Mixe-Zoquean data the same observation occurred that poorly documented languages appeared to separate late, a systematic bias could be suspected. This work reveals the reason for this bias peculiar to lists of natural languages, as opposed to stochastically normal distributed test cases like those presented in Holm 2007a. As more modern and linguistic reliable database the new "Lexikon der indogermanischen Verben", 2nd.ed. (Rix et al. 2001) was the best choice. Indeed the suspicion was confirmed and it is shown how these biased data can be correctly projected to true separation amounts. The result is a partly new chain of separation for the main Indo-European branches, which fits well to the grammatical facts, as well as to the geographical distribution of these branches. In particular it clearly demonstrates that the Anatolian languages did not part as first ones and thereby refutes the Indo-Hittite hypothesis.]
- Holm, Hans J. (2007a): The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages. Slide show of the presentation at Holm_Indo-European_Subgrouping_by_SLRD_Freiburg_2007_21155000/. (In English) - Holm, Hans J. (2007b): Indo-European Subgrouping lexicostatistical 2-Distributions deutsch. Slideshow presented at the Linguistics Department of the University of Bonn, available via https://www.academia.edu/11292156/Indo-European_Subgrouping_lexicostatistical_2-Distributions_deutsch. (partly inEnglish). - Holm, Hans J. (2007c): The new Arboretum of Indo-European "Trees" - Can new Algorithms Reveal the Phylogeny and even Prehistory of IE? In: Journal of Quantitative Linguistics 14-2:167-214(For a glance at the MS, click >Arboretum IE trees.pdf - update to 2005, newer lexicostatistical attempts in language subgrouping -
[ABSTRACT: Specialization in the fields of linguistics vs. biological informatics leads to growing misunderstandings and false results caused by poor knowledge of the essential conditions of the applied respective methods and material. These are analyzed and the insights used to assess the recent glut of attempts in establishing new phylogenies of Indo-European languages.]
- Holm, Hans J. (2007d): Language Subgrouping. In: Grzybek, P. & R. Köhler (Editors), Exact Methods in the Study of Language and Text. Dedicated to Professor Gabriel Altmann on the occasion of his 75th birthday. [Quantitative Linguistics 62]. Berlin: de-Gruyter: 225-235. - Handling scatter in multiple subgroupings -
[Abstract: After many years of testing, and facing many competing methods, the Separation Level Recovery method (Holm 2000, passim) has been refined in terms of its stochastic and linguistic data requirements. It has been tested on how stochastic scatter can be distinguished from bad data and how data should be improved.]
- Holm, Hans J. (2005): Genealogische Verwandtschaft [Genealogical relationship]. In 'Quantitative Linguistics; An international handbook' [HSK-Series, Bd. 27, chapter 45]. Berlin: de Gruyter. - Lexikostatistische Ansätze zur Gliederung von Sprachen im 20.Jh. Aktualisierung s.u. 2007c -
[Inhalt: 1. Wann sind Sprachen "verwandt"? 2. Datenbewertung; 3. Beziehungsmaße; 3.1. Synchrone ~; 3.2. Diachrone Beziehungsmaße; 4. Strukturierung genealogischer Abhängigkeiten.]
- Holm, Hans J. (2003): The proportionality trap, or: what is wrong with lexicostatistical subgrouping? In: Indogermanische Forschungen 108: 39-47. - The basics, employing only the hypergeometric distribution; also for non-mathematicians -
[ABSTRACT: With the help of an experiment it is shown that the raw amount of agreements (e.g. cognate numbers) between any two languages can never express their degree of genealogical relationship. It is then demonstrated, how, by taking into account all statistical determining parameters, the original level of any pair and further the correct subgroupings can be recovered].
- Holm, Hans J. & Embleton, Sheila (2001): Review of 'Mathematical foundations of Linguistics' (by Hubey, H.Mark, 1999, LINCOM handbooks in Linguistics 10, Muenchen: LINCOM). In: Journal of Quantitative Linguistics 8-2:149-62.
- Holm, Hans J. (2000): Genealogy of the Main Indo-European Branches Applying the Separation Base Method. In: Journal of Quantitative Linguistics 7-2:73-95. (In German) -Application upon Pokorny's "Indogermanisches Etymologisches Wörterbuch"; updates see 2007c,d -
[Abstract: In former quantitative analyses of genealogical relations between languages the systematic bias caused by substitutions has not adequately been eliminated, which could only lead to false results. Only after registration of the huge and thereby only statistically significant data material of J. Pokorny's "Indogermanisches Etymologisches Wörterbuch" (Bern: Francke, 1959) in N. Bird's "Distribution of Indo-European root morphemes" (Wiesbaden: Harrassowitz, 1982) it became possible, in spite of its known shortcomings, to estimate the amount of lexemes having been present at the era of separation for every pair of sister languages with help of a robust estimator, and consequently to conclude upon the chain of separations.]

