A possible Homeland of the Indo-European Languages

And their Migrations after the extended Separation Level Recovery Method (Separation Level Recovery under Two Distributions, SLR2D)

                           > Version française>                >Deutsche Version

By Hans J. J. G. Holm

0. Most educated people have at least a rough idea, what 'Indo-European' (IE) languages are: The many languages spoken between the Northwest of Europe to the East of the Indian subcontinent (historically even to Xinjiang in the Northwest of China), which are combined by their common inherited amount of lexemes (e.g. the system of counting or pronouns) as well as the grammar. For basic informations, see any newer encyclopedia, the pertinent Wikipedia sites are substandard. Highly unreliable are pages of non-Indo-Europeanists, often unable to assess the special problems of historical linguistics, lexicostatistics, and prehistory, as addressed in Holm 2007b. Such authors are often recognizable by citations of a few secondary sources or even racial nonsense.

1. However, what is still under discussion, are the pre-historical developments of these languages, the stages of subgrouping. A main error in all these discussions was and still is the superficial view, that a higher amount of agreements automatically and proportionally meant a closer relationship, without noticing that these agreements depend e.g. upon the rest of original residues, or the amount of replacements after the separation of any language (cf. Holm 2003). It should really be understandable that languages with heavy losses (as e.g. Albanian or Armenian), in spite of a close relationship, simply because of their smaller data base, share lesser agreements than so-called big-corpus languages like Greek or Indo-Aryan. All this is regrettably most times overlooked or ignored.

1.1. In fact these parameters - in mathematical terms - depend hypergeometrically upon each other, and must be transformed. Only by this necessary SLRD-transformation we achieve the original state (the amount of features that must have been present in common at the time of separation), the so-called "separation level". These figures, for the 91 pairs between 14 attested branches of IE, have been published in Holm 2000.

1.2. Since the amount of original linguistic features can only decrease in time, e.g., by historical influences, there is a clear sequence of separations (NOT ´glottochronology´), which can be visualized by a so-called >"family-tree", here using the example of the oldest words for 'hand' in the 12 main Indo-European language groups. This is, of course, a simplification and then can and should be applied to the different hypotheses of a zone of origination ("staging area", "Urheimat") of the speakers of Proto-Indo-European, including the migrations and ending in their final establishment in the concrete geographical areas.

1.3. Linguistically proved contacts between early stages of Indo-European and Uralian strongly suggest a homeland in the forest steppes north of the Black Sea ('Pontus', cf. e.g. Anthony 2007). Of course, the family tree presented above must have actually happened in the given geography, for example as on this >IE diversification map. Note that the migration routes are up to now not convincingly proven.

2. By going backward from the safe grounds of Hittite historical data, it seems clear that the IE expansion roughly parallels the adoption of the bronze metallurgy, of draught oxen and wagons, and mounded graves (burial hills). That does not mean (!) that speakers of Indo-European invented these techniques and customs, but made extensive use of them. Being herding nomads with a high proportion of horses they had to be good riders. This in turn gave tactical advantages in raids and warfare. The attempts to prove or disprove horseback riding by wear of teeth due to bits overlooks that there are indeed dozens of bitless bridles. The migrations could have happened in quite fewer centuries or somewhat former or later as well.

3. Another point of disagreement and discussion is the question whether the so-called Anatolian languages, in particular Hittite,
- were full members of Proto-IE
- or the latter have achieved their complete development only after the separation of Hittite.
In general, it is a misunderstanding that methods from bio-informatics could speak in favor of one or the other hypothesis, out of reasons addressed in [4]. Further: Cladistic researchers just assume a priori that Hittite has not shared the final development of IE, and use this language as a so-called 'outgroup' to define the starting point of their originally unrooted (!) graph, not vice versa!

4. The momentarily fashionable phylogeny reconstructions by mechanistic misuse of computer packages from the field of biological systematics rest on at least one of two erroneous beliefs:
4.1. The primitive similarity principle, completely ignoring the interdependencies outlined in chapter [1] (the so-called 'Proportionality Trap'- cf. Holm 2003) that languages were closer related the more cognates they share (erroneously looked at as 'evolutional distance'), or even
4.2. That words in languages change like clocks/rates 'by' time, what is obviously wrong, a rehash of the obsolete glottochronology: Look up any word in an etymological dictionary and find the reason for its existence: it will never be 'time', but historical (e.g. cultural, technical, military) events, which nobody ever can foresee = compute: E.g. English did not replace about 50 % of its originally Germanic vocabulary 'by time', but, as educated speakers of English know, by Norman dominance after the battle of Hastings, besides a long-lasting educational background of Latin. That in a so-called "basic vocabulary" the amount of changes is gradually lower, does not at all change the socio-historical reasons and causes, in particular their uncomputability. Even in the basic 100-word list of English, 6% are loans from Viking dialects (cf. Holm 2007c). Journalists cannot be blamed for not understanding what really is going on in these computations.

 5. Publications: (Please see also google scholar, academia.edu, and Reserachgate.Net)
ORCID iD iconorcid.org/0000-0001-9527-0553 - Hans J.J.G. Holm's running map notes of (pre)history - from the Bay of Biscay to the Caspian Sea - from the Last Glacial Period to the Middle Ages; in 27 time slices, each with running climate bar (based on Holm 2011a) and running culture bar. Mostly in source languages. View as >Holm's Historical Time Slices.pdf – I am trying to keep these maps updated - Holm, Hans J. J. G. (2019): The Earliest Wheel Finds, their Archeology and Indo-European Terminology in Time and Space, and Early Migrations around the Caucasus. Series Minor 43. Budapest: ARCHAEOLINGUA ALAPÍTVÁNY. ISBN 978-615-5766-30-5. With 306 references, six greyscaled and coloured images, and miniature images within the table of 130 representative wheel finds, including brandnew ones in Germany and Western China. N e w ! - Did the Proto-Indo-Europeans invent the Wheel?
[Abstract: The role that the cartwheel played in the life of the Indo-Europeans has primarily been studied from the perspective of specialists, often without sufficient consideration of the other fields involved. Therefore, we research an archaeological list of the oldest wheel finds (before ca. 2000 BCE) with regard to the most accurate dating, location, and construction type that now contains 130 representative finds between the North Sea, Central Asia, and India. We then elaborated the five wheel designations in the Indo-European main families, espe- cially in terms of onomasiological aspects. In order to relate both results to the development of the Indo-European languages, chronological scaffolding is needed, for which we bring in a recent glottochronological calculation of the Indo-European subdivisions. This already leads us to conclusions about the age of some designations, as well as clear parallels to certain construction types. In addition, on this updated basis, two often-discussed questions are addressed. With regard to the separation of the (Indo-European) Anatolians and Tocharians, there are many indications that this had taken place around the Caucasus from the eastern primary branch. Finally, the hypotheses for the “invention” of the wheel are replaced by the far more realistic one of a long-lasting development in a wide communicational area.] Correction: I apologize for the typo in footnote 5, please correct the Name to S(usanne) Kuprella.
- Holm, Hans J. (2017): Steppe Homeland of Indo–Europeans Favored by a Bayesian Approach with Revised Data and Processing. Glottometrics 37, 54-81. Open access at - Updated Bayesian approach, with archeological and linguistic parallels. http://www.ram-verlag.eu/journals-e-journals/glottometrics/
[Abstract: Despite dozens of hypotheses, the origin and development of the Indo-European language family are still under debate. A glottochronological approach to this problem using Bayesian computation of language divergence dates (appeared in Science 2012/2013) claimed to have provided evidence for the period of Neolithic expansion, known as the "Anatolian hypothesis". The dates have met with considerable criticism from other disciplines. I decided to investigate the alleged evidence for these dates by replicating and analyzing the approach with an own, updated dataset. This initially resulted in an origin around 4800 B.C., although the structure of the pedigrees varied considerably in several hundred tests. This problem was avoided in previous approaches by rigorous topological forcing. Here we applied a west-east dichotomy from a previous purely lexicostatistical (i.e. without times) approach based on the best available Indo-European dataset of approx. 1,100 verbal roots, which then produced dates around 4100 BC. During these tests, a further approach (Language, 2015) located a date of origin from between 3950 - 4740 BC. One of the insights of that study was that previous results were significantly disrupted by poorly attested languages, which thus were consistently removed step by step. These dates reflect the most recent state of knowledge in linguistics, archeology and genetics in favor of the Steppe hypothesis. A new archaeological-linguistic comparison of the wheel terminology, a primary argument for the divergence date, shows that different Indo- European denotations coincide in different areas with different types of wheel-axle constructions. Finally, the cultures lying on the possible dispersal routes and times are superimposed as an overlay on the calculated phylogenetic tree, without, however, postulating their Indo-European character in every case.]
- Holm, Hans J.J.G. (2016): >My Indo-European "Swadesh List" based on the latest edition (M. Swadesh 1971 posthumously). Their so-called "unmarked" translations in 17 representative extinct and living Indo-European languages yielded 815 different word stems. About 150 references. - Developed mainly for statistical work on the 12 Indo-European main branches! - Holm, Hans J. (2011b): "Swadesh lists" of Albanian Revisited and Consequences for Its Position in the Indo-European Languages. The Journal of Indo-European Studies 39-1&2. - English and updated version (note >Corrigenda).-
[Abstract: In the last decade, several scholars claimed to have finally solved the subgrouping of Indo-European by new lexicostatistical attempts. The public of course was not able to perceive the questionable outcomes, of which the different and idiosyncratic positions of Albanian are particularly conspicuous. One reason for this is the inadequate methods, simply copied from bioinformatics (cf. Holm, H. J. 2007). That defective data may contribute a great deal to these mistakes, is now first demonstrated here by analysing the Albanian part of three representative lists frequently employed in these studies: Thirteen percent of the data on these lists contains errors and this mixes inextricably with the overlooked stochastic dispersion. Seventeen new etymologies are proposed; however, about thirty per-cent of the list remains unsolved or questionable. Moreover, the high amount of differently changing replacements in Albanian is one more compelling argument against the rate assumption in glottochronology.]
- Holm, Hans J. (2011a): Archäoklimatologie des Holozäns: Ein durchgreifender Vergleich der "Wuchshomogenität" mit der Sonnenaktivität und anderen Klimaanzeigern ("Proxies"). [Archaeoclimatology of the Holocene: A thorough comparison of the "growth homogeneity" with solar activity and other climate indicators ("Proxies")] - Mid and late Holocene climate change in Greenland icecores compared to Alpine tree lines - Archäologisches Korrespondenzblatt 41-1:119-132. For the pdf, please click >Holm Archäoklimatologie
[Abstract: Recent approaches upon the validity of both the homogeneity of tree-ring widths of Middle-European oaks as well as two proxies for the activity of the sun do not stand our thorough comparison. This holds in particular regarding their alleged climatic meaning, e.g. regarding precipitation. Better correspondences, on the other hand, seem to be recognizable for the last 9 000 years between the alpine tree lines, as well as the temperature evidence of the NGRIP ice core.]
- Holm, Hans J. (2010): Review of Frank Sirocko (Hg.), "Wetter, Klima, Menschheitsentwickung, Von der Eiszeit bis ins 21. Jahrhundert". ["Weather, climate, human development, from the Ice-age to the 21st century"]. (German), please click >False climate presentations for (pre-)history
- Holm, Hans J. (2009): Albanische Basiswortlisten und die Stellung des Albanischen in den indogermanischen Sprachen. In: Zeitschrift für Balkanologie, Heft 45-2. (In German, slightly updated English version see 2011 above) (Remark: Today, I would replace the misleading term "Basiswortlisten" by "Universal concept lists") - Holm, Hans J. (2008): The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages. In: Christine Preisach, Hans Burkhardt, Lars Schmidt-Thieme, Reinhold Decker (eds.): Data Analysis, Machine Learning, and Applications. Proc. of the 31th Annual Conference of the German Classification Society (GfKl), Univ. of Freiburg, March 7-9, 2007. Springer-Verlag, Heidelberg-Berlin: 629-636. - Solving distribution problems in corpora of natural languages -> improved IE "Family Tree" - For the manuscript, please click >Holm SLRD Freiburg.pdf;
[Abstract: Linguists use to assume that languages were closer related, the more features, in particular common innovations, they share. In Holm (2003) has been demonstrated that this assumption is erroneous because these researchers miss the fact that the amount of shared agreements depends stochastically upon three more parameters. Only by help of the maximum likelihood estimator of the hypergeometric distribution we are able to find the amount of features, which must have been present in both languages at the era of their separation. This way we obtain a chain of separation between a family of languages for which the appropriate data is available. When applied to data of the Pokorny IEW, the resulting late separation of Hittite, Albanian and Armenian could well have been caused by their central position and therefore did not appear suspicious. Only when in a further application to Mixe-Zoquean data the same observation occurred that poorly documented languages appeared to separate late, a systematic bias could be suspected. This work reveals the reason for this bias peculiar to lists of natural languages, as opposed to stochastically normal distributed test cases like those presented in Holm 2007a. As more modern and linguistic reliable database the new "Lexikon der indogermanischen Verben", 2nd.ed. (Rix et al. 2001) was the best choice. Indeed the suspicion was confirmed and it is shown how these biased data can be correctly projected to true separation amounts. The result is a partly new chain of separation for the main Indo-European branches, which fits well to the grammatical facts, as well as to the geographical distribution of these branches. In particular it clearly demonstrates that the Anatolian languages did not part as first ones and thereby refutes the Indo-Hittite hypothesis.]
- Holm, Hans J. (2007d): Ausgliederungsreihenfolge der Indogermania auf Grundlage des LIV2. Lecture given at the Linguistics Department of the University of Bonn. For the slide presentation, - Audience: German linguists please click >Holm Idg. Ausgliederung Bonn.
- Holm, Hans J. (2007c): The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages. Presentation for the 31th Annual Conf. of the German Classification - Audience: "quantitative linguists", statisticians Society (GfKl), Univ. of Freiburg, March 7-9, 2007. For the slide presentation, please click >Holm Distribution in word lists, Freiburg 2007.
- Holm, Hans J. (2007b): The new Arboretum of Indo-European "Trees" - Can new Algorithms Reveal the Phylogeny and even Prehistory of IE? In: Journal of Quantitative Linguistics 14-2: 167-214 (For the manuscript, please click >Arboretum IE trees.pdf - update to 2005, newer lexicostatistical attempts in language subgrouping -
[Abstract: Specialization in the fields of linguistics vs. biological informatics leads to growing misunderstandings and false results caused by poor knowledge of the essential conditions of the applied respective methods and material. These are analyzed and the insights used to assess the recent glut of attempts in establishing new phylogenies of Indo-European languages.]
- Holm, Hans J. (2007a): Language Subgrouping. In: Grzybek, P. & R. Köhler (Editors), Exact Methods in the Study of Language and Text. Dedicated to Professor Gabriel Altmann on the occasion of his 75th birthday. [Quantitative Linguistics 62]. Berlin: de-Gruyter: 225-235. - Handling scatter in multiple subgroupings -
[Abstract: After many years of testing, and facing many competing methods, the Separation Level Recovery method (Holm 2000, passim) has been refined in terms of its stochastic and linguistic data requirements. It has been tested on how stochastic scatter can be distinguished from bad data and how data should be improved.]
- Holm, Hans J. (2005): Genealogische Verwandtschaft [Genealogical relationship]. In 'Quantitative Linguistics; An International Handbook' [HSK-Series, vol. 27, chapter 45]. Berlin: de Gruyter. (in German) - Lexicostatistical approaches to the subgrouping of languages in the 20the century. Updated repeatedly - see above -
[Inhalt: 1. Wann sind Sprachen "verwandt"? 2. Datenbewertung; 3. Beziehungsmaße; 3.1. Synchrone Beziehungsmaße; 3.2. Diachrone Beziehungsmaße; 4. Strukturierung genealogischer Abhängigkeiten.]
- Holm, Hans J. (2003): The proportionality trap, or: what is wrong with lexicostatistical subgrouping? In: Indogermanische Forschungen 108: 39-47. - The basics, employing only the hypergeometric distribution; also for non-mathematicians -
[ABSTRACT: With the help of an experiment it is shown that the raw amount of agreements (e.g. cognate numbers) between any two languages can never express their degree of genealogical relationship. It is then demonstrated, how, by taking into account all statistical determining parameters, the original level of any pair and further the correct subgroupings can be recovered].
- Holm, Hans J. & Embleton, Sheila (2001): Review of 'Mathematical foundations of Linguistics' (by Hubey, H.Mark, 1999, LINCOM handbooks in Linguistics 10, Muenchen: LINCOM). In: Journal of Quantitative Linguistics 8-2:149-62.
- Holm, Hans J. (2000): Genealogy of the Main Indo-European Branches Applying the Separation Base Method. In: Journal of Quantitative Linguistics 7-2:73-95. (In German) Some figures are saved at>Holm Sep Base Meth Examples -Application upon Pokorny's "Indogermanisches Etymologisches Wörterbuch"; updates see 2007c,d -
[Abstract: In former quantitative analyses of genealogical relations between languages the systematic bias caused by substitutions has not adequately been eliminated, which could only lead to false results. Only after registration of the huge and thereby only statistically significant data material of J. Pokorny's "Indogermanisches Etymologisches Wörterbuch" (Bern: Francke, 1959) in N. Bird's "Distribution of Indo-European root morphemes" (Wiesbaden: Harrassowitz, 1982) it became possible, in spite of its known shortcomings, to estimate the amount of lexemes having been present at the era of separation for every pair of sister languages with help of a robust estimator, and consequently to conclude upon the chain of separations.]

Started 2010-05-27:
free counters