Δευτέρα, 10 Μαρτίου 2014

On the Indo-European Language “Family”

By Professor John Kontos

Let us start with the relevant Wikipedia entry as a source of information for Indo-European language family that Greek is considered by some to belong to. This Wiki entry provides references till 2008 only. I picked some main points of the entry and commented on them as below (slanted) taking into account newer scientific literature.
The Wikipedia entry starts as follows:

“The Indo-European languages are a family of several hundred related languages and dialects. There are about 439 languages and dialects, according to the 2009 Ethnologue estimate.”
The statement  that ”Greek is an Indo-European language” is obviously a tautology since Greek is a European language and if you create an artificial family grouping  European and “Indian” languages then Greek tautologically belongs to this new family using simple set theoretic reasoning. Hence the statement carries no new information.   The entry goes on listing subgroups  as branches of a tree that have been obtained using obscure statistical processing of word lists as criticized in [1]. The Wiki entry continues as follows:

“The various subgroups of the Indo-European language family include ten major branches, given in the chronological order of their earliest surviving written attestations. Specialists have postulated the existence of such subfamilies (subgroups) as Italo-Celtic, Graeco-Armenian, Graeco-Aryan, and Germanic with Balto-Slavic. The vogue for such subgroups waxes and wanes.

The language subfamilies are simply postulated and are the result of vogue as interesting scientifically as the … garment vogue.
In [1] the authors conclude:
We think that lexicostatistics in its current form does not have a future, but we do not think that, because of the failure of one particular method, all quantitative approaches to genetic language classification should be given up at once. We especially hope that root based approaches which are closer to the traditional methodology of historical linguistics will produce datasets which are less prone to subjective judgments and individual errors. Datasets can be used for phylogenetic calculations, and we hope that they will provide a more objective basis for stochastic calculations on linguistic datasets and may reveal interesting aspects and new insights into the complexity of language history. 

The lack of objective basis of Indo-Europeanism is obviously supported above.
The football auto-goal of the Wiki entry is that the writers have to accept the demolishment of one of the foundations of the Indo-European language myth namely the Indo-Hittite hypothesis as quoted below. 

“The Indo-Hittite hypothesis proposes the Indo-European language family to consist of two main branches: one represented by the Anatolian languages and another branch encompassing all other Indo-European languages. However, in general this hypothesis is considered to attribute too much weight to the Anatolian evidence.
Hans J. Holm, based on lexical calculations, arrives at a picture roughly replicating the general scholarly opinion and refuting the Indo-Hittite hypothesis [2].”

My conclusion is that the subject is still evolving and no final results exist. New tools are being proposed, scientifically the field is still in a flux [3-5] and it is too early for definite conclusions. Languages are treated by Indo-Europeanism fans as rock formations that may be grouped using local static geological information. The vastly complex phenomena of language evolution influenced by cultural interactions and human migration  is naively encapsulated into the statistics of some mere word lists. These word lists are incapable of representing the huge body of information that the science of linguistics has collected since the times of Ancient Greeks for thousands of years and it consists of Etymology, Syntax, Semantics and Pragmatics bodies of knowledge recorded in many thousands (may be millions) of books and papers. 

References
[1] Geisler Hans and Johann-Mattis List “Beautiful Trees on Unstable Ground. Notes on the Data Problem in Lexicostatistics” (forthcoming as cited in [3]).
[2] Holm, Hans J. (2008). "The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages". In Preisach, Christine; Burkhardt, Hans; Schmidt-Thieme, Lars et al. Data Analysis, Machine Learning, and Applications. Proc. 31st Annual Conference of the German Classification Society (GfKl).
[3] List, Johann-Mattis (2012). “LexStat: Automatic Detection of Cognates in Multilingual Wordlists”. Proc. Of the EACL 2012 Joint Workshop of LINGVIS & UNCLH.
[4] List Johann-Mattis (2014). “Investigating the impact of sample size on cognate detection” Journal of Language Relationship11.191-101.
[5] Taraka Rama et al (2013).”Two methods for automatic identification of cognates”In proceeding of: Quantitative Investigations in Theoretical Linguistics, At Leuven, Belgium

Δεν υπάρχουν σχόλια: