We use cookies.
By using the site, you agree to our Privacy Policy.

Contract number
Time span of the project

As of 01.11.2022

Number of staff members
scientific publications
Objects of intellectual property
General information

Name of the project: Linguistic and ethnocultural diversity of Southern Siberia in synchrony

Goals and objectives

Research directions: Linguistics and literature

Project objective: Determining laws of historical development and the modern state of languages and cultures of Southern Siberia in the aspect of their mutual interactions on the basis of linguistic, anthropological, and psycholinguistic research methods

The practical value of the study

Scientific results:

The Laboratory has significantly expanded and continues to add information to the empirical database on languages of Southern Siberia. By analyzing previously collected and new data we, were able to arrive at several new conclusions with regard to synchronous and diachronous aspects of the existence of the idioms of the region, their interaction at the linguistic, psycholinguistic and cognitive level. We have raised the field documentation on dialects of indigenous languages in Southern Siberia to a new level. Our researchers have completed a pilot «blanket research» of dialects of the Republic of Khakassia. Employees of the Laboratory have started and are now continuing a «blanket research» Turkic dialects of the Altai Republic and dialects of Siberian Tatars. We have collected, processed and for the most part published online materials from 300 settlements in Southern Siberia with indigenous populations. From this data, we compiled a map of various language phenomena. The cartographic visualization of data on the maps of phonetic transitions in languages of Southern Siberia demonstrates a significant areal diversity; the boundaries of the isoglosses run through the boundaries of the genealogical groups. A microanalysis of isoglosses indicated that an extremely dismembered nature of the rules determining the positions of phonetic transitions that look the same on the outside. In particular, the functions of an isogloss are performed by the morphological elimination of the effect of various phonetic rules, in which heterogeneous phenomena converge as a result of areal confluence of idioms. The late contact origins of these similarities, which apparently verifies contacts of languages in the last 500 years, is quite obvious.

We are continuing to incorporate materials of early sources on languages of Southern Siberia into electronic corpora and databases. Firstly, this is scientific notes taken by Daniel G. Messerschmidt,  Gerhard Friedrich Müller, Jakob Lindenau, Peter S, Pállas, Matthias Alexander Castrén, Vasiliy V. Radlov, Nikolay F. Katanov and others in the 18th–19th centuries. Secondly, multiple texts by missionaries written in languages of Southern Siberia in the 18th and the early 19th century, which, as it turned out, to a large extent reflect the dlalect diversity in the region in the indicated period. Comparison of data on 18th–19th century dialects with modern field data and previously published information demonstrates which language processes – phonetic and morphological – as well as sociolinguistic (language shifts – movement of population or transition of existing population to a contact language or dialect; convergence of  closely related dialects) processes were evolving over the last 300 years on the researched territory. The results of such a study will significantly contribute to the empirical basis of the theory of language changes, allowing to assess the relative rate of heterogeneous processes in closely related and territorially adjacent idioms. One more crucial type of studied early sources – runic written monuments of the Mountainous Altai. Applying photogrammetry methods to Turkic runic monuments from the territory of the Mountainous Altai allowed, on the one hand, to refine methods of the analysis of these materials. On the other hand, they contribute significantly to the progress in the understanding and interpretation of the inscriptions themselves. As of now, we have started comparing data on Altaic languages with data on Yenisei inscriptions to determine the areals of distribution of individual graphic, orthographic, lexical and grammatical phenomena and their possible matches with the areals of distribution of modern Turkic languages and dialects. This allows to extend the notion of dialect division of the Turkic languages for 800 more years backwards.

The Laboratory succeeded in starting comprehensive work on instrumental studies of articulatory features of dialects of various groups ion Southern Siberia and neighboring regions. The Articulate Instruments Micro ultrasonic device is as non-invasive as possible and it allows to make recordings in places where native speakers of indigenous languages live. We have already collected and are now processing a large volume of ultrasonic and audiovisual data on Turkic dialects of Southern Siberia. The use of this device brings the work on the typology of articulatory phonetics to a new level.  Previously, such work was limited to a large degree by the random selection of languages and informants and does not allow to perform statistically significant observations for individual idioms. Meanwhile, the possibility to thoroughly investigate the balance of articulatory settings in closely related languages should significantly contribute to the typology of phonetic changes that, as is known, is implemented by articulatory shifts. For instance, we obtained data from native speakers of several Turkic languages to study the articulation of back and front dorsal consonants ([k] and [q], [ɡ] and [ʁ]). Since the distribution of these consonants closely converges with the distribution of front and back vowels, this allows to raise the question of whether these consonants differ from each other phonologically or they are allophones, which is caused by the articulatory properties of neighboring consonants. Moreover, recently obtained data on a number of Turkic languages demonstrates that the positioning of the base of the tongue in the pharynx cavity is a differentiating feature in the articulation of alternating front and back vowels. In particular, the retraction or protrusion of the base of the tongue in the articulation of front and back vowels is relevant in some Turkic languages and irrelevant in others. By virtue of a pilot research of a number of Turkic languages (Khakas, Shor, Chulym and Teleut) we, for the first time, reliably demonstrated that this apparently is true for dorsal consonants of these languages as well. Previously, this problem has not been discussed in academic literature. Continuing this research can substantiate the factor of the regularity of the direction of changes, which has not been empirically proven yet, despite the fact that it is a postulate in comparative and historical linguistics (the research is continued within a Russian Science Foundation grant won by the Laboratory).

An analysis of the trends of historical changes of vowel and consonant phonemes in Turkic languages of Southern Siberia allowed to establish chronological sequences of the applicability of these rules in various positions for individual idioms and groups of idioms, therefore building reliable genealogical trees describing linguosynthesis of the Altaic and Shor-Khakas group. This undoubtedly is a new result, which, on the one hand, demonstrates the reliability of the comparative historical method even for language groups of relatively small depth (in fact, dialectical) that are in close contact. On the other hand, the areal distribution of rules demonstrates regroupings of idioms and a change of contact zones over the course of history.

The historical sequence of events extracted from the phonetic history finds its confirmations in lexico-semantic processes noted in changes of the hundred-word Swadesh list. We are forming a sequence of maps demonstrating relations between areals over various time periods (http://turk.polycorpora.org/khall.php). Unfortunately, the absolute dating of these maps is unreliable, only their sequence is clear. We are currently extending materials for an annotated lexicographic database featuring 200-word lists of basic lexicon for Turkic, Tungus-Manchurian and Mongolian languages prevalent on the territory of Siberia. Based on collected data, we conducted a philological classification (accounting for the whole variety of dialectics) using three-step data processing, computing (a) a dataset of root cognations, (b) extracting the derivation drift, (c) extracting the homoplastic developments. Using numerical methods, we obtained a more precise genealogical tree with hypotheses on secondary contacts.

Materials on the existing ethnographic features have been digitalized, entered into databases, and prepared for mapping. With this data, it is possible to amend the electronic atlas of languages and cultures of Southern Siberia. Such work has been performed for the first time after the creation of the Historic and Ethnographic Atlas of Siberia (edited by Leonid P. Potapov, Moscow–Leningrad: 1961). George P. Murdock’s Ethnographic Atlas, as we know, does not include materials on the ex-USSR territories, therefore our work involving new material will fill in the gaps.

As a result pf field research in areas of compact residence of Tatars, Shors and Khakas, we collected an extensive database on spoken language of Turkic-Russian bilinguals that has become the basis of a deep morphologically marked bimodal corpus of the Russian language of  Turkic-Russian bilinguals, RuTuBiC, (more than 500 hours of recordings), including a subcorpus with deep linguistic markup over 11 levels (40 hours of recordings). Based on the data of the corpus, we developed a model for analyzing interference phenomena that includes a parametrization: 1) for types of interference: (a)  inside – interlingual, (b) for language levels, 2) for influencing factors: (a) language – type of speech, type of discourse, type of genre,  type of text position, type of sentential position, (b) sociolinguistic – age, education, sex, occupation, (c) physcholinguistic – type of language experience, nature of its acquisition and variation.  In the project we proved that these influences can be assessed as some trends, but their effect is, in most cases, does not reach the level of statistical significance. We tracked the «intralingual (levels of the literary language and forms of the national language) vs interlingual (native-acquired languages)» interaction in Russian-language speech practices in Turkic-Russian bilinguals at the lexical and grammatical level.

An analysis of texts marked up in terms of types of deviations from the language standard  (40 hours of recordings) allowed to determine the following trends. The prevalent type of deviations from the language standard (DLS) are deviations related to interfering influence of Turkic languages supported by active tendencies of the Russian language, the type of communication, genre features. These are DLS linked to the implementation of the norms of case and prepositional-case governance. Less frequent than the previous group are norms of  grammatical agreement, the density of the distribution of these DLS is much lower.  We marked a considerably high level of dispersion density in the use of locally or functionally limited lexicon and in the domain of code movement.  The conducted cluster analysis did not determine a statistically reliable influence of social factors on the formation of the activity of speech actualization of deviations from the language standard, a prevalence of individual differences over social and group ones.

In the project we created extensive data resources — psycholinguistic databases:

  • a) RuWordPerception, featuring data on assessments of 600  Russian words (200 nouns, 200 verbs, 200 adjectives) by native Russian speakers and Turkic-Russian bilinguals (Tatar-Russian, Khakas-Russian, Shor-Russian) across 7 scales: vision, hearing, touch, taste and smell, subjective incidence, age of acquisition. The database contains 645773 assessments of Russian words obtained from 251 native speakers of the Russian language; 
  • b) TurkWordPerception: assessment of words of the Tatar and Khakas languages in terms of modalities of perception: (1) assessments of 600 words of the Tatar language (200 nouns, 200 adjectives, 200 verbs) across 5 modalities of perception obtained from 71 respondents; assessments of 600 words of the Khakas language (200 nouns, 200 adjectives, 200 verbs) for 5 modalities of perception obtained from 80 respondents.  The database contains 364545 assessments of words of the Russian language across 7 scales obtained from 151 respondents;
  • c) RuTurkPsychLing: assessments of words of the Russian, Tatar, and Khakas languages in terms of the parameters «familiarity», «temperature», «position in space», «size», «emotionality» and «manipulability». It includes 1 335 323 assessments of words, of which (1) 379 131 are assessments of Russian words in 7 psycholinguistic parameters, data from 136 native Russian speakers, (2) 62 629 assessments of Russian and Tatar words in 7 psycholinguistic parameters, data from 17 native Tatar speakers; 3) 893 563 assessments of Russian and Khakas words in 7 psycholinguistic parameters, data from 37 native Khakas speakers.

The comparative analysis of three psycholinguistic databases created within the project allows to evaluate the contribution of linguistic, social, and cognitive factors to the ratio of the modalities of perception during the formation of the semantics of words of basic parts of speech, nouns, adjectives, verbs of three languages – Russian, Khakas, and Tatar. The comparative analysis of the assessments of the modalities of perception of Russian words by native Russian speakers and Turkic-Russian bilinguals uncovered a high level of commonality of the two groups of respondents. The high level of intermodal correlations allows to make a preliminary conclusion  on the possibility of the perceptive experience objectified by means of the native language of a bilingual affecting the assessment of translated equivalents. Results of interlingual comparison are fundamentally different: in bilinguals, translated equivalents of the native language get another system of assessments, which determines the absence of significant correlations and allows to draw a conclusion that perceptive components occupy various positions in the structure of concepts in the mental lexicon of bilinguals correlated with nominations from the system of the native and acquired language.

The application of linguistic textological and experimental linguistics methods to the analysis of interlingual interference allowed to obtain a number of conclusions that are backed by statistical evidence concerning the presence of correlations of superficial speech and deep-rooted cognitive manifestations of interferences. We found the basic characteristics of the investigated type of bilingualism, determined the manifestations of interlingual interference both at the superficial speech and at the cognitive level: early natural bilingualism not balanced towards the second, acquired language. These databases will serve as a resource base for the research of language contacts for solving theoretical problems within the global concepts of embodied cognition, the hypothesis on linguistic relativity, the mental lexicon of bilinguals etc.  Simultaneously, the created databases can be used: (1) to check the existing hypotheses on new, objective material, (2) to generate new hypotheses, as well as to determine new trends, (3) to cross-validate results for other studies, both psycholinguistic and psychological as well as results of corpus studies, (4) to create and check various cognitive models in cognitive sciences, (5) to build computer models of languages and language use in computer linguistics, (6) to determine the social factors of language use: bilingualism, age differences, level of education etc., (7) to use in clinical practice as normative material. The text content of the compiled corpus of bilingual speech can be used in the practice of creating models for the automatic generation of oral and written speech.

Thus, as a result of the project, we have presented a representative picture of the development and contacts of languages and cultures of peoples of Southern Siberia in the historical and present-day perspective from the beginning of the modern era up until now on the basis of a complex methodology that combines the use of field, corpus, linguistic, psycholinguistic, experimental, statistical methods, methods of automated data processing. The collected data are significant for comparative and historical linguistics, Turkology, Tungus and Manchurian studies, Uralistics, language change theory, language contact theory, sociolinguistics, psycho- and cognitive linguistics. 

Implemented results of research:

  • Stage of development of the Lingvodoc 3.0 system: developing automated search capabilities for dictionaries on the main page of the website of the LingvoDoc system; ensuring the sorting of the dictionaries on the main parameters (languages, dialects, language families, readiness level, grant support); adding the automated binding function; adding the quantitative analysis function for the phonetic features and building isoglosses. The linguistic platform is designed to compile, analyze and store dictionaries, corpora and concordances of various languages and dialects (http://lingvodoc.ispras.ru). The platform is currently used to document languages of Russia and neighboring regions in a consortium of 6 Russian universities and a number of foreign researchers.

Education and career development:

Members of the academic team of the Laboratory have created education courses in the direction of the project and launched them at Tomsk State University:

  •  «Corpus linguistics: creating and using corpora» (Zoya I. Rezanova, main master’s degree education program in the direction of training «Fundamental and applied linguistics»),
  • «Comparative and historical linguistics» (Anna V. Dybo, scientific and pedagogical personnel training direction for the postgraduate school),
  • «Experimental methods of linguistic research» (Alina V. Vasiliyeva, master’s degree programs),
  • «Educational practice (practice to obtain primary professional skills). Training profile «Cognitive linguistics» (Zoya I. Rezanova, main master’s degree education program in the direction of training 45.04.03 «Fundamental and applied linguistics», master’s degree program «Computer and cognitive linguistics», module «Cognitive linguistics»),
  • Project practice (practice to obtain primary professional skills. Training profile «Computer linguistics») (Zoya I. Rezanova, main master’s degree education program in the direction of training 45.04.03 «Fundamental and applied linguistics», master’s degree program «Computer and cognitive linguistics», module «Computer linguistics»),
  • «Research and project work» (Zoya I. Rezanova, master’s degree program),
  • «Research work in the professional field» (Alina V. Vasiliyeva, bachelor’s degree program).

We have published textbooks in our domain of studies: «Teleut folklore. Lyric poetry», «Teleut folklore. Epos».

3 Doctor of Sciences and 6 Candidate of Sciences dissertations have been prepared and defended.

We have organized advanced training and occupational retraining programs at the Laboratory in our research domain: «Uralic and Altaic languages: modern methods of work», «Phonetics and graphics of Altaic and Uralic languages: methods of work».

Undergraduate and postgraduate students and young researchers have completed internships at Karoli Gaspar Reformatus Egytem (Hungary), the National University of Uzbekistan named after Mirzo Ulugbek (Uzbekistan), Goethe University Frankfurt (Germany), the Linguistics Department of Swarthmore College (USA).

Members of the academic team of the Laboratory conducted the following events in our area of studies: the all-Russian conference for undergraduate and postgraduate students and young researchers «Uralic and Altaic languages» (1–2 July 2017), the scientific seminar «Current problems of the research of languages of Southern Siberia» (3–14 July 2017), the international scientific conference «Experimental studies of language and speech: bilingualism and multilingualism» (9–10 October 2017, 1–5 October 2018), the scientific seminar «Corpus data and audio dictionaries of the Selkup and Khanty languages» (7–8 October 2018), the international seminar «Experimental studies of language and speech: using eye-tracking technologies in linguistic research» (1–5 April 2019), the scientific school «Language contacts: linguistic, sociolinguistic, psycholinguistic aspects» (10–20 May 2019, 16–19 November 2020), the international scientific conference «Slavic languages in the context of modern challenges» (13–14 May 2019., 18–19 October 2021, 16–17 May 2022), the all-Russian conference with international participation «Current problems of Uralic and Altaic languages» (12–13 October 2019), the international scientific seminar «Articulatory phonetics of peoples of Southern Siberia: modern experimental foundations» (28 September–1 November 2019), the scientific and educational seminar «Social, anthropological and sociolinguistic aspects of the research of languages of Southern Siberia» (2–8 December 2019), the scientific and educational seminar «Corpus technologies in the study of language contacts» (9–15 December 2019), the international scientific school «Experimental research of language and speech» (7–17 December 2020), the scientific and educational seminar «Current problems of the research for preserving and documenting languages and cultures of Southern Siberia» (7–19 December 2020), the scientific and educational seminar «Current problems of the study and description of phonetics of languages of Southern Siberia» (16–19 December 2020), the international scientific and practical seminar «Experimental studies of language and speech: experience and prospects» (3–4 December 2021), international scientific and practical seminar «Experimental research of language and speech» (17–20 October 2022).  


  • Ivannikov Institute for System Programming of the Russian Academy of Sciences, «Elekard–Med» LLC, «Intelligent Profit Solutions Tomsk» LLC (Russia): technological developments for the automation of scientific research at the stage of the accumulation and processing of language, speech material, conducting psycholinguistic and cognitive research and their processing.
  • McMaster University (Canada), University of Turku (Finland), Urgench State University (Uzbekistan): joint research projects.
  • University of Potsdam, University of Tübingen (Germany), University of Victoria, McMaster University (Canada), Technion – Israel Institute of Technology (Israel), University of Turku (Finland), New Bulgarian University (Bulgaria), Swarthmore College (USA), Corvinus University of Budapest (Hungary), Shukshin Altai State University for Humanities and Pedagogy, Siberian Federal University, Vinogradov Russian Language Institute of the Russian Academy of Sciences, Institute of World Culture of Moscow State University, Saint Petersburg State University, Irkutsk State University (Russia): conducting scientific schools, scientific and educational seminars, workshops.
  • HSE University — Saint Petersburg, Moscow Polytechnic University (Russia): teaching lectures and conducting workshops within an international seminar.
  • Institute of Linguistics of the Russian Academy of Sciences, Institute of Anthropology and Ethnography of the Russian Academy of Sciences, Institute of Philology of the Siberian Branch of the Russian Academy of Sciences (Russia): joint research, academic publications, scientific seminars in ethnolinguistics and folklore of peoples of Siberia.

Hide Show full
kondiyakov a.v., lemskaya v.m.
Chulym language of the village of Pasechnoye, Tyukhtetsky district, Krasnoyarsk Territory: Vol. 1. Dictionary, word forms and grammatical examples; Vol. 2. Texts with translation and analysis. Tomsk: TSU Publishing House, 2021.
funk d.a.
Shor epic tales in the notes of V.V. Radlov. Astana “Gylym” Baspasy, 2018.
dybo a.v., maltseva v.s., nikolaev s.l., sultrekova e.v., sheimovich a.v.
Areas and origin of various types of phonations in the Turkic languages of Southern Siberia // Bulletin of the Tomsk State University. Philology. 2020. No. 68. pp. 5-26.
mallory j., dybo a., balanovsky o.
The Impact of Genetics Research on Archaeology and Linguistics in Eurasia // Russian Journal of Genetics. Maik Nauka/Interperiodica Publishing, 2019. Vol. 55, No12. pp. 1472-1487.
dybo a.
Proto-Samoyedic and Proto-Manchu-Tungusic Dwelling Names: An Attempt at Semantic Reconstruction // Anthropos. Editions Saint-Paul Fribourg (Switzerland), 2022. Vol. 117, No1. pp. 43-72.
nevskaya i.a., vavulin m.v.
Kuttu I, a recently discovered Old Turkic Altai runiform inscription and its reading and interpretation // Turkic Languages. 2019 (23, 2).
rezanova z.i.
Fragment of markup in RUTUBIC linguistic corpus. Code switching or lexical borrowing? // Issues of lexicography. 2021. No20. pp. 91-104.
dambueva p.p.
Functional-semantic category of modality in "Altan Tobchi" by Lubsan Danzan. Elista: KSU Publishing House, 2021.
kamaletdinova z.s.
Turkic landscape lexicon of the Lower Tom region. Tomsk: TSU Publishing House, 2019.
rezanova z.i., artemenko e.d., vasil'eva a.v.
The semantics of the Russian diminutive in interlingual correspondences and interactions: a corpus and experimental study. Tomsk: TSU Publishing House, 2019.
Other laboratories and scientists
Hosting organization
Field of studies
Invited researcher
Time span of the project
Linguistic Ecology of the Arctic

M. K. Ammosov North-Eastern Federal University - (NEFU)

Languages and literature


Grenoble Lenore A



Laboratory for Ancient Text Commentary

A. M. Gorky Institute of World Literature of the RAS - (Institute of World Literature of the RAS)

Languages and literature


Olson Stuart Douglas



Laboratory for Siberian and Far Eastern Verbal Cultures

Institute of Philology of the Siberian Branch of the RAS - (SB RAS Institute of Philology)

Languages and literature


Dampilova Liudmila Sanzhiboevna