Syriac Lexicography between Information Science and Linguistics

Previous research based on the ColibriCore algorithm has allowed us to trace translation patterns between Hebrew Bible and the Peshitta, based on n-gram analysis. For every lexeme, we derived an index of the entire distribution of collocations for both languages. In this paper, we investigate how these patterns, based merely on surface forms, reflect deeper syntactic differences. In order to do that, we discern the twenty largest translation divergences between lexical n-grams, and investigate what syntactic patterns they reflect. This will allow us to determine these lexical divergences in terms of syntactic features such as valency on the one hand, but information theoretic notions such as entropy on the other in order to classify and explain the structures with highest divergence. After this concrete case study, we provide further insights into the possibilities and limits of comparing syntactic and information theoretic metrics for the lexicology of Syriac in particular, and ancient resources in general. Furthermore, we place this discussion in the recent resurgence of information theory in the field of linguistics.​

Mathias Coeckelbergs (Bruxelles/Leuven)
Willem van Peursen (Amsterdam)