The new Tekstaro.com has more of Zamenhof’s writings, new capabilities, and is faster!

Tekstaro 3.1 provides users with several improvements. Important works from several periods of Zamenhof’s life have been added to the currently available body of Zamenhof’s writings. An update to the interface now allows specifying  several search terms simultaneously. This gives new possibilities to discover, compare, and contrast details of the use of Esperanto. Everything works faster and building complex search expressions is now clearer and easier.

Tekstaro.com contains the largest accessible language corpus of written Esperanto. ‘La Tekstaro’, as the daily users often call it, is among the most important projects ESF has commissioned. In 2002, the famous Swedish grammarian, Bertilo Wennergren, began to implement the project. Bertilo is the author of PMEG, which is arguably the most important book on Esperanto grammar and usage. La Ondo de Esperanto selected him as “Esperantist of the Year” in 2006 and UEA has honored him for his contributions.

For more than 15 years, La Tekstaro has helped innumerable researchers — Esperantists or not — study aspects of written Esperanto. Users can investigate the lexicon, from single morphemes to multi-word expressions and turns-of-phrase, as well as researching the details of grammar at similar depth. It is possible to discover unique features of style and content from diverse works, to confirm good language usage, to distinguish Zamenhof from non-Zamenhof usages, and to see the evolution of Esperanto over more than 120 years.

The first phase concentrated on collecting, integrating and compiling texts already existing in electronic form. The second phase focused on the completion of the texts (among which are articles from various Esperanto periodicals) and the creation of a user friendly interface. In the first two phases, Bertilo sought advice from famous Esperanto specialists: Ilona Koutny, Jouko Lindstedt, Carlo Minnaja, Chris Gledhill and Mauro La Torre.

The just completed phase, the third, significantly enhanced the body of works by adding many scanned texts made available by the Austrian National Library, with the result that now the total corpus contains more than 4.8 million words. Further additions are the capability to make simultaneous searches, the ability to join searches and a facility to select and limit search periods. And, of course, using a larger number of browsers and mobile devices is now possible.

This third phase is not the last phase for the development of La Tekstaro. A fourth phase aims to add the content of important non-Zamenhof works (particularly those from the period after World War II) and texts from various contemporary periodicals, in addition to the existing works from Monato and La Ondo de Esperanto. Also in the plans for an additional development phase is the capability to add morphological markers; i.e., adding “signs” to the existing texts, so that in searching you can avoid the confusion between, for example, the words ‘per/son/e’ – ‘person/e’, ‘per/fid/e’ – ‘perfid/e’, etc. Similar search tools in national language databases usually have such indicators and adding them to La Tekstaro will raise its usefulness to an even higher level.

 

Comments are closed