Towards an Interlingual Internet


The Internet began as a U.S.-based research project, and its earliest technical standards were designed for the use of monolingual English speakers. However, as the Internet spreads to become a world-wide communications tool, work is underway to make it suitable for all written languages, and to use it for language learning and teaching.

On the basis of its commitment to interlingualism, the Esperantic Studies Foundation has a strong interest in this process, and in its implications for national and international language policies. We welcome contacts with experts in this area, and suggestions for further additions to the following short list of sites which give some insight into the complexities of Net-based communication in a multilingual world.


Plurilingualism

Plurilingualism, in the framework of interlingualism, refers to a wide range of attempts to stablize multilingual societies by fostering high levels of individual plurilingual competence. Language teaching and learning are thus central to this approach, but also all categories of linguistic rights, language legislation and language policy that affect the relationships between language communities and therefore influence people's desire or opportunity to learn and use other languages.

Language on the Web

The Internet is plurilingual, first of all, because it makes a wide range of language communities and language products much more readily accessible to its users. Linguist Geoff Nunberg pointed this out in a short and entertaining essay from 1996, E-Babel. Today there are many multilingual "jump sites" on the Web. Two of the most comprehensive are:
  • the Human-Languages Page (HLP), aimed primarily at academic users, which includes over 2000 up-to-date links to such items as on-line language lessons, translating dictionaries, native literature, translation services, software, language schools, and information pages;
  • The Language Resource Pages of Rivendell International, aimed at a more general audience, which provides searchable lists of online dictionaries and translators, tools for translators and language translation associations; voice recognition & synthesis, machine translation software and portable translation systems; software localization companies; online language courses, virtual language schools, and language learning software; language chat sites, language-related newsgroups, language magazines and newsletters; RealAudio language stations; and resources on such topics as ancient languages, etymology and word games.

    Impressive as such sources are, they are geared towards English speakers and therefore do not reflect the actual use of different languages on the Net. Very little methodologically adequate research has been carried out on this topic. Two of the few available reports are:


    The clear trend is towards greater and greater linguistic diversity on the Web as a higher proportion of the world's population gains affordable access to it. This trend is limited, however, by the uneven distribution of resources between language communities; by still-unresolved technical issues regarding non-Latin scripts (see below); and by the predominance of Languages of Wider Communication, above all English, in international contexts. Despite its plurilingual nature, the Internet is not likely to fully reflect the world's linguistic diversity for the foreseeable future.

    Global Initiatives

    Such limitations have been discussed for some years in UNESCO, but relatively little of this debate is visible in the organization's web pages. The UNESCO Observatory on the Information Society includes a set of pages on multilingualism under the three headings:


    Although it is not mentioned on this site, UNESCO is also becoming more active in the area of endangered languages. Its Red Book On Endangered Languages is compiled and maintained by the International Clearing House for Endangered Languages at the University of Tokyo, which also publishes an occasional newsletter and provides details of a grants program.

    Other noteworthy sites on endangered languages include:

    European Union

    The European Union has had to wrestle with multilingual issues throughout its existence, but surprisingly little information is available on the Net. A large but disorganized set of links is provided by Paul Treanor in Language Futures Europe (also available here). To judge from these pages, the EU is currently ruled by an overwhelmingly technologist appraoch to "promoting linguistic diversity", exemplified by the extensive web site of the MLIS (Multilingual Information Society), a program of the European Commission's DG XIII (se below).

    More plurilingual in nature is the EU's Lingua program (DG XXII), aimed at fostering competence in three languages among all of the EU's citizens. Under an initiative to foster innovative language learning, the Commission is funding the development of Lingu@netEuropa, a virtual resource centre for the teaching and learning of foreign languages. Such a site has yet to make its appearance on the web.

    Another approach to plurilingualism can be found in the pages of the European Bureau for Lesser-Used Languages, which coordinates indigenous minority language initiatives throughout the European Union.

    United States

    The National Clearinghouse for Bilingual Education covers a wide range of language policy issues. The NCBE publishes a weekly on-line news bulletin, Newsline; sponsors a list server on bilingual education, the NCBE Roundtable; maintains an on-line database of articles and books; and provides extensive Internet resource lists.

    An excellent site on U.S. language policy is maintained by author and researcher James Crawford, including issues such as English-Only and English-Plus, bilingualism, and endangered languages.

    Important plurlingualist organizations in the U.S. (generally with informative web sites) include:

    Computer-Assisted and Web-Based Language Learning

    The Internet offers the promise of changing the way languages are learned, offering a great array of on-line courses and courseware and making it more feasible for novice and intermediate speakers to use their new skills in real communicative situations. Here are some of the more useful resources on the Net.

    The Human-Languages Page and the Language Resource Pages of Rivendell International, referred to above, are both excellent starting points for tracking down materials on specific languages. The Virtual CALL Library aims to provide a comprehensive guide to downloadable PC software for computer-assisted language learning. More general guides are Internet Resources for Language Teachers and Learners by the UK-based Computers in Teaching Initiative, and William Haworth's World Language Pages, which also offer sensible advice about how to integrate the Internet with other teaching approaches. More in-depth discussion of topics such as the use of e-mail in language learning can be found in the symposium report "Educational Technology in Language Learning" published by the Language Resource Centre of the Institut National des Sciences Appliquées de Lyon, France. Also useful, and a rare source of detailed and informed reviews of software and books, is the Berkeley-based site CALL @ Chorus.

    These issues are also explored in the on-line journals:

    and broader aspects of plurlingualism are discussed in the small but interesting journal Language Today.

    The International Association for Language Learning Technology is the largest group of organizations involved in computer-assisted language teaching. IALLT members include the Association for Educational Communications and Technology and the Computer Assisted Language Instruction Consortium. All of these organizations maintain websites that are worth consulting from time to time.


    Technologism

    Unlike plurilingualism, which requires personal effort by individuals, technologism aims to reduce such effort to the minumum. In the technologist world, anyone should be able to use any language to communicate and to access information on the Internet without disadvantage. Although this situation is still far from realization, some notable advances are being made.

    Multilingual coding standards

    Many of the technical obstacles to making the Net fully multilingual lie in working out unambiguous coding and transliteration conventions. Transliteration is necessary to alllow information to be conveyed between writing systems. Coding is necessary to allow information to be transferred between data-processing systems. Given that most of the world's software was originally designed for Latin-based scripts, transliteration and coding solutions are often interrelated.

    The most promising solution to the coding problem is the Unicode standard, whose development has been described at some length by Janet Erickson. The International Unicode Conferences are a source of information on recent developments.

    Unicode has been accepted by the International Organization for Standardization as the "Basic Multilingual Plane" (BMP) of the international standard ISO 10646, which is under the responsibility of the subcommittee on Coded Character Sets (SC2) of the Joint Technical Committee on Information Technology (JTC1).

    The transliteration problem remains a subject for much discussion (sometimes heated) in many languages; for some thoughtful short essays on the problems involved see the Catend Web site. The ISO body for this work is the subcommittee on Conversion of Written Languages (SC2) of the ISO Technical Committee on Information and Documentation (TC46). TC46/SC2 has an informative Home Page, along with a moderated list server and archive.

    The equivalent body to JTC1/SC2 and TC46/SC2 at the European level is Technical Committee 304 of the European Committee for Standardization (CEN). Much more information on the standards under development in CEN and ISO is provided in the Web site for Everson Gunn Teoranta, in which Michael Everson and Marion Gunn exhaustively document their work on developing software and coding standards in Irish and other lesser-used languages.

    An overview of progress towards multilingual HTML and related technologies is provided by the World Wide Web Consortium through its site on internationalization. Also worth consulting is WInter (Web Internationalization & Multilingualism) and the Babel site, in which Alis Technologies and the Internet Society provide an introduction to some of the technical issues involved in creating multilingual Web pages.

    Human Language Technology

    A good place to start is the thorough and thought-provoking Survey of the State of the Art in Human Language Technology (1996), by 97 different authors. The chapter on multilinguality, covers the following fields:


    Flashier but less analytical is HLT Central, the Observatory of Human Language Technologies on the Web, which describes its field as "three intertwined areas centred around the human interaction with information, with information services and with each other":


    HLT Central hosts the on-line journal LeJournal, which claims to be "the journal of record for human language technology".

    The European Union's MLIS (Multilingual Information Society) (see above) supports research and development in machine translation ("language coverage is poor for some of the official languages"), translation memories, terminological data bases ("the Eurodicautom data base contains more than 1,200,000 multilingual concepts available in up to 11 languages"), electronic dictionaries, authoring aids, thesauri, glossaries, nomenclatures, classifications, digitalised and tagged text corpuses. MLIS on-line resources include:


    Other sites with extensive resource lists include the professional associations:


    and the commercial/public sites:



    Language Brokers, World English, Esperantism

    Surveys of these areas will be added in the future. Comments and updates on the above are welcome.

    The Challenge of Interlingualism

    Esperantic Studies Foundation


    Send questions or comments to Mark Fettes.