|
|
A threat to the future of the Norwegian language
Arild Hoksnes Bill Gates is making ill-natured remarks in the corridors. The Microsoft boss says that technological language products are going to be the next big application for the computer industry. You can talk to your computer, and the machine will be able to respond. But at the same time, however, Gates does not care much about minor languages.
A nation-wide effort is under way to ensure that the Norwegian language community won't disappear with this new technological wave. The first results of this initiative are computer linguistic dictionaries in Norwegian. Professor Torbjørn Nordgård at NTNU's Department of Linguistics at Dragvoll, is a leader in what could be described as "the battle for the Norwegian language". Nordgård is chairing the co-ordination of national resources for computational linguistics, which has adopted the name NIFST, a play on words that in Norwegian literally means "frightening", but stands for "The National Infrastructure for Language Technology". "In fact", according to Nordgård, "the future can be a bit frightening. The Norwegian Language Council, the government's cabinet and the Norwegian Research Council are extremely concerned with the fact that we must protect the Norwegian language during the development that is currently taking place in the computer industry. "There are already many trends in the direction of English becoming totally dominant in language technology, and we see that the international computer industry isn't particularly interested in troubling itself with the smaller speech communities. "The main point now is to build up national language technology resources that describe our language in a linguistically relevant way for computers." The industry can make use of such resources and, as a result, there is good reason to believe that language technology products also will be developed for Norwegian. Research communities on both technological and humanistic faculties in Oslo, Trondheim and Bergen are now engaged in this work.
CommercialisationComputational linguistics is a field which combines linguistics and computer technology by applying computational and mathematical methods to linguistic data. Applications of such techniques includes:
Some interesting language technology products already exist, such as dictation machines that, operating with English, convert your speech to written text. The search engine Alta Vista also offers automatic translation of documents on the Internet, but only between the major languages. "There is a lot that could be said about the quality of these services for automatic translation. They are rather basic and are unable to consider the nuances of languages. How could such a machine cope with translating very local expressions or plays on words?" asks Nordgård. "I don't think we will ever reach a situation where computers will be able to translate fiction in a way that is totally satisfactory, but we can come quite a long way with the translation of standardised language using machine translation. The basis for such services is a combination of technological and linguistic knowledge."
National resourcesLinguistic resources must be built up and made available if industry and research communities are to be interested in making products in Norwegian. According to Nordgård such resources require:
Computer linguistic dictionariesThe first stage of building these national resources involves developing machine- readable linguistic dictionaries, based on the standard dictionaries that the University of Oslo holds the rights to. Nordgård explains that computer linguistic dictionary is not just an ordinary dictionary in electronic form. "It contains different and far more nuanced information, for example detailed information about pronunciation. In addition, there are advanced descriptions of inflection systems in Norwegian and, moreover, the so-called argument structure, involving the syntactic and semantic properties of verbs." "When it concerns dictionaries our industry partner is Telenor (the Norwegian national telephone company), which has financed a considerable part of the work on pronunciation. Among other things, Telenor wants to make products that are based on speech technology. Computer generated speech from written text, for example having your E-mail read over the telephone, is one product that is already in use."
Dialects are alive and well"Telenor wants to describe many variants of spoken Norwegian. Currently, however, the most common dialects used in Eastern Norway are being described," Nordgård says, adding that Telenor is aware that people from other parts of Norway will not be prepared to try and imitate that dialect to use their services. The research communities hope that within three years the relevant linguistic resources will be in place so that Bill Gates and others won't be justified in saying: "It is too expensive to include the Norwegian language in the rapid language technology development."
Contact at NTNU: Torbjørn Nordgård |