19 - Search engines - part one

"We should then have to discover the laws by which organic stimuli are translated into dream-representations"1.

In the previous units we have seen electronic text corpora, how they work and how they can be util-ized in the translator's job. And we have seen that internet is full of other resources often freely avail-able to the users, citizens at large.

An aspect that we didn't consider yet is the use of the search engine. As we know, the informa-tion in the internet in itself, without these very pow-erful applications indexing the material in the Net would be of little use and, following such indexing, can make searches across the whole of the indexed material.

Of course, no indexing system is complete or perfect: they can all be improved upon and com-pleted. Still, developments in this field are dramatic, and the whole world wide web of sites, thanks to search engines, is closer and closer to becoming a huge reference tool and, in some cases, to develop-ing into a huge corpus consultable in a variety of ways.

Usually those able in the use of internet are also adept with a search engine: it is nearly impos-sible to find something useful in the former without being able to use the latter. However, often techni-cal knowledge on the use of the search engine is very poor, sometimes insufficient, for many rea-sons: because it is considered a boring tool of little interest and one is not willing to see its technical details, because one thinks that some details are in-tended solely for software engineers, because one doesn't realize how sophisticated a search can be-come if one knows the tools that govern it well.

The most popular search engine is Google. Let's say from the start that there are very many versions of Google, nearly one for each country in the world, therefore much more than one for each language. To get a complete picture of this phe-nomenon, just connect to one of the different Goo-gle sites, for example the main one, www.google.com, then click on "Language Tools", situated on the right side of the screen, in a rather small font. In this way you get to the page http://www.google.com/language_tools?hl=en showing a list both of the represented languages in Google and the Google version in the domains of the different countries. To make an example, for the English language there are as much as twenty-seven different national versions, other than the U.S. main version (ending in .com):

  1. http://www.google.co.uk/ (United Kingdom)
  2. http://www.google.ca/ (Canada: there is also the French version)
  3. http://www.google.com.au/ (Australia)
  4. http://www.google.co.nz/ (New Zealand)
  5. http://www.google.co.in/ (India: there are also the versions in the languages Hindi Bengali Telugu Marathi Tamil)
  6. http://www.google.co.in/ (Ireland)
  7. http://www.google.com.my/ (Malaysia: there is also the version in the languages Bahasa Melayu)
  8. http://www.google.com.sg/ (Singapore: there are also the versions in the languages 中文(简体) Bahasa Melayu தமிழ்)
  9. http://www.google.com.hk/intl/en/ (Hong Kong)
  10. http://www.google.com.hk/intl/en/ (Porto Rico: there is also the version in Spanish)
  11. http://www.google.com.hk/intl/en/ (Malta: there is also the version in Maltese)
  12. http://www.google.com.fj/ (Fiji)
  13. http://www.google.com.na/ (Namibia: there is also the version in Afrikaans)
  14. http://www.google.vg/ (Virgin Islands)
  15. http://www.google.mu/ (Mauritius: there is also the French version)
  16. http://www.google.mu/ (Samoa)
  17. http://www.google.com.gi/ (Gibraltar: there are also the versions in the languages español Italiano Português (Portugal) )
  18. http://www.google.com.vc/ (Saint Vincent and the Grenadines)
  19. http://www.google.mw/ (Malawi)
  20. http://www.google.fm/ (Micronesia)
  21. http://www.google.ms/ (Montserrat)
  22. http://www.google.com.nf/ (Norfolk Island)
  23. http://www.google.sh/ (Saint Helena)
  24. http://www.google.gg/ (Guernsey: there is also the French version)
  25. http://www.google.co.je/ (Jersey: there is also the French version)
  26. http://www.google.gm/ (Gambia)
  27. http://www.google.pn/ (Pitcairn Islands)

These are not just new labels put on the same search engine in English: they are tools localized and adapted to the specific need of a given country. If for example I connect to the site http://www.google.ch/intl/en/, that is the English language version of the Swiss Google, under the search box I find the option «Search the web / Search pages from Switzerland», that obviously makes a huge difference, for example, if I am searching for the use frequency of a given string in Italian, or in French, or in German, and I am inter-ested in understanding if there is any difference between such frequency in the web in general, in the Swiss sites only, or in the German or French sites only.

If for example I search the string "emmental" in the Swiss sites only, I find 75.900 sites concern-ing not the famous cheese, but the Emmental, or Emmen Valley, mostly of a tourist o geographic character. If I search the same string in the Italian sites only through www.google.it and specifying "only pages in Italian", I get 2.560 references to sites concerning almost exclusively the cheese (that, in its original language, is called instead "Emmen-taler"); if I insert it in the search engine www.google.fr, clicking on the option "Pages: France», I get 12.100 references to sites mostly about the French cheese that is an imitation of the Swiss Emmental and is called "Emmental français".

While if I put in the Swiss engine "emmental käse" specifying the option "Seiten auf Deutsch", I get only 21 results, that are mostly non-Swiss sites (a Swiss could hardly misspell the national cheese "Emmental" rather than the correct "Emmentaler), mostly German sites. To specify that I want to search only Swiss sites and only in German, enter the options opened clicking on "advanced search" (Erweiterte Suche), then indicate as language Ger-man, and as "domain .ch, indicating the suffix after any Swiss site. I get as a result only one site!

In the search engine box appears "emmental käse" site:.ch, that is the syntax of what I asked translated into the language understandable by the search engine application. If I edit the same string to read "emmentaler käse" site:.ch, the results number 482, which immediately tells me what the name of this cheese is in Switzerland, in contrast to how it is known in Italy and France.

The "Language Tools" page is one of the main references for translators' internet searches, and the example I showed is just one of the millions possible. The essential aspect is that the necessary syntax to set up searches does not have to be learned: it is simple, in fact, but if one is not willing to learn the various formulas she needs only to use the pages "advanced search" and "language tools".

In the "advanced search" page we find the following options:

"results containing all the following words"

"results containing the following sentence" (differ-ent from the previous option because here the order of the words must be the same)

"results containing any of the following words"

"results not containing the following words"

If, to find out what happens, I state that I want to find the word "emmental" in sites not containing the word "cheese", I can see if the existence of the valley that is touristically well-known within Swit-zerland is known also abroad, I get a disappointing result: I find many references to sites about the Emmentaler cheese, but simply never mention the word "cheese".

If I read the contents of the search box, I see that it contains "emmental -cheese", the translation of the criteria of my search into the language of Google. I then see that to indicate the words to be excluded it is enough, if I don't want to use the ad-vanced search page, put a minus sign before the words in question.

Looking at the sites I am referred to, I notice that the syntax "-cheese" didn't work because often the word is plural, moreover there are often the words recipe, recipes. I then try reformulating my search:

emmental -cheese -recipes -recipe -cheeses

And this time I'm satisfied: I find mostly sites de-voted to the Swiss valley, not to its famous cheese. In the next units we will examine other aspects of the search engine.


