Logos Multilingual Portal

12 - Other corpora

«My next concern is to show that the dream-work is exhausted by the co-operation of the three factors enumerated- and of a fourth which has still to be mentioned- that it does no more than translate the dream-thoughts».

Perhaps the most complete site regarding English texts is The Online Books Page, John Mark Ockerbloom, Curator, to be found at http://onlinebooks.library.upenn.edu/lists.html. The site contains over twenty thousand works that can be searched by author, title, or argument. The site is not structures like a corpus, therefore it is possible to download a work into your computer, but you cannot carry out an online research of the works, nor access nor create concordances through it.

The texts are not materially present in the university's server (the site is a part of the portal of the University of Pennsylvania), but is accessed via links to other sites. The librarian Ockerbloom maintains the various links constantly updated, and does it so well to make Online Books one of the most visited sites by researchers seeking texts in electronic form.

One frequently finds precious sites that are not equipped to allow the user to work on the materials in the repertory. In these cases, it is recommendable to consider creating one's own concordances. At first thought, it may seem an arduous task, but upon closer examination it doesn't come close to the enormous efforts necessary just a few years ago. Today, in a handful of hours, using internet it is possible to equip oneself with the desired texts and the programs to work with them. Let's look at how it's done.

The first issue at hand is to determine the kind of corpus we need. Generally, the corpus is needed to check the frequency of use of given expressions or strings of text. The frequency of use of a word may be typical of an author of a text, or of a whole literary genre, or of a historical period. Consequently, its use (or lack thereof) becomes significant and at times may constitute an intertextual reference.

To make this type of check, you usually needs two corpora: one containing the text you're working on (or the macrotext of the complete works of the same author) and a control corpus. The second is generally composed of texts of the same historical period, language and culture as the original and serves as a comparison.

In dealing with texts from remote times, and therefore not covered by copyrights, it is not difficult to obtain them: they are usually found in internet. When they are not available in electronic form, that form can be created.

A scanner and an optical recognition program. The scanner "reads" the necessary pages, and the optical recognition program transforms the image into text. Generally these programs produce the text in .txt format, that is, in a text format recognizable by any word processor.

To create the corpus, you need only put all the material in the same file as the works you want to incorporate in the corpus and save them as a single .txt file. Some programs used to generate concordances require a specific .txt format called "text only with line breaks", a format option available in all work processing programs.

Once the necessary texts have been acquired and the corpora has been created, you need to get a program for creating the concordances. Connecting to the page entitled Gateway to corpus linguistics on the internet at http://www.corpus-linguistics.de/software/software_concord.html ,

You get an overview of the software available. There are programs available both free and with costs.

One of the free programs, called KWiC Finder is available at http://miniappolis.com/KWiCFinder/KWiCFinderDownload.html but there are others as well: R.J.C Watt's Concordance (Windows 95/98/ME/2000/XP), LEXA - Corpus Processing Software by Raymond Hickey (Windows 95/98/ME/2000/XP), Mike Scott's WordSmith 4.0 beta (Windows 98/NT/2000/XP), Microconcord offered by Athelstan (per DOS), Microconcord by Mike Scott, Monoconc PRO also from Athelstan (for Windows 95 or higher), Multiconcord by Progetto Lingua-Socrates, QWICK from the Corpus Research Group, University of Birmingham, UK, TATOE: Text Analysis Tool with Object Encoding (for Windows 95/98/NT), Text Analysis Computing Tools (TACT), solo per Windows 3.11 e Windows 95/98), UltraFind 2.5.3 for Macintosh by Ultradesign.com, Peter Roe's WinATA (for Windows 95 or later), WinConcord from Zdenek Martinek (for Windows 95 and XP), Xlex/www tools by Arbeitsbereich Linguistik.

As you can see, there is a program for all needs, budgets and tastes.

Once the software is installed, the inquiries to the corpus are quite simple. Naturally it is possible to make complicated inquiries that require experience and ability with basic statistics principles. But for use as described here (comparisons of use frequency, etc.) no extraordinary know-how is required.

In the next unit we will begin our examination of online dictionaries.


Bibliographical references

FREUD SIGMUND L'interpretazione dei sogni in Opere vol. 3 Torino Boringhieri edited by C. L. Musatti 1966.

FREUD SIGMUND The Interpretation Of Dreams translated by A. A. Brill London G. Allen & company 1913.

Gateway to corpus linguistics on the internet, available in the world wide web at the address http://www.corpus-linguistics.de/software/software_concord.html, consulted on 7 April 2004.

KWiC Finder, available in the world wide web at the address http://miniappolis.com/KWiCFinder/KWiCFinderDownload.html, consulted on 7 April 2004.

OCKERBLOOM JOHN MARK The Online Books Page, available in the world wide web at the address http://onlinebooks.library.upenn.edu/lists.html, consulted on 7 April 2004

1 Freud 1900: 385.