The concordancer is the 'machine', in
order for the machine to function the operator needs some 'fuel',
a body of text or corpus. Acquiring large amounts of text that
can be 'read' by a computer is not a problem. The internet and
email ensure that there is now an abundance of machine readable
text. It is now possible to subscribe to email versions of newspapers
thus ensuring a weekly download of text. There are also a number
of web sites that allow individuals to download entire books.
The links to some of these sites can be found at the author's
web site. The following should be considered as the main sources
of text.
1. The world wide web
2. Email
3. Commercially sold corpora
4. Students homework (for error analysis)
5. Scanning (beware of copyright infringement)
Choosing corpora
While the acquisition 'some kind of
electronic text' is straightforward, the acquisition of an 'appropriate'
body of text is the most problematic step to be taken in the
production of concordances. The first factor that dictates the
choice of text are the objectives of the operator. A study of
a certain usage in spoken English requires a corpus of spoken
English, while the creation of language material to assist students
write scientific papers would require and entirely different
kind of corpus. The next point to be aware of is a technical
one concerning the form of the text. For a concordancer to function
the text has to be machine readable. In other words it has to
be saved on a computer disk. Furthermore, it should not be formatted
in anyway, and thus it should be saved as a 'text file'. Some
commercially sold corpora have tagged texts, which means that
various information about the texts and individual words have
been saved with the texts. This information is not the same as
formatting, and is invaluable if the operator wishes to make
searches of certain grammatical forms as opposed to specific
words (see figure one above).
A third variable is that of text level.
For the language teacher this is especially a matter of concern
in two areas. There is a distinct shortage of both spoken text,
and authentic text of any kind that can be easily understood
by lower- intermediate learners. Of these two shortages the most
troublesome is that of finding authentic text that will allow
teachers to produce concordances in which the contexts are not
too far beyond the comprehension level of their students. Above,
it was noted that the attraction of concordances is that one
is able to expose students to a large number of authentic examples.
However, the very authenticity of this material renders much
of it unusable for the instruction of beginners. Thus, if there
is a weakness with the concept of concordancing and teaching,
it is that the very concordances that are intended to enlighten
learners may only serve to mystify and frustrate them. Despite
this weakness, concordancers still have a role in the teaching
of lower level learners. At the very least the information gleaned
from an analysis of concordancers should help teachers in the
construction of language material. There is also the possibility
of using student created text as a corpus (Mark and Minagawa).
This would be invaluable for the teacher in analysing students'
errors, and for the students themselves to learn from their mistakes.
The final factor to consider is the
size of the text. The smaller the text, the fewer the instances
of any given word, and the less able the user is to draw conclusions
about usage in the that particular type of text. For a more in
depth account of the creation of a corpus the reader should consult
chapter one of Sinclair. Suffice to say that the producer of
concordances should consider the following four points when selecting
a corpus:
1. Your objectives dictate type/genre
of text
2. The form of text; is it already machine readable, is it plain
text or is it formatted?
3. Level of text and students
4. Size of text; the bigger the better