An Evaluation Method for Stemming Algorithms - Semantic Scholar

Report 0 Downloads 37 Views
An

Evaluation

Method

for

Chris

Department

Stemming

D.

Paice

of Computing,

Bailrigg,

Algorithms

Lancaster

Lancaster

LA1

University

4YR,

U.K.

Abstract The effectiveness

of stemming

algorithms

effect

performance

with

on retrieval

insights which

which stemming

might

help

in stemmer

performance

has

usually

been

test collections. optimisation.

is assessed against

This

in terms

measured

This however paper

predefine

of their

does not provide describes

concept

groups

any

a method

in

in samples

of

words. This enables various indices of stemming performance and weight to be computed. Results are reported for three stemming algorithms. The validity and usefulness of the approach, further

and the problems research

of conceptual

grouping,

are discussed,

and directions

for

are identified.

Introduction Stemming

is a widely-used

morphologically like

related

English,

a typical

method

terms, word

of word

standardisation

such as “clusters”

contains

designed

and “clustering”.

a stem which

refers

to allow

The idea

to some

central

the matching

is that, idea

of

in a language

or ‘meaning’,

and

that certain affixes have been added to modify the meaning and/or to fit the word for its syntactic role. The purpose of stemming is to strip away the affixes and thus reduce the word to its essence. In practice,

some affixes

may alter the meaning

of a word

so greatly

that to remove

them

would

be to discard vital information. In particular, deletion of prefixes is not generally felt to be helpful, except in certain domains such as medicine and chemistry. On the other hand, most suffixes in English are considered to be potentially removable. This paper is concerned with stemming in the restricted sense of suffix removal. A useful

summary

et al. [1]; further

Stemming There

of various

algorithms

stemming

and conflation

have been described

algorithms

by Frakes

is given

in the paper

by Lennon

[2] and by Paice [3].

Errors are two particular

problems

in using

stemming

for word

standardisation.

In the first place,

pairs of etymologically related words sometimes differ sharply in meaning - for example, consider “author” and “authoritarian”. In the second place, the transformations involved in adding and removing suffixes involve numerous irregularities and special cases. Stemming errors are of two kinds: wuierstemmirzg errors, in which words which refer to the same concept are not reduced to the same stem, and overstemmirzg errors, in which words are converted to the same stem even though they refer to distinct concepts. In designing a stemming algorithm there stemmer plays safe in order to avoid is a trade-off between these two kinds of error. A ligkt overstemming errors, but consequently leaves many understemming errors. A heavy sfewzwzer boldly removes all sorts of endings, some of which are decidedly unsafe, and therefore commits many oversternming There

errors.

have been several

investigations

into

the effects

of stemming

on retrieval

performance

in

test collections [1,2,4,5]. In most cases, stemming was found to improve retrieval performance, but not by very much, and there were no consistent differences of performance between different stemmers. Harman was unable to show any consistent benefit over not using stemming at all [6].

43 We

might

expect

performance is appropriate as a measure clearly

a relationship

in particular when high

of the weight,

with

or strength,

the performance

Although

evaluating

appropriate

from

into the specific

between

the

weight

of stemming

used

of each stemmer,

in precision-oriented

stemmers

it is probably

causes of errors.

Moreover,

but

found

that

and recall-oriented

on the basis of their

an IR viewpoint,

effects

unhelpful

it ignores

This paper

outlines

from

actual

well

texts. This permits

as indices

method

representing

involves

actual stemming It may

manually

humans Our

reasonable

on retrieval

in practice,

In using

with

types. into

evaluation

which during

a sample

correlate

may

it gives

seem

no insight

are not used only

in IR

or in a frame-instantiation

of word

weight’

into

not

is based on detecting

error

of words

did

performance

stemming

of a ‘stemming

and overstemming

words

be relied

obviously

a sample

retrieval

[1].

because

the fact that stemmers

committed

the computation

natural

here

index

and counting

samples

derived

for each stemmer,

rates and the general

conceptual

of words

into

entirely

to make objective

on an assumption

is that

‘reality’,

do not fall

upon

relies

assumption

base the evaluation

the

groups,

as

accuracy.

The

referring

the

groups,

and

and

to these groups.

that

agreement

errors

dividing

cannot

of stemming

approximate.

to stemmer

the under-

performance

be objected

that anyway practice

an approach

and overstemming

this

searches

systems - for example, they may be used in a natural language interface, program. IR-basect evaluations are irrelevant to such applications.

the ach~al under-

and

types of search; for instance, it might be supposed that heavy stemming compression recall is needed. Lennon et d. used the degree of dictionary

careful

of semantic

human

clear-cut

assignments.

judgment

semantic

In fact of course,

grouping, can

even

produce

taken from

a natural

on all the individual

word

source, we encounter

tokens

in the sample,

if it is somewhat

groups

and that these can be used for evaluation

the very are in

which

purposes.

the question

or on only

of whether

the distinct

In the first case, we would determine the frequencies of all the word types in computing the performance indices. This is perfectly straightforward,

to

word

and take these but it is found

account

that the results then obtained tend to be dominated by the way the stemmer handles a quite small number of high frequency word groups - for example, common verbs such as “be”/ “being” /“been”, etc. This is unfortunate since in practice common and irregular forms are often “do” /“doing’’/’’done” handled rare

by

and

lexical

lookup

anyway.

unpredictable

frequencies,

and

Stemmers In

occurrence.

uses

word

types

are

view

rather

mainly

of

than

this,

tokens.

required

our An

to deal

evaluation

incidental

with

words

method benefit

of relatively

ignores

is that

there

occurrence is no need

remove syntactic function words - nor to worry about what should be included in the stoplist it can be shown that inclusion or exclusion of these words has very little effect on the results.

Computation Suppose

we have

containing

forms

each concept

group

of pairs

This is given

Indices

a sample

which

of W

are both

different

semantically

stemmer

of different

should

words

merge

word

which

given

by

a perfect

of words stemmer

shoLkl

contains

these two

a 0.5 factor

member

defines

‘concept

related

of a concept

the ‘desired

and

= 0.5

groups’

each

to one another.

For

group

merge

total’

with

every

~M~g

other, for

that

ng(ng -1)

not merge any member

Thus,

for

DAITg By summing

into

in the group.

is not in the group.

w[crgc total G DMT

partitioned

morphologically

by

ng is the number

Secondly,

every

in the group

DMTg where

words, and

two totals may be computed.

since a perfect

Firstly,

the number group.

of Performance

to

- since

totals

over

the glohd

to compensate

every group

there

is a ‘desired

concept non-merge

group

with

total’

any

~~~g

= 0.5TZg(VV - IZg)

all groups

in the word

desired nownerge

for double

of the present

counting

total

sample,

G DNT

of pairs during

we obtain respectively.

the summation.

the @obal Each

desired

equation

44 After

applying

contain

a stemmer

two or more distinct

to the sample,

suppose

a concept

group

of size

instances

of these

stems

are

‘unachieved

merge

ng containss is given

distinct

this

quantity

understemming After

index

stemming,

groups.

over

we

also

The procedure

stem group overstemming

to find

a stem

group

and suppose

The number of WMTS, given

which

errors

errors

for the group

(the

G LIMT;

unachieved merge total

two

~~

or more

items

which

of representatives

different

concept

are derived

from

of these concept

for this stem group

is represented

groups

contains

f different

concept

groups

vl,

are

quantity

the overstemming It is clear a measure

over

index

that

the

all stem groups

01 is now given

for a heavy

situation

of

total’

will

will

be rather

ratio

we have

and overstemrning

01

and

values.

but worse And

no precise tendencies,

It will

in terms

01.

does the question

the difference does

make

word

sample

In order judged, letters

is quite about

we refer

and

~1

quantities

low, may

whereas

for

therefore

a light

be taken

as

for the relationship

occur

small

framework

that one stemmer

will

to judge

Regarding general

be better

whether

UI

in

accuracy

the understemming

to assess particular

than another

one is better

the second

and the difference

the relative

between in which

question

in terms another

of

is large,

LH,

overall?

we may observe

01

and/or

of the two

than

U1

that,

if

then it probably

stemmers,

at least

for the

consideration.

to obtain

(words

to account

even have meaning?

in weight under

theory

So is there any way

sense to talk

two

.

we lack any satisfactory

commonly

of

high

of these

SW:

stwur/i}2g weight

the

The

G WMT;

total

GWMT/ GDNT.

by the ratio

01

stemmer

be reversed.

{1}

the global wrongly-merged

we obtain

Sw = ol/ul Because

vt.

v.2, . . .

by the ‘wrongly-merged

by:

this

stemmer

the

GUMT\GDMT.

WMTs = 0.5 ~ i=l .t~i(~ts - vi) Summing

of

cases where the same stem occurs in two or more concept all cases of a particular stem into a ‘stem group’; now any

contains

that the numbers

overstemming

still Thus,

and that the numbers

of understemming

the global

b y the ratio

whose members are derived from errors which need to be counted.

Consider groups,

expect

of the groups

..~ui(ng - Ui)

we obtain

here is to gather

some

errors to be counted.

by

all groups

L.11is now given

that

stems after stemming,

Z.IMTg = 0.5 ~i=l Summing

to find

there are understemrning

The number

Z41, U2, .. . us.

LIMTg)

total’

we are likely

stems. In such groups

some kind

to the process

shorter

than

of baseline of length

~ being

left

against

which

tt’wmztioft

the general

accuracy

- that is, reducing

unchanged).

Length

every

truncation

of a stemmer word

may be

to just its first

is the crudest

method

q of

stemming, and we would obviously expect any rule-based or table-based stemmer to do better. Note however that length truncation refers to not just one but a series of stemmers, each with a different value

of ~; for IR purposes

truncation

lengths

The idea here is that if we determine

(LII,O 1) reasonable

coordinates stemmer

general, the further to be. Specifically, or

ERRT,

it intersects

define will

values

a tnuxation

give

the truncation

a ( LJ1, O1)

by extending

of

UI

li}ze against

away the point is from a performance measure

can be obtained

of 5,6 and 7 seem to be the most useful.

point

and

01

which

for a series of truncation any

between

stemmer

the truncation

lengths,

can be assessed. line

and

the Any

the origin;

in

the truncation line, the better the stemmer can be said which we may call the error rate relative to truncation,

a line from

line at T, as illustrated

the origin in Figure

ERRT = length(OP)/length(

1.

O through

ERRT OT)

.

the ( UI,OI)

is then simply

point defined

P until as

45

01

o

UI

Figure 1: computation

Experimental A suite

conversion

.-

computation

--

application values

of a truncation

LII,OI

of

stage where

program

makes

grouped

though

A question

should

partly

a matter

the alphabetic

7’ value

indices.

The

file; file;

algorithms

to the words

for each stemmer

human

effort

is involved

decisions

in the grouped

file,

its ( LII,OI)

by comparing

Adjacent

file thus

groups

produced

a second before

and

perhaps

once it is finished arises

only

over

what

even

to be correct.

third

the grouped to do about

‘barriers’

to obtain

point

to the

into

For one thing,

needs

The grouping the source.

cases refers

The to the

the file. some of the ‘obvious’

to be performed,

The grouping

file represents irregular

from

For another, alphabetic ordering may “read”, “readily”, “reading”, “readjust”, some of the more difficult groupings.

scan of the file

be satisfactory.

words

but for all uncertain

by inserting

in fact be wrong. groups (consider time to reconsider will

of the groups.

of all the distinct

are separated is not likely

the grouping

is in the construction

display

on the user’s behalf,

by the program may up certain conceptual the user may require

and “flew” should “went”? Stemming penalise it for not words

of the above

SW;

all the obvious

decisions taken sometimes split “reads”). Thirdly,

process,

stemming

to the user an alphabetic

user for the decision.

editor,

into a grouped

line for the grouped

of the ERR

presents

Consequently,

computation

line.

program

standard

and

to permit

parts:

of one or more

truncation The only

was written

into four

of a source text sample

computation

The

programs

falls broadly

.-

--

value

Arrangements

of computer

processing

of ERRT

verbs.

is thus

a permanent

a rather

resource

It seems natural

for future

and proper

using

a

laborious use. that

“fly”

be placed in the same group, but what about “is” and “were”, and “go” and relies on morphological regularities and similarities, so it seems wrong to merging totally dissimilar forms. In the event, a rule-of-thumb was used that be grouped

of convenience,

together

list. The rule obviously

but “buy” and “bought” it has only a tiny effect

if at least the first

since it is awkward

to bring

leads to anomalies

are not - but because the indices on the values of the indices.

two

letters

together - e.g., “bring”

were

the same.

This

was

words which are far apart in and “brought” are grouped,

are based on word

types rather

During the grouping process, the user is presented with individual, isolated given, and so there is no chance to allow for the different meanings of ambiguous

than tokens

words; no context words. In making

is a

46 grouping

decision,

underlying This

the user

concept, still

given

leaves

is in effect

a knowledge

the question

deciding

whether

of the general

whether

two

two

domain

words

which

typically

words

refer

to the same

of the source material. refer

to related

but not quite

identical

concepts should be counted as equivalent. It may be that taking a ‘strict’ view of semantic equivalence will give materially different results than taking a ‘loose’ view. To investigate this using two kinds of inter-group point, groups were actually defined at two levels of ‘tightness’, barrier.

First,

weakly

related

there

is a level

of possibly

to one another.

subgroups, each containing words means that for each stemmer which performance indices are generated. Some examples have been placed abstract

is

large,

Secondly,

which refer is evaluated

of the two-level grouping in individual subgroups

related

to

abstractness,

abstracting was assigned library science.

loose groups,

any loose group

to more-or-less against a given

other

subgroup

to

addition ) add, adds,

abstracts.

the domain

( additional, additionally adding, added, additive

During

explicated,

2:

examples

explication,

the

grouping,

was known

to be

)

) )

explications

)

)

( framework, frameworks

of two-level

and addition one sense of

)

( frame, frames, framing, framed ) Figure

tight

This approach separate sets of

abstract Thus,

of the source

( authur, author’s, authors, authorship ) .. ( authoritative ) ( authoriiy, authorities ( authoritarian ) authorized, authorization ) ( ( costly ) ( cost, costing, costed, costs) ( devise, devising, devised ) ( device, devices ) ( elementary ) ( element, elements, elemental ) explicate, explicates, explicit, explicitly )

may be quite

two or more

)

( alter, alters, altered, alterations ) ( alternate, alternately, alternating, alternations ( alternative, alternatives, alternatively ) ( appropriations ( appropriate, appropriately )

( (

which into

identical concepts. word sample, two

literature

because

abstract ) ( abstraction, abstractly ) abstracts, abstracting, abstracted, abstracters

( ( ( (

words

are shown in Figure 2. The words because they are both ambiguous.

the

to the latter

containing

may be subdivided

concept

)

groups

Words enclosed within parentheses are grouped tightly together. A horizontal line is a major barrier between adjacent loose groups. (All

Performance

rule

tables)

evaluations

are fully

and the Paice/Husk values

are actual

were

described stemmer

were also obtained

examples

carried

out

taken

for

in the IR literature: [3]. To provide

for simple

truncation

from

three

CISI source.)

stemmers

the Lovins

a baseline

whose

stemmer

for computing

using truncation

lengths

details

(including

[7], the Porter values

of ERRT,

of 4,5,6,

specific

stemmer

[8]

LII and 01

7 and 8.

A sample of words was obtained by processing all of the titles and abstracts in the CISI test collection, which is concerned with Library and Information Science. This source contained a total of 184,659 words, reduced to 9,757 after deletion of duplicates. Runs were also carried out using two smaller word samples: 1,527 distinct words (derived from 8,947 source words) from a textbook excerpt concerned with computer storage devices, and 3,559 distinct words (from 32,098 source words) from the texts of 14 papers on agriculture.

47 In order were

prepared

values

from

the influence

of sample

the CISI text source,

of 2, 4, 8 and

contained CISI

to investigate

16, and

then

size on the performance

by taking

preparing

every

grouped

7,304, 5,395, 3804 and 2,654 word

types

nth

files

line from as usual.

respectively,

indices,

four

complete

collection,

The resulting

compared

subsamples

with

with

word

n

samples

9,757 for the full

vocabulary.

Results Although

the values

patterns of values each

sample.

corresponding vary the

loose

were

markedly

different

the

values

obtained

for the and loose levels

we would

than

for tight

grouping

level.

since

----

results

expect.

for will

and this in turn

ER R T

The

obtained

Understemming

grouping,

loose

morphological

for the different

the same in all cases, so that similar

1 shows

understandable systematic

much

Table

in the ways

less, for loose

of the indices

were

semantic

values

CISI

sample. we find

obviously

be greater,

larger

with

are naturally

we

that

for

compare

the

all four

SW

grouping;

reflected

indices

values

this

in weaker

and

is less

similarities.

tight

grouping

----

----

----

SW

0.062

0.000814

0.013127

------

0.099

0.000706

0.007155

------

trunc(5)

0.176

0.000262

0.001487

------

0.258

0.000183

0.000710

------

trunc(6)

0.337

0.000073

0.000218

......

0.442

0.000022

0.000050

-.. -..

trunc(7)

0.527

0.000028

0.000054

------

0.633

0.000002

0.000004

......

trunc(8)

0.700 0.000012 0.000017 ------

0.780 0.000000 0.000000 ------

Lovins

0.326

0.000063

0.000193

0.92

0.459

0.000020

0.000044

1.00

Paice/Husk

0.121

0.000118

0.000978

0.55

0.257

0.000051

0.000197

0.67

Porter

0.374

0.000028

0.000074

0.76

0.542

0.000004

0.000007

0.88

performance

indices

01

ERRT

trunc(4)

Stemming

U/

grouping

SW

1:

ERRT

loose

01

find

for

too

UI

Table

the

be drawn

and overstemming

smaller

loose

samples,

could If

of grouping,

causes considerably

are

relationships

the

word

conclusions

for the CISI word

sample

the tight and loose sections of Table 1, we can If now we compare the pflttem of values within no marked differences, and this suggests that the properties and validity of the indices are not

strongly

affected

by the level of grouping

Comparing the three summarised as follows:

stemmers

- provided

with

Zll(Porter)

one another,

> ~~(Lovins)

O1(Paice/Husk) S ~(Paice/Husk) ERRT(Lovins)

presumably the relative

S W(Lovins)

> ERRT(Porter)

values

strategy

of the four

indices

is used. maybe

> LH(Paice/Husk)

> O~(Lovins) >

that a consistent

> O1(Porter) >

S W(Porter)

> ERRT(Paice/Husk)

Although marginal,

the magnitudes of the differences varied a good deal, and in a couple of cases were only the above inequalities actually held for all of the word samples tested. In terms of the

~1 index,

Lovins

was noticeably

closer to Porter

than to Paice/Husk.

48 If we take

ERRT

as a general

indicator

of performance

accuracy,

we would

have

to conclude

that Paice/Husk is a better stemmer than Porter, which is in turn better than Lovins. However, the differences in stemming weight between Paice/Husk and Porter are so great that it is probably meaningless to compare their accuracy: Paice/Husk is a heavy stemmer and Porter a light stemmer, and presumably It is helpful level.

This

not

also casts light the origin;

01.

this

However,

each is suited to look only

to a different

at Figure

3, which

highlights

the great

on the performance is as expected, a line

joining

task. plots

of Lovins.

given

Notice

the generally

the performance

concave towards the origin, suggesting other stemmers. This relationship holds samples tested .

01

~~ vs.

difference

for the CISI sample

in weight first

relationship

for Paice/Husk,

at the tight

Paice/Husk

that the truncation

inverse

points

between

and

grouping Porter,

line is convex

between Lovins

between

and Porter

it

towards L.H and is clearly

that Lovins is genuinely less accurate than either of the for both tight and loose levels of grouping for all the word

300 xlo

trunc(5)

-6

250

200

150

100

50

I

1

I

0.1

0.2

1

0.3

+

1

0:4

0:5

UI

Figure 3: UI x 01 plot for CISI sample (tight grouping)

Lennon

et al. represented

the weights

of their

stemmers

by the dictionary

compression

each

could achieve. Their results for Lovins and Porter are compared with ours in Table 2. Our results from the CISI sample are closest to theirs from the Brown linguistic corpus; oddly, this appeared to be the least similar

source

to ours.

However,

our ratio

for the compression

by Lovins

compared

to

49 Porter

was

1.14, whilst

clearly

that Porter

theirs

were

is a lighter

all in the range

stemmer

than

1.13 to 1.18. Our

Lovins,

and also that Paice/Husk

--- this work ---

-–

-–

tight groups

5,101

47.7

-–

-–

-–

-–

loose groups

4,350

55.4

---

-–

-–

-–

Lovins stems

5,409

44.6

45.8

39.2

39.5

30.9

Paice/Husk

4,755

51.3

-–

-–

-–

-–

5,964

38.9

38.8

34.6

33.8

26.2

turn

of the concept

that

groups,

Porter

Dictionary

reach

compression numbers

other

are percentage

values

of sample

is fairly

a minimum

recall

at about

can be expected that

the global

the average

as

likely that as the source related singleton words.

U1, representing

size. 01 however

sample

words,

grows,

The global

of the word

G WMT

or no tendency

the fresh

structure

SW) falls

(and consequently

tendency: whereas

the values Paice/Husk

for Lovins decreases

less fast, and

will

non-merge

this

with

W; this is plausible

increasingly

tend

total

therefore

in turn

implies

to W. This can be explained

to increase

introductions

desired

size W. We may

sample

increases

{1} is less than proportional little

the internal

10,000 words.

G WMT/GDNT.

total

of formula

size shows

that

rates.

a less consistent

on the square

wrongly-merged

value

stem-group

to depend

5,000

off around

01 is defined

that

compression

to sample

ERRT shows

but shows signs of levelling

may

of items;

size. It appears

insensitive

size increases.

heavier.

by three stemmers.

n represents

to the effect

as sample

throughout,

average

2:

confirm

Cranfield

---

off sharply

GDNT

Inspec

---

We now

infer

N PL

100.0

Table

We

Brown

9,757

Porter stems

and

is much

sample words

stems

values

--- Lennon et al. 1981 ---

Clsl

n

compression

if the

since it is

to be non-domain-

Findings The general ---

results

The specific

of our experiments values

For a particular values

maybe

of the performance

word

source,

of 01 and S W fall

summarised indices

the values

off sharply

vary

as follows: markedly

of U1 are fairly

as sample

depending

insensitive

size increases.

on the source to sample

ERRT

values

text.

size, whereas show

modest

fluctuations. .-

In terms

of the stemming

SW,

Porter

is a light

between

Porter

and Paice/Husk

weight

stemmer

and

Paice/Husk

a heavy

stemmer. --

The difference meaningless

in the SW value to compare

--

The Lovins

stemmer

--

The choice

of grouping

strategy

their

is so great

that it is probably

performance.

seems to be generally

less accurate

level does not appear

than either

to be a critical

matter,

of the other provided

two stemmers. that a consistent

is used in each case.

The lightness of the Porter stemmer is in agreement with earlier significantly lighter than Lovins and several other algorithms [1].

findings

that

Porter

is

50

Final

Comments

The author

is well

aware

of various

doubts

and problems

with

the methods

described

in this

paper. Further work is clearly needed to explore the validity of the approach and to make the programs more useful. One area of difficulty concerns the subjective and fuzzy nature of the grouping operations, and it would he valuabl~ to hav~ som~ objective evidence to assist in this activity. Rather

than

tagged

words

would

appear

word

basing could

the grouping

on a display

be presented

instead.

to be of greater

type might

be assigned

value

[9]. Use of tagged

to different

Use of these stemmer evaluation limited to comparing the performance optimizing

the rule-tables

depend on the nature algorithm stem a little of the contextual With discover

words

groups

it might

which

text would

of course

on different

sk, the emphasis

would

use. The types of changes

be on reducing

some of the more

to compute

considered

the number

rules.

trOUblWOIIW

cause of overstemming

individual

be interesting the

and

dictionary

Acknowledgements.

indices

tags

a particular

would

of course

to make Porter’s or relaxing some

of

errors.

method the

of overstemming

In this case, it would This could

for each different

this evaluation

significance

errors

by

be desirable

to

be done by modifying

the

used to remove it) so that this are being counted. It should then

ending.

to dictionary-based

performance

gap

conflation

between

in order

a well-optimised

operation.

I should

work

error

to apply

extent

and a full

invaluable

that

tools (whether in the existing or an enhanced form) is not of existing off-the-shelf stemmers: they can also be used for

the stemmers

rules are a serious

to investigate stemmer

mean

occasions.

stemmer to keep a note of each removed ending (or the rule information is available when the under- and overstemming errors

It would

if grammatically

be of some use, but semantic

of the stemmer in question. For example, we might try more heavily by adding additional rules or by modifying

or modifying

be possible

be better

tags might

constraints.

Paice/Hu

retracting

which

of isolated

Part-of-speech

like to thank

in cieveloping

various

Gareth

Husk,

Chris

Danson

and Helen

Simpson

for their

parts of the software.

References 1. Lennon, M., Pierce, D. S., Tarry, B. D., Willett, JoIirwd of I}zforIHathH information retrieval. 2. Frakes, 198~.

W.B.

3. Paice, C.D. 4. Hafer,

Trrw

Another

M.A.

Storage

CoHj7afiott

fl}ui

and

for

Weiss,

I}I~orwIation

Retrimnl.

Ph.D.

thesis,

Syracuse

algorithms

University,

for

NY,

SIGIR for’?~m 1990; 24,56-61.

stemmer.

Retrleml

P. An evaluation of some conflation SCimcr 1981; 3{ 177-183.

S.F.

Word

segmentation

by

letter

successor

varieties.

lnforvvrtion

1974; 10, 371-385.

5. Landauer, C. and Mah, C. Message extraction through estimation of relevance. In: R. N.Oddy d. (Ects.), Irrformtiotl Retmxd Resem’ch. Lomion: Butterworths, London, 1981, pp.117-138. 6. Harman, D. How 1991; 42, 7-15. 7. Lovins,

J.B.

Linguistics 8. Porter,

M.F.

9. Wilson,

progress.

effective

Development

is suffixing?

~o?trml

of a stemming

of the Awericm

Society for

l}lforvlfltiotl

Met)/a}~id

Trmdafio}z

a)zd Coqmtatiomd

algorithm.

et

Science

1968; 11, 22-31. An

A. and

algorithm Rayson,

In: Souter, Amster~iam & Atlanta

for P. The

suffix

Program

stripping.

automatic

C. and Atwell, GA, 1993.

content

A.,

analysis

Corpus-based

1980; 14, 130-137. of spoken

discourse:

Covyri~hztiowd

a report

Linguistics.

on work

Rodopi,

in