Database Recovery Using Redundant Disk Arrays - Amazon Web ...

Report 1 Downloads 13 Views
SUBMITTED

TO:

EIGHTH

Database

INTERRATIONAL

Recovery

CONFERENCE

Using

N. Mourad Kent

Daniel

Center

for Reliable and Coordinated

DATA

ENGINEERING

Redundant

Antoine W.

ON

Disk

Arrays*

t

Fuchs G. Saab

High-Performance Science Laboratory

Computing

1101 W. Springfield Ave University of Illinois Urbana,

Illinois

June

61801.

6, 1991

Abstract Redundant ures

with

disk

a relatively

availability. support role

arrays

In rapid

in the Using

significant

increase the

array

for achieving

for large a method

crashes

recovery. so that

an analytical number

cost

system

failure

in the

a way

we propose

from

media

degraded.

storage

paper

recovery

ity information

by reducing the database.

low

this

in providing

provide

the

model,

throughput of recovery

and A twin

time we

for

show

of database operations

rapid

scale

database

for

using

page

systems aborts

scheme

transaction the

needed

high

arrays the

processing method the

to

to their

to store

redundant

to maintain

fail-

requiring

in addition

commit using

media

disk

is used

proposed

systems

from

redundant

transaction

that

recovery

paris not

achieves disk

consistency

a

arrays of

N92-257,_9 !_,[,y_'r_

A._T

_I_,K

A_AY,S

(illinois

Univ

)

D UriC] ::_s S 31 _,'?

"This Contract Research tPh.

0OV3_0

research was supported in part by the National Aeronautics and Space Administration (NASA) under NAG 1-613 and in part by the Department of the Navy and managed by the Office of the Chief of Naval under Gra_at N00014-91-:I-1283. (217) 244-7180,

Fax (217) 244-5686, emaih

mouradQcrhc.uiuc.edu

1

Introduction

In a database state

after

which

be

the

state.

in main

memory

due

type

by periodically

all updates

For large

media

subsystem recovery

design

In this recovery

the

from

that

significant 50

than

25 days 1. Mirrored

disk

mirroring

still

mean

incurs

Array

However,

be necessary

the

and

even

when

to protect

the

the

last

copy

have

to

of the

or RDAs

rapid

is prohibitive an alternative

are used,

operator

high.

permanent

to provide

which

apply

In such

is quite

[2, 3] provide

against

a media

is used

employed

overhead

mirroring

database

When

recovery

is

database

generated.

(MTTF)

organizations

of failure

to the

file

was

performed

type

log file.

for

been

storage

this

log

database

crash.

updates

and

disks

maintained

to the

of the

with

overhead

to their

modifications

a redo

to failure

disk

the

into

time

a 100%

(R.DA)

to deal

a transaction

tables

made

time

aborts

transaction

system

by logging

the

When

all updates

copy

after

time

be less

storage.

disks,

Disk

and

committed down

way

last

case

to a consistent

transaction

by the

at the

copies

the

are

to REDO

database

archive

typical

modified

and

database

over

we present

transaction

by using old version

1Assumin$

of the

database

initiated.

In this

common

the

errors

archiving or system

errors.

paper,

from

is achieved

may

One

between

Redundant

reliable

logging

copies

crash.

in the

failure.

user

pages

occurred

with

However,

applications.

redo

software

can [1].

for maintaining

keep

e.g.,

be

most

has to UNDO

crash

is reconstructed

causes

or can

is a system

reflected

by transactions

failure

systems,

for many

and

database

The

all database

the

transactions

performed

a media

not yet

for restoring

occur.

mechanism

when

archive

by committed the

recovery

is media

generating

occurs

storage

and

can

deadlocks,

of failure

active

of failure

of failures

errors,

type

were

be necessary

has to restore

The

transactions

Another

a case,

second

that

by complete

failure

manager

are lost.

may

types

to program

The

by transactions

recovery

Several

recovery

previous

performed

rapid

a failure.

can

aborts,

system,

a twin of the

a technique

and

system

page parity

that failures

scheme along

exploits in addition

for storing with

the

the new

an MTTF of 30,000 hours for each disk.

2

the

redundancy

in disk

to providing parity

version.

fast

information The

old

version

arrays

media making of the

to

support

recovery.

This

it possible parity

to

is used

to undo

updates

performed

by aborted

transactions

or by transactions

interrupted

by a system

failure. In Sections

2 and 3 we briefly

tems and discuss The results

two RDA

review several

organizations.

of our performance

analysis

techniques

In Section are detailed

for transaction

4, we present in Section

recovery

in database

our database

5. Section

recovery

6 presents

sys-

scheme.

some

conclu-

sions.

2

Recovery

Recovery

Techniques

algorithms

[4], before

typically

a new version

use some form of logging

(a_ter-irnage)

the old version

(before-image)

system

the log file is analyzed

crashes,

approach

the update

containing written

maintaining Another

respect

and in analyzing

algorithms

introduced

to the following

Propagation of updated propagation

log file.

is restored.

of the committing

approach

is dynamic

effect which decreases

a copy of

aborts

mapping during

[7]. They classify

pages

have been

since it requires normal

the sequentiality

or the

In the shadowing

transaction

we will use the following

and Reuter

approach

page on disk [5, 6]. The physical

which leads to high I/O overhead

our method,

processing.

of disk accesses.

taxonomy recovery

of database

algorithms

with

four concepts: The propagation

can be propagated

of updates

to the database,

If a transaction

of the database

after all updates

by Ha_rder

2 of updates.

pages

the state

with the shadowing

is the disk scrambling

In describing

into a sequential

and

are released

a very large page table

problem

recovery

is placed

One problem

In the logging

of a record or page is written

of a page is placed into a new physical

the old versions

to disk.

or shadowing.

strategy

to the database

can be interrupted

can be ATOMIC

in one atomic

by a system

crash and

action.

in which case any set In the -,ATOMIC

database

case,

pages are updated-in-

place. Page

replacement.

2Propagation written

to disk

to

the

without

Two policies database

being

means

propagated

that (e.g.,

can be used:

the STEAL

the

is visible

new

version

shadowing).

policy

to higher

allows

level

pages

software.

modified Updates

can

by be

uncommitted opposite

transactions

to be propagated

policy is referred to as -_STEAL.

EOT

processing.

a transaction

Two categories

to be propagated

Cheekpointing order to minimize Transaction

Oriented

transaction.

This is equivalent

of checkpoints cent periods less restrictive

3

3.1

exist:

of REDO

where no transactions

recovery

Disk

actions

scheme,

Consistent

discipline Checkpoints

statements

with a -_STEAL

Action

policy.

is called

-_FORCE.

updates

is generated

(TCC)

in

In the

at the end of each Two other types

are generated

Consistent

by

to the database

after a crash.

in EOT-processing.

are processed

the

all pages modified

to be performed

a checkpoint

are being processed,

and require that no update

discipline

(EOT);

requires

is used to propagate

to using the FORCE

can be used: Transaction

end-of-transaction

discipline

the opposite

(TOC)

before

recovery is necessary

the FORCE

Cheekpointing

Checkpointing

Redundant

Data

No UNDO

before EOT;

Schemes. the number

to the database

Checkpoints

during checkpoint

during

quies-

(ACC)

are

generation.

Arrays

Striping

Striped disk arrayshave been proposed and implemented forincreasingthe transferbandwidth in high performance I/O subsystems [8,9, 10].In order to allow the use of a largenumber of disksin such arrays without compromising the reliability of the I/O subsystem, redundancy is sometimes includedin the form of parityinformation[3,10].Pattersonet al.[3]have presentedseveralpossible organizationsfor Redundant Arrays of InexpensiveDisks (RAID). One interestingorganizationis RAID

with rotated parityin which blocksof data are interleavedacrossN disks while the parity

of the N blocksiswrittenon the N + 1_t disk.The parityis rotatedover the setof disksin order to avoid contentionon the paritydisk.Figure 1 shows the array organizationwith four disks.The organizationallowsboth large (fullstripe)concurrent accessesor small (individualdisk)accesses. In thispaper, we concentrateon small read/writeaccesses.For a small write access,the data block is read from the relevantdisk and modified. To compute the new parity,the old parity has to be read,XORed

with the new data and XORed

with the old data. Then the new data and new parity

can be written back to the correspondingdisks.Stonebraker et al.[11]have advocated the use of 4

D2

D1 D4

D3

D7 D10

D9 Figure

1: RAID

with rotated

parity

D00

D20 D21

D10 Dll

Figure

a RAID

3.2

Parity

Gray

to provide

systems.

number

Hence,

of disk arrays.

in database

ways of using an architecture

They found

of small

that because

accesses,

the organization

systems.

shown

in Figure

an area

without

interleaving.

For a group

on each disk is reserved disks are grouped

for parity

2 was proposed. on each disk and

for parity and in a parity

in on-line

requests

to have several

of N + 1 disks,

together

such as RAID

of the nature of I/O

it is not convenient

of reserving

different

striping

high availability

consists

areas

2: Parity

Striping

et al. I2] studied

(OLTP) large

organization

on four disks.

in OLTP

disks servicing

It is referred writing

data

each disk is divided

the other

areas

contsdn

group and their

transaction

parity

processing

systems,

namely

a

the same request.

to as parity

striping.

It

sequentially

on each disk

into N + 1 areas

one of these

data.

N data areas

from

N

is written

on the parity

area

that

is a collection

of redundant

disk

striping

or data

of the N + 1_t disk.

4

KDA-Based

In the remainder arrays.

The

rotated

parity).

Recovery of this paper,

organization

we consider

of the

In the case of data

arrays

an I/O subsystem

being

striping

either

we assume 5

parity that

a large

striping

striping

(RAID

with

unit is used in order

to

ensure that I/O assumptions:

requests

Communication

fixed size pages; A STEAL

4.1

will typically

Database

policy

ttDA-based

between

allowing

Description

recovery makes

performed

by aborted

performed

by an aborted

modified

before

the transaction

has been stolen from the buffer). page per parity

group

UNDO

If additional

logging.

back to the database to the clean the state parity

state

transition

groups

that

when

the number

of the parity

a page

page number updated

When

page

it commits Otherwise

a transaction

version

(using

within

that

are dealt

has been written of Haerder

is referred caused parity.

page

page

A parity

group

modified

and Reuter,

the page

transactions

data

without

and need to be written

A dirty dirty

back to the database

parity

group

commits. contains

Figure

goes back 3 shows

the numbers

to as the Dirty_Set.

of all

It also contains

the group to be in the dirty

state

and

Only log N bits need to be used to store

number.

can be written that

first.

it to become

that

the updated

and one bit for the parity

have been

A table in main memory

This table

unless

Only one modified

by uncommitted

be logged

caused

the group

a page,

In the following,

a page parity group.

the notation

must

group.

transaction

page.

the parity group is called clean.

before-images

of a parity

updates

using the parity

group is dirty when one of its data pages has

back to the database

page holding

by an active

EOT.

by itself to undo all updates

be undone

the same parity

A parity

and the modified

are in the dirty state.

of the data

is _A TOMIC;

recovery schemes.

the transaction

diagram

propagation

using

present in the disk arrays to undo updates

that cannot

pages in the parity group

then their

is performed

before

the parity is not sumcient

Updates

can be written

the number

the data

However,

clean or dirty.

modifying

that

to be propagated

we will use the term parity group to denote

by a transaction

subsystem

Approach

transaction.

can be in one of two states: been

pages

group is the set of pages that share

there is ambiguity,

and the I/O

use of the parity information

transactions.

disk. We also make the following

in place which implies

modified

of the

with using one of the traditional A page parity

by a single data

main memory

pages are updated

is used thus

General

be serviced

The

table

is used to check

back to disk without

can be written

back

UNDO

to the

whether

logging.

database

without

Transaction written

back

T modifies

page D_ and D_ is

to the database

before

EOT T rereferences modifies

D_,

it and

Di

back

to

is written the database before

EOT

TransactionT commits

Figure 3: State transitiondiagram UNDO

of a page paritygroup.

loggingifitsparitygroup is clean or ifitsparitygroup is dirty and the update is for the

same page that caused the group to move intothe dirtystate,i.e., the same page has been updated, stolenfrom the bufferthen rereferencedby the same transaction,updated and stolenagain from the bufferbeforeEOT 3. Note that thisdoes not affectthe degree of concurrency or interfere with the lockingpolicyused in the system. We do not specifywhen a transactioncan or cannot modify a page. We only specifywhen a modified page can be writtenback to disk without UNDO

logging.

Ifa singleparitypage isused, then when a group becomes dirtythe old parityinformationhas to be kept in the paritypage to be able to recoverin case of a transactionfailure.That would mean that when the transactioncommits, the new parityhas to be recomputed in order to update the paritypage. That would requirereading allthe data pages in the group in order to compute the new parity.To avoid that problem a twin page scheme is used forthe paritypages. The basic mechanism of the twin page scheme isas follows:one of the paritypages always containsthe valid parityof the group while the other page containsobsoleteparityinformation.When

a data page is

modifiedin a paritygroup,the obsoleteparitypage (P forexample) isupdated with the new parity of the array.Ifthe transactionperforming the update commits then the modified paritypage (P) becomes the validparitypage otherwisethe other paritypage (P_) remains the validparitypage and itscontents are used to recoverthe data page that was modified by the failedtransaction. Figures4 and 5 show the data stripingorganizationand the paritystripingorganizationwhen the a Normally such an event should not occur often since buffer management a page that will be referenced again in the near future.

7

algorithms are not supposed to replace

DO D3 D6 iii!!!iiiiiiiiiiiiiiiiii i iiiiii ii iii i i i iiii! i Figure

4: Data striping

organization

DO0

DIO

DO1

Dll

[°5 t

D7

Dll

D10 with the twin page

scheme

for the parity.

D20 D21

P30 1"31

Figure

twin page striping shows

scheme

organization

is used for the parity.

the contents

with the twin page scheme

Twin

of a parity

group including

of a data page after a transaction and

the new data page:

one of its data

pages

to disk,

UNDO

updated

since when

the

striping

parity

pages

case and Pzy and Pzy t, with z -- (x q- 1)mod(N

version pages

5: Parity

actual

uncommitted

parity data

must

stolen

the group is dirty

of the data page

to disk the corresponding

the twin parity it is sufficient

for Dj 4 then

it is necessary

on disk and an "old"

Di in case of a transaction parity

4The before-image of the page must be written to a log file.

page(s)

pages.

When

must

in the case of page

abort.

be updated

logging

8

both

page

striping

case. Figure

In order to recover

a parity

Ds needs

pages

that

would

In all cases,

parity because

to be written

P and P_ need

a current

parity

page

be used

when

writing

to be

reflecting

to recover a data

the page

first.

or of the modified

record(s)

6

the old

of both

group is dirty

page

parity

to maintain parity

Px t in the data

to XOR the contents

from the buffer and another

be performed

Px and

q- 2), in the parity

Dol d -- (P (_ p0) @ Dnew.

Di has been

logging

abort

are denoted

for the parity.

in the case of record

logging

Do

D1

DN-1

P

P_

oooooo.°.o_oo°

Figure 6: The contents

4.2

Twin

Page

The twin parity transaction contains

recovery

following

the highest

timestalnp

transaction

or system

and

a bit map

contains

the valid parity the timestamp 7 selects

parity page is written

can be maintMned groups

following

parity

algorithm

that destroys

parity

obsolete been

the current

back

page, parity

of the

idle periods twin parity

[12]. A parity

updated

it has aborted.

by an active Figure

pages

transaction,

8 shows the state

When

may

both parity

not survive

when it contains parity information.

is computed parity

diagram

pages,

page for each crash.

will have to be used to

states:

parity

a system

crash

to reconstruct

page

P is

is not available

a background

process

the bit map.

committed,

the last committed

obsolete, parity

working update.

It is in the working state

and it is in the invalid state if the last transaction transition

Algorithm

a system

parity page or the information

can be initiated

after a

In this case, two bits would have

to code the three possible

Following

with

page is updated

Then the parity

Current._Parity

the bit map.

The page

to 0.

a data

reading

pages

is undone

is reset

which is the current

can be in one of four states:

old committed

page

In order to avoid

algorithm

has to be used. of the system

an update

page.

such a bit map

page pi is the current

page is committed

when it contains

group

When parity

parity

indicating

the map,

which of the twin parity

for modification.

to disk.

However

in order to be able to perform

in the page header.

information.

page and to reconstruct

Current_Parity

that runs during

invalid

is stored

of the current

in main memory

in the database.

a crash

the current

the current

Each

In order to identify

a timestamap

to be used in the bit map for each parity

and

This is necessary

pages are read and one of them is selected

of the parity

identify

information,

failure,

disks.

a disk failure.

shown in Figure

the modified

Hence

are stored on different

the valid parity

both parity

group.

Management

pages

Current__Parity

of a page parity

of the twin parity

pages.

or It is

when it has updating

Current_Parity(pg) begin Read twin parity pages in parity if Timestamp(P) > Timestamp(P') Current_Parity *-- P; else Current_Parity _- P';

group pg; then

end

Figure

7: Algorithm

Current_Parity

C: committed; Figure

4.3

Recovery

Following pages

from

a system

have

been

to disk and

Modified

database

their before-images performed

System

crash we need modified

needs to be written pages

8: State

O: obsolete;

transition

to identify

on disk by those

parity

page.

W: working

of the twin parity

which transactions transactions.

be written

pages for which UNDO

can be recovered

I: invalid;

diagram

the transaction

record must

from the log.

the current

pages.

Failure

to a log file after an EOT

determines

Modified

begins

have to be backed

A Begin-Of-Transaction and before

it writes

out and which (BOT)

back any modified

to the log file when the transaction

logging

has been performed,

database

using the parity pages. 10

pages

can be recovered

for which UNDO

However information

record

logging

commits. by reading

has not been

on which pages

have been

written this

to the

problem,

a twin the

same

scheme

modified

database

by the

has

EOT

to

solely

with

for the

using

RDA

redundant recovery.

We

scheme RDA

arrays

we examine

restrict both

".ATOMIC,

the

arrays

be

storage.

logging

headers

modified

pages

log chain.

is performed

that

solution

link

together

written

The

head

and

The

back

to the

of the

chain

to maintain

the

system

performance.

significantly

solve

In TWIST,

is encountered.

operations

To

employed.

page

of the

I/O

the

which

the extra

the amount

ourselves FORCE

record

FORCE,

the

RDAs

log

chain

cost

in using

involved

of the initial

propagation

analysis

".FORCE only

logging

of such strategies

a TOC

data and and

11

information

We

at

case

recovery

storage

of

recovery. and

with two

recovery

is that

benefit

algorithms

we examine RDA

evaluate

of RDAs

the

in combination

by adding RDA

use

media

recovery

algorithms in each

look of rapid

RDA

different

to them.

of the

twin

As page

cost.

hence

is appropriate

a STEAL

policy Within

for EOT-processing.

checkpointing

parity the

traditional

algorithms.

to

advocate

for the purpose

and

model

of maintaining

recovery.

using

achieved

analytical

we do not

recovery

logging

an

cost

crash

improvement

-"ATOMIC

and

same

of UNDO

to the

and

needs

the

develop

high,

of systems

is (100/N)%

implies

STEAL,

already

and

evaluate

Since

is relatively

throughput

page

we

transaction

both

reduces which

We therefore

only

do not affect

algorithms.

with

and

parity

update-in-place

in the

part id.

R.DA-recovery,

to systems

consider

recovery

and

that

the

is concerned,

for the

requests

of supporting in a system

algorithms

far as storage

transaction

of

disk

comparing

disk

a crash

In our case,

the

for different

purpose

by

benefit

redundant

recovery

do this

recovery

the

throughput

in a system

I/O

after

stored

will be

[13] can

no before-image

to undo

logging

in permanent

Analysis

evaluate

transaction

type

regular

in TWIST

pages,

transaction.

with

to be saved

of pointers

UNDO

along

Performance

In order

active

used

pages

consists

without

behind

one

has

all database

which

which

to be logged

be hidden

to store

same

logging

to the

of identifying

before

though

5

is used

problem

UNDO

similar

use of a log chain

pages

can

without

a technique

page

makes

We

database

policy

makes

for systems

for page this

class

replacement. of algorithms

For algorithms sense.

using

of the

For algorithms

of the type

-_ATOMIC,

algorithms

using

STEAL,

ACC

-,FORCE,

checkpointing

Hence we only look at the former We use the mance

same basic

of several

therefore

all cleanup

that

they

Transactions

required

accesses

transaction

is pu. To characterize

denotes

buffer.

The

the

large

cost of recovery between

cost of executing

period I/O

so that

between

5Also TCC

is used

be used however

the

the behavior a page

TCC type 5 [14].

process

requested

has referenced

by the transaction

memory

crash

during

crashes.

executing

concurrently

of update pages that buffer,

by B.

that

instead

is f_.

are modified

transaction

by an update

is present

the page

Each

communality

It is assumed

required

an availability

interval.

then the length

We

in the system.

transactions

by ca and is measured

transaction

all cost measures

the availability

operation.

periods.

we use the

a page,

and

This implies

shutdown

The

Since

bound

that

C

in the

the

buffer

will remain

in the

6.

is denoted

by ct.

a given

or during

and the disk subsystem

is denoted

is I/O

shutdown.

by an incoming

is denoted

of the perfor-

for in the cost calculations

of the database

in the buffer

the system

to perform

are accounted

The fraction

once a transaction

that

using

with no periodic

of accessed

processed

we assume

If checkpointing

algorithm

after a system

two system

operations,

continuously

pages.

needed

could

in his evaluation

that

required

The fraction

a transaction

of transactions

requests

of a set of P transactions

frames

main

by Reuter

update or retrieval.

that

of page

it is no longer

transfers

number

probability

number

is sufficiently until

s database

those

[14]. We assume

of I/O

consists

are of two types:

which

to outperform

by some background

transaction

page

techniques

by the

considered

TCC checkpoints

one introduced

is running

are performed

The workload

The

as the

recovery

the system

activities

of assuming

buffer

were shown

we look only at the number

also assume

ACCor

type of checkpointing.

model

database

both

interval

checkpointing contradicts our assumption

to perform

throughput

are evaluated

interval

in terms

in units

interval

recovery.

rt is defined

An availability

is measured

of a checkpointing

by the number

of page

is denoted

of The

as the T is the

of number transfers

of T.

by I and is also

of a continuously running system since it requires the

establishment of a quiescent point where no update transactions are present in the system. _The page could stillbe replaced before the transaction commits it willnot be rereferenced by the transaction.

ifa STEAL

policy isused, however ifitisreplaced

length of availabilityintervalin seconds 7Mathematically, T can be defined as follows:T _ time to transfera page to/from disk in seconds

12

measuredin unitsof page transfers.The

costof generatinga checkpointisdenoted by cc.Assuming

that the crashoccursin the middle of a checkpointinginterval, the number ofpage transfers available forprocessingtransactionsin an availability intervalis T - cs - cc((T - c, - I/2)/I).Hence the throughput isgiven by:

We assume that Cc isindependent of I. Hence the optima/checkpointing intervalcan be easily derivedfrom the followingequation [14]:

drt / dcs d--7 = (i/c,)_--_(I Let cr denote

the cost of updating

Then c_ can be expressed

) = O. - coI) + (T - cs)(cd/2)_

a retrieval

transaction

(i)

and cu that of an update

transaction.

as follows:

c,= (i- f,,)c, + f,,c,,. c_ itselfincludestwo components: the cost of reading pages that are not found in the database buffer

and

the cost of writing

back the replaced

pages

if they

have been

modified.

Hence:

c, = s(l - C) + am(1 - C)pm,

(2)

where Pm denotes the probabilitythat the replacedpage was modified and a denotes the number of page transfersnecessary to perform one write to the disk array,a isequal to 3 or 4 depending on whether or not the old data page is in the bufferat the time of writingthe new data. For c,, we have two additionalcomponents which representthe costof loggingthe transaction(ct)and the cost of backing out the transaction(cb)in the case where an abort occurs.Hence:

(3)

c. = s(l - C) + as(l - C)p._ + ct+ pbCb, where

Pb denotes

the probability

5.1

Evaluation

of the

We consider expected

of an abort.

Probability

a set of K pages that

have

value of the size of the subset

of Logging

been

modified

of pages

that 13

by active

transactions

can be written

back

and we compute to the database

the

without

UNDO logging.N pages

is the number

in the database.

database.

We assume

Note that

in database

over distinct

parity

the

parity

random

pages. page

groups

the

striping

accesses

in the

K pages

and S is the total

are randomly

(tLa_ID) with a large

chosen

striping

will act in favor of our scheme

whose

database value

Let X be the random

X is also the per parity

pages in a parity group

number

from the

S pages

unit or parity

by distributing

of data in the

striping,

the pages

any

accessed

groups.

variable

otherwise.

that

by using data

sequentiality

The

of data

group

number

are numbered

from

is 1 if one of the variable

of pages

can be written

denoting

that

1 to S/N.

K pages

back.

is a member

the number

can be directly

Let Xi,

of parity

written

1 < i < S/N,

of parity groups

back

group

that

i, and 0

contain

to the database

be

all K

since one

We have: stir X=

_'_Xi. i----1

Since the K pages are assumed of being

accessed

the expected

by those

K page references.

value of X is E[X]

E[X1] -- Pr(X1

--" 1) and E[X]

Hence if K modified, having

to be randomly

= _(1-Pr(X1

Page

5.2.1 With

FORCE

checkpointing our

assumption

replaced

has the same

probability

distributed.

Therefore,

are identically

Since X1 is a Bernoulli can bewritten:

are to be written

random

E[X]--

to the database,

_ the

variable,

1probability

of

by:

E[X]/K

= 1-

KN

1-

(s)

]"

(4)

Logging

Algorithm the

Hence the Xi's

= 0)),which

pages

pages is given

Pt = 1-

5.2

each parity group

s/N E[XI] = _E[X1]. = _,i=1

"uncommitted"

to log one of those

chosen,

of the

Type

discipline,

is therefore that

in the buffer,

pages

-.ATOMIC,

the checkpoint

accounted are not

STEAL, is taken

FORCE_ at the

end of each transaction.

for in the cost of logging. rereferenced

the cost of writing

by the

and logging 14

In the model,

calling

a page

TOC

transaction

The

cost of

we set cc = 0. Given after

they

will be the same whether

have

been

the page is

stolen from the buffer before transaction

commit

then logged and written

Hence we will account

the pages and writing set Pm-

to the database. them

or whether

back to the database

0 in the expressions

it stays in the buffer until EOT and is for all the costs involved

as part of the cost of logging.

for cr and cu. The expression

This

in logging allows us to

for cl is:

ct = 3 x spu + 4 X (2sp=) + 4 X 4 The first term is the cost of writing I/O

costs

three

until

EOT for the purpose of UNDO

and REDO

operations

log files.

a system

software

separately

which

makes

writes

logging.

information

reading

of having

more

is needed than

back a page to the database

through

writing

modified

by concurrent

in which

K is replaced

by incomplete before

their own modified update

aborted

with RDA

pages.

Therefore

transactions.

with s Psfup_,/2.

is kept

where

transactions

EOT

records

recovery

the other

The

buffer

to the UNDO error

or

log files are stored

less costly.

The

last term

to each of the log files. on the number

We assume

that

concurrent

transactions

of logging

are halfway

number

is given

the formula

K of

when a transaction

K is equal to half the total

RDA recovery,

in the

an operator

is dependent

Hence the probability With

case

disk array.

transactions.

committing,

to the disk array

term is the cost of writing

only in the

and

Each write

the old data

one disk in the

BOT

to log a page

database.

discipline,

The second

the log to backout

back to the database

to the

the FORCE

of cl is the cost of writing

probability

pages written

REDO

with

error damages

in the expression The

since,

the pages back

of pages

by Equation

for the cost of logging

becomes: c_ = (3 + 2pl)sp,_ + 4(spu + spupt + 4) + 4(pt - p_m,) The

major

group

difference

is dirty,

when writing expression written

i.e., with

s Page logging are disjoint.

cz is that

probability

to a dirty parity

of c[ denotes

along

9We assume

with

with implies that

UNDO Pt. The

group

both

the cost of writing

the BOT

record

and data

parity

to be performed

pages need to be updated

are not

page

to the log.

except

when

the sets of pages

modified

mixed

15

in the

only

to 3 to accounts

the log chain header

and hence

pages

has

term 2pt is added

in the same

the use of page locking

log file pages

logging

same

9. The

page

by concurrent groups.

the parity

for the

fact that

last term

The header

the first

parity

when

is normally

written update

in the

by the

transactions

4

transaction

to the

database

has

to be logged

and

not

all pages

updated

by the transaction

have

to

be logged. To evaluate that

the

UNDO

cb we assume

other log

concurrent

that

a transaction

update

has to be read

aborts

transactions

up to the

BOT

have

record

first

is the

is the

number

to and the

term

the

term

records

database

account

to undo

for the

that

writing

logged

The

modifications

their

its

modified

pages

and

pages.

The

second

term

transaction.

+a to be read

third

from

the log.

The

is the

number

of page

by the

aborting

term

performed

of a rollback

of processing

half

aborting

have

to be read. the

middle

+ PA +

of before-images

of BOT/EOT

from

last

number

also

of the

Cb= (P.42)(PA) The

in the

record.

With

KDA

transfers

transaction

recovery

the

above

and formula

becomes:

In

the

second

c_ = (p_pzs/2)PA + (pl- p_P")PA + PA

+ (p.8/2)(6pl+ 5(I - pt))+ 4

first

term

to

term

is the expected

difference

is in the

logged,

number

read

hand, both

database

page

by resetting

page

the

timestamp

of log chain

might been

in its

old

before-images

It is due

has

pages

with

the

term.

operations

if the parity

of logged number

fourth

up to six I/O

the other to

the

data

to the

fact

parity

group

and

modify

in its header.

since

to the

the

Hence

read

is now

to be read

that,

be necessary written

and

headers

be

state

the

when

recovering

its parity

group

database the

from

without

multiplied log.

may

being page

and

of the

parity

page

from

will

still

logged,

data

operations

The

other

a page

"new"

five I/O

by Pt.

that

The major

has

be dirty

been

1°. On

it is necessary

then

overwrite

wor]dng

be necessary

to

the invalid

in the latter

case.

After

a system

contains

the cost

at the time

1°In recovery

this

crash, of reading

of the crash

instance

in order

to

only

and

and

in other

keep

things

UNDO

recovery

UNDO

log file up to the

the then

overwriting

instances simple.

in the This

will

needs

to be BOT

the modifications.

evaluation, lead

to

we use

a conservative

16

an

performed. record The

upper estimate

Hence

the

of the oldest work

bound of

of the

for the

the benefit

formula

for

transaction oldest

costs of

alive

transaction

involved our

cs

method.

in

RDA

High update

High retrieval

frequency 475800

77300

_

T h

71600 65900 -

g h P

frequency

60200 -

399000-

r o u

322200-

g h P

245400-

U

U

t

t 54500 -

1"t

I

I

48800 0.0

_RDA

I

I

I

I

0.2

0.4

0.6

0.8

Communality, Figure

alive overlapped

91800

I

0.0

1.0

for _ATOMIC,

of some committed

transactions

is an upper

STEAL,

0.2

0.4

FORCE,

transactions

need to be read.

c, = Pfu(spu

S/N

I

Hence

I

0.6

I

0.8

1.0

C

TOC

therefore

the log records

the expressions

for c_ and

for half

c_ are:

+ 2) + 4(Pfup_,s/2)

+ 2(pt - p_P") + 2) + Pfu(p_,s/2)(4pl bound

I

Communality,

9: Results

c_, = ef_,(spupl

I

C

with the work

the work of about 2Pfu

The term

168600-

1"t

for the cost of reconstructing

+ 5(1 - Pt)) + S/N the bit map

for the current

parity

page. We evaluate

the algorithms

transactions.

Figure

high update

frequency

in throughput

9 shows

the throughput

and in a system

using RDA

ment.

For the latter

values

for the different

are:

in two different

recovery

environment

is much

more

of the model,

frequency.

significant

= 0.8 and

p,

except

---- 0.01

= 0.9 while

s = 40, f_, = 0.1 and p_, = 0.3.

17

on the frequency

of the communality As expected

in throughput

for N, were taken

and T for the

=

5.10 6.

with

the improvement frequency

is about

environ-

42%.

from [14]. These

For the high update

high retrieval

of update

C in a system

in the high update

and for C = 0.9 the increase

parameters

s = 10, f,

depending

as a function

with high retrieval

B = 300, S = 5000, N = 10, P = 6, Pb

environment,

environments

frequency

All the values

frequency

environment,

5.2.2

Algorithm

In this case,

of the

at EOT,

modified

pages

replaced.

REDO

to reduce

the amount

First

are

before-

not written

recovery

references

to a page

referenced

when

distribution

during

during

it is in the buffer

buffer

it is fup,,,

after

ACC

pages

They

a system

are

written

remain crash

in the

to the log but buffer

until

the

they

and ACCcheckpointing

are

is used

crash recovery.

transactions

to compute database

we can

and with probability

of references

with parameter

references

database.

its life in the

by successive

",FORCE,

of modified

Pr_. To do so, we need

to the page

C which

while it is in the buffer is 1/(1 -C). that

to the

has to be performed

a page

Hence the number

STEAL,

after-images

back

of REDO

reference

-_ATOMIC,

and

we need to evaluate

successively

buffer.

Type

implies

that

buffer.

see that

of transactions

If we look

with

during the

of a replaced

number

page

being

follows

modified

modified

of

page

is

it is not in the a geometric

of references

of a page being

that

stream

C the

when

its life in the buffer

average

at the

probability

1 - C it is referenced

Since the probability

the probability

the number

to the

page

by a transaction

during

its life in the

is I_: p,n = 1 - (I - f_,p,,)I/(1-o)

The cost of logging the BOT/EOT

is simply

records

the cost of writing

before-

their The

RDA

recovery,

before-images number

probability

pages

Hence the formula

that

logged.

of references that

any

of modified

pages

and

to the log: ct -- 4(2spu

With

and aSter-images

have

Therefore that

could

one of those

been stolen

÷ 2).

from the buffer

before

EOT

do not have

we need to evaluate

the probability

cause

to be stolen

is (1 - C)s(P

replacement

of the

references

a given

page

causes

the

Ps for a page being

for Ps is:

p, = 1 -

(1

1 ) (1-C)s(P-I) -Cs

B

11The same equation for p,_ was derived in [14] using a slightly different axgument.

18

page

to have stolen.

- 1) and is 1/(B

-

the Cs).

In the formulafor p_, be logged

with

the value of K is Psf_p,,ps/2.

probability

p,(1

-Pz).

file contains transaction

both

before-

is found.

axe still in the buffer.

and after-images

one difference

RDA

recovery,

difference

backout

and the expression

a checkpoint

for -_RDA and for RDA

after

All transactions of transactions

-1

executed

is given

by:

+ 2), + 2).

we assume

that

a crash

since the last checkpoint during

a checkpoint

+ 4spu) + P fu(ct/4 record

which

with the P_DA recovery

+ 4sp,,) + PA(c[/4

value of the optimal

Equation

to the EOT

from a crash

c; = (r'd2)A(c_/4 The

- C) + 4

occurs have

interval,

in the middle

to be redone.

rc is given

of a

Let rc

by rc = I/ct

for cs is:

term corresponds

cost of recovery

a crash,

executed

Cs = (rc/2)ft,(ct/4 The

pages to be undone

spup'] )+pu(s/2)((4+2pt)(1-C)(1-ps)+6pspt+5ps(1-p/))+4

the cost of recovery

number

the

is that the log

becomes:

c'c = (4 + 2pt)(Bpm

denote

scheme

C the modified

+ Pf_, + 4p,,(s/2)(1

cc = 4(npm

interval.

with the FORCE

is that with probability

the cost of transaction

The cost of performing

checkpoint

is:

Hence:

eL = 2x (pt, s/2)(Pfu)+Pfu+Pfu(pt-p[

To evaluate

with RDA recovery

page will not

which will be read until the BOT record of the aborting

cb = 2 X (p,,s/2)(Pf,,)

With

of a modified

- p.(1 - pO)+ 2)+ 4(p -

out a transaction

Another

before-image

Hence the cost of logging

= 4( p. + For the cost of backing

The

checkpointing

+ (s/2)p_,(4(1 interval

+ 4(s/2)pu

is accounted technique

-

1)

for in ct/4

but

- p,) + 4pspt + 5ps(1 - pt)) -

I is obtained

- Pf_,(ct

+ 4(s/2)p,,) 19

- PA)/(f_(ct

The

is:

by plugging

1) + S/N.

the expression

1. This yields:

I = (2ctcc(T

is not read.

+ 48p_))) 1/2.

for c_ in

High update

High retrieval

frequency

frequency

399700-

T h

T h

70120 -

r

0 U

64540 75700 --

g h P

337960-

r o u

276220-

g h P

214480

U

U

t

t 53380 58960

rt

47800

_RDA

i 0.0

i 0.2

l 0.4

I 0.6

Figure The formula

0.0

Figure

takes

10 shows

10: Results

place

the

for -ATOMIC,

STEAL,

recovery is derived

significant

in this case.

the

-,FORCE,

ACC

the situation

the old version

results

not

type

I

I

I

0.2

0.4

0.6

for both

However

algorithm

is reversed

1

in a similar

of the data

environments.

the

interesting

outperforms

and

the

_FORCE,

because

algorithms

in which

fashion.

The

value of a in the

with the -,FORCE

discipline,

It can be seen

that

FORCE,

algorithm

ACC

any more in the

is that

while

the

buffer.

improvement

without

TOC scheme,

outperforms

1.0

C

is not available

result

the latter

I

0.8

Communality,

of c_ and c_ is 4 for -_RDA and 4 % 2pl for RDA

replacement

is used,

91000

I 1.0

C

for I in the case of RDA

expressions

_RDA

I 0.8

Communality,

when

152740

rt

/

RDA

when

the former

RDA

is

recovery, recovery

by a significant

margin.

5.3

Record

In this

Logging

section

we look

at recovery

unit of transfer

between

main memory

is performed, additional denotes

logged parameters

the number

of a long log entry as a table

entry;

records of the of update

are encapsulated system

denotes

need

statements

such as a data Ibc

and secondary

the length

storage

into pages

e denotes

of the BOT 2O

then

for the r denotes

the average and

records

is still a page

and

to be introduced per transaction;

record;

only modified

EOT

written analysis

are logged.

however,

length records;

when logging

to the log file. of record

the average

length

of a short Ip denotes

The

Some

logging:

d

(in bytes)

log entry

such

the length

of a

physical

page;

are taken

lh denotes

from

for low update

[14]. These frequency

was set to 4. Assuming average

length

the length values

of a log chain

r -

each update

of a log entry

The values

are: d = 3 for high update

environments, that

header.

100, e -

statement

can be derived

10,

frequency

environments

16 and lp = 2020.

lbc --

causes

for the first five parameters

one long log entry

The

and that

and

d = 8

value

for lh

s > d, the

[14]:

L=(dr+(s-d)e)/s.

5.3.1

Algorithm

of the

With

record logging,

Type

the locking

-.ATOMIC, granule

is used in order to enhance

concurrency.

a given

set of P concurrent

transaction

locking

was assumed.

Appendix. so that

log. The derivations equations

This implies

without

of the detailed

TOC We assume

that the total

number

is not the same as for the above

We will denote

from different

FORCE,

can be less than a page.

this number

The value of K in the expression

log records

STEAL,

of Pl is sJ2.

transactions

cost equations

by su.

can be grouped

are similar

to those

in Section

commit

in the is used

page and written 5.2.1.

by

for which page

for su is derived

that group

in the same

record locking

of pages modified

algorithms

An expression We assume

that

We simply

to the list the

explanation.

el

-_

3sp_, + 4 x 2(21bc 4" 8pu(Ibc 4- L))/Ip

c_

=

(3 + 2pt)_p. + 4(21b_+ _p.(tb¢+ L))/l_ + 4(21b_+ _p.(t,_ + L)p_+ (Ib_+ lh)(p_-- p;"°))/l,

Cb

"-

P ft,(Ibc + 8pu(Ibc -4"L)/2)/lp

C'b = C$

--"

!

Figure recovery

5.3.2

The

Pfu(Ibc

+ spu(Ibc -4-L)pl/2

+ 4(p_,s/2)

+ 4

+ (Ibc + la)(Pt -- p_1'"))/Ij, + (p_,s/2)(6pt

+ 5(1 -- Pl)) + 4

P/.(21b_ + _p.(Ib_+ L))/l_ + 4P f.(p.,/2) Pf.(2tb¢ + 8p.(Ib_+ L)pt + 2(tbo+ lh)(Pt -- pF'))/Z, + (Pf.p.,/2)(4p_ + 5(I - p_)) 11 shows

the throughput

as a function

Algorithm

cost equations

for the FORCE,

of the communality

of the

Type

in the buffer

-.ATOMIC,

for this case can be derived

value of K in the expression

TOC type of algorithms

STEAL,

using

for Pl is s_,ps/2. 21

with

for the case of record

-"FORCE,

the results

and without

RDA

logging.

ACC

of Sections

5.2.2

and

5.3.1.

The

High retrieval High update

frequency

frequency 1102500

215900

T h r o u

g h P

202840 189780

h r To u g

905240 707980

1767204

P

510720

Ry

U

U

t

t rt

rt

163660

313460 /

150600 0.2

0.0

0.4

0.6

Communality, Figure

11: Results

, 1.O

0.8

116200

--RDA

i

0.0

I

0.2

]

0.4

1

0.6

Communality,

C

for -,ATOMIC,

STEAL,

FORCE,

=

4(2/bc + spu(lbe + 2L))/lp

c_

=

4(2/bc + sp_,(lbc + L(2 - p.(1 - Pt))) + (lbe + lh)(pt -- p[SP"P']))/Iv

1.0

C

TOC, in the case of record

el

I

0.8

logging.

cb = Pf.(ci/8) + 4p.(s/2)(1 - c) + 4 ctb =

Pfu(c_/8)

+ pu(s/2)((4

= c'.

+

=

(rd2)f.(c_/4

The equations be modified the buffer the page

before

can be replaced. We have

by the concurrently replacing

EOT.

+ 4spu) + Pf.(c_/4

for the extra The

executing

transactions expression

for c,_ and c u are obtained

of a stolen

where

5.2.2.

in logging

the proportion

- Cs),

I

as in Section

record

Let Pl denote Pi = s_/(B

+ p_,(s/2)(5ps(1

cost involved

modified

P with P - 1 in the

the equations

- C)(1 - p,) + 6pspl + 5p.(1

- Pt)) + 4

+ Pf.(c /4 +

for cc and c' are the same

to account

transactions.

+ 2pt)(1

- PI) + 4(1 - p,(1

The equations modified

page needs of replaced

records

of pages

fashion:

=

s(1 - C) + 4s(1 - C)(p,n

c'r

=

S(1--C)+4S(1--C)(prn+2pipt)

22

+ 2pi)

c_ need stolen

to

from

to the log before by uncommitted

in the

transaction,

for s_. This gives the following

c_

in pages

pages modified

as seen by an incoming

in a similar

for c_ and

to be written

s_ is the number

- pt))))

buffer

modified

s_ is obtained

equations

for cr and

by I

c_,

High update

High retrieval

frequency 1475600

1945400

T h

1576520

0 U

1207640

T h

r

g h P

frequency

838760-

1203100

r o u

930600

g h P

658100 -

u

U

t

t 469880-

rt

101000 0.0

I

I

I

I

I

0.2

0.4

0.6

0.8

1.0

Communality, Figure

Figure RDA

12: Results

12 shows recovery

Unlike

the

TOC

scheme

",FORCE, than

page

logging

for the

ACC

cost of logging

is about

range

Figure of the

environment

with

",FORCE,

ACC

in the scheme

in throughput

reduces

that

page

of RDA

I

i

0.4

0.6

I

accessed

buffer

of algorithms for both much

in typical

applications

better

by using

increases

the need

RDA

by each

transaction

and

cost

than

the

[15].

Also, for the

recovery

of logging in most

by RDA

(s) for the high

FORCE,

is higher

logging, non

the

stolen

cases.

For

in throughput

of work performed

achieved

without

environments.

with record

for logging

with the amount in throughput

23

with

and for C = 0.9, the increase

increase

C = 0.9.

to the

1.0

C

evaluation

performs

is high relatively

i

0.8

case of record logging.

This is the case because,

environment

recovery

the percent

of pages

I

0.2

in the

type

achieved

cost by eliminating

frequency

ACC,

ACC

of C encountered

of a stolen

13 shows

number

--,FORCE,

with page logging.

updates

14%. The benefit

-',FORCE,

communality

the increase

algorithm

recovery

STEAL,

of values

for the high update

transaction. function

the

and RDA

example,

case,

0.0

Communality,

for the

of the

algorithm,

for the same

pages

throughput

as a function

113100

C

for -',ATOMIC,

the

_A

385600 -

7"t

by each

recovery

update

as'a

frequency

-_FORCE,A CC,

record logging

70.0 -

% 57.2 i

44.4-

n c r

e

31.6 -

a s

e

18.8 6.0 5

I

I

I

i

15

25

35

45

Number Figure

6

13: Benefit

of RDA recovery

as a function

of pages

accessed,

s

of the number of pages referenced

by a transaction.

Conclusions

In this paper, from

we have presented

media

transaction

failures aborts

a large fraction

a scheme that uses redundant

in database and system

systems crashes.

of pages modified

and

simultaneously

The redundancy

by active transactions thus reducing

recovery

component.

uses a twin page scheme

used in transaction

of the size of the database, We used a detailed

N being

analytical

with redundant

disk arrays.

combined

RDA recovery

recovery

with

as well as -,FORCE,

actions

to store the parity

The extra

storage

from

to allow

to disk and updated

the number of recovery

undo recovery.

for recovery

in place

performed

by the

information

so that

used is about

(lO0/N)%

the number of disks in the array.

model

We found

to evaluate that,

significantly ACC

support

in the array is exploited

to be written

the need for undo logging

it can be efficiently

provide

present

without

The method

disk arrays to achieve rapid recovery

the benefit

of our scheme

in the case of page logging, outperforms

type of algorithms.

a -,FORCE,

ACC

algorithm

performs

best and

that

significantly

its performance

especially

for transactions

24

a FORCE,

a FORCE,

TOC

algorithm

In the case of record the addition

in a system

logging,

of RDA recovery

with a large

number

equipped

TOCalgorithm without

RDA

we found

that

to it improves

of updated

pages.

Appendix Derivation

of the

s_ is the number denote

the number

update

transactions

Pfy

update

of pages

in the buffer. 1 update

recurrence

equation:

Su

in the buffer

executing

transaction

to the k -

for

updated

by a set of P concurrent

of pages in the buffer updated

transaction

kth update already

Formula

concurrently

from

1 to Pf_

enters

in the system,

the

in the system, then when

the

pages,

in the system n.

are Pf_

If we number

of the spy pages it needs

that out of those

executing

Let S (k)

Since there

we have su = S (Pf").

it will find Cspu

We make the assumption already

transactions.

in the order of their entry

the system,

transaction

by k update

transactions.

Cspu

Hence,

to modify

× S(k-1)/B

belong

we have the following

S (k)- S (_-I)= spu(l - CS(k-*)/B)

Using S(*)= spy, we obtain sy = S(PI")=B(I--

(i --Cspy/B)PA).

References [1] D. Bitton and J. Gray, "Disk shadowing," in Proceedings on Very Large Data Bases, pp. 331-338, Sept. 1988. [2] J. Gray,

B. Horst,

and M. Walker,

"Parity

striping

with acceptable throughput," in Proceedings Large Data Bases, pp. 148-161, Aug. 1990.

of the lgth

of disk arrays:

of the

International

Low-cost

16th International

Conference

reliable

Conference

storage on

[3] D. Patterson, G. Gibson, and R. Katz, "A case for redundant arrays of inexpensive (RAID)," in Proceedings of the ACM SIGMOD Conference, pp. 109-116, June 1988. [4] J. Gray, P. McJones, M. Blasgen, B. Lindsay, R. Lorie, T. Price, "The recovery manager of the system 1_ database manager," ACM no. 2, pp. 223-242, 1981. [5] J. Kent and H. Garcia-Molina, "Optimizing shadow ware Engineering, vol. 14, pp. 155-168, Feb. 1988. [6] R. A. Lorie, "Physical integrity vol. 2, pp. 91-104, Mar. 1977.

in a large segmented

[7] T. Haerder and A. Reuter, "Principles puting Surveys, vol. 15, pp. 287-317,

12Update

transactions

can

share

pages

because

recovery

database,"

of transaction-oriented Dec. 1983.

record

logging

25

is used instead

Very

disks

F. Putzolu, and I. Traiger, Computing Surveys, vol. 13,

algorithms,"

IEEE

ACM

Database

database

of page

Trans.

recovery,"

logging.

Trans.

Soft-

Systems,

A CM Com-

[8] M. Y. Kim, "Synchronizeddiskinterleaving,"IEEE

Trans.

Computers,

vol. C-35, pp. 978-988,

Nov. 1986.

[9] M.

Livny, S. Khoshafian, and H. Boral, "Multi-disk the A CM Sigmetrics Conference on Measurement

management and Modeling

algorithms," of Computer

in Proceedings of Systems, pp. 69-

77, May 1987.

[10] K.

Salem

and

Conference [Ii]

H. Garcia-Molina,

on Data

Engineering,

"Disk striping," pp. 336-342,

M. Stonebraker, R. Katz, D. Patterson, and ceedings of the 14th International Conference 1988.

[12] K.-L. Wu and W. K. Fuchs, agement,"

[13] A. Reuter, Software

in Proceedings

"Rapid

of IEEE

J. Ousterhout, "The on Very Large Data

transaction-undo Compsac,

in Proceedings

recovery

pp. 295-300,

[14] A. Reuter, "Performance analysis of recovery Systems, vol. 9, pp. 526-559, Dec. 1984.

on Database

techniques,"

design Bases,

using

of XPRS," in Propp. 318-330, Sept.

twin-page

storage

man-

recovery,"

IEEE

Trans.

ACM

Transactions

and T. Haerder, "Principles of database buffer management," Systems, vol. 9, pp. 560-595, Dec. 1984.

26

International

Nov. 1990.

"A fast transaction-oriented logging scheme for UNDO Engineering, vol. SE-6, pp. 348-356, July 1980.

[15] W. Effelsberg

of the IEEE

Feb. 1986.

on Database

A CM Transactions