SUBMITTED
TO:
EIGHTH
Database
INTERRATIONAL
Recovery
CONFERENCE
Using
N. Mourad Kent
Daniel
Center
for Reliable and Coordinated
DATA
ENGINEERING
Redundant
Antoine W.
ON
Disk
Arrays*
t
Fuchs G. Saab
High-Performance Science Laboratory
Computing
1101 W. Springfield Ave University of Illinois Urbana,
Illinois
June
61801.
6, 1991
Abstract Redundant ures
with
disk
a relatively
availability. support role
arrays
In rapid
in the Using
significant
increase the
array
for achieving
for large a method
crashes
recovery. so that
an analytical number
cost
system
failure
in the
a way
we propose
from
media
degraded.
storage
paper
recovery
ity information
by reducing the database.
low
this
in providing
provide
the
model,
throughput of recovery
and A twin
time we
for
show
of database operations
rapid
scale
database
for
using
page
systems aborts
scheme
transaction the
needed
high
arrays the
processing method the
to
to their
to store
redundant
to maintain
fail-
requiring
in addition
commit using
media
disk
is used
proposed
systems
from
redundant
transaction
that
recovery
paris not
achieves disk
consistency
a
arrays of
N92-257,_9 !_,[,y_'r_
A._T
_I_,K
A_AY,S
(illinois
Univ
)
D UriC] ::_s S 31 _,'?
"This Contract Research tPh.
0OV3_0
research was supported in part by the National Aeronautics and Space Administration (NASA) under NAG 1-613 and in part by the Department of the Navy and managed by the Office of the Chief of Naval under Gra_at N00014-91-:I-1283. (217) 244-7180,
Fax (217) 244-5686, emaih
mouradQcrhc.uiuc.edu
1
Introduction
In a database state
after
which
be
the
state.
in main
memory
due
type
by periodically
all updates
For large
media
subsystem recovery
design
In this recovery
the
from
that
significant 50
than
25 days 1. Mirrored
disk
mirroring
still
mean
incurs
Array
However,
be necessary
the
and
even
when
to protect
the
the
last
copy
have
to
of the
or RDAs
rapid
is prohibitive an alternative
are used,
operator
high.
permanent
to provide
which
apply
In such
is quite
[2, 3] provide
against
a media
is used
employed
overhead
mirroring
database
When
recovery
is
database
generated.
(MTTF)
organizations
of failure
to the
file
was
performed
type
log file.
for
been
storage
this
log
database
crash.
updates
and
disks
maintained
to the
of the
with
overhead
to their
modifications
a redo
to failure
disk
the
into
time
a 100%
(R.DA)
to deal
a transaction
tables
made
time
aborts
transaction
system
by logging
the
When
all updates
copy
after
time
be less
storage.
disks,
Disk
and
committed down
way
last
case
to a consistent
transaction
by the
at the
copies
the
are
to REDO
database
archive
typical
modified
and
database
over
we present
transaction
by using old version
1Assumin$
of the
database
initiated.
In this
common
the
errors
archiving or system
errors.
paper,
from
is achieved
may
One
between
Redundant
reliable
logging
copies
crash.
in the
failure.
user
pages
occurred
with
However,
applications.
redo
software
can [1].
for maintaining
keep
e.g.,
be
most
has to UNDO
crash
is reconstructed
causes
or can
is a system
reflected
by transactions
failure
systems,
for many
and
database
The
all database
the
transactions
performed
a media
not yet
for restoring
occur.
mechanism
when
archive
by committed the
recovery
is media
generating
occurs
storage
and
can
deadlocks,
of failure
active
of failure
of failures
errors,
type
were
be necessary
has to restore
The
transactions
Another
a case,
second
that
by complete
failure
manager
are lost.
may
types
to program
The
by transactions
recovery
Several
recovery
previous
performed
rapid
a failure.
can
aborts,
system,
a twin of the
a technique
and
system
page parity
that failures
scheme along
exploits in addition
for storing with
the
the new
an MTTF of 30,000 hours for each disk.
2
the
redundancy
in disk
to providing parity
version.
fast
information The
old
version
arrays
media making of the
to
support
recovery.
This
it possible parity
to
is used
to undo
updates
performed
by aborted
transactions
or by transactions
interrupted
by a system
failure. In Sections
2 and 3 we briefly
tems and discuss The results
two RDA
review several
organizations.
of our performance
analysis
techniques
In Section are detailed
for transaction
4, we present in Section
recovery
in database
our database
5. Section
recovery
6 presents
sys-
scheme.
some
conclu-
sions.
2
Recovery
Recovery
Techniques
algorithms
[4], before
typically
a new version
use some form of logging
(a_ter-irnage)
the old version
(before-image)
system
the log file is analyzed
crashes,
approach
the update
containing written
maintaining Another
respect
and in analyzing
algorithms
introduced
to the following
Propagation of updated propagation
log file.
is restored.
of the committing
approach
is dynamic
effect which decreases
a copy of
aborts
mapping during
[7]. They classify
pages
have been
since it requires normal
the sequentiality
or the
In the shadowing
transaction
we will use the following
and Reuter
approach
page on disk [5, 6]. The physical
which leads to high I/O overhead
our method,
processing.
of disk accesses.
taxonomy recovery
of database
algorithms
with
four concepts: The propagation
can be propagated
of updates
to the database,
If a transaction
of the database
after all updates
by Ha_rder
2 of updates.
pages
the state
with the shadowing
is the disk scrambling
In describing
into a sequential
and
are released
a very large page table
problem
recovery
is placed
One problem
In the logging
of a record or page is written
of a page is placed into a new physical
the old versions
to disk.
or shadowing.
strategy
to the database
can be interrupted
can be ATOMIC
in one atomic
by a system
crash and
action.
in which case any set In the -,ATOMIC
database
case,
pages are updated-in-
place. Page
replacement.
2Propagation written
to disk
to
the
without
Two policies database
being
means
propagated
that (e.g.,
can be used:
the STEAL
the
is visible
new
version
shadowing).
policy
to higher
allows
level
pages
software.
modified Updates
can
by be
uncommitted opposite
transactions
to be propagated
policy is referred to as -_STEAL.
EOT
processing.
a transaction
Two categories
to be propagated
Cheekpointing order to minimize Transaction
Oriented
transaction.
This is equivalent
of checkpoints cent periods less restrictive
3
3.1
exist:
of REDO
where no transactions
recovery
Disk
actions
scheme,
Consistent
discipline Checkpoints
statements
with a -_STEAL
Action
policy.
is called
-_FORCE.
updates
is generated
(TCC)
in
In the
at the end of each Two other types
are generated
Consistent
by
to the database
after a crash.
in EOT-processing.
are processed
the
all pages modified
to be performed
a checkpoint
are being processed,
and require that no update
discipline
(EOT);
requires
is used to propagate
to using the FORCE
can be used: Transaction
end-of-transaction
discipline
the opposite
(TOC)
before
recovery is necessary
the FORCE
Cheekpointing
Checkpointing
Redundant
Data
No UNDO
before EOT;
Schemes. the number
to the database
Checkpoints
during checkpoint
during
quies-
(ACC)
are
generation.
Arrays
Striping
Striped disk arrayshave been proposed and implemented forincreasingthe transferbandwidth in high performance I/O subsystems [8,9, 10].In order to allow the use of a largenumber of disksin such arrays without compromising the reliability of the I/O subsystem, redundancy is sometimes includedin the form of parityinformation[3,10].Pattersonet al.[3]have presentedseveralpossible organizationsfor Redundant Arrays of InexpensiveDisks (RAID). One interestingorganizationis RAID
with rotated parityin which blocksof data are interleavedacrossN disks while the parity
of the N blocksiswrittenon the N + 1_t disk.The parityis rotatedover the setof disksin order to avoid contentionon the paritydisk.Figure 1 shows the array organizationwith four disks.The organizationallowsboth large (fullstripe)concurrent accessesor small (individualdisk)accesses. In thispaper, we concentrateon small read/writeaccesses.For a small write access,the data block is read from the relevantdisk and modified. To compute the new parity,the old parity has to be read,XORed
with the new data and XORed
with the old data. Then the new data and new parity
can be written back to the correspondingdisks.Stonebraker et al.[11]have advocated the use of 4
D2
D1 D4
D3
D7 D10
D9 Figure
1: RAID
with rotated
parity
D00
D20 D21
D10 Dll
Figure
a RAID
3.2
Parity
Gray
to provide
systems.
number
Hence,
of disk arrays.
in database
ways of using an architecture
They found
of small
that because
accesses,
the organization
systems.
shown
in Figure
an area
without
interleaving.
For a group
on each disk is reserved disks are grouped
for parity
2 was proposed. on each disk and
for parity and in a parity
in on-line
requests
to have several
of N + 1 disks,
together
such as RAID
of the nature of I/O
it is not convenient
of reserving
different
striping
high availability
consists
areas
2: Parity
Striping
et al. I2] studied
(OLTP) large
organization
on four disks.
in OLTP
disks servicing
It is referred writing
data
each disk is divided
the other
areas
contsdn
group and their
transaction
parity
processing
systems,
namely
a
the same request.
to as parity
striping.
It
sequentially
on each disk
into N + 1 areas
one of these
data.
N data areas
from
N
is written
on the parity
area
that
is a collection
of redundant
disk
striping
or data
of the N + 1_t disk.
4
KDA-Based
In the remainder arrays.
The
rotated
parity).
Recovery of this paper,
organization
we consider
of the
In the case of data
arrays
an I/O subsystem
being
striping
either
we assume 5
parity that
a large
striping
striping
(RAID
with
unit is used in order
to
ensure that I/O assumptions:
requests
Communication
fixed size pages; A STEAL
4.1
will typically
Database
policy
ttDA-based
between
allowing
Description
recovery makes
performed
by aborted
performed
by an aborted
modified
before
the transaction
has been stolen from the buffer). page per parity
group
UNDO
If additional
logging.
back to the database to the clean the state parity
state
transition
groups
that
when
the number
of the parity
a page
page number updated
When
page
it commits Otherwise
a transaction
version
(using
within
that
are dealt
has been written of Haerder
is referred caused parity.
page
page
A parity
group
modified
and Reuter,
the page
transactions
data
without
and need to be written
A dirty dirty
back to the database
parity
group
commits. contains
Figure
goes back 3 shows
the numbers
to as the Dirty_Set.
of all
It also contains
the group to be in the dirty
state
and
Only log N bits need to be used to store
number.
can be written that
first.
it to become
that
the updated
and one bit for the parity
have been
A table in main memory
This table
unless
Only one modified
by uncommitted
be logged
caused
the group
a page,
In the following,
a page parity group.
the notation
must
group.
transaction
page.
the parity group is called clean.
before-images
of a parity
updates
using the parity
group is dirty when one of its data pages has
back to the database
page holding
by an active
EOT.
by itself to undo all updates
be undone
the same parity
A parity
and the modified
are in the dirty state.
of the data
is _A TOMIC;
recovery schemes.
the transaction
diagram
propagation
using
present in the disk arrays to undo updates
that cannot
pages in the parity group
then their
is performed
before
the parity is not sumcient
Updates
can be written
the number
the data
However,
clean or dirty.
modifying
that
to be propagated
we will use the term parity group to denote
by a transaction
subsystem
Approach
transaction.
can be in one of two states: been
pages
group is the set of pages that share
there is ambiguity,
and the I/O
use of the parity information
transactions.
disk. We also make the following
in place which implies
modified
of the
with using one of the traditional A page parity
by a single data
main memory
pages are updated
is used thus
General
be serviced
The
table
is used to check
back to disk without
can be written
back
UNDO
to the
whether
logging.
database
without
Transaction written
back
T modifies
page D_ and D_ is
to the database
before
EOT T rereferences modifies
D_,
it and
Di
back
to
is written the database before
EOT
TransactionT commits
Figure 3: State transitiondiagram UNDO
of a page paritygroup.
loggingifitsparitygroup is clean or ifitsparitygroup is dirty and the update is for the
same page that caused the group to move intothe dirtystate,i.e., the same page has been updated, stolenfrom the bufferthen rereferencedby the same transaction,updated and stolenagain from the bufferbeforeEOT 3. Note that thisdoes not affectthe degree of concurrency or interfere with the lockingpolicyused in the system. We do not specifywhen a transactioncan or cannot modify a page. We only specifywhen a modified page can be writtenback to disk without UNDO
logging.
Ifa singleparitypage isused, then when a group becomes dirtythe old parityinformationhas to be kept in the paritypage to be able to recoverin case of a transactionfailure.That would mean that when the transactioncommits, the new parityhas to be recomputed in order to update the paritypage. That would requirereading allthe data pages in the group in order to compute the new parity.To avoid that problem a twin page scheme is used forthe paritypages. The basic mechanism of the twin page scheme isas follows:one of the paritypages always containsthe valid parityof the group while the other page containsobsoleteparityinformation.When
a data page is
modifiedin a paritygroup,the obsoleteparitypage (P forexample) isupdated with the new parity of the array.Ifthe transactionperforming the update commits then the modified paritypage (P) becomes the validparitypage otherwisethe other paritypage (P_) remains the validparitypage and itscontents are used to recoverthe data page that was modified by the failedtransaction. Figures4 and 5 show the data stripingorganizationand the paritystripingorganizationwhen the a Normally such an event should not occur often since buffer management a page that will be referenced again in the near future.
7
algorithms are not supposed to replace
DO D3 D6 iii!!!iiiiiiiiiiiiiiiiii i iiiiii ii iii i i i iiii! i Figure
4: Data striping
organization
DO0
DIO
DO1
Dll
[°5 t
D7
Dll
D10 with the twin page
scheme
for the parity.
D20 D21
P30 1"31
Figure
twin page striping shows
scheme
organization
is used for the parity.
the contents
with the twin page scheme
Twin
of a parity
group including
of a data page after a transaction and
the new data page:
one of its data
pages
to disk,
UNDO
updated
since when
the
striping
parity
pages
case and Pzy and Pzy t, with z -- (x q- 1)mod(N
version pages
5: Parity
actual
uncommitted
parity data
must
stolen
the group is dirty
of the data page
to disk the corresponding
the twin parity it is sufficient
for Dj 4 then
it is necessary
on disk and an "old"
Di in case of a transaction parity
4The before-image of the page must be written to a log file.
page(s)
pages.
When
must
in the case of page
abort.
be updated
logging
8
both
page
striping
case. Figure
In order to recover
a parity
Ds needs
pages
that
would
In all cases,
parity because
to be written
P and P_ need
a current
parity
page
be used
when
writing
to be
reflecting
to recover a data
the page
first.
or of the modified
record(s)
6
the old
of both
group is dirty
page
parity
to maintain parity
Px t in the data
to XOR the contents
from the buffer and another
be performed
Px and
q- 2), in the parity
Dol d -- (P (_ p0) @ Dnew.
Di has been
logging
abort
are denoted
for the parity.
in the case of record
logging
Do
D1
DN-1
P
P_
oooooo.°.o_oo°
Figure 6: The contents
4.2
Twin
Page
The twin parity transaction contains
recovery
following
the highest
timestalnp
transaction
or system
and
a bit map
contains
the valid parity the timestamp 7 selects
parity page is written
can be maintMned groups
following
parity
algorithm
that destroys
parity
obsolete been
the current
back
page, parity
of the
idle periods twin parity
[12]. A parity
updated
it has aborted.
by an active Figure
pages
transaction,
8 shows the state
When
may
both parity
not survive
when it contains parity information.
is computed parity
diagram
pages,
page for each crash.
will have to be used to
states:
parity
a system
crash
to reconstruct
page
P is
is not available
a background
process
the bit map.
committed,
the last committed
obsolete, parity
working update.
It is in the working state
and it is in the invalid state if the last transaction transition
Algorithm
a system
parity page or the information
can be initiated
after a
In this case, two bits would have
to code the three possible
Following
with
page is updated
Then the parity
Current._Parity
the bit map.
The page
to 0.
a data
reading
pages
is undone
is reset
which is the current
can be in one of four states:
old committed
page
In order to avoid
algorithm
has to be used. of the system
an update
page.
such a bit map
page pi is the current
page is committed
when it contains
group
When parity
parity
indicating
the map,
which of the twin parity
for modification.
to disk.
However
in order to be able to perform
in the page header.
information.
page and to reconstruct
Current_Parity
that runs during
invalid
is stored
of the current
in main memory
in the database.
a crash
the current
the current
Each
In order to identify
a timestamap
to be used in the bit map for each parity
and
This is necessary
pages are read and one of them is selected
of the parity
identify
information,
failure,
disks.
a disk failure.
shown in Figure
the modified
Hence
are stored on different
the valid parity
both parity
group.
Management
pages
Current__Parity
of a page parity
of the twin parity
pages.
or It is
when it has updating
Current_Parity(pg) begin Read twin parity pages in parity if Timestamp(P) > Timestamp(P') Current_Parity *-- P; else Current_Parity _- P';
group pg; then
end
Figure
7: Algorithm
Current_Parity
C: committed; Figure
4.3
Recovery
Following pages
from
a system
have
been
to disk and
Modified
database
their before-images performed
System
crash we need modified
needs to be written pages
8: State
O: obsolete;
transition
to identify
on disk by those
parity
page.
W: working
of the twin parity
which transactions transactions.
be written
pages for which UNDO
can be recovered
I: invalid;
diagram
the transaction
record must
from the log.
the current
pages.
Failure
to a log file after an EOT
determines
Modified
begins
have to be backed
A Begin-Of-Transaction and before
it writes
out and which (BOT)
back any modified
to the log file when the transaction
logging
has been performed,
database
using the parity pages. 10
pages
can be recovered
for which UNDO
However information
record
logging
commits. by reading
has not been
on which pages
have been
written this
to the
problem,
a twin the
same
scheme
modified
database
by the
has
EOT
to
solely
with
for the
using
RDA
redundant recovery.
We
scheme RDA
arrays
we examine
restrict both
".ATOMIC,
the
arrays
be
storage.
logging
headers
modified
pages
log chain.
is performed
that
solution
link
together
written
The
head
and
The
back
to the
of the
chain
to maintain
the
system
performance.
significantly
solve
In TWIST,
is encountered.
operations
To
employed.
page
of the
I/O
the
which
the extra
the amount
ourselves FORCE
record
FORCE,
the
RDAs
log
chain
cost
in using
involved
of the initial
propagation
analysis
".FORCE only
logging
of such strategies
a TOC
data and and
11
information
We
at
case
recovery
storage
of
recovery. and
with two
recovery
is that
benefit
algorithms
we examine RDA
evaluate
of RDAs
the
in combination
by adding RDA
use
media
recovery
algorithms in each
look of rapid
RDA
different
to them.
of the
twin
As page
cost.
hence
is appropriate
a STEAL
policy Within
for EOT-processing.
checkpointing
parity the
traditional
algorithms.
to
advocate
for the purpose
and
model
of maintaining
recovery.
using
achieved
analytical
we do not
recovery
logging
an
cost
crash
improvement
-"ATOMIC
and
same
of UNDO
to the
and
needs
the
develop
high,
of systems
is (100/N)%
implies
STEAL,
already
and
evaluate
Since
is relatively
throughput
page
we
transaction
both
reduces which
We therefore
only
do not affect
algorithms.
with
and
parity
update-in-place
in the
part id.
R.DA-recovery,
to systems
consider
recovery
and
that
the
is concerned,
for the
requests
of supporting in a system
algorithms
far as storage
transaction
of
disk
comparing
disk
a crash
In our case,
the
for different
purpose
by
benefit
redundant
recovery
do this
recovery
the
throughput
in a system
I/O
after
stored
will be
[13] can
no before-image
to undo
logging
in permanent
Analysis
evaluate
transaction
type
regular
in TWIST
pages,
transaction.
with
to be saved
of pointers
UNDO
along
Performance
In order
active
used
pages
consists
without
behind
one
has
all database
which
which
to be logged
be hidden
to store
same
logging
to the
of identifying
before
though
5
is used
problem
UNDO
similar
use of a log chain
pages
can
without
a technique
page
makes
We
database
policy
makes
for systems
for page this
class
replacement. of algorithms
For algorithms sense.
using
of the
For algorithms
of the type
-_ATOMIC,
algorithms
using
STEAL,
ACC
-,FORCE,
checkpointing
Hence we only look at the former We use the mance
same basic
of several
therefore
all cleanup
that
they
Transactions
required
accesses
transaction
is pu. To characterize
denotes
buffer.
The
the
large
cost of recovery between
cost of executing
period I/O
so that
between
5Also TCC
is used
be used however
the
the behavior a page
TCC type 5 [14].
process
requested
has referenced
by the transaction
memory
crash
during
crashes.
executing
concurrently
of update pages that buffer,
by B.
that
instead
is f_.
are modified
transaction
by an update
is present
the page
Each
communality
It is assumed
required
an availability
interval.
then the length
We
in the system.
transactions
by ca and is measured
transaction
all cost measures
the availability
operation.
periods.
we use the
a page,
and
This implies
shutdown
The
Since
bound
that
C
in the
the
buffer
will remain
in the
6.
is denoted
by ct.
a given
or during
and the disk subsystem
is denoted
is I/O
shutdown.
by an incoming
is denoted
of the perfor-
for in the cost calculations
of the database
in the buffer
the system
to perform
are accounted
The fraction
once a transaction
that
using
with no periodic
of accessed
processed
we assume
If checkpointing
algorithm
after a system
two system
operations,
continuously
pages.
needed
could
in his evaluation
that
required
The fraction
a transaction
of transactions
requests
of a set of P transactions
frames
main
by Reuter
update or retrieval.
that
of page
it is no longer
transfers
number
probability
number
is sufficiently until
s database
those
[14]. We assume
of I/O
consists
are of two types:
which
to outperform
by some background
transaction
page
techniques
by the
considered
TCC checkpoints
one introduced
is running
are performed
The workload
The
as the
recovery
the system
activities
of assuming
buffer
were shown
we look only at the number
also assume
ACCor
type of checkpointing.
model
database
both
interval
checkpointing contradicts our assumption
to perform
throughput
are evaluated
interval
in terms
in units
interval
recovery.
rt is defined
An availability
is measured
of a checkpointing
by the number
of page
is denoted
of The
as the T is the
of number transfers
of T.
by I and is also
of a continuously running system since it requires the
establishment of a quiescent point where no update transactions are present in the system. _The page could stillbe replaced before the transaction commits it willnot be rereferenced by the transaction.
ifa STEAL
policy isused, however ifitisreplaced
length of availabilityintervalin seconds 7Mathematically, T can be defined as follows:T _ time to transfera page to/from disk in seconds
12
measuredin unitsof page transfers.The
costof generatinga checkpointisdenoted by cc.Assuming
that the crashoccursin the middle of a checkpointinginterval, the number ofpage transfers available forprocessingtransactionsin an availability intervalis T - cs - cc((T - c, - I/2)/I).Hence the throughput isgiven by:
We assume that Cc isindependent of I. Hence the optima/checkpointing intervalcan be easily derivedfrom the followingequation [14]:
drt / dcs d--7 = (i/c,)_--_(I Let cr denote
the cost of updating
Then c_ can be expressed
) = O. - coI) + (T - cs)(cd/2)_
a retrieval
transaction
(i)
and cu that of an update
transaction.
as follows:
c,= (i- f,,)c, + f,,c,,. c_ itselfincludestwo components: the cost of reading pages that are not found in the database buffer
and
the cost of writing
back the replaced
pages
if they
have been
modified.
Hence:
c, = s(l - C) + am(1 - C)pm,
(2)
where Pm denotes the probabilitythat the replacedpage was modified and a denotes the number of page transfersnecessary to perform one write to the disk array,a isequal to 3 or 4 depending on whether or not the old data page is in the bufferat the time of writingthe new data. For c,, we have two additionalcomponents which representthe costof loggingthe transaction(ct)and the cost of backing out the transaction(cb)in the case where an abort occurs.Hence:
(3)
c. = s(l - C) + as(l - C)p._ + ct+ pbCb, where
Pb denotes
the probability
5.1
Evaluation
of the
We consider expected
of an abort.
Probability
a set of K pages that
have
value of the size of the subset
of Logging
been
modified
of pages
that 13
by active
transactions
can be written
back
and we compute to the database
the
without
UNDO logging.N pages
is the number
in the database.
database.
We assume
Note that
in database
over distinct
parity
the
parity
random
pages. page
groups
the
striping
accesses
in the
K pages
and S is the total
are randomly
(tLa_ID) with a large
chosen
striping
will act in favor of our scheme
whose
database value
Let X be the random
X is also the per parity
pages in a parity group
number
from the
S pages
unit or parity
by distributing
of data in the
striping,
the pages
any
accessed
groups.
variable
otherwise.
that
by using data
sequentiality
The
of data
group
number
are numbered
from
is 1 if one of the variable
of pages
can be written
denoting
that
1 to S/N.
K pages
back.
is a member
the number
can be directly
Let Xi,
of parity
written
1 < i < S/N,
of parity groups
back
group
that
i, and 0
contain
to the database
be
all K
since one
We have: stir X=
_'_Xi. i----1
Since the K pages are assumed of being
accessed
the expected
by those
K page references.
value of X is E[X]
E[X1] -- Pr(X1
--" 1) and E[X]
Hence if K modified, having
to be randomly
= _(1-Pr(X1
Page
5.2.1 With
FORCE
checkpointing our
assumption
replaced
has the same
probability
distributed.
Therefore,
are identically
Since X1 is a Bernoulli can bewritten:
are to be written
random
E[X]--
to the database,
_ the
variable,
1probability
of
by:
E[X]/K
= 1-
KN
1-
(s)
]"
(4)
Logging
Algorithm the
Hence the Xi's
= 0)),which
pages
pages is given
Pt = 1-
5.2
each parity group
s/N E[XI] = _E[X1]. = _,i=1
"uncommitted"
to log one of those
chosen,
of the
Type
discipline,
is therefore that
in the buffer,
pages
-.ATOMIC,
the checkpoint
accounted are not
STEAL, is taken
FORCE_ at the
end of each transaction.
for in the cost of logging. rereferenced
the cost of writing
by the
and logging 14
In the model,
calling
a page
TOC
transaction
The
cost of
we set cc = 0. Given after
they
will be the same whether
have
been
the page is
stolen from the buffer before transaction
commit
then logged and written
Hence we will account
the pages and writing set Pm-
to the database. them
or whether
back to the database
0 in the expressions
it stays in the buffer until EOT and is for all the costs involved
as part of the cost of logging.
for cr and cu. The expression
This
in logging allows us to
for cl is:
ct = 3 x spu + 4 X (2sp=) + 4 X 4 The first term is the cost of writing I/O
costs
three
until
EOT for the purpose of UNDO
and REDO
operations
log files.
a system
software
separately
which
makes
writes
logging.
information
reading
of having
more
is needed than
back a page to the database
through
writing
modified
by concurrent
in which
K is replaced
by incomplete before
their own modified update
aborted
with RDA
pages.
Therefore
transactions.
with s Psfup_,/2.
is kept
where
transactions
EOT
records
recovery
the other
The
buffer
to the UNDO error
or
log files are stored
less costly.
The
last term
to each of the log files. on the number
We assume
that
concurrent
transactions
of logging
are halfway
number
is given
the formula
K of
when a transaction
K is equal to half the total
RDA recovery,
in the
an operator
is dependent
Hence the probability With
case
disk array.
transactions.
committing,
to the disk array
term is the cost of writing
only in the
and
Each write
the old data
one disk in the
BOT
to log a page
database.
discipline,
The second
the log to backout
back to the database
to the
the FORCE
of cl is the cost of writing
probability
pages written
REDO
with
error damages
in the expression The
since,
the pages back
of pages
by Equation
for the cost of logging
becomes: c_ = (3 + 2pl)sp,_ + 4(spu + spupt + 4) + 4(pt - p_m,) The
major
group
difference
is dirty,
when writing expression written
i.e., with
s Page logging are disjoint.
cz is that
probability
to a dirty parity
of c[ denotes
along
9We assume
with
with implies that
UNDO Pt. The
group
both
the cost of writing
the BOT
record
and data
parity
to be performed
pages need to be updated
are not
page
to the log.
except
when
the sets of pages
modified
mixed
15
in the
only
to 3 to accounts
the log chain header
and hence
pages
has
term 2pt is added
in the same
the use of page locking
log file pages
logging
same
9. The
page
by concurrent groups.
the parity
for the
fact that
last term
The header
the first
parity
when
is normally
written update
in the
by the
transactions
4
transaction
to the
database
has
to be logged
and
not
all pages
updated
by the transaction
have
to
be logged. To evaluate that
the
UNDO
cb we assume
other log
concurrent
that
a transaction
update
has to be read
aborts
transactions
up to the
BOT
have
record
first
is the
is the
number
to and the
term
the
term
records
database
account
to undo
for the
that
writing
logged
The
modifications
their
its
modified
pages
and
pages.
The
second
term
transaction.
+a to be read
third
from
the log.
The
is the
number
of page
by the
aborting
term
performed
of a rollback
of processing
half
aborting
have
to be read. the
middle
+ PA +
of before-images
of BOT/EOT
from
last
number
also
of the
Cb= (P.42)(PA) The
in the
record.
With
KDA
transfers
transaction
recovery
the
above
and formula
becomes:
In
the
second
c_ = (p_pzs/2)PA + (pl- p_P")PA + PA
+ (p.8/2)(6pl+ 5(I - pt))+ 4
first
term
to
term
is the expected
difference
is in the
logged,
number
read
hand, both
database
page
by resetting
page
the
timestamp
of log chain
might been
in its
old
before-images
It is due
has
pages
with
the
term.
operations
if the parity
of logged number
fourth
up to six I/O
the other to
the
data
to the
fact
parity
group
and
modify
in its header.
since
to the
the
Hence
read
is now
to be read
that,
be necessary written
and
headers
be
state
the
when
recovering
its parity
group
database the
from
without
multiplied log.
may
being page
and
of the
parity
page
from
will
still
logged,
data
operations
The
other
a page
"new"
five I/O
by Pt.
that
The major
has
be dirty
been
1°. On
it is necessary
then
overwrite
wor]dng
be necessary
to
the invalid
in the latter
case.
After
a system
contains
the cost
at the time
1°In recovery
this
crash, of reading
of the crash
instance
in order
to
only
and
and
in other
keep
things
UNDO
recovery
UNDO
log file up to the
the then
overwriting
instances simple.
in the This
will
needs
to be BOT
the modifications.
evaluation, lead
to
we use
a conservative
16
an
performed. record The
upper estimate
Hence
the
of the oldest work
bound of
of the
for the
the benefit
formula
for
transaction oldest
costs of
alive
transaction
involved our
cs
method.
in
RDA
High update
High retrieval
frequency 475800
77300
_
T h
71600 65900 -
g h P
frequency
60200 -
399000-
r o u
322200-
g h P
245400-
U
U
t
t 54500 -
1"t
I
I
48800 0.0
_RDA
I
I
I
I
0.2
0.4
0.6
0.8
Communality, Figure
alive overlapped
91800
I
0.0
1.0
for _ATOMIC,
of some committed
transactions
is an upper
STEAL,
0.2
0.4
FORCE,
transactions
need to be read.
c, = Pfu(spu
S/N
I
Hence
I
0.6
I
0.8
1.0
C
TOC
therefore
the log records
the expressions
for c_ and
for half
c_ are:
+ 2) + 4(Pfup_,s/2)
+ 2(pt - p_P") + 2) + Pfu(p_,s/2)(4pl bound
I
Communality,
9: Results
c_, = ef_,(spupl
I
C
with the work
the work of about 2Pfu
The term
168600-
1"t
for the cost of reconstructing
+ 5(1 - Pt)) + S/N the bit map
for the current
parity
page. We evaluate
the algorithms
transactions.
Figure
high update
frequency
in throughput
9 shows
the throughput
and in a system
using RDA
ment.
For the latter
values
for the different
are:
in two different
recovery
environment
is much
more
of the model,
frequency.
significant
= 0.8 and
p,
except
---- 0.01
= 0.9 while
s = 40, f_, = 0.1 and p_, = 0.3.
17
on the frequency
of the communality As expected
in throughput
for N, were taken
and T for the
=
5.10 6.
with
the improvement frequency
is about
environ-
42%.
from [14]. These
For the high update
high retrieval
of update
C in a system
in the high update
and for C = 0.9 the increase
parameters
s = 10, f,
depending
as a function
with high retrieval
B = 300, S = 5000, N = 10, P = 6, Pb
environment,
environments
frequency
All the values
frequency
environment,
5.2.2
Algorithm
In this case,
of the
at EOT,
modified
pages
replaced.
REDO
to reduce
the amount
First
are
before-
not written
recovery
references
to a page
referenced
when
distribution
during
during
it is in the buffer
buffer
it is fup,,,
after
ACC
pages
They
a system
are
written
remain crash
in the
to the log but buffer
until
the
they
and ACCcheckpointing
are
is used
crash recovery.
transactions
to compute database
we can
and with probability
of references
with parameter
references
database.
its life in the
by successive
",FORCE,
of modified
Pr_. To do so, we need
to the page
C which
while it is in the buffer is 1/(1 -C). that
to the
has to be performed
a page
Hence the number
STEAL,
after-images
back
of REDO
reference
-_ATOMIC,
and
we need to evaluate
successively
buffer.
Type
implies
that
buffer.
see that
of transactions
If we look
with
during the
of a replaced
number
page
being
follows
modified
modified
of
page
is
it is not in the a geometric
of references
of a page being
that
stream
C the
when
its life in the buffer
average
at the
probability
1 - C it is referenced
Since the probability
the probability
the number
to the
page
by a transaction
during
its life in the
is I_: p,n = 1 - (I - f_,p,,)I/(1-o)
The cost of logging the BOT/EOT
is simply
records
the cost of writing
before-
their The
RDA
recovery,
before-images number
probability
pages
Hence the formula
that
logged.
of references that
any
of modified
pages
and
to the log: ct -- 4(2spu
With
and aSter-images
have
Therefore that
could
one of those
been stolen
÷ 2).
from the buffer
before
EOT
do not have
we need to evaluate
the probability
cause
to be stolen
is (1 - C)s(P
replacement
of the
references
a given
page
causes
the
Ps for a page being
for Ps is:
p, = 1 -
(1
1 ) (1-C)s(P-I) -Cs
B
11The same equation for p,_ was derived in [14] using a slightly different axgument.
18
page
to have stolen.
- 1) and is 1/(B
-
the Cs).
In the formulafor p_, be logged
with
the value of K is Psf_p,,ps/2.
probability
p,(1
-Pz).
file contains transaction
both
before-
is found.
axe still in the buffer.
and after-images
one difference
RDA
recovery,
difference
backout
and the expression
a checkpoint
for -_RDA and for RDA
after
All transactions of transactions
-1
executed
is given
by:
+ 2), + 2).
we assume
that
a crash
since the last checkpoint during
a checkpoint
+ 4spu) + P fu(ct/4 record
which
with the P_DA recovery
+ 4sp,,) + PA(c[/4
value of the optimal
Equation
to the EOT
from a crash
c; = (r'd2)A(c_/4 The
- C) + 4
occurs have
interval,
in the middle
to be redone.
rc is given
of a
Let rc
by rc = I/ct
for cs is:
term corresponds
cost of recovery
a crash,
executed
Cs = (rc/2)ft,(ct/4 The
pages to be undone
spup'] )+pu(s/2)((4+2pt)(1-C)(1-ps)+6pspt+5ps(1-p/))+4
the cost of recovery
number
the
is that the log
becomes:
c'c = (4 + 2pt)(Bpm
denote
scheme
C the modified
+ Pf_, + 4p,,(s/2)(1
cc = 4(npm
interval.
with the FORCE
is that with probability
the cost of transaction
The cost of performing
checkpoint
is:
Hence:
eL = 2x (pt, s/2)(Pfu)+Pfu+Pfu(pt-p[
To evaluate
with RDA recovery
page will not
which will be read until the BOT record of the aborting
cb = 2 X (p,,s/2)(Pf,,)
With
of a modified
- p.(1 - pO)+ 2)+ 4(p -
out a transaction
Another
before-image
Hence the cost of logging
= 4( p. + For the cost of backing
The
checkpointing
+ (s/2)p_,(4(1 interval
+ 4(s/2)pu
is accounted technique
-
1)
for in ct/4
but
- p,) + 4pspt + 5ps(1 - pt)) -
I is obtained
- Pf_,(ct
+ 4(s/2)p,,) 19
- PA)/(f_(ct
The
is:
by plugging
1) + S/N.
the expression
1. This yields:
I = (2ctcc(T
is not read.
+ 48p_))) 1/2.
for c_ in
High update
High retrieval
frequency
frequency
399700-
T h
T h
70120 -
r
0 U
64540 75700 --
g h P
337960-
r o u
276220-
g h P
214480
U
U
t
t 53380 58960
rt
47800
_RDA
i 0.0
i 0.2
l 0.4
I 0.6
Figure The formula
0.0
Figure
takes
10 shows
10: Results
place
the
for -ATOMIC,
STEAL,
recovery is derived
significant
in this case.
the
-,FORCE,
ACC
the situation
the old version
results
not
type
I
I
I
0.2
0.4
0.6
for both
However
algorithm
is reversed
1
in a similar
of the data
environments.
the
interesting
outperforms
and
the
_FORCE,
because
algorithms
in which
fashion.
The
value of a in the
with the -,FORCE
discipline,
It can be seen
that
FORCE,
algorithm
ACC
any more in the
is that
while
the
buffer.
improvement
without
TOC scheme,
outperforms
1.0
C
is not available
result
the latter
I
0.8
Communality,
of c_ and c_ is 4 for -_RDA and 4 % 2pl for RDA
replacement
is used,
91000
I 1.0
C
for I in the case of RDA
expressions
_RDA
I 0.8
Communality,
when
152740
rt
/
RDA
when
the former
RDA
is
recovery, recovery
by a significant
margin.
5.3
Record
In this
Logging
section
we look
at recovery
unit of transfer
between
main memory
is performed, additional denotes
logged parameters
the number
of a long log entry as a table
entry;
records of the of update
are encapsulated system
denotes
need
statements
such as a data Ibc
and secondary
the length
storage
into pages
e denotes
of the BOT 2O
then
for the r denotes
the average and
records
is still a page
and
to be introduced per transaction;
record;
only modified
EOT
written analysis
are logged.
however,
length records;
when logging
to the log file. of record
the average
length
of a short Ip denotes
The
Some
logging:
d
(in bytes)
log entry
such
the length
of a
physical
page;
are taken
lh denotes
from
for low update
[14]. These frequency
was set to 4. Assuming average
length
the length values
of a log chain
r -
each update
of a log entry
The values
are: d = 3 for high update
environments, that
header.
100, e -
statement
can be derived
10,
frequency
environments
16 and lp = 2020.
lbc --
causes
for the first five parameters
one long log entry
The
and that
and
d = 8
value
for lh
s > d, the
[14]:
L=(dr+(s-d)e)/s.
5.3.1
Algorithm
of the
With
record logging,
Type
the locking
-.ATOMIC, granule
is used in order to enhance
concurrency.
a given
set of P concurrent
transaction
locking
was assumed.
Appendix. so that
log. The derivations equations
This implies
without
of the detailed
TOC We assume
that the total
number
is not the same as for the above
We will denote
from different
FORCE,
can be less than a page.
this number
The value of K in the expression
log records
STEAL,
of Pl is sJ2.
transactions
cost equations
by su.
can be grouped
are similar
to those
in Section
commit
in the is used
page and written 5.2.1.
by
for which page
for su is derived
that group
in the same
record locking
of pages modified
algorithms
An expression We assume
that
We simply
to the list the
explanation.
el
-_
3sp_, + 4 x 2(21bc 4" 8pu(Ibc 4- L))/Ip
c_
=
(3 + 2pt)_p. + 4(21b_+ _p.(tb¢+ L))/l_ + 4(21b_+ _p.(t,_ + L)p_+ (Ib_+ lh)(p_-- p;"°))/l,
Cb
"-
P ft,(Ibc + 8pu(Ibc -4"L)/2)/lp
C'b = C$
--"
!
Figure recovery
5.3.2
The
Pfu(Ibc
+ spu(Ibc -4-L)pl/2
+ 4(p_,s/2)
+ 4
+ (Ibc + la)(Pt -- p_1'"))/Ij, + (p_,s/2)(6pt
+ 5(1 -- Pl)) + 4
P/.(21b_ + _p.(Ib_+ L))/l_ + 4P f.(p.,/2) Pf.(2tb¢ + 8p.(Ib_+ L)pt + 2(tbo+ lh)(Pt -- pF'))/Z, + (Pf.p.,/2)(4p_ + 5(I - p_)) 11 shows
the throughput
as a function
Algorithm
cost equations
for the FORCE,
of the communality
of the
Type
in the buffer
-.ATOMIC,
for this case can be derived
value of K in the expression
TOC type of algorithms
STEAL,
using
for Pl is s_,ps/2. 21
with
for the case of record
-"FORCE,
the results
and without
RDA
logging.
ACC
of Sections
5.2.2
and
5.3.1.
The
High retrieval High update
frequency
frequency 1102500
215900
T h r o u
g h P
202840 189780
h r To u g
905240 707980
1767204
P
510720
Ry
U
U
t
t rt
rt
163660
313460 /
150600 0.2
0.0
0.4
0.6
Communality, Figure
11: Results
, 1.O
0.8
116200
--RDA
i
0.0
I
0.2
]
0.4
1
0.6
Communality,
C
for -,ATOMIC,
STEAL,
FORCE,
=
4(2/bc + spu(lbe + 2L))/lp
c_
=
4(2/bc + sp_,(lbc + L(2 - p.(1 - Pt))) + (lbe + lh)(pt -- p[SP"P']))/Iv
1.0
C
TOC, in the case of record
el
I
0.8
logging.
cb = Pf.(ci/8) + 4p.(s/2)(1 - c) + 4 ctb =
Pfu(c_/8)
+ pu(s/2)((4
= c'.
+
=
(rd2)f.(c_/4
The equations be modified the buffer the page
before
can be replaced. We have
by the concurrently replacing
EOT.
+ 4spu) + Pf.(c_/4
for the extra The
executing
transactions expression
for c,_ and c u are obtained
of a stolen
where
5.2.2.
in logging
the proportion
- Cs),
I
as in Section
record
Let Pl denote Pi = s_/(B
+ p_,(s/2)(5ps(1
cost involved
modified
P with P - 1 in the
the equations
- C)(1 - p,) + 6pspl + 5p.(1
- Pt)) + 4
+ Pf.(c /4 +
for cc and c' are the same
to account
transactions.
+ 2pt)(1
- PI) + 4(1 - p,(1
The equations modified
page needs of replaced
records
of pages
fashion:
=
s(1 - C) + 4s(1 - C)(p,n
c'r
=
S(1--C)+4S(1--C)(prn+2pipt)
22
+ 2pi)
c_ need stolen
to
from
to the log before by uncommitted
in the
transaction,
for s_. This gives the following
c_
in pages
pages modified
as seen by an incoming
in a similar
for c_ and
to be written
s_ is the number
- pt))))
buffer
modified
s_ is obtained
equations
for cr and
by I
c_,
High update
High retrieval
frequency 1475600
1945400
T h
1576520
0 U
1207640
T h
r
g h P
frequency
838760-
1203100
r o u
930600
g h P
658100 -
u
U
t
t 469880-
rt
101000 0.0
I
I
I
I
I
0.2
0.4
0.6
0.8
1.0
Communality, Figure
Figure RDA
12: Results
12 shows recovery
Unlike
the
TOC
scheme
",FORCE, than
page
logging
for the
ACC
cost of logging
is about
range
Figure of the
environment
with
",FORCE,
ACC
in the scheme
in throughput
reduces
that
page
of RDA
I
i
0.4
0.6
I
accessed
buffer
of algorithms for both much
in typical
applications
better
by using
increases
the need
RDA
by each
transaction
and
cost
than
the
[15].
Also, for the
recovery
of logging in most
by RDA
(s) for the high
FORCE,
is higher
logging, non
the
stolen
cases.
For
in throughput
of work performed
achieved
without
environments.
with record
for logging
with the amount in throughput
23
with
and for C = 0.9, the increase
increase
C = 0.9.
to the
1.0
C
evaluation
performs
is high relatively
i
0.8
case of record logging.
This is the case because,
environment
recovery
the percent
of pages
I
0.2
in the
type
achieved
cost by eliminating
frequency
ACC,
ACC
of C encountered
of a stolen
13 shows
number
--,FORCE,
with page logging.
updates
14%. The benefit
-',FORCE,
communality
the increase
algorithm
recovery
STEAL,
of values
for the high update
transaction. function
the
and RDA
example,
case,
0.0
Communality,
for the
of the
algorithm,
for the same
pages
throughput
as a function
113100
C
for -',ATOMIC,
the
_A
385600 -
7"t
by each
recovery
update
as'a
frequency
-_FORCE,A CC,
record logging
70.0 -
% 57.2 i
44.4-
n c r
e
31.6 -
a s
e
18.8 6.0 5
I
I
I
i
15
25
35
45
Number Figure
6
13: Benefit
of RDA recovery
as a function
of pages
accessed,
s
of the number of pages referenced
by a transaction.
Conclusions
In this paper, from
we have presented
media
transaction
failures aborts
a large fraction
a scheme that uses redundant
in database and system
systems crashes.
of pages modified
and
simultaneously
The redundancy
by active transactions thus reducing
recovery
component.
uses a twin page scheme
used in transaction
of the size of the database, We used a detailed
N being
analytical
with redundant
disk arrays.
combined
RDA recovery
recovery
with
as well as -,FORCE,
actions
to store the parity
The extra
storage
from
to allow
to disk and updated
the number of recovery
undo recovery.
for recovery
in place
performed
by the
information
so that
used is about
(lO0/N)%
the number of disks in the array.
model
We found
to evaluate that,
significantly ACC
support
in the array is exploited
to be written
the need for undo logging
it can be efficiently
provide
present
without
The method
disk arrays to achieve rapid recovery
the benefit
of our scheme
in the case of page logging, outperforms
type of algorithms.
a -,FORCE,
ACC
algorithm
performs
best and
that
significantly
its performance
especially
for transactions
24
a FORCE,
a FORCE,
TOC
algorithm
In the case of record the addition
in a system
logging,
of RDA recovery
with a large
number
equipped
TOCalgorithm without
RDA
we found
that
to it improves
of updated
pages.
Appendix Derivation
of the
s_ is the number denote
the number
update
transactions
Pfy
update
of pages
in the buffer. 1 update
recurrence
equation:
Su
in the buffer
executing
transaction
to the k -
for
updated
by a set of P concurrent
of pages in the buffer updated
transaction
kth update already
Formula
concurrently
from
1 to Pf_
enters
in the system,
the
in the system, then when
the
pages,
in the system n.
are Pf_
If we number
of the spy pages it needs
that out of those
executing
Let S (k)
Since there
we have su = S (Pf").
it will find Cspu
We make the assumption already
transactions.
in the order of their entry
the system,
transaction
by k update
transactions.
Cspu
Hence,
to modify
× S(k-1)/B
belong
we have the following
S (k)- S (_-I)= spu(l - CS(k-*)/B)
Using S(*)= spy, we obtain sy = S(PI")=B(I--
(i --Cspy/B)PA).
References [1] D. Bitton and J. Gray, "Disk shadowing," in Proceedings on Very Large Data Bases, pp. 331-338, Sept. 1988. [2] J. Gray,
B. Horst,
and M. Walker,
"Parity
striping
with acceptable throughput," in Proceedings Large Data Bases, pp. 148-161, Aug. 1990.
of the lgth
of disk arrays:
of the
International
Low-cost
16th International
Conference
reliable
Conference
storage on
[3] D. Patterson, G. Gibson, and R. Katz, "A case for redundant arrays of inexpensive (RAID)," in Proceedings of the ACM SIGMOD Conference, pp. 109-116, June 1988. [4] J. Gray, P. McJones, M. Blasgen, B. Lindsay, R. Lorie, T. Price, "The recovery manager of the system 1_ database manager," ACM no. 2, pp. 223-242, 1981. [5] J. Kent and H. Garcia-Molina, "Optimizing shadow ware Engineering, vol. 14, pp. 155-168, Feb. 1988. [6] R. A. Lorie, "Physical integrity vol. 2, pp. 91-104, Mar. 1977.
in a large segmented
[7] T. Haerder and A. Reuter, "Principles puting Surveys, vol. 15, pp. 287-317,
12Update
transactions
can
share
pages
because
recovery
database,"
of transaction-oriented Dec. 1983.
record
logging
25
is used instead
Very
disks
F. Putzolu, and I. Traiger, Computing Surveys, vol. 13,
algorithms,"
IEEE
ACM
Database
database
of page
Trans.
recovery,"
logging.
Trans.
Soft-
Systems,
A CM Com-
[8] M. Y. Kim, "Synchronizeddiskinterleaving,"IEEE
Trans.
Computers,
vol. C-35, pp. 978-988,
Nov. 1986.
[9] M.
Livny, S. Khoshafian, and H. Boral, "Multi-disk the A CM Sigmetrics Conference on Measurement
management and Modeling
algorithms," of Computer
in Proceedings of Systems, pp. 69-
77, May 1987.
[10] K.
Salem
and
Conference [Ii]
H. Garcia-Molina,
on Data
Engineering,
"Disk striping," pp. 336-342,
M. Stonebraker, R. Katz, D. Patterson, and ceedings of the 14th International Conference 1988.
[12] K.-L. Wu and W. K. Fuchs, agement,"
[13] A. Reuter, Software
in Proceedings
"Rapid
of IEEE
J. Ousterhout, "The on Very Large Data
transaction-undo Compsac,
in Proceedings
recovery
pp. 295-300,
[14] A. Reuter, "Performance analysis of recovery Systems, vol. 9, pp. 526-559, Dec. 1984.
on Database
techniques,"
design Bases,
using
of XPRS," in Propp. 318-330, Sept.
twin-page
storage
man-
recovery,"
IEEE
Trans.
ACM
Transactions
and T. Haerder, "Principles of database buffer management," Systems, vol. 9, pp. 560-595, Dec. 1984.
26
International
Nov. 1990.
"A fast transaction-oriented logging scheme for UNDO Engineering, vol. SE-6, pp. 348-356, July 1980.
[15] W. Effelsberg
of the IEEE
Feb. 1986.
on Database
A CM Transactions