RECOVERY PROCEDURES
COMMIT is the sucessful end-of-transaction operation.
Changes to data items are not made permanent until the
COMMIT issued by the TM is acknowledged by the DM.
.---------------------------.
|Transaction Manager(s) |
| |
`---------------------------'
^ :
: v
.---------------------------.
|SCHEDULER |
`---------------------------'
^ :
: v
.---------------------------.
|DATA MANAGER |
`---------------------------'
^ :
: v
.---------------------------.
|DATABASE ON DISK |
`---------------------------'
Following that ack, the DBMS must guarantee that the
updates will never be lost, no matter what happens!
(DB must be recoverable).
Techniques used by DBMS to guarantee recoverability to a recent
COMMITTED DB STATE
(all data items show the value written by a committed transaction and
the resulting state is consistent with the integrity constraints)
ROLLBACK (ABORT is the unsuccessful end-of-transaction operation).
All changes are undone using the LOG (or JOURNAL).
- the on-line LOG holds all updates as they are made.
- when on-line log fills up,
written to off-line log (usually on tape)
- LOGs can grow to be as large as the database itself.
Write-Ahead Logging (WAL) Protocol:
requires that a log record is physically written
with the "last committed value" on it,
before that item is changed (overwritten).
WAL protocol facilitates
"UNDO by re-intalling before-values" of all changes
(removes effects of abortd transaction)
TYPES OF FAILURES
Transaction local (ABENDS, NSF check)
System failures (DBMS itself fails)
Media failure (disk crash)
TRANSACTION FAILURE (transactions themselves are responsible for action)
eg, Abnormal program ends (ABENDS),
Non-Sufficient Funds (NSF)
Transaction code can can trap these and specify remedy (e.g., ROLLBACK).
However, in order to facilitate proper transaction actions,
system must hold all output messages until COMMIT.
Otherwise, this can happen:
__________
| A T M |
///// | | !@#$%!
| O ` | o o o | ....ROLLBACK!
| > | o o o | ._BANK.__
| `-| | o o o | | o o |
`----' | _______ | | _ |
| | / $ / | | ' ` | ___
|-------/___/ | `---' | |
| | | |----| NSF|
| | | | |____|
^ | | |_____ __
| | | | ----- | |
| | | | | | |
L L |__________| | L |
At an ATM cash machine, if the "message" (the cash)
is given to the user before commit,
it is impossible to ROLLBACK the transaction.
(Why not just ask the user to put the money back in the machine?)
SYSTEM FAILURE
DBMS itself fails,
memory contents are lost (buffers)
data on disk is undamaged
The Data Manager is allowed to do its job any way it wants to
(to optimize its activity). That's the reason for the
component separation in the first place (instead of monlithic
system)
DM can be implemented so that
Disk may contain some "uncommitted values" and/or
disk may not contain all committed values.
Disk may contain uncommitted values if a STEAL policy is used.
STEAL policy: Buffer Mgr can replace a page which still has uncommitted values
(write a page to disk that contain uncommitted values)
(actually "stealing" a page from 1 trans and give it to another)
(Necessary for very long running trans e.g., a payroll processing)
Disk may not contain all committed values if a NO-FORCE policy is used.
NO-FORCE policy: Buffer Mgr may not write a page with newly committed values
until later. (e.g., In a Banking system, may not be able to afford
to force every write immediately)
BUFFER POLICIES:
\ FORCE | STEAL
.--------------
| YES | YES
| YES | NO
| NO | YES < - the hardest to implement but the best!
| NO | NO
Although there are system that use either a NO-STEAL or a FORCE policy
(or both), we discuss only STEAL, NO-FORCE
(STEAL NO-FORCE requires the most demanding recovery system).
In a STEAL NO-FORCE system:
All transactions active at fail-time
(BEGUN, not ENDed) must be UNDONE.
(because some of the changes it made may
have been written under the STEAL policy).
All transactions committed at fail-time
must be idempotently REDONE
(because the committed changes it made may
have not been written under the NO-FORCE policy).
One way is to UNDO all active transactions and then idempotently
REDO all committed transactions.
Do we have to go all the way back to IPL
(Initial Program Load) and REDO all committed transactions?
Can that be avoided?
YES!
Through checkpointing!
System Periodically takes a CHECKPOINT
There are many, many checkpointing methods, here is one;
"Standard" CHECKPOINT:
Usually at a quiescent point in time (no activity going on),
but not necessarily (i.e., there are "on-the-fly" checkpointing
methods, but they are very complex).
1. forcewrites all buffers to disk immediately ("flushes buffers")
2. forcewrites a "checkpoint" record to log.
CHECKPOINT record must have an "active list":
(all currently active transactions)
TRANS |-->|"ca-chunk" |
.-|"change record" |
: |"ca-chunk" |
log : |"COMMIT *-1st then|-.
record : | . | :
.---|"check-point" 1->| : ////
: : | .________|<'/|(- -)-
: : |------. | | / O `-' /
2 `>| log | |database|/ `._|_/
:.<-| buff| | buffer | |
:* |______|__|________| |
@@@ :: / ) ^
@ o > :: ( / | |
@`._- :: `----'| | |
_|__ :: \___/ L L
/ ( ) `.: |
| :: V
/^\ :: | | disk copy|
--- :`>tr-log | | database |
L L `>_chpt-rec| |__________|
With standard Checkpointing (described above), of the following
which must be undone and
which must be redone?
Active
v
where |----->|
^ ^
BEGIN COMMIT
CHECKPOINT CRASH
| |
v v
T1 |------->|
T2 |----------------------------->|
T3 |--------------------------------->
T4 |----->|
T5 |------------------>
T6 |-------------->|
After Crash, RECOVERY PROCESS would:
1. Start at most recent Checkpoint record in LOG
containing ACTIVE-list={T2,T3,T6}
UNDO-list = ACTIVE-list e.g., UNDO={T2,T3,T6}
REDO-list = empty
2. Scan forward in the LOG from CHECKPOINT record.
For each BEGIN encountered, put trans in UNDO-list (UNDO={T4, T5}
For each COMMIT encountered, move trans from UNDO to REDO.
(e.g., move T4,T2)
3. When LOG is exhausted, Idempotently REDO
REDO-list in commit in order. (e.g., {T6, T4, T2} )
UNDO all trans in UNDO-list (e.g., {T3, T5} )
Note:
Since transactions are redone in commit-order = REDO-order,
it must be the case that the Serial Order to which execution
is equivalent is COMMIT order
That is, if another serial order is the order to which the
serializability is equivalent, the REDO must be done in that order.
In T2 and T4 above, messages may have gone back to the users
which were based on and execution order equivalent to SOME
serial order (values reported to users were generated by the
execution in that order). Thus, RECOVERY must regenerate
in the same order.
The only way that the RECOVERY process can know what serial order
the original execution was equivalent to is that the initial
execution be equivalent to some serial order identifiable from the LOG.
One order identifiable from the LOG is COMMIT order.
Therefore, it is common to demand that the order of execution be
equivalent to the serial COMMIT-order.
(S2PL does that. Is that why it is so popular?)
MEDIA FAILURE (from disk crash) RECOVERY
ARCHIVE: periodically dump database (i.e., make an ARCHIVE copy to off-line tape?):
1. Shut down the DBMS (e.g., late at night or during "quiescent" period)
2. Copy the entire database to off-line storage (tape)
3. Bring up the DBMS again
4. Erase the LOG and restart logging
__ | |
. . | disk copy|
| tape | <- - - - - -| of |
___.___ . | database |
|__________|
Following a media failure (disk crash)
1. RESTORE DB from archive,
__ | |
. . | disk copy|
| tape | - - - - - >| of |
___.___ . | database |
|__________|
2. REDO trans-log from archive-time to as near to crash-time as possible
(using both off-line & on-line log (on-line kept on separate disk)).
This is called ROLL-FORWARD
________COMPUTER_______
LOG |- - - ->| "ca-chunk" |
_____| | "ca-chunk |
| "redo transaction" |
|---------. .---------|
| log | | database|
| buffer | | buffer |
|_________|___|_________|
There are many other methods.
DUPLEXING = make two copies of every data item
on separate disks (at least separate failure modes).
The amount of extra disk space used can be reduced by methods
such as Huffman coding to as low as 5% extra disk space,
however,
in this, the Age Of Infinite Storage is it worth doing?
Huffman coding is used in some in RAID systems.
(Redundant Arrays of Independent Disks)
storage
APPENDIX
--------
Storage past, present and future:
In 1956, IBM developed RAMAC, a refrig sized disk system with
50 2-ft diam platters. RAMAC had a capacity of 5 megabytes.
Since then:
- amt of data stored on given area has increased 1,000,000-fold
- the transfer speed has increased 3,000-fold
- the cost per bit has decreased 500,000-fold (comparable $s).
This has been achieved through breakthroughs in
- "areal density" (# bits/sq in)
- revolution speeds
- read-write head technologies
How much more higher can disk capacity go?
So far predictions of "upper limits" have been made by
engineers and they have always been wrong (way wrong).
We are approaching a limit determined by fundamental physics,
not engineering ingenuity.
There comes a point beyond which random jiggle of electron spins
due to temperature is likely to cause the directions of bit's
magnetization to spontaneously reverse within the expected
livetime of the disk. This is called the SUPERPARAMAGNETIC LIMIT
which may limit the progress that can be =======================
achieved through minaturizing or the
"scaling down" of existing technologies.
Where is the superparamagnetic limit?
Most agree it will be encountered at densities ~120 Gbits/sq_in.
At 6.5 sq_in per 3.5 inch surface, that gives ~ 800 Gb/surface.
or ~ 100 GB/surface
times 50 surfaces, we can conclued
that a 3.5 inch hard-drives may go to 5000 GB/disk= 5 TB/disk
************************************************************
Note that COMMODITY drives today have reached 500 GB/drive:
so another x10 and we're there with commodity drives!!!
*************************************************************
Indexing and providing reference paths and access paths to
data stores of this size is nearly impossible!
What are we going to do??
Holographic storage?
(From
holographic storage 1
holographic storage 2
"Storing data as holograms has intrigued scientists for decades.
In the early 1960s, former Almaden Research Center scientist
Glenn Sincerbox helped IBM develop the world's first working
holographic data storage system
- a write-once-read-many (worm) technology using
photographic film, for US Air Force.
Today, IBM participates in two industry/university/government
consortia that aim to demonstrate holographic storage
technologies by the turn of the century.
A traditional hologram is produced when a
beam of laser light, the reference beam, interferes with another
beam reflected from the object to be recorded.
The pattern of interference is captured by photographic film,
a light-sensitive crystal or some other optical material.
Illuminating this pattern by the reference beam reproduces
a three-dimensional image of the object.
(this technology is called "interfereometry")
Each viewing angle gives you a different view of the same object
Holographic data storage works in exactly the same way.
But for every angle, instead of having another view of object,
we have a completely different page of information."
Up to 10,000 pages have been stored in a single cube of
recording material 1 cm on a side.
Each page contains one megabit of information,
which means that the cube can store ~10 gigabits.
Since there are approximately 27 cubic cms in a cubic inch
and there are approximately 46 cubic inches in a 3.5 inch cube
(3.5 inch diskettes piled 3.5 inches high) that means a
3.5 inch cube holograph would hold ~12 terabits of data.
============
Holographic recording has the advantage of being
inherently non-linear (parallel).
It reads and stores an entire page at a time.
The technology permits data rates of up to one gigabit
(or 125 megabytes) per second,
making it ideal for storing image data.
Another advantage of holographic storage, largely untapped,
lies in its use as associative memory. Just as illuminating
a hologram with a reference beam recovers the stored info,
illuminating it with a pattern of info will reproduce the
corresponding reference beam and angle, which immediately
identifies the page on which the information is stored.
In other words, holographic memories can be searched
extremely quickly for data patterns (associative memory).
This would allow database searches using physics rather
than software.
Note that holographic storage may make the current access
path technologies (indexes) obsolete.
Why would anyone use indexes. hash functions, SQL,
Relational Alg, Relational Calc....
when you can simply pattern match in a holo cube?!?!?!?!
It should be interesting.
Spintronics is another solution;
IBM and Stanford University are putting their heads together
on a new microelectronics technology dubbed "spintronics"
that promises breakthroughs in computer processors and other
electronics components while extending Moore's Law forchip design.
In setting up a spintronics lab, researchers at the two
organizations plan to control the spin, or magnetic orientation,
of electrons within nano-scale electronic structures comprisingz
super-thin layers to produce devices for low-power switching
and non-volatile information storage.
Magnetic Properties
Electron spin is a quantum property that has "up" or "down" states.
Aligning spins in a material creates magnetism, and magnetic fields
affect the passage of electrons differently. Understanding and
controlling this property is central to creating a whole new breed
of electronic applications.
Among the possibilities are reconfigurable logic devices,
room-temperature superconductors and quantum computers.
The first commercial products, ranging from digital cameras to
instant-on computers, will not be available for at least five years.
Current chip technology relies on the charges of electrons in
circuitry, explained Mike Ross, a spokesperson for IBM's
Almaden Research Center. Spintronics uses the quantum
"spin" property of electrons to create magnetism, just as
an electron's negative charge property creates electricity.
MRAM In the Works
By designing and making stacks of different materials -- some
with layers only two to three atoms thick -- researchers can
create devices that have novel properties. The spintronic GMR head,
for example, has boosted the disk-drive industry, Ross told NewsFactor.
"This sensitive magnetic sensor, introduced by IBM, has resulted in
a 40-fold increase in data storage in the past seven years," he said.
Magnetic RAM (MRAM) is the next spintronic device in the works.
It has the potential to be a non-volatile memory that runs circles
around non-volatile Flash memory typically used in cell phones,
memory cards and other products. Current fast memory (SRAM,
SDRAM, etc.) technology is volatile, meaning that devices
must be booted up to save data.
"We want to learn more about using this technology in the
sensor realm, and we see big benefits to logic and other
types of electronics circuits," said Ross.
The IBM-Stanford Spintronic Science and Applications Center
(SpinAps) will involve about a half-dozen Stanford professors
and a similar number of IBM scientists. Research projects are
funded by the two partners and agencies, including the Defense
Advanced Research Projects Agency, the U.S. Department of Energy,
and the National Science Foundation.
RAM Revolution
Spintronics "has quickly revolutionized magnetic recording technology
and is going to revolutionize random access memory (RAM),"
University of Utah physics researcher Jing Shi told NewsFactor.
Compared with electronic computers, computers with spintronic
memory should be able to store more data, process it faster,
and consume less power.
Spintronics also may yield "instant-on" computers.
Aligned spins stay aligned until a magnetic field changes
them -- even if a computer is shut off. Consequently,
spin-based instant-on computers do not require booting
to move data from the hard drive to the memory.
The data never left.