RECOVERY PROCEDURES


COMMIT is the sucessful end-of-transaction operation.


Changes to data items are not made permanent until the


     COMMIT issued by the TM is acknowledged by the DM.




  .---------------------------.
  |Transaction Manager(s)     |
  |                           |
  `---------------------------'   
  ^                           :              
  :                           v
  .---------------------------.
  |SCHEDULER                  |
  `---------------------------'
  ^                           :
  :                           v
  .---------------------------.
  |DATA MANAGER               |
  `---------------------------'
  ^                           :
  :                           v
  .---------------------------.
  |DATABASE ON DISK           |
  `---------------------------'


Following that ack, the DBMS must guarantee that the
 updates will never be lost, no matter what happens!
 (DB must be recoverable).




Techniques used by DBMS to guarantee recoverability to a recent

 COMMITTED DB STATE

 (all data items show the value written by a committed transaction and
 the resulting state is consistent with the integrity constraints)




ROLLBACK    (ABORT is the unsuccessful end-of-transaction operation).

     All changes are undone using the LOG (or JOURNAL).


     - the on-line LOG holds all updates as they are made.

     - when on-line log fills up,
       written to off-line log (usually on tape)

     - LOGs can grow to be as large as the database itself.




Write-Ahead Logging (WAL) Protocol:

 requires that a log record is physically written
 with the "last committed value" on it,
 before that item is changed (overwritten).




WAL protocol facilitates

 "UNDO by re-intalling before-values" of all changes

 (removes effects of abortd transaction)




TYPES OF FAILURES

  Transaction local (ABENDS, NSF check)

  System failures (DBMS itself fails)

  Media failure (disk crash)




TRANSACTION FAILURE (transactions themselves are responsible for action)

  eg, Abnormal program ends (ABENDS),
      Non-Sufficient Funds (NSF)


  Transaction code can can trap these and specify remedy (e.g., ROLLBACK).


  However, in order to facilitate proper transaction actions,

    system must hold all output messages until COMMIT.
    Otherwise, this can happen:

         __________
        |  A T M   |
 /////  |          |         !@#$%!
|  O `  |  o o o   |     ....ROLLBACK!
|     > |  o o o   |    ._BANK.__
|  `-|  |  o o o   |    | o o |
`----'  | _______  |    |  _  |
  |     |  / $ /   |    | ' ` |  ___
  |-------/___/    |     `---'  |    |
  |     |          |       |----| NSF|
  |     |          |       |    |____|
  ^     |          |       |_____    __
 | |    |          |       ----- |  |
 | |    |          |         |   |  |
 L L    |__________|         |   L  |


At an ATM cash machine, if the "message" (the cash)
 is given to the user before commit,
 it is impossible to ROLLBACK the transaction.

 (Why not just ask the user to put the money back in the machine?)





SYSTEM FAILURE

DBMS itself fails,
 memory contents are lost (buffers)
 data on disk is undamaged


The Data Manager is allowed to do its job any way it wants to
(to optimize its activity). That's the reason for the
component separation in the first place (instead of monlithic
system)

DM can be implemented so that


Disk may contain some "uncommitted values" and/or
disk may not contain all committed values.





Disk may contain uncommitted values if a STEAL policy is used.

STEAL policy: Buffer Mgr can replace a page which still has uncommitted values
     (write a page to disk that contain uncommitted values)
     (actually "stealing" a page from 1 trans and give it to another)
     (Necessary for very long running trans e.g., a payroll processing)





Disk may not contain all committed values if a NO-FORCE policy is used.

NO-FORCE policy: Buffer Mgr may not write a page with newly committed values
         until later.  (e.g., In a Banking system, may not be able to afford
                        to force every write immediately)




BUFFER POLICIES:
                \ FORCE  | STEAL
                 .--------------
                 | YES   | YES
                 | YES   | NO
                 | NO    | YES  < - the hardest to implement but the best!
                 | NO    | NO



Although there are system that use either a NO-STEAL or a FORCE policy
 (or both), we discuss only STEAL, NO-FORCE

 (STEAL NO-FORCE requires the most demanding recovery system).




In a STEAL NO-FORCE system:


All transactions active at fail-time
    (BEGUN, not ENDed) must be UNDONE.

    (because some of the changes it made may
     have been written under the STEAL policy).



All transactions committed at fail-time
     must be idempotently REDONE

    (because the committed changes it made may
     have not been written under the NO-FORCE policy).



One way is to UNDO all active transactions and then idempotently
              REDO all committed transactions.


Do we have to go all the way back to IPL
 (Initial Program Load) and REDO all committed transactions?

              Can that be avoided?




YES!

 Through checkpointing!

 System Periodically takes a CHECKPOINT




There are many, many checkpointing methods, here is one;




"Standard" CHECKPOINT:

Usually at a quiescent point in time (no activity going on),

 but not necessarily (i.e., there are "on-the-fly" checkpointing
     methods, but they are very complex).


1.  forcewrites all buffers to disk immediately ("flushes buffers")

2.  forcewrites a "checkpoint" record to log.

       CHECKPOINT record must have an "active list":
                  (all currently active transactions)


TRANS  |-->|"ca-chunk"        |
         .-|"change record"   |
         : |"ca-chunk"        |
  log    : |"COMMIT *-1st then|-.
  record : |    .             | :
       .---|"check-point"  1->| :   ////
       : : |         .________|<'/|(- -)-
       : : |------.  |        | / O `-' /
       2 `>|  log |  |database|/  `._|_/
       :.<-|  buff|  | buffer |     |
       :*  |______|__|________|     |
  @@@  ::             /      )      ^
 @ o > ::            (      /      | |
@`._-  ::             `----'|      | |
  _|__ ::              \___/       L L
/ ( ) `.:                |
   |   ::                V
  /^\  ::         |   | disk copy|
  ---  :`>tr-log  |   | database |
  L L  `>_chpt-rec|   |__________|



With standard Checkpointing (described above), of the following

     which must be undone and
     which must be redone?




            Active
              v
     where |----->|
           ^      ^
       BEGIN      COMMIT




                    CHECKPOINT             CRASH
                       |                     |
                       v                     v
 T1      |------->|
 T2    |----------------------------->|
 T3        |--------------------------------->
 T4                       |----->|
 T5                       |------------------>
 T6           |-------------->|

After Crash, RECOVERY PROCESS would:

1. Start at most recent Checkpoint record in LOG
                 containing ACTIVE-list={T2,T3,T6}

   UNDO-list = ACTIVE-list   e.g., UNDO={T2,T3,T6}  
   REDO-list = empty

2. Scan forward in the LOG from CHECKPOINT record.
   For each BEGIN encountered, put trans in UNDO-list  (UNDO={T4, T5}
   For each COMMIT encountered, move trans from UNDO to REDO.
                                          (e.g., move T4,T2)

3. When LOG is exhausted, Idempotently REDO
      REDO-list in commit in order.       (e.g., {T6, T4, T2} )
      UNDO all trans in UNDO-list         (e.g., {T3, T5} )


Note:

Since transactions are redone in commit-order = REDO-order,
 it must be the case that the Serial Order to which execution
 is equivalent is COMMIT order
 
 That is, if another serial order is the order to which the
 serializability is equivalent, the REDO must be done in that order.



In T2 and T4 above, messages may have gone back to the users
 which were based on and execution order equivalent to SOME
 serial order (values reported to users were generated by the
 execution in that order).  Thus, RECOVERY must regenerate
 in the same order.

The only way that the RECOVERY process can know what serial order
 the original execution was equivalent to is that the initial
 execution be equivalent to some serial order identifiable from the LOG.

One order identifiable from the LOG is COMMIT order.
 Therefore, it is common to demand that the order of execution be
 equivalent to the serial COMMIT-order.

 (S2PL does that.   Is that why it is so popular?)
        





MEDIA FAILURE (from disk crash) RECOVERY


ARCHIVE: periodically dump database (i.e., make an ARCHIVE copy to off-line tape?):


1.  Shut down the DBMS  (e.g., late at night or during "quiescent" period)


2.  Copy the entire database to off-line storage (tape)


3.  Bring up the DBMS again


4.  Erase the LOG and restart logging





         __                |          |
       .    .              | disk copy|
      | tape | <- - - - - -|   of     |
    ___.___ .              | database |
                           |__________|






Following a media failure (disk crash)


 1. RESTORE DB from archive,

         __                |          |
       .    .              | disk copy|
      | tape |  - - - - - >|   of     |
    ___.___ .              | database |
                           |__________|



 2. REDO trans-log from archive-time to as near to crash-time as possible
         (using both off-line & on-line log (on-line kept on separate disk)).


    This is called ROLL-FORWARD


               ________COMPUTER_______
LOG  |- - - ->|   "ca-chunk"          |
_____|        |   "ca-chunk           |
              |   "redo transaction"  |
              |---------.   .---------|
              | log     |   | database|
              | buffer  |   |  buffer |
              |_________|___|_________|







There are many other methods.





DUPLEXING = make two copies of every data item
            on separate disks (at least separate failure modes).



    The amount of extra disk space used can be reduced by methods
    such as Huffman coding to as low as 5% extra disk space,

    however,

    in this, the Age Of Infinite Storage is it worth doing?



    Huffman coding is used in some in RAID systems.
    (Redundant Arrays of Independent Disks)





storage








APPENDIX
--------

Storage past, present and future:


In 1956, IBM developed RAMAC, a refrig sized disk system with
    50 2-ft diam platters.  RAMAC had a capacity of 5 megabytes.


Since then:
 - amt of data stored on given area has increased 1,000,000-fold
 - the transfer speed has increased 3,000-fold
 - the cost per bit has decreased 500,000-fold (comparable $s).


This has been achieved through breakthroughs in
      - "areal density" (# bits/sq in)
      - revolution speeds
      - read-write head technologies



How much more higher can disk capacity go?


So far predictions of "upper limits" have been made by
 engineers and they have always been wrong (way wrong).


We are approaching a limit determined by fundamental physics,
       not engineering ingenuity.


There comes a point beyond which random jiggle of electron spins
 due to temperature is likely to cause the directions of bit's
 magnetization to spontaneously reverse within the expected
 livetime of the disk.  This is called the SUPERPARAMAGNETIC LIMIT
 which may limit the progress that can be  =======================
 achieved through minaturizing or the
 "scaling down" of existing technologies.


Where is the superparamagnetic limit?


Most agree it will be encountered at densities ~120 Gbits/sq_in.

At 6.5 sq_in per 3.5 inch surface, that gives ~ 800 Gb/surface.
                                           or ~ 100 GB/surface
times 50 surfaces, we can conclued
that a 3.5 inch hard-drives may go to 5000 GB/disk= 5 TB/disk


************************************************************
Note that COMMODITY drives today have reached 500 GB/drive:
so another x10 and we're there with commodity drives!!!
*************************************************************


Indexing and providing reference paths and access paths to
  data stores of this size is nearly impossible!

  What are we going to do??



Holographic storage?

(From
holographic storage 1
holographic storage 2


"Storing data as holograms has intrigued scientists for decades.


In the early 1960s, former Almaden Research Center scientist
  Glenn Sincerbox helped IBM develop the world's first working
  holographic data storage system

  - a write-once-read-many (worm) technology using
    photographic film, for US Air Force.



Today, IBM participates in two industry/university/government
 consortia that aim to demonstrate holographic storage
 technologies by the turn of the century.



A traditional hologram is produced when a

beam of laser light, the reference beam, interferes with another
beam reflected from the object to be recorded.


The pattern of interference is captured by photographic film,
 a light-sensitive crystal or some other optical material.


Illuminating this pattern by the reference beam reproduces
 a three-dimensional image of the object.

 (this technology is called "interfereometry")


Each viewing angle gives you a different view of the same object



Holographic data storage works in exactly the same way.

  But for every angle, instead of having another view of object,
  we have a completely different page of information."

  Up to 10,000 pages have been stored in a single cube of
     recording material 1 cm on a side.

  Each page contains one megabit of information,
     which means that the cube can store ~10 gigabits.

Since there are approximately 27 cubic cms in a cubic inch
  and there are approximately 46 cubic inches in a 3.5 inch cube
 (3.5 inch diskettes piled 3.5 inches high) that means a

  3.5 inch cube holograph would hold ~12 terabits of data.
                                     ============

Holographic recording has the advantage of being
  inherently non-linear (parallel).


It reads and stores an entire page at a time.
 The technology permits data rates of up to one gigabit
 (or 125 megabytes) per second,
  making it ideal for storing image data.


Another advantage of holographic storage, largely untapped,
 lies in its use as associative memory. Just as illuminating
 a hologram with a reference beam recovers the stored info,

 illuminating it with a pattern of info will reproduce the
 corresponding reference beam and angle, which immediately
 identifies the page on which the information is stored.


In other words, holographic memories can be searched
 extremely quickly for data patterns (associative memory).

This would allow database searches using physics rather
 than software.

Note that holographic storage may make the current access
 path technologies (indexes) obsolete.

Why would anyone use indexes. hash functions, SQL,
 Relational Alg, Relational Calc....
  when you can simply pattern match in a holo cube?!?!?!?!

It should be interesting.


Spintronics is another solution;



IBM and Stanford University are putting their heads together
 on a new microelectronics technology dubbed "spintronics"
 that promises breakthroughs in computer processors and other
 electronics components while extending Moore's Law forchip design.

In setting up a spintronics lab, researchers at the two
 organizations plan to control the spin, or magnetic orientation,
 of electrons within nano-scale electronic structures comprisingz
 super-thin layers to produce devices for low-power switching
 and non-volatile information storage.

Magnetic Properties

Electron spin is a quantum property that has "up" or "down" states.
 Aligning spins in a material creates magnetism, and magnetic fields
 affect the passage of electrons differently. Understanding and
 controlling this property is central to creating a whole new breed
 of electronic applications.

Among the possibilities are reconfigurable logic devices,
 room-temperature superconductors and quantum computers.
 The first commercial products, ranging from digital cameras to
 instant-on computers, will not be available for at least five years.


Current chip technology relies on the charges of electrons in
 circuitry, explained Mike Ross, a spokesperson for IBM's
 Almaden Research Center. Spintronics uses the quantum
 "spin" property of electrons to create magnetism, just as
 an electron's negative charge property creates electricity.

MRAM In the Works

By designing and making stacks of different materials -- some
 with layers only two to three atoms thick -- researchers can
 create devices that have novel properties. The spintronic GMR head,
 for example, has boosted the disk-drive industry, Ross told NewsFactor.

"This sensitive magnetic sensor, introduced by IBM, has resulted in
 a 40-fold increase in data storage in the past seven years," he said.

Magnetic RAM (MRAM) is the next spintronic device in the works.
 It has the potential to be a non-volatile memory that runs circles
 around non-volatile Flash memory typically used in cell phones,
 memory cards and other products. Current fast memory (SRAM,
 SDRAM, etc.) technology is volatile, meaning that devices
 must be booted up to save data.


"We want to learn more about using this technology in the
 sensor realm, and we see big benefits to logic and other
 types of electronics circuits," said Ross.

The IBM-Stanford Spintronic Science and Applications Center
 (SpinAps) will involve about a half-dozen Stanford professors
 and a similar number of IBM scientists. Research projects are
 funded by the two partners and agencies, including the Defense
 Advanced Research Projects Agency, the U.S. Department of Energy,
 and the National Science Foundation.

RAM Revolution

Spintronics "has quickly revolutionized magnetic recording technology
 and is going to revolutionize random access memory (RAM),"
 University of Utah physics researcher Jing Shi told NewsFactor.

Compared with electronic computers, computers with spintronic
 memory should be able to store more data, process it faster,
 and consume less power.

Spintronics also may yield "instant-on" computers.

Aligned spins stay aligned until a magnetic field changes
 them -- even if a computer is shut off. Consequently,
 spin-based instant-on computers do not require booting
 to move data from the hard drive to the memory.
 The data never left.