TRANSACTIONS


The terminology used in this section is that all users
    (online interactive users or batch programs) issue transactions to the DBMS.


A TRANSACTION is an atomic unit of database work specified to the DBMS.


A transaction is issued using constructs such as reserved words,

     BEGIN to initiate a transaction
           (most actual system supply the BEGIN if the user doesn't,
            e.g., whenever a new SQL statement is encountered
                  it is assumed to iniate a new transaction)

     END   to end a transaction (usually either COMMIT for successful END
                                            and ABORT  for unsuccessful END)
           (most actual system supply this element if the user doesn't, e.g.,
                  If   SQL statement execution is successful,
                  Then DBMS supplies COMMIT, else ABORT)

     READ   whenever any data is needed from the DB
            (e.g., in an SQL SELECT)

     WRITE  whenever any data needs to be written to the DB
            (e.g., in an SQL INSERT or UPDATE)




In this set of notes, all others aspects of language, coding, etc.
will be considered as un-intrepreted aspects.

For the purposes of transaction management (Concurrency Control and Recovery)
 we only need to consider this level of detail.




 ATOMIC unit of work  means that the DBMS guarantees that unit will be
 done to completion or not at all (in which case, the DB and User community
 will be left just as they were when the request came.  i.e., as if
 the unit of work never existed)





A Transaction is defined by all work specified between a
 BEGIN statement and the next encountered END (either COMMIT or ABORT)





When a transaction arrives at the DBMS, a  Transaction Manager (TM) 
 is assigned to it (code segment to act on its behalf).




The TM interfaces with other components, e.g., the   Scheduler (SCHED) 
for permission to access particular data items

    SCHED is like a policeman, giving permission
          to access the requested item(s).

    Its activity is called concurrency control.




Once permission is granted for TM to access data items Data Manager (DM) 
does the actual reads and writes.




There are several models for describing this interaction.  Here are two.


Model-1:

1. TM    makes requests to the SCHEDULER to read/write data item(s) or
         to commit/abort the transaction


2. SCHED decides if the request can be scheduled
         If yes it schedules request (passes it to DM (on TMs behalf)
         If no  it rejects it and informs TM.


3. DM    read/writes the data item or commits/aborts the transaction
            if possible, else returns reject to the SCHEDULER
                                        (which returns it to the TM).


4. DM    returns the value read (or returns an acknowledgement (ACK) of the
         write/commit request to the SCHEDULER


5. SCHED returns the same to the TM.




  .---------------------------.    There can be one TM multithreaded by
  |  Transaction Manager(s)   |    all transactions, or an individual TM
  |                           |    assigned to each individual transaction.
  `---------------------------'   
  ^                           :              
  : 2/3. reject               : 1. read,write,commit,abort
  : 5 value, write/commit_ack v
  .---------------------------.
  |     SCHEDULER             |
  `---------------------------'
  ^                           :
  : 3. reject                 : 2. read, write, commit, abort
  : 4 value, write/commit_ack v
  .---------------------------.
  |      DATA MANAGER         |
  `---------------------------'
  ^                           :
  :                           : 3. read, write
  :                           v
  .---------------------------.
  |   DATABASE ON DISK        |
  `---------------------------'




In this second model,

TM interfaces with SCHED for permissions and then after getting permissions,
TM interfaces with DM    for data access


SCHED interfaces with TM, not DM
      (so the scheduler is often named a LOCK MANAGER in locking system
       or just the ACCESS MANAGER)
          



  .-----------------------------.
  |  Transaction Manager(s)     |  
  `-----------------------------'
    ^value(read)  ^            |
    | ack/reject  |            |
   ||             |            | read, write, commit, abort
   ||             |ack/reject  v
   ||      .---------------------------.
   ||      |     SCHEDULER             |
   ||      `---------------------------'
   | 
   |read, write, commit, abort
   v 
  .---------------------------.
  |      DATA MANAGER         |
  `---------------------------'
    ^          
    |                       | read, write
    |value(read), ack       v
  .---------------------------.
  |   DATABASE ON DISK        |
  `---------------------------'





We assume the 2nd model most of the time in this section.

It just depends upon where the submodule that
does the actual read/write request to the DM resides (in SCHED or TM)






Background on Concurrency Control (the activity of the scheduler, SCHED):



We need "concurrency control" CC (AKA: mutual exclusion) whenever there are
 shared system resources that cannot be used concurrently

An illegal concurrent use of a shared resource is called a conflict


 e.g., a printer,
       a data item that one user wants to read and another wants to change.



IN DBMSs the shared resources we will call "data items".



"DATA ITEM GRANULARITY" is the level at which we treat concurrency control.


The possible Granularity levels are:

 field level    (logical level,  very fine granularity)
 record level   (logical level,  fine granularity)
 page level     (physical level, medium granularity)
 file level     (logical level,  coarse granularity)
 area level     (logical level,  quite coarse granularity)
 database level (logical level,  very coarse granularity)



We will assume, that a data item is a record
 (i.e., we assume logical, record-level granularity)



This means there are many more shared resources for DBMS to manage than there
are anywhere else, (e.g., printers for an O/S), and

CC is a harder problem to solve in a DBMS than anywhere else!


   A DBMS may have 1,000,000 records or more.

      An O/S manages ~ 50 printers.

      Ethernet Medium Access Protocol (unswitched) manages ONE bus







Transactions are often called QUERIES when they request only read access

       (i.e., QUERIES are READ-ONLY TRANSACTIONS)
 


Databases provide the user with the ability to define their
own "units of work" (i.e., the transaction) which will be executed
to completion or not at all (guarantee made by the DBMS).




In any resource management situation (OS, NOS, DBMS..)

 there are   "shared resources"   and
 there are   "users"              

              items and customers,  if you like.





SHARED RESOURCE MANAGEMENT


How can the system insure correct access to shared resources
 among concurrently executing transactions?


All answers seem to come from traffic managment!

  (traffic management intersections or construction zones or driveup windows)



WAITING POLICY: If a needed resource is unavailable,
                requester waits until it becomes available

                (e.g., intersection red light, Hardees drive up lane).



  This is how print jobs are managed by an OS


    Advantages: NO RESTARTING (no unnecessary loss of progress)
      e.g., At Hardees, they don't say "Go home! Come back later!



    Disadvantages:


    DEADLOCKS may happen unless they are managed.

        e.g., at a construction zone, if the two flag women
        don't coordinate, both traffic lines may start into
        the construction zones from opposite directions and
        DEADLOCK in the middle!)


    INCONSISTENT RESPONSE TIMES
 
        At the Hardees window, you may wait an hour or a minute.
        (Not so important at Hardees (well maybe it is? ;-),
         but it is very important at, e.g., at your local ER)




RESTART POLICY:
    If a needed resource is unavailable,
    Then requester terminates request and restarts request later.


        e.g., When someone goes before the parole board:
              They either get their request
              or they restart the process later  ( much later? ;-( 
       

        e.g., Ethernet (unswitched) CSMA/CD:
              If node A wants to send a message to node B:


       1. Carrier Sense (the "CS" part):

          check the wire for traffic.

          if busy (in use by another sender), wait
          (according to some "back-off algorithm")

          then check again, etc. until the bus is idle
          then SEND message



       2. Collision Detection (the "CD" part):

          listen to bus until you're certain that your message
          did not collide with another concurrently sent message

          (the required length of wait time is the traversal_time of wire,
           since there are terminators (absorbers) at each end)



    Advantages of restart policies: simple, no deadlock

    Disadvantages:  Lower throughput,
                    lost progress,
                    long delays?,
                    possible livelock






A Transaction = A computation or program taking the database from
 one consistent state to another (without necessarily preserving
 consistency at each step of the way)

    - The transaction is an atomic unit of database work,
      ie, DBMS executes transaction to completion or not at all, GUARANTEED.

      If only one transaction is allowed to execute at time
      and if the database starts in a consistent state (obeying all ICs)
      then it will always end up in a consistent state!

      The problem is, the above SERIAL EXECUTION is much too inefficient!



A DBMS (is supposed to) guarantee the
ACIDS PROPERTIES of transactions:


ATOMICITY:   A transaction is an all-or-nothing proposition.
                 Either a transaction is executed by the DBMS to
                 completion or all of its effects are erased completely.
                 (Transaction = atomic unit of database workload)

CONSISTENCY: Correct Transactions take the database from
                 one consistent state to another consistent state.

                 Consistency is defined in terms of consistency
                 constraints or "integrity constraints", e.g.,
                       entity      integrity,
                       referential integrity,
                       other       integrities.


ISOLATION:   Each user is given the illusion of being the sole user
                 of the system (by the concurrency control subsystem).


DURABILITY:  The effects of a transaction are never lost after it is
                 "committed" by the DBMS.  (ie, after a COMMIT request is
                 acknowledged by the DBMS) (requires RECOVERABILITY)




SERIAL EXECUTION  insures most of the ACID properties
        (Consistency and isolation for sure.
         It also helps in atomicity and durability).

   ie, queue all transactions as they come in (into a FIFO queue?)
   Let each transaction execute to completion before the next even starts.

   Serial execution may produce unacceptable execution delays
   (i.e., long response times) and low system utilization.




SERIALIZABLE EXECUTION  is much much better!


Concurrent execution of multiple transactions is called
  serializable if the effect of the execution of operations
  (reads and writes) within the transactions are sequenced
  in a way that is equivalent to some serial execution


 (i.e., is is as if it was done by a serial execution of transaction operations)


  Serializability facilitates ATOMICITY, CONSISTENCY and ISOLATION of
  concurrent, correct transactions, just as much as SERIAL does.





  A property called RECOVERABILITY facilitates
     DURABILITY (more on that later)


  An execution is RECOVERABLE if every transactions that commits,
  commits only after every other transaction it read-from is committed.




ISOLATION LEVELS 

SQL defines levels of isolation which are weaker
 than SERIALIZABILITY (they do not guarantee ACIDS properties
 entirely, but they are easier to achieve).



REPEATABLE READ  ensures that no value read or written by a transaction, T,
               is changed by any other transaction until T is complete;
               and that T can read only changes made by committed transactions.



READ COMMITTED  ensures that no value written by a transaction, T
               is changed by any other transaction until T is complete; and that
               T can read only changes made by committed transactions.


READ UNCOMMITTED  ensures nothing (T can read changes made to an
               item by an ongoing trans and the item can be further changed
               while T is in progress.




There will be further discussion on these later in these notes.
  For now, please note there are several suggested paper topics
  in the topics file concerning isolation levels.

  But also note that I think they are bunk!





   

CONCURRENT TRANSACTIONS = transactions whose executions
 overlaps in time (the individual operations (read/write
 of a particular data item) may be interleaved in time)


Again, the only operations we concern ourselves with are
 
  BEGIN, READ, WRITE, COMMIT, ABORT


READ and WRITE are the operations that apply to data items.

data item can be a field, record, file, or DB (logical granules)
or page (physical granule).  We assume record-level granularity.




A read(X) operation, reads current value of the data item, X,
  into a program variable (which we will also called X for simplicity).

Even though we will not concern our selves with these details in this section,
read(X) includes the following steps:
1. Find the address of the page containing X
2. Copy that page to a main memory buffer
     (unless it is already in memory)
3. Copy the value of the dataitem, X, from the buffer
      to the program variable, X





The write(X) operation, writes the value of the program variable,
    X, into the database item X.

It includes the following steps:
1. Find the address of the page containing X
2. Copy that page to a main memory buffer   (unless it is already in memory)
3. Copy the program variable, X, to buffer area for X
4. Write the buffer back to disk     (can be deferred and is governed by DM)




DBMSs should guarantee ACID properties
   (Atomicity, Consistency, Isolation, Durability).



This is typically done by guaranteeing the condition of
  SERIALIZABLILTY introduced above.

 - Database operations are scheduled so that changes
   to the database and output to users is equivalent
   to the changes and outputs of SOME serial execution.


If each transaction is correct by itself
 (takes a correct database state to another correct state),
 then a serial sequence of such trans will be correct also.




Thus, serializable executions or histories guarantee correctness,




Some important example of "incorrectness" problems,
 which can happen without proper concurrency control:


Allowing arbitrary interleaving of operations from concurrent transactions.


 (Note: We introduce Two Phase Locking concurrency control as solutions




LOST UPDATE: 

A powerpoint annimation of the example below.


Tammy deposits 500; Jimmy deposits 1000 in their joint account.

     @@@                         ///
   @ - - @                     | o o |
  @ ` ~ '                       ` - '
      |     ____       _____      |
    .( )---|$500|     |$1000|-----|-.
  .'  |    |____|     |_____|    ( ) `.
     / `.                         ^
    _____                        | |
     L L               JOINT     L L
                       ACCOUNT
T1 (dep $500)          BALANCE    T2 (dep $1000)
  workspace of T1      ON DISK    workspace of T2
0.                     $2000
1.          $2000 <- - $2000
2. add 500: $2500      $2000
3.          $2500      $2000 - -> $2000
4.          $2500      $2000      $3000 1000++
5.          $2500 - -> $2500      $3000
6.                     $3000 <- - $3000



SOLUTION?

LOCKING 
 = Each transaction must obtain a "lock" on an item
          (an access right obtained from a "lock manager"
           or "scheduler") before it can access that item.


NOTES:

 Concurrent read-read to the same data item can be done in either order
                      (no conflict exists).



     If T1: read1(x)  and   T2: read2(x) are concurrent,

     then in terms of changes to the database (none are made here)
       and messages to users (2 are made here), the same "effect" is
       produced regardless of order of execution of read operations.





 Concurrent read-write or write-write to the same data produce different
 results depending on the order

 (a conflict exists iff at least one operation is a write and the
  operations access the same item)


 Even if the operations themselves (the individual reads and writes)
 are made atomic by the Buffer Manager, there can still be "conflict"
 because different transaction results can occur.


    If T1: write1(x)  and   T2: read2(x) are concurrent,

    then in terms of changes to the database (one is made here)
       and messages to users (one is made here), different "effects"
       are produced by the 2 orders of execution of the operations.

       i.e., if write1(x) is done first,
             user-2 will get the value written to X by T1,

             while if write1(x) is done second,
             user-2 will get the initial value of X.
     


    If  T1: write1(x)  and   T2: write2(x)

    then in terms of changes to the database (two are made here),
      different "effects" are produced by the 2 orders of execution
      of the operations.

      i.e., if write1(x) is done last, the database will be
            left with the value written by T1,

            while if write2(x) is done last, the database will be
            left with the value written by T2,
     


Therefore, sometimes, to improve performance,
   we distinguish between locks for read-only access and
   locks for write-access by having two types of locks:



A lock for read-only access is read-lock or shared-lock (SLOCK).
   

A lock of write-access is write-lock or exclusive-lock (XLOCK).




SLOCKS are "compatible" or "non-conflicting":
                         if an SLOCK is held on a data item,
                         another trans can be granted a concurrent SLOCK


XLOCKS are "incompatible" or "conflicting":
                         if an XLOCK is held on a data item,
                         another trans cannot be granted a concurrent XLOCK
                                                     nor a concurrent SLOCK



Therefore the compatibility table is:


      \ Requester>           |
Holder \         >  SLOCK    |  XLOCK
vvvvvvvv\________>___________|___________
        |        |           |
        |        |           |
 SLOCK  |        |  yes      |   no
        |        |           |
________|________|___________|___________
        |        |           |
        |        |           |
 XLOCK  |        |  no       |   no
        |        |           |
        |        |           |



Or said another way, the conflict table is:


      \ Requester>           |
Holder \         >  SLOCK    |  XLOCK
vvvvvvvv\________>___________|___________
        |        |           |
        |        |           |
 SLOCK  |        |  no       |   yes
        |        |           |
________|________|___________|___________
        |        |           |
        |        |           |
 XLOCK  |        |  yes      |   yes
        |        |           |
        |        |           |


Both the Compatibility and Conflict tables give the very same info.

Sometimes you will see it given as a compatibility table and sometimes
as a conflict table.




Is locking with SLOCKS and XLOCKS enough Concurrency Control?

   NO!



Example:


INCONSISTENT ANALYSIS PROBLEM 


Tammy transfers $100 from savings to checking, concurrently the
 bank is running an audit on the 2 accounts (summing accounts)





A powerpoint annimation of the example below.





    @@@         ________         __/BANK)
  @ - - @      |ACCOUNTS|  _____  < $>  |
  @` - '       |--------| |AUDIT|  >___'
     |   ____  |CHECKING| |Ch__<==__.|
   .( )-|100 | |--------| |Sav__|----|
 .'  |  |____| |        | |Tot__|   (|
    / `.       |--------|            ^
  /_____`.     |SAVINGS |           | |
    L L        |________|           L L

              SAVINGS      CHECKING
T1 (trans)   BAL  LOCK    BAL   LOCK  T2 (audit)
 T1-wkspace                           T2-wkspace
0.           500          800 
1. 100 <- -  400  T-1     800
2. 100       400          800   T-2 - - >  800
3. 100       400  T-2  - - - - - - - - ->  400
4. 100       400          800           = 1200
5. 100 - - - - - - - - -> 900   T-1
6.           400          900           = 1200



SOLUTION? 


2-Phase Locking (commonly called "2PL") 



Locking as above, with the additional condition that each transaction
must acquire all its locks before releasing any of its locks.

 

Point in time at which a transaction releases it's first lock
 is called "lockpoint"


    In 2PL systems, the serial order to which
     the serializable order is equivalent, is lockpoint order.



Is Two-Phase Locking (2PL) enough concurrency control?

    NO!


             CHECKING   SAVINGS
T1 (TRANS)  BAL  LOCK  BAL  LOCK    T2 (AUD)
==========  ===  ====  ===  ====    ========

0.          500        800

1. 100 <- - 400  T-1   800

2. 100      400  T-1   800  T-2 - - -> 800

3. 100      400  T-1   800  T2 blocked

4. 100      400  T-1   800

5. 100      400  T-1   800  T1 blocked



In the above example, an impasse has been reached!  (called DEADLOCK)


Two-Phase Locking (2PL) is still not enough concurrency control.




DEADLOCK MANAGEMENT 


 We will look at many more deadlock management schemes later, but


The simplest deadlock management scheme, one that will always work, is
"TIMEOUT" which simply means that each transaction is given a set amount of
 time to complete.  If it fails to complete in that set amount of time
 (governed by a "timeout clock") it must abort and restart.



Is 2PL + deadlock mgmt enough CC?

  NO!


UNCOMMITTED DEPENDENCY PROBLEM (or cascading rollbacks) 

 Tammy deposits 500, then Jimmy deposits 1000,

 Tammy's transaction aborts after Jimmy's commits.




Note: In order it accomodate transaction "abort" or "rollback",

Write-Ahead Logging (WAL)  protocol is used:


    A changed database item cannot be written to the database disk until
    the "before value" (the value before the change took place) has been
    "logged" to a secure place (the system log - on a separate disk).






   @@@                             ///
 @ - - @                         | o o |
@ ` - '                           ` - '
    |     ____           _____      |
  .( )---|$500|         |$1000|-----|-.
 / /_`.                             ^
   L L                             L L

T1 (YOU Dep 500)   BALANCE  T2 (SIG Dep 1000)


0.                 $2000

1.       XLOCK

2.       $2000<- - $2000

3. 500++ $2500 - ->$2500 .  .  .   . WAL: before value, 2000, written to log

4.       UNLOCK

5.                          XLOCK

6.                 $2500- ->$2500

7.                 $3500<- -$3500 1000++

8.                          UNLOCK

9.                          COMMIT

10.     ROLLBACK (using logged "before-values")

11.                $2000




A Solution: Strict 2-Phase-Locking: 


            all locks must be held until COMMIT/ABORT





TRANS-1 (YOUR     BALANCE   T-2 SIG
 DEP OF $500)               DEP $1000
=============     =======   ========
0.                 $2000

1.       LOCK

2.       $2000<- - $2000

3. 500++ $2500 - ->$2500

4.       ABORT     $2000

5.                          LOCK

6.                 $2000->  $2000

7.                 $3000<-- $3000 $1000++

                            COMMIT




These, of course, are not the only solutions.  But they are solutions.




Many commercial system use Strict 2PL or something close to it.




To review:



LOCKING:

 TM must acquire a lock  (XLOCK to write / SLOCK to read if 2 MODES,
                          else, just a LOCK, if there is only 1 MODE)

 from the SCHEDULER (model-1) or LOCK MANAGER (model-2)

 before the operation request can be sent to the DATA MANAGER
 by the SCHEDULER (Model-1 or the LOCK MGR (Model-2)

 DATA MGR will return the value_read for a READ operation or
              an Acknowledgement (Ack) for WRITE/COMMIT/ABORT operations.

 TM request the RELEASE of all locks (to the SCHEDULER/LCOK_MGR RELEASE)
 on or before the transaction ENDs.




TWO PHASE LOCKING (2PL):  For a given transaction, all locks must be
                          acquired before any are releasing any.



STRICT TWO PHASE LOCKING (S2PL):  All locks are RELEASE request are made
                                  at transaction END (COMMIT/ABORT)



CONSERVATIVE TWO PHASE LOCKING (C2PL):  All locks are ACQUIRED before any
                                    operation request are sent to the DATA MGR.



i.e.,



lock acquisitions
             ^                .^.
             |              .     .
             |            .         .
             |          .             .
             |        .                 .
             |      .                     .
             |    .                         .
General 2PL  ._______________________________________ time
                 |             |             |
             BEGIN          LOCKPT           END





lock acquisitions
             ^   .. . . . . . . .
             |   .                .
             |   .                  .
             |   .                    .
             |   .                      .
             |   .                        .
             |   .                          .
       C2PL  ._______________________________________ time
                 |                           |
             BEGIN                           END
        and LOCKPT


  



lock acquisitions
             ^                . . . . . . . .. 
             |              .                . 
             |            .                  .
             |          .                    .
             |        .                      .
             |      .                        .
             |    .                          .
       S2PL  ._______________________________________ time
                 |             |             |
             BEGIN          LOCKPT           END

                                         



lock acquisitions
             ^   . . . . . . . . . . . . . . . 
             |   .                           . 
             |   .                           .
             |   .                           .
             |   .                           .
             |   .                           .
             |   .                           .
      CS2PL  ._______________________________________ time
                 |                           |
             BEGIN                           END
        and LOCKPT
                                         



All are 2PL of course.









DEADLOCK MANAGEMENT



Deadlocks can occur when a WAITING POLICY is used for CC.


How can deadlocks be

       PREVENTED (precluding the possibly of one ever happening),

       AVOIDED   (taking corrective action when one is imminent) or

       RESOLVED  (detecting existing deadlocks and resolving them (periodically)




First, a useful tool for studying deadlock mgmt is the WAIT-FOR-GRAPH or WFG which has
                 a node for each transaction that is involved in a wait an edge
                 from each waiting transaction (the holder)
                 to the transaction it is waiting for (requester).
T1 ------> T2
 ^         /      
  `-------'  
WFG in the 2PL example above




Formal Definition:  A DEADLOCK is a cycle in the Wait-For-Graph.



                                                               SAVINGS
It is sometimes useful to label edges with item involved  T1 -----------> T2
                                                           ^              /
This is called a binary cycle (2 transactions)              `------------'
                                                               CHECKING




Cycles can have lengths greater than 2, of course:
              T1 is waiting on T2 for a                   a     b     c
              T2 is waiting on T3 for b                T1--->T2--->T3--->T4
              T3 is waiting on T4 for c                 ^                /
              T4 is waiting on T1 for d                  `--------------' 
                 the cycle length is  4.                        d
      






Deadlock Prevention


Action is taken to prevent even the possibility of a deadlock.
                           E.g., flu shot is a preventative action
                           (you may not have gotten the flu anyway)




1.  Transaction Scheduling (all trans obtain access to all needed data items
                            before beginning execution.) prevents deadlocks.
                            (C2PL is a transaction scheduling mechanism)

Comes from construction zone management.
--------------.                   .----------
               `.               .'       ____
         ____    `-------------'  ____  <____|
        <____|    ____    ____   <____|  _ _
                 <____|  <____|
  ___  .--.   .--. _____________   .--.
 |___>(stop   |go >             \  |go >
-------`--'   `--'               `-`--' -----
          `.O.'                   O.'
            |                     |
            ^                     ^
Need GO permission from both flag persons before proceeding into the zone.





2.  SERIAL EXECUTION prevents deadlocks:




3. WOUND_WAIT &  WAIT-DIE deadlock prevention methods
                              are timestamp-based prevention methods to decide
                              who can wait whenever a conflict arizes.

   timestamp = unique ordinal transaction-id or "stamp"

                    (usually start-time i.e., DoB (Date of Birth)

                     so we can talk about one transaction being "older than"
                     another, meaning its timestamp is lower (born before the other) )



WOUND_WAIT:   When a requesting trans (the requester) finds that the
              requested data item is held by another trans (the holder):

         if   REQUESTER is OLDER (has lower timestamp),
         then REQUESTER WOUNDS HOLDER,
         else REQUESTER WAITS;

         where WOUND means holder is given a short time to finish with the item,
         otherwise it must restart (bleeds to death from the wound?)

NOTES:
WW a pre-emptive method.
The only waits allowed are YOUNGER REQUESTERS waiting for OLDER HOLDERS.
Assumes blocking waits only (e.g., if requester waits, it waits idly)
Never a cycle in WFG.  Why not? 

   M Luo, M.S. 87; M Radhakrishnan, M.S. 92; and T Wang, M.S. 96 advanced this
          protocol as their M.S. theses (and also published their results).






WAIT_DIE:     When a requesting trans (the requester) finds that the
              requested data item is held by another trans (the holder):

              if   REQUESTER is OLDER (has lower timestamp),
              then REQUESTER WAITS,
              else REQUESTER DIES;

NOTES:
WD is non-preemptive.

   W. Yao modified wound-wait and wait-die to allow forward and backward waiting
         by introducing an additional parameter assigned to each waiting trans,
         called "orientation".  (Info Sci Journal, V103:1-4, pp. 23-26, 1997.)



A powerpoint presentation of Wait-Die and Wound-Wait




deadlock AVOIDANCE   (avoiding deadlock when it's about to happen)


1.   Request Denial:   Deny any request that would result in deadlock
                       (must check WFG cycle every time a wait is requested)








DEADLOCK DETECTION and RESOLUTION TECHNIQUES


All Deadlock detection/resolution protocols revolve around Wait-For-Graph (WFG)

    put an edge in WFG representing each new wait,
    periodic analysis WFG for cycles and then
    select a victim trans to restart from each cycle.



     Victim selection criteria can vary.

          Some system use "youngest" others use "oldest",

          others use "been waiting the longest time"  and still

          others use "been waiting the shortest time".....







TIMEOUT


1. When a TRANSACTION BEGINs, a timeout clock is set.

     If   transaction is still active when the timeout clock runs down to zero,
     then transaction is aborted.




2. When a TRANSACTION has to WAIT, a timeout clock is set.

     If   transaction is still waiting when the timeout clock runs down to zero,
     then transaction is aborted.  (reduces timeout clock overhead)




     Potential improvements probably leap to mind for 2, e.g.,

     2.1 only set timeout clock if the item requested is already in LockTable
         (meaning that there is already a wait in progress for that item).





General Notes:



Deadlock management is still a very important area of research and there's still
much to be done, even though there are many methods described in the literature.

  One reason: Deadlocks which involves data distributed across a network
              are a much harder problem than centralized deadlocks.





Locking, as a concurrency control method,

  REQUIRES a CENTRALIZED lock-table object (logically at least)

  with a SINGLE THREADED lock manager (a monitor or critical section)





The Locking protocols presented above are called PESSIMISTIC





OPTIMISTIC Locking assumes there will be no conflict

   and then tests that assumption for validity at COMMIT time.

   If assumption proved false, entire (completed) transaction is aborted.







OTHER CONCURRENCY CONTROL METHODS
---------------------------------



BASIC TIMESTAMP ORDERING (BTO) is a RESTART POLICY (no waiting)
         Each transaction gets a unique timestamp (ts)    (usually arrival time)



Note that timestamps were introduced already in the context of deadlock
management schemes to accompany Locking Concurrency Control.  Now we are going
to use timestamps for concurrency control itself!  (not deadlock management
will be necessary here since the CC method is "restart" method, not "waiting"
method).





BTO SCHEDULING DECISION:

When Scheduler receives a READ  request, it rejects it iff a

                                YOUNGER trans has written that item.

When Scheduler receives a WRITE request, it rejects it iff a

                                YOUNGER trans has written or read that item.




NOTES on BTO:

Timestamp is usually "arrival time" but can be ANY linear ordering.

When the SCHEDULER rejects a request, the requesting trans restarts.

BTO must also schedule accepted operations to DM in ts order also.





In order to make the SCHEDULE decisions,

    scheduler must know ts of last transaction to write each item and
                        ts of last transaction to read  each item.




 Thus, the system must keep both of these "data-item timestamps"

       for EVERY data item, x, in the system,

       a data item read  timestamp, rts(x), and
       a data item write timestamp, wts(x).



       Usually kept right with the data item itself:

        .--------------------+--------+--------.
        |    data_item(x)    | rts(x) | wts(x) |
        `--------------------+--------+--------'



 That takes a lot of extra space

     e.g. 10 billion data items (records) in DB (not uncommon),

          data-item-timestamps may take up 160 GB,

          assuming 8 byte ts (note that 4 bytes won't do).




   BTO is a pure RESART policy (uses only restart conflict resolution.

   BTO CC is deadlock free (since waiting is not used).

   BTO, however, can experience livelocks (trans continuously
               restarting for the same reason over and over).

   BTO results in lower concurrency in central systems (studies have shown)





   BTO, works better in distributed systems.  Why?






     All the Scheduler has to have in order to make the scheduling decision
     when a transaction, t asks for a data item, x, is the


         transaction-timestamp,     ts(t),  and

         data-item-write-timestamp, wts(x)     (for a read request) and the

         data-item-read-timestamp,  tts(x)     (for a write request)






DISTRIBUTED BTO SCHEDULERS NEED NO INFORMATION FROM OTHER SITES


     ts(t) comes along with the transaction, t     (part of its identifier)

     wts(x) and rts(x) are stored with the data item, x, at that site.







                               SITE1    site where t(r[x],w[y]), arrives
                                 t       
                                / \                                           
                   r[x],ts(t)  /   \ w[x],ts(T)
                              /     \                                         
                             /       \                                        
   _________________        v         v       _________________      
  |x, rts(x), wts(x)|   SITE2         SITE3  |y, rts(y), wts(y)|
  `-----------------'                        `-----------------' 
                                                                            
SCHED2 at SITE2 schedules r[x]        SCHED3 at SITE3 schedules w[y] 
if   ts(t) > write_ts(x),             if   ts(t) > MAX{ wts(y), rts(y) }
else rejects r[x]                     else rejects w[y]






Again, one can see, that there is system overhead in BTO since

  EVERY DATA ITEM has to have a read_timestamp (rts) and a write_timestamp (wts)

  each could be 8 bytes, so additional 16 bytes of system data for each record




A large database can have billions and even trillions of DATA ITEMS (Records).





By contrast, a distributed 2PL scheduler must maintain Lock Table at some 1 site

   Then any request coming from any site for data at any other site would have
   to be sent across the network from the request-site to the LT-site and then
   the reply wold have to be sent from the LT site to the data site(s).

   However, LT is not nearly as large



One further downside to BTO:

 The BTO Scheduler must submit accepted conflicting operations to DM in ts-order



 BTO could issue them in a serial manner:

          Wait to issue next one until previous is ack'ed

          That's very inefficient! (serial execution is almost always inefficent

          Usually a complex "handshake" protocol is used to optimize this.





DO NOT CONFUSE BTO with Wound-Wait or Wait-Die Deadlock Management!


   Both are timestamp-based, but

   BTO is Concurrency Control Scheduler, while

   WW/WD are deadlock prevention methods (to go with a, e.g., 2PL scheduler)






OPTIMISTIC concurrency control assumes optimistically, no conflicts will occur.



Transactions access data without getting any apriori permissions.

But, a Transaction must be VALIDATED when it completes (just prior to COMMIT)

     To make sure its optimistism was correct.  If not, it must abort.





VALIDATION
Note: validation module must be single threaded (monitor or mutually excluding):



1.  A commiting transaction is "validated"
    if   it is in conflict with no active transaction
    else it is declared "invalid" and must restarted.

    So a transaction must list the data items it has accessed
    and the system must maintain an up-to-date list of
    "active transactions" with t(ts) and accessed data-item-ids?



Basically, an optimistic concurrency control can be thought of as being BTO,

  in which the "timestamping" is done at its commit time, not at start time,

  (transaction is validated iff it is not "too late" accessing any of its data)

  since active transactions are younger than the committing transaction.




        Validation must be an atomic, single threaded process 

        Therefore if any active trans has already read a item
        that the committing trans wants to write (all writes are
        delayed until validation) it's too late for committing
        trans to write it in ts order and thus, must be restarted.




Note that this is non-prememptive optimistic CC.

 There are other optimistic methods.  It is possible to devise a
 pre-emptive optimistic CC protocol in which the validating
 trans actually "kills" all active transactions which have
 read items it needs to write.






How would a pre-emptive OCC compare to the standard non-pre-emptive protocol?  
 Got an idea?  It could be a paper topic!




 CVS (Concurrent Versioning System) for software version control is
      an optimistic CC system!






Need to write a simple Concurrency Controller  (Scheduler) for your boss?


A very simple and effective SCHEDULER (no critical section coding required).



 Cooperating TMs do "self service" 2PL using the ethernet LAN  CSMA/CD protocol


             CSMA/CD = Carrier Sense Multiple Access with Collision Detect







CSMA/CD-Concurrency Control


When cooperation TM, t, seeks access to item, x, it will:


  1. Check availability of x (analogous with "carrier sensing")

       (Is another trans using it in a conflicting mode?).


  2. If x is available, set lock on x (TM does this itself! in a LockTable File)
     else try later (after some backoff random period).


  3. Check for collision (with other cooperating trans that might have been
     setting conflicting locks concurrently (analogous to "collision detecting")


  4. If collision, the TM removes all lock(s) it set
        and tries again later (after some backoff period).


  5. Release all locks after completion (COMMIT or ABORT) (Strict 2PL).


     (This is a S2PL protocol WITHOUT an active scheduler).





To make it even simpler, we can dispense with the carrier sense step:

CD-Only Method:

When a cooperation trans, t, seeks access to a data item, x, it must:
  2. Set lock.
  3. Check for collisions.
  4. If there is a collision, remove all locks and try later (after backoff).
  5. Release all locks after completion (COMMIT or ABORT).
     (This is also a S2PL protocol without an active scheduler).





In fact, one can write this code in SQL, something like:

Assume there is a file acting as the LockTable, called LT, such that




LT(TID, RID, MODE) where TID is column for the Trans' ID number,
                         RID is column for Record's ID number,
                    and MODE is either "shared" or "exclusive" (S or X).





Below shows some of the code for a CD-like CC Method

 (what additional code would be required for a CSMA/CD like method?).





If T7 (transaction with TID = 7) needs an XLOCK on the data
 item with RID = (53,28), the TM for T7 issues:

BEGIN
INSERT INTO  LT  VALUES ('7', '(53,28)', 'X');
V = SELECT COUNT(*) FROM LT WHERE RID='(53,28)';
IF V = 1, THEN COMMT
          ELSE ABORT (try again later),
DELETE FROM LT WHERE TID='7';









Another CC method that uses Cooperation Transaction Managers and no Scheduler:

(Note this technology - together with a later refinement called ROCC, is patent
pending concurrency control technology at NDSU.  In reverse time order, it
can be called ROCC and ROLL Concurrency Control)



ROLL:   Request Order Linked List Concurrency Control
(a further enhancement of this approach, ROCC and MVROCC
are patent pending technologies at this time by NDSU).


ROLL is a generalized model which includes aspects of locking
 and timestamp ordering as well as other methods.





ROLL is:

 1 non-blocking (no idle waiting)

 2 restart free and thus livelock free.

 3 deadlock free

 4 self-service for trans mgrs
   (no active singlethread scheduler other than an enqueue operation)

 5 very parallel (little critical sectioning)

 6 ROLL is easily distributed






Data items are requested by a transaction using a REQUEST VECTOR (RV) bit vector



Each data item is mapped to specific bit position using an assignment table
     (Domain Vector Table or DVT).





A 1-bit at a position indicates that that item is requested by the trans and a
  0-bit means it is not requested.

  - if read and write modes are to be distinguished, use 2 bits, a read-bit and
    a write-bit for each item.

  - could use a bit vector for the items to be read,    the ReadVector
    and another bit vector for the items to be written, the WriteVector 






ROLL can be thougth of as an object in which the data structure
 is a queue of Request Vectors, one for each transaction.

    .----------.
    |010010...0|    <- -Ti
    `----------'
             ^
             |
    .----------.
    |010010...0|    <- -Tj
    `----------'
             ^
             .
             .
    .----------.
    |010010...0|    <- -Tk
    `----------'
             ^
             |
        .------.
        | tail |
        `------'


with 3 basic methods:

 POST:       For establishing requests

 CHECK:      For checking availability

 RELEASE:    For releasing control





                    ROLL OPERATIONS
                    ---------------

POST (allows a transaction to specify its requests)

     POST is an atomic "enqueue" operation (the only atomicity required
                                            the only critical section)




CHECK (determines availability)

   CHECK returns the logical OR of all RVs ahead of requesters POSTED vector
           in the ROLL.

  - The vector resulting from this OR operation is (called the "Access Vector"
           or AV and represents a "lock table" for that transaction
           (specifies which items are available and which are not).

  - If we have a separate ReadROLL and WriteROLL, in order

    to determine what can be read,    a trans CHECKs the  WriteROLL only and 

    to determine what can be written, a trans CHECKS both WriteROLL and ReadROLL



  - reCHECKing can be done any time

     eg, when trans finishes data items found available on first CHECK,
         it would issue another CHECK expecting that some of the previously
         unavailable items have become available in the interim.)






RELEASE: (releases dataitems to the next requester)

   RELEASE set some or all of trans' 1-bits to 0-bits.





VALIDATE: (can be added for optimistic transactions)

    1. Optimistic transactions would read data without POSTing

    2. Optimistic transactions would buffer all writes until commit

    3. Upon reading x, optimistic transaction would record rts(x),
       by copying the current ROLL tail-pointer.

    4. before commit, Optimistic trans would have to VALIDATE


   VALIDATE:
     POST  its request vector
     CHECK the intevening ROLL interval from its vector to its reads
           If there are no intervening writes in conflict with its reads, the
              Transaction is valid and can be committed,
           else it must be restarted.


    A garbage collector can operate in the background to remove zeroed vectors.






PROBLEMS?

Two potential problems leap immediately to mind immediately.



1. Excessive Vector length for fine data item granularity.

   One-bits are most space efficient way to indicate a needed item.
   Zero-bits are unnecessary except to maintain positional matchup.


SOLUTIONS:

   Partitioning DB (eg, by files or even by ranges of records within files)

   Designate a separate ROLL for each partition



  .----------.    .----------.
T1|010010...0| -->|010010...0|-null
  `----------'    `----------'
           ^               ^
           |               |
  .----------.             |       .-
T2|010010...0| - - - - - - - -. - >|0
  `----------'             |       `-
           ^               |
           .               |
           .               |
  .----------.    .----------.
Tm|010010...0| -->|010010...0| - - . .
  `----------'    `----------'
           ^               ^
           |               |
      .------.        .------.
      |File-1|        |File-2|




   To ensure a serialization partial order (trans-1's vector is POSTed ahead of
   trans-2's in 1 ROLL, therefore it must be POST ahead in all ROLLs)

   i. to insure proper POSTing, entire series of POSTs for a given transaction
      can be made atomic.

   or

   ii. a POST-ROLL can govern POSTing.  A separate POST-ROLL is maintianed.

      -  Each bit position in the POST-ROLL represents a DB partition.

      -  Each new transaction POSTs a special vector (POST-RV) indicating the
         partitions to which it needs to POST (using 1 bits at those positions).

      -  Then CHECKing POST-ROLL determines which partitions can be POSTed to.
   
      -  Allows multithreading of POSTs.  Multi-level partitions are possible.








PROBLEM 2.  Multiple ORs in CHECK operation would be slow (if ROLL is long).

  A background process may Precompute and save all AVs in 1 pass down the ROLL.

  (the ROCC technology adds a major improvement to multiple CHECK operations)






PROBLEM 3.  Basic Roll does not accommodate dynamic transactions
            (the entire access set of data items is not known ahead of time
            (e.g., "give $1000 bonus to all red-headed professors").




SOLUTION: Allow rePOSTing

   But then the intervening interval between the rePOST and the original POST
   must be examined for conflicts.  If there are unresolvable conflicts,
   RESTART is necessary.

   (ROCC technology provides a sophisticated way of doing this)










DISTRIBUTED DATABASE MANAGEMENT SYSTEM, DDBMS consists of multiple DBMSs connected over a network



  - If all DBMSs are same (same vendor), it's called a homogeneous DDBMS.
    (and we will assume the vendor supplies the software to make the separate
     DBs operate as one).

    Else , it is call a heterogeneous or federated DDBMS




     - A heterogeneous DDBMS requires software on top of the of local DBMSs
       to intercept global transactions (those accessing data >= 2 sites)


       We assume that this Global harness is unable to change internal workings
       of local DBMSs (local DBMSs are binary licensed)

       But we do assume that the Local DBMSs are set to maintain local
       serializability






DDBMS TRANSACTION MANAGEMENT (some issues)



DISTRIBUTED DEADLOCKs are more difficult to manage than local deadlocks,
                     since it may be that no site can detect the
                     global deadlock by itself.



e.g., Assume an agent represents a trans at each site where it needs data


                 .- - -At A, a T1-agent is waiting on T2 for x
                 .
                 x
     SITE-A  T1 -- >T2
             ^      |     a T2-agent at A is waiting for a T2-agent at B 
             |      |     (e.g., to make an update at B based on a read at A)
             |      v     and a T1-agent at B is waiting for a T1-agent at A
     SITE-B  T1< -- T2
                 y
                 .
                 `- - -At B, a T2-agent is waiting on T1 for y



             Requires more communication, since no local WFG will have a cycle.

                  solutions are based on an expensive WFG forwarding process.







DISTRIBUTED COMMITMENT is more difficult to manage than local commitment.


     Recall, that before a DDBMS can ACKnowledge the COMMIT request of a trans,
        it must be sure that all agents can successfully COMMIT (for DURABILITY)

     If even 1 agent cannot guarantee DURABILITY, COMMIT must be rejected.




A solution,




TWO PHASE COMMIT (2PC)


Similar to Marriage Ceremony: unless all parties commit, there's no marriage
                              and there are 2 phases to the commitment, a

              VOTE     phase   and a

              DECISION phase.





In marriage, a COORDINATOR (minister,imam,rabbi,priest...) initiates a



VOTE_PHASE: all PARTICIPANTS vote YES or NO

     (implicit in this is DURABILITY, the assumption that a yes means you are
      ready to get married no matter what - can't change your mind!)
      Participants who vote YES must be in the READY state (either to get
      married or not depending upon the decision made by the coordinator))

  Coord: Tracy, do you take Pat to be your lawfully wedded spouse?
  Tracy: Yes!
  Coord: Pat, do you take Tracy to be your lawfully wedded spouse?
  Pat:   yes!
  Coord: Anyone here present know of any reason why these 2 should not be joined matrimony?
  All:   silence (means yes).




DECISION PHASE: the Coordinator counts votes and makes the DECISION.

  If all votes are YES, COORD decides COMMIT and broadcasts that to participants
  I pronounce you married!

  If even 1 vote is NO, coordinator decides ABORT and broadcasts that:
  Sorry folks.  Hope you saved your gift receipts.







In DDBMSs we use the same 2PC protocol to commit distributed transactions.


   Upon receiving a COMMIT request from a transaction:


   COORD asks agents to VOTE.

         Agent votes YES only if it can force READY log record
         (with enough detail to COMMIT or ABORT) else votes NO and aborts.




   COORD gets all YESs, then DECIDES COMMIT (forces COMMIT log record)
                        else DECIDES  ABORT (forces  ABORT log record)

         In either case, informs all agents to do the same.







Whenever the system fails,

If a transaction has a BEGIN in the log, but no END record,
    if a decision log record is found, RESTART PROCEDURE can finish 2PC process
    else it must ABORT the transaction and send that decision to all.







2PC STATE TRANSITION DIAGRAM:

C  = Coordinator
Si = Subordinate-i (agent-i)



0. SEND     5 ALL (NOT ALL) YES, FORCE COMMIT (ABORT) LOGREC, Move to COMMIT (ABORT) STATE
"PREPARE"   6 SEND COMMIT to all (ABORT to all NON-NO voters)
       
   C        C------------------->C             C gets ACKs
   v        ^                    v       ^  11 WRITE END RECORD LOCALLY
   |        |                    |       |  12 FORGETS TRANS
   |        |                    |       |
PREPARE   YES or NO          COMMIT or   |
            |                 ABORT     ACK
 /   \      |                  /   \     | 
S1    S2    |                S1    S2    |
 `----`---->'                 `----`---->'

  Si READY TO COMMIT (ABORT),         Si->COMMIT (ABORT) STATE
1 FORCE "READY"(ABORT) to LOG      7  FORCE "COMMIT" (ABORT) LOGREC
2 ENTER READY STATE                8  SEND ACK (NON-NO's SEND ACK)
3 SEND YES (NO)                    9  COMMIT(ABORT) LOCALLY
4 (ABORT LOCALLY)                  10 FORGET TRANS
5 (FORGET TRANS)                 









DISTRIBUTED CONCURRENCY CONTROL IS HARD




Indirect conflict is a problem unique to Heterogenous DDBMSs

Let T1 be a local trans (not even known to the heterogeneous DDBMS) executing
 concurrently with 2 global transactions, T2 and T4:



  The only ordering required is that
      T2 must r2(a) first, then w2(c) 
      T4 must r4(c) first, then w4(b)   (both are read-modify-write sequences)


                        . - > -.
                        :      :
        SITE2:     T2: r2(a), w2(c)
                       .        . 
                      .          .
                     .            .
             w1(a)  a              .
SITE1    T1:^                       c  SITE3
            `r1(b)  b              . 
                     .            .
                      .          .
                       .        .
        SITE4:     T4: w4(b), r4(c)
                        :      :
                        ` - < -'


DSCHED sees no T2, T4 conflict     (only knows r2(a) is executed before w2(c))

DBMS1 (SITE1) see no T2, T4 conflict, so it could interleave:
                                   w4(b)  1st 
                                   r1(b)  2nd 
                                   w1(a)  3rd
                                   r2(a)  4th   so  T4 < T1 < T2

DBMS3 (SITE3) could interleave:    w2(c)  1st 
                                   r4(c)  2nd   so            T2 < T4


Execution is NOT SERIALIZABLE

 but no DDBMS could detect it
 assuming each local system is an autonomous binary licensed COTS product).





TICKET SOLUTION force all pairs of distributed transactions
                    into conflict where ever they both access data by
                    forcing each to write its id to a bogus data item called
                    site ticket



  Since both are writing to a common data item,
  they are in conflict at that site and therefore
  HDDBMS can enforce a particular order.





  Let the site ticket at SITE-i be ti


         T2: w2(t1),w2(t3),r2(a),w2(c)    SITE2
                  \      \    ^      \
                   \      \  /        \
                    \      \/          \
                     \_____/\________   \
                          /\         \   \
            .--------- > a  v         v   v
T1: r1(b),w1(a)      SITE1  t1        t3  c   SITE3
       ^---------------  b   ^        ^   ^
                         ^  /        /   /
                          \/  ______/   /
                          /\ /         /
                         /  X         /
                    ____/  / \       /
                   /      /   \     /
          T4: w4(t1),w4(t3),w4(b),r4(c)   SITE4



HDDBMS now sees a conflict between T2 and T4 so it decides an ordering
       e.g.,   T2 < T4


      HDDBMS forces w2(t1) to be ACKed before w4(t1) is sent and the
      HDDBMS forces w2(t3) to be ACKed before w4(t3) is sent and then
       DBMS1 will have to order T2 before T4 at SITE1 and
       DBMS3 will have to order T2 before T4 at SITE3.

















































**************************************
APPENDIX: More on Concurrency Control.
-------------------------------------


HYDRO TRANSACTION MODEL:
 This global trans processing model is assumed.


     T1                   T2
     |                    | ...
     V                    V
s:  TM1                  TM2
   / | \                / | \
  /  |  \              /  |  \
 /   |   \            /   |   \
 1    2    n          1    2    n
T    T    T          T    T    T
 1    1    1          2    2    2
|    |    |          |    |    |
v    v    v          v    v    v
.-------------------------------
|          HYDRO MODULE         |
`-------------------------------'
|    |    |          |    |    |
v    v    v          v    v    v
 1    1    1          2    2    2
T    T    T          T    T    T
 1    2    n          1    2    n
|    |    |          |    |    |
v    v    v          v    v    v
LOCAL HYDRO          LOCAL HYDRO
  DRIVER1              DRIVER2
     |                    |
     V                    V
   DBMS1                DBMS2


Ti's are Global Trans Mgrs


 i
T  subtrans of Tj running at site i.
 j


         HYDRO protocol

At each site ther is a Local Trans ROLL (LROLL) and
                     a Global Trans ROLL (GROLL)

      i
Each T  is submitted to its site in
      j   a serializable order by HYDRO.

The LOCAL-HYDRO-DRIVER POSTs as below,
             then send an acknowledgement to HYDRO

(guaranteeing POST order and serialization
 p.o. compatible).


POSTing:

Local Trans POST to the LROLL only, but get a
 "position" in both ROLLs (read both tail-pointer values).

Global Trans POST to the GROLL only, but get a
 "position" in both ROLLs.


CHECKing:
--------
Local Trans CHECK (LCHECK) GROLL only (conflicts between
 pairs of local trans are the responsibility of the
 local DBMS) If there are no conflicts submit, else wait.

Global Trans CHECK (GCHECK) both LROLL and GROLL.
If there are no conflicts, submit else wait.

RELEASE: Same.
-------











HISTORIES:
----------

Example program:

    PROCEDURE P begin
      Start;
      temp:=Read(x);
      temp:=temp+1
      Write(x,temp);
      Commit
    END

    for the purposes of concurrency control theory this program
    can be represented as the "History":

H0       r1[x] -> w1[x] -> c1

where r1[x] stands for the read operation by transaction-1 on dataitem, x.
and   w1[x] stands for the write operation by transaction-1 on dataitem, x.
and   c1    stands for the commit operation of transaction-1.

This graph is called the Serialization Graph (SG).  The nodes of SG are operations
and each edge represents the execution order of two conflicting operations.

(Note we don't even care what value gets written.  The value written is an
"uninterpreted" feature or characteristic, as far as concurrency control
 is concerned)


More Examples:
--------------
Consider three transactions,

T1  r1[x]--> w1[x]--> c1   
         
T2  r3[x]--> w3[y]--> w3[x]--> c3 
   
T4  r4[y]--> w4[x]--> w4[y]--> w4[z]--> c4


An example of a complete history (execution) for { T1,T3,T4 } as a SG is:


           r3[x]-> w3[y]-> w3[x]-> c3 
            ^        ^    
            |        |    
H1 r4[y]-> w4[x]-> w4[y]-> w4[z]-> c4 
            ^    
            |  
   r1[x]-> w1[x]-> c1  


We use simple left-to-right ordering to indicate the Von Neumann 
order of execution of operations when we wish to do so
(assuming a Von Neumann machine - one which executes one operation
at a time):



Four possible Von Neumann histories over the transactions,

	T1 = w1[x] w1[y] w1[z] c1
	T2 = r2[u] w2[x] r2[y] w2[y c2    are:

H2   w1x w1y r2u w2x r2y w2y c2 w1z c1 
H3   w1x w1y r2u w2x r2y w2y w1z c1 c2 
H4   w1x w1y r2u w2x w1z c1 r2y w2y c2 
H5   w1x w1y r2u w1z c1 w2x r2y w2y c2 




APPENDIX 2:
----------

Another approach to ROLL:   ROLL is a queue object of RVs
Methods available to transactions:
1. QUEUE(RV)     {returns ROLL address of RV}
2. READ(a,b)    {returns RVs from address a to b}
   where a must be an address returned to the TM from a POST
     and b must be an address returned to the TM from a POST or "HEAD"
   ("address" could be simply "position number" and HEAD could be position 0)
3. SET(a,list)
     sets the bits in all positions in the list of the RV at address a to zero
     where a must be an address returned to the TM from a POST
     list can be any comma list of numbers and/or intervals

Then TMs are responsible to compute AV from the link list returned by READ(a,HEAD)
and  TMs are responsible to determine validity of a rePOST from the return of READ(a,b)

Inefficient due to massive return parameters from READ.
Background daemon does the garbage collection periodically.
Could have an AV-Daemon accepting FastREAD requests from transactions, maintaining AVs
and returning the appropriate AV immediately.  AV-Daemon has read access to all RVs.

May be a good model for DataCycleROLL.
How would that work in a Beowulf Cluster such as MiDAS?
  - one transaction manager per node and round robin assignment of arriving
    transactions to node TMs
  - stripe data per file in non-random fashion so that a ROLL can be
    partitioned hierarchically and all CHECKing done in parallel and targeted
    at only the stipes containing items requested


Another view of NON-PREDECLARATIVE TRANSACTIONS: (DYNAMIC TRANSACTIONS)


 THEOREM:  Serializability and restart-free requires predeclaration.
If no predeclaration requirement, then the following can happen:
1. Transaction, T, declares need for exclusive use of data item, D.
2. Transaction, S, declares need for exclusive use of data item, E.
3. Transaction, T, declares need for exclusive use of data item, E.
4. Transaction, S, declares need for exclusive use of data item, D.

We assume request 3 was made based on the results of request 1, request 4
was made based on the results of 2.

DEADLOCK results, requiring RESTART.  (Note: If 1 or 2 were delayed in
anticipation of 3 & 4, 3 & 4 would have had to been predeclared.)

Thus, in order to accommodate dynamic trans, restarts must be allowed.
We accommodate dynamic trans by allowing a trans to POST more than once.



  ASSUME:
 All write requests delayed until final POST.  Each additional POST by a trans
is a superset of its predecessor.  Maintain correctness: examine accesses
between POST1 & 2.  For T1, POST1 includes read of "a" T1(r,a), POST2
access to "b"

5 CASES: compare each case with 2PL (waiting policy) & BTO (restart policy)

CASE 1: No intervening writes to "a", no intervening conflict on "b".
CASE 2: Intervening T2(w,a), T1(r,b), but no intervening writes to "b".
CASE 3: Intervening T2(w,a), T1(r,b), intervening write, T2(w,b) on "b".
CASE 4: Intervening T2(w,a), T1(w,b), 1st intervening b-access is T2(w,b).
CASE 5: Intervening T2(w,a), T1(w,b), 1st intervening b-access is T2(r,b).

We will compare each case with 2PL (waiting policy) & BTO (restart policy)



                 DYNAMIC TRANSACTIONS:

CASE 1: No intervening writes to "a", no intervening conflict on "b".

        "a"  "b"
        .----------.
    T1->|100000...0|POST1
        `----------'
         rw  rw  ^
                 |
        .----------.
    T2->|000010...0|
        `----------'
         rw  rw  ^
                 |
                 .
        .----------.
    T1->|100010...0|POST2
        `----------'


                      "a"  "b"
                      .----------.
                  T1->|000000...0POST1
                      `----------'
Zero POST1.            rw  rw  ^
(rereading would               |
give same value)      .----------.
Serialization     T2->|000010...0|
order: T2, T1)        `----------'
                      rw  rw  ^
                              |
                              .
                     .----------.
                 T1->|100010...0|POST2
                     `----------'
 (2PL) requires no restart
(no deadlock possible).

 (BTO) no restart since neither
T2(r,a) nor T1(w,b) in conflict.


CASE 2: Intervening T2(w,a), T1(r,b),
but no intervening writes to "b".

        "a"  "b"
        .----------.
    T1->|100000...0|POST1
        `----------'
         rw  rw  ^
                 |
        .----------
    T2->|010010...0|
        `----------'
         rw  rw


        .----------.
    T1->|100010...0|POST2
        `----------'


                  "a"  "b"
                  .----------.
              T1->|100010...0|POST1
                  `----------'
                   rw  rw  ^
                           |
Move T1(r,b)      .----------.
POST2->POST1. T2->|010010...0|
Zero POST2.       `----------'
(Serialization     rw  rw  ^
order: T1, T2)             |
                           .
                  .----------.
              T1->|000000...0|POST2
                  `----------'

(2PL) requires no restart.  However, T1
cannot release T1(r,a) until after the
POST2 point, even if it is done reading
a (idle wait).

(BTO) requires T1 to restart if T2(w,a)
execution precedes T1(r,a)


CASE 3: Intervening T2(w,a), T1(r,b),
intervening T2(w,b) on "b".

        "a"  "b"
        .----------.
    T1->|100000...0|POST1
        `----------'
         rw  rw  ^
                 |
        .----------.
    T2->|010001...0|
        `----------'
         rw  rw  ^
                 |
                 .
        .----------.
    T1->|100010...0|POST2
        `----------'


                            "a"  "b"
                            .----------.
                        T1->|100010...0|
                            `----------'
                             rw  rw  ^
                                     |
Move T1(r,b) to POST1.      .----------.
(Read "b" before value) T2->|010001...0|
                            `----------'
Zero POST2.                  rw  rw  ^
Serialization order: T1,T2           |
                                     .
                            .----------.
                        T1->|000000...0|
                            `----------'

Need to read the before-value of the T2(w,b) (this possibility can be easily
provided with ROLL by delaying REMOVal of writes just one additional
generation.).

2PL: deadlock (T2(w,a) blocked by T1(r,a) & T1(r,b) blocked by T2(w,b))
Restarting required to break this deadlock. (If 2PL accompanied by
multiversioning, no restart is required, however, adding multiversion
support to 2PL much more expensive).

BTO restarts T1(r,b) since it is in conflict with T2(w,b).


CASE 4: Intervening T2(w,a), T1(w,b), 1st intervening b-access is T2(w,b).

        .----------.
    T1->|100000...0|POST1
        `----------'
         rw  rw  ^
                 |
        .----------.
    T2->|010001...0|
        `----------'
         rw  rw  ^
                 |
                 .
        .----------.
    T1->|100001...0|POST2
        `----------'


                            .----------.
                        T1->|100001...0|
                            `----------'
                             rw  rw  ^
                                     |
Move T1(w,b) to POST1.      .----------.
(T1(w,b) not written    T2->|010001...0|
since obsolete)             `----------'
                             rw  rw  ^
Zero POST2.                          |
alization order: T1,T2               .
                            .----------.
                        T1->|000000...0|
                            `----------'

T1(w,b) can be discarded (written to T1
workspace as if completed) since
T2(w,b) overwrites it and no trans was
supposed to have read T1(w,b).  T1(w,b)
should be written to T1 workspace for recov.

2PL deadlocks, since T2(w,a) blocked by
T1(r,a) & T1(w,b) by T3(w,b). Restart
required. (no mechanism in 2PL to note
T1(w,b) unneeded).

BTO requires restart, T1(w,b) would
conflict with T2(w,b).


CASE 5: Intervening T2(w,a), T1(w,b),
1st intervening b-access is T2(r,b).

        .----------.
    T1->|100000...0|POST1
        `----------'
         rw  rw  ^
                 |
        .----------.
    T2->|010010...0|
        `----------'
         rw  rw  ^
                 |
                 .
        .----------.
    T1->|100001...0|POST2
        `----------'


                            .----------.
                        T1->|000000...0|
                            `----------'
                             rw  rw  ^
                                     |
Must restart (at least      .----------.
move POST1 ahead to     T2->|010001...0|
POST2 and reread "a")       `----------'
                             rw  rw  ^
Zero POST1.                          |
alization order: T2,T1               .
                            .----------.
                        T1->|100001...0|
                            `----------'

T1(w,b) cannot be moved to POST1 since
T3(r,b) may already have read previous
value of "b".  Restart needed (at least
patial, redo T1(r,a))


2PL & BTO would also result in restart
          DYNAMIC TRANSACTIONS

                 ROLL    2PL        BTO
CASE_PROBLEM___________________________

     idle wait   NO      NO          NO
1
     restart     NO      NO          NO
_______________________________________

     idle wait   NO      YES         NO
2
     restart     NO      NO         YES
_______________________________________

     idle wait   NO      YES         NO
3
     restart     NO      YES(deadl) YES
_______________________________________

     idle wait   NO      YES         NO
4
     restart     NO      YES(deadl) YES
_______________________________________

     idle wait   NO      YES         NO
5
     restart     YES     YES(deadl) YES
_______________________________________





YET ANOTHER WAY TO LOOK AT DYNAMIC TRANSACTIONS

It is useful to think about what type of dynamic-ness there is,
and how frequent each type might be.


Define the baseset, readset and writeset of a transaction as the set
of all data items accessed in any manner, read, written, respectively.

Then transactions can be categorized as follows, with respect to dynamism:

SSS:  Static baseset, static readset,
      static writeset
      These are standard static trans

SSD: Static base, stat read,
   dyn write sets
     These are predominant
     dynamic trans??
     eg, Double sal readheaded emps.
     - baseset=all emp.records
     - readset=all emp.records
     - writeset=all emp.records
       where hair=red

Note that 2PL would have to writelock all emp.records and hold them until
lockpoint (at the very least - maybe until commit, if strictness is used).
ROLL would need to set writebit for all emp.records, but could release
non-redhead records immediately (even with strictness).


SDS: eg,
IF status=WAGE, AMT=HRS*5, ELSE AMT=300

- baseset=all emp.records + status
- readset=all emp.recs or empty + status
- writeset=all emp.records

2PL would have to writelock all emps, to lockpt & readlock status.
ROLL would have to writelock all emps, only until write completed and
readlock status.




SDD: If Status=bonus, write AMT=AMT+5, else PRINT all emps

ROLL would readlock status & writelock base (early release, if unneeded)

2PL would readlock status & read/write lock emps..


DSD: If Status=1_time, write AMT=500.

ROLL would readlock status and writelock base (early release, if unneeded)

2PL would readlock status and write/no lock emps.


DDS: Write to TOTAL, sum of first emp.AMTs such that sum <= 1000.

ROLL could writelock TOTAL and readlock
all emp (release rest as soon as total is reached).

2PL could readlock emps 1-at-a-time until total is reach, writelock total.

DDD: ?????




PARALLELISM:

ROLL operations can be done in parallel on the same ROLL data structure except
that all POSTs must be in partial order.







APPENDIX-2
----------
D-ROLL: An Extension of ROLL for Dynamic transactions

1. INTRODUCTION
     Database management systems (DBMS) are usually modeled into three components []: 
transaction manager, scheduler and data manager, from concurrency control point of view.  
Concurrency control methods are used to maximize the number of concurrent operation 
executions in the system while guaranteeing correctness of transactions.  When operations 
from different transactions are submitted to the scheduler, the scheduler determines the 
execution sequence of the operations using its embedded concurrency control methods.   So 
far, various methods are proposed, and may be classified into three categories: locking, 
timestamp ordering (TO) and serialization graph testing (SGT).  

     We proposed ROLL (Request Order Linked List), which eliminates the need of 
scheduler by introducing ROLL object and three simple operations (POST, CHECK and 
RELEASE) performed by transaction manager.  The ROLL method has the advantages of 
deadlock and livelock free.  It also minimizes waiting and restarting, compared to 2PL and 
TO.  Furthermore, experiments showed that ROLL has a significant performance 
improvement in terms of transaction response time.

     This paper discusses an extension of ROLL, D-ROLL.  The proposed extension 
removes the requirement of predeclared transactions, which is the major weakness of 
ROLL.  By adding RPOST and RCHECK operations to the ROLL method, D-ROLL may 
handle dynamic transactions while inheriting ROLL’s advantages.  Predeclared 
transactions predeclare their readsets and writesets before executions of all their read/write 
operations.  Dynamic transactions allow submitting their readsets and writesets after some 
of them be excuted.


     The paper is organized as follows: 

2. ROLL and its extension, D-ROLL
ROLL method used three basic data structures: request vector (RV), ROLL, and 
access vector (AV).  ROLL is a linked list of bit vectors.  Each bit vector is called a request 
vector, which represents the requested data item by a transaction.  The number of bits in the 
bit vector equals the number of data items in the database multiplied by the number of 
access mode.  The value of a bit is either one or zero.  One indicates the corresponding data 
item is requested by the transaction, and zero means the data item is not used by the 
transaction.  Each request vector has a vector pointer pointing to its precedent request 
vector.  The RV pointer of the first RV in a ROLL is called head pointer; its value is 
NULL.  A tail pointer points to the last request vector of the ROLL.  Another data 
structure, AV, is used to represent the data items accessible for a transaction at a certain 
time.  Like RV, it is also a bit vector and needs the same number of bits as a RV, but it 
does not contain a RV pointer.  One bit in the access vector indicates the corresponding 
data item is not accessible, and zero indicates accessible.  The AV is calculated by a 
CHECK operation.  For every CHECk operation, the AV corresponding to each RV will be 
calculated once.  Suppose we have a database with six data items.  Figure 1 shows the 
ROLL and its RVs.  In this example, each data item has two access modes: read and write. 
Two bits are assigned to each data item – the first bit corresponds to the read access mode, 
and the second bit corresponds to a write access mode.  There are six data items in each 
RV.  Four transactions are posted to the ROLL (RV1 corresponds to transaction T1, RV2 
corresponds to transaction T2, etc.).  RV1 indicates transaction T1 requests access three 
data items, the first, the fifth, and the sixth.


    .----------.
    |010010...0|    <- -Ti
    `----------'
             ^
             |
    .----------.
    |010010...0|    <- -Tj
    `----------'
             ^
             .
             .
    .----------.
    |010010...0|    <- -Tk
    `----------'
             ^
             |
        .------.
        | tail |
        `------'

Figure 1  An example of ROLL
     
     Three basic operations are devised for ROLL method: POST, CHECK, and RELEASE. 
POST is a critical operation establishing the serialization partial order of transactions.  It 
links the most recent received RV as the last vector in the list.  After posting a RV, the 
vector pointer of this RV points to its previous RV.  To calculate AV of a transaction, 
CHECK simply accumulates ORing (bitwise logic OR operation) results of the bits in the 
ROLL ahead of its RV.  Because CHECK only needs bitwise logic operation, the execution 
is fast.  Figure 2 demonstrates how to calculate AV4 (the AV of transaction T4) for the 
example shown in Figure 1.


    .----------.
    |010010...0|    <- -T0    \
    `----------'            
             ^                  \
             |
    .----------.                  \
    |010010...0|    <- -T1
    `----------'                    >  OR  yields  AV4
             ^
             .                    /
             .
             .                  /
    .----------.                
    |010010...0|    <- -T4    /
    `----------'           
             ^
             |
    .----------.
    |010010...0|    <- -T5
    `----------'
             ^
             |
        .------.
        | tail |
        `------'



Figure 2  CHECK for transaction T4: Calculate AV4

     Once the operation on a data item is finished, the transaction does not need this data 
item any more, and the corresponding bit in the RV should be changed to zero so that it can 
be accessed by other transactions.  Such a change is made by RELEASE operation.

     We can clearly see the increased concurrency advantage of ROLL method from the 
above example.   All the data items requested by transaction T4 are available and thus it 
can be committed at once.  Using 2PL, T4 has to wait because T4 conflicts with T1 and T2.  
T4 has to restart for the same reason if using TO method.

     The Problem in ROLL method is its requirement of transaction predeclaration.  For 
example, suppose T1 is not predeclared.  It submits another access request rpRV= after T4 
arrives.  Then we may run into trouble by using ROLL, i.e., the execution enforced by 
ROLL is no longer serializable.  Figure 3 shows an example of such problem.

 



Figure 3  ROLL is NOT serializable for dynamic transaction T1


Problems are evident in the ROLL concurrency control method. (1). Space problem: For huge 
centralized databases, vectors of billions of bits are required.  If the number of transactions 
is large too, then it needs a lot of storage space.  This is not a problem in heterogeneous 
distributed database systems because, in that environment, a number of smaller ROLLs are 
kept at different sites.  Less space is used at each site, and the processing of the ROLL at 
each site is faster, too.  Even for a huge, centralized database, different methods can be 
used to solve this problem.  One solution is to use partition.  The rational of this method is 
to partition the whole database (by files) into a number of smaller sections and build one 
ROLL for each section.  The individual ROLL size will be reduced dramatically.  In this 
case, a transaction only POSTs to the ROLLs where the needed data items are present.  A 
constraint needed here is that the ROLLs should be kept in a serializable order; i. e., the 
relative position of a vector in all of the ROLLs should be the same.  Using surrogate 
(hidden system-supplied permanent record identification number) or using function to 
assign vector positions can maintain a serializable order of the RVs in the partitioned 
ROLLs. (2). Dynamic transaction support problem: When the data items used are not 
known in advance, overdeclaration strategy is used.  It is to reserve all the possibly used 
data items for write.  Once it is determined that a data item is not used, the corresponding 
bit is released immediately.  In this case, most of the requested data items are not used, 
while other transactions which need these data items cannot access the data.  
Overdeclaration approach is a “demotion” approach. (3). The dynamic RV size problem: 
The size of the RV depends on the size of the database.  For a database of fixed size, there 
should be no problem using a static RV, but when database size tends to change, dynamic 
RV size should be supported.  If there is only one ROLL for the database, one solution is to 
set the vector length more than sufficient for the time being.  Based on the change in the 
database size, the length can be reset periodically.        

For ROLL and the three basic conservative concurrency control methods, 
predeclaration is required.  For some transactions, the data items used may not be known 
until some operations are processed first.  This type of transaction is called dynamic 
transaction.  To apply conservative concurrency control method to dynamic transactions,  
overdeclaration is often used.  Although the algorithm is easy to implement, it is not a good 
strategy because it lowers the concurrency.  Suppose there is a file with a million records; 
if we do not know which records are needed, overdeclaration may reserve the entire file 
instead.  Developing algorithms to handle dynamic transactions efficiently will be very 
useful to improve the efficiency of both ROLL and other conservative concurrent control 
methods. 
	The goal of this paper is to develop alternative algorithms to process dynamic 
transactions.  The algorithms should overcome the disadvantages of the overdeclaration 
method.

For example, there is a transaction to raise salary by five percent for an employee 
whose name will be entered at runtime.  To process this query, the name of the employee 
needs to be known first; then it can be decided whether a write is needed for a given record.  
This type of transaction is called dynamic transaction.  Two strategies can be used to 
process dynamic transactions.  One is overdeclaration.  It is also called a “demotion” 
approach.  To use this approach, a transaction needs to reserve all the possibly used data 
items.  For the above query, simply declare that all the records in the employee relation are 
needed for write.  If an employee name does not match the user-entered name, then release 
the bits corresponding to the data items of this record immediately.  This method is easy to 
implement, but it lowers the concurrency in data access.  Because of the overdeclaration of 
this transaction, no other transactions can access the whole table.  The second strategy is to 
use REPOST.  It is also called a “promotion” approach.  At the beginning, a RV 
representing the initial data requirement for a transaction is posted; as more data items are 
needed later, another RV will be posted for the same transaction.  For a request vector with 
position index i in the ROLL, the firstly posted RVi is called initial RVi, and the reposted 
RVi is called rpRVi.  Both initial RVi and rpRVi represent the same transaction.  The 
promotion approach allows concurrent execution of transactions, but it requires REPOST 
algorithms to reorder of the RVs between the initial RVi and the rpRVi and make sure that 
all the execution is serializable.  We define REPOST and rpRVi as the following.
1. REPOST: a REPOST is the operation to post rpRVi to the ROLL.
2. rpRVi: rpRVi is reposted RVi.  It represents some additional data items needed 
after the processing of the initial RVi.  Usually it requires more data items to 
write than the initial RVi.

 	  
Here are some observations about the relationship between the initial RVi and the rpRVi.
1.  The initial RV is read-intensive.
2.  Whether a REPOST is needed can be determined after a transaction is sent to the TM.
3.  Only after the initial RVi is processed, can rpRVi be created.
4.  Due to the short time interval, the number of RVs between RVi and rpRVi is small.
5.  Usually, there is only one rpRVi for each initial RVi. 
6. The transaction cannot commit until its initial RVi and all the rpRVi are executed. 

5.2. Goal
The goal of this paper is to develop some efficient REPOST algorithms.  The algorithm 
should allow concurrent execution whenever possible.  Other goals are to 
achieve (1). POST-REPOST compatibility, i. e., all the RVs representing 
static and dynamic transactions can be posted as usual, and (2). REPOST-
CHECK concurrency, i.e., CHECK can be performed when a REPOST is in 
progress or vice versa.

5.3.  Basic Algorithms
5.3.1. REPOST Serializability
Although the initial RVi and rpRVi are sent to the scheduler as two RVs, they represent the 
same transaction.  A correct REPOST algorithm should produce a 
serializable history.  If rpRVi is processed just after initial RVi, then these 
two RVs are equivalent to occupying the same RV position.  Therefore, all 
the RVs in the ROLL are processed in posting order and are guaranteed in a 
serializable manner.  Otherwise, we need to make sure that the RVs between 
the initial RVi and rpRVi are serializable in order to produce a serializable 
history.  If there are no RVs between the initial RVi and rpRVi, then the 
overall  ROLL will be REPOST serializable (R1, Figure 5.1).  Otherwise, 
the ROLL is REPOST serializable only if none of the RVs in between are in 
conflict with the initial RVi (R2, Figure 5.1).
If any of the RVs between the initial RVi and rpRVi are in conflict with the initial RVi, then 
there will be a cycle in the SSG.  In this case, REPOST algorithm should be 
used to force serializable execution.
Theorem: ROLL still produces serializable history if it is REPOST serial or REPOST serializable.
To keep initial RVi and rpRVi executed in RVi rpRVi order or a REPOST serializable 
order, evidently there are two ways to schedule the RVs between the initial 
RVi and rpRVi.  One way is to make the initial RV and rpRV meet by moving the




Figure 5.1. A REPOST serial  (R1) and a REPOST serializable (R2) ROLL.

rpRVi up or initial RV down.  The alternative is to keep the initial RVi and 
rpRVi where they are, but move the conflicting RVs between the initial RVi 
and the rpRVi just below the rpRVi or restarting those conflicting RVs.  In a 
regular ROLL, the REPOST status of a RV is not known until the 
corresponding rpRVi is posted.  To start checking conflict earlier and 
prevent an initial RV from committing too early, the REPOST status of a 
RV should be determined as early as possible.  The ideal situation is that the 
REPOST status is known at the time when a RV is posted.  In fact, a 
transaction’s REPOST status can be determined based on the semantics of 
the query.  For example, suppose we have a table called STUDENT which 
contains the general information for all the students at NDSU; a query to get 
all the names of the students who are currently enrolled at NDSU will not 
need a REPOST.  On the other hand, to increase the stipend by 10% for a 
student with a name matching a value of variable, where the value of the 
variable can only be determined at runtime, is a dynamic transaction.  The 
second query needs a REPOST because who is entitled to the increase 
cannot be determined until the value of variable is determined.  


5.3.2. Parser 
	A parser should be added to the transaction manager module.  The 
function of the parser is to parse the transaction after it arrives and to 
determine the REPOST status for the transaction.  As a matter of fact, all the 
ultimate conservative concurrency control methods need a preprocessor to 
parse the transaction in order to get the readset and writeset of a transaction.  
To get the REPOST status, adding a few more logic processing operations 
will be enough.  Transactions requiring REPOST can easily be distinguished 
by analyzing the user request.  Usually REPOST is needed for transactions 
with conditional write or conditional update where the condition is not 
known until runtime.
In addition, each transaction should be assigned a unique transaction ID by the transaction 
manager.  For a dynamic transaction, the initial RV and all its reposted RVs 
should use the same transaction ID.  In effect, all relational database systems 
do assign transaction ID number to transactions; therefore, no extra work is needed.  

5.3.3. Data Structure Modification
	The structure of the RV should be changed in order to represent the 
REPOST status in the REPOST algorithm.  An extra bit is needed at the 
beginning of the request vector to indicate whether the corresponding 
transaction needs a REPOST later.  The value of the REPOST bit is either 
one or zero.  One means that a REPOST is needed for this transaction, and 
zero means no REPOST is needed for this transaction.  Figure 5.2 is an 
example of the modified RV.  In the illustrated RV, there are five data 
items, each with two operation modes.  The first bit represents read 
operation and the second bit represents the write operation.



 Figure 5.2. A modified RV.

  In addition to the data structures used in ROLL concurrency control method, a new data 
structure called CLIST (conflict list) needs to be introduced.  It has the same 
structure as ROLL but usually with smaller size.  For each RVi which needs 
a REPOST, there is a CLISTi associated with it.  Each CLIST also has a 
head pointer and tail pointer.  The head pointer points to its initial RVi in the 
ROLL, and the tail pointer points to the last RV in the CLIST.  The 
operations associated with the CLIST are MOVE, PASTE, and CLEAR.
MOVE performs the procedures done by REMOVE and POST operations of the regular 
ROLL.  The operation disconnects a RV from a ROLL and posts it to a 
CLIST.  All it does is to disconnect and connect RV pointers.
PASTE is similar to POST.  The difference is that PASTE pastes a RV list to the ROLL 
instead of a single RV.  After pasting, the head pointer of CLISTi is 
disconnected from the initial RVi and connected to the rpRVi.   
CLEAR clears the CLIST after the CLIST has been pasted to the ROLL.  The function of 
the CLEAR is to get rid of the CLIST identifier from a list so it cannot be 
referred or manipulated later. 
The key of the algorithm is to order the conflicting RVs between initial RVi and rpRVi.  
Before describing the rules to arrange the RVs, we first define REPOST conflict.
REPOST conflict: For each RVj which is after initial RVi and before rpRVi, it is in 
REPOST conflict with initial RVi if it has read-write or write-write conflict 
with the initial RVi. 
For any RVj which is in REPOST conflict with an initial RVi (i ?j), there will be a 
cycle in the SSG because of the existence of the path RVi ?RVj ?rpRVi, which is 
equivalent to Ti ->Tj -> Ti, in the corresponding SSG.  Different algorithms can be used to 
keep the execution serializable.  One is a no-wait approach; i.e., the REPOST conflicting 
transactions are restarted before any processing has been done to these transactions.  
Another strategy is to force the transaction in REPOST conflict to wait until the initial RVi 
and rpRVi are processed.  We call this kind of algorithm a conservative algorithm.  It can 
avoid aborting and restarting transactions and is also consistent with the  goals of the basic 
conservative concurrency control methods.   Regarding the REPOST algorithms, firstly let 
us consider the simplest case in which there is only one RVi requiring a REPOST in a 
ROLL.  We will develop a basic algorithm to handle this situation. 

5.3.4. Conservative Algorithm
	The goal of the conservative algorithm is to avoid aborting or restarting 
transactions.  The extra cost is the pointer assignments needed for the conflicting RVs.
3. The pseudocode for conservative algorithm is as follows:
Begin
Let rvPtr points to the first RV in the ROLL 
while (rvPtr is not null and the pointed RV is not rpRV) {
       	  If (REPOST bit of the RV is zero) //Representing a static transaction
	    CHECK and process this RV
                rvPtr++ //Go to next RV
                Continue
	  Else  //Representing a dynamic transaction
	    If it is the first check on this RV
	      Make a copy of RV as cpRV
	    Check this RV and process it
	    For (each RV below the initial RV and RV ?rpRV and rvPtr is not null) {
	        If (RV is not in REPOST conflict with cpRV)
		Check and process this RV
	        Else
		Move this RV into a CLIST
                    rvPtr++
	     }
	     if (the next RV is rpRV)
  	       Check and Process rpRV
                   Change the repost bit of initial RV to 0
 	       Paste CLIST below rpRV
 	       Clear CLIST
              }
1.1.3.1.1.1           End
         
 	
	When the rpRV is being processed, other transactions can still be posted to the 
ROLL.  After CLISTi is pasted below the rpRVi, all the RVs after rpRVi, including the 
pasted RVs from CLIST and the newly posted RVs, cannot be checked and processed by 
the current CHECK.  The following example shows how the conservative REPOST algorithm works. 
Figure 5.3 shows all the RVs in a ROLL at certain time.  The RVs are posted in the order 
from top to bottom.  Based on the REPOST algorithm, before every CHECK, a REPOST conflict test 
should be conducted.  For RV0, its REPOST bit is set to 1, so we need a CLIST0 to hold the 
conflicting RVs.  A copy of the original RV0, cpRV0, is needed for checking.   RV1, RV2, 
and RV3 are posted after RV0 and should be checked against cpRV0 for conflict.  RV1 is 
not in REPOST conflict with cpRV0, so it can be checked and processed.  RV2 is in 



              Figure 5.3. Some RVs in a ROLL.

REPOST conflict with cpRV0, so it should be moved to CLIST0 before any checking and 
processing is done on it (Figure 5.4).  RV3 is not in conflict with cpRV0, so RV3 can be 
checked and processed.  The conflicting RVs is RV0-> RV2-> rpRV0.  After rpRV0 is 



               Figure 5.4. A ROLL and the CLIST after checking. 


detected, it will be checked and processed immediately.  After checking for rpRV0, the 
repost bit of RV0 is changed to 0, and the CLIST0 is pasted below rpRV0.  The order of the 
RVs after this pasting will be the rearranged order of these RVs.  It is shown in Figure 5.5.  
It is equivalent to T1 ->T3 ->T0 ->T2.

5.3.5.  No-wait Algorithm

	In this algorithm, we do not use a CLIST to hold the conflicting RVs; instead, we

              


              Figure 5.5. The rearranged order of the RVs.

restart the conflicting RVs to make the execution serializable.  For every RV between the 
initial RVi and rpRVi, if it is not in conflict with the initial RVi, then it can be checked and 
processed; otherwise, it will be restarted.  In this case, we have choices over which
transaction should be the victim.  The alternative is to restart the initial RVi and rpRVi. 
Restarting rpRVi alone will not solve the problem because it cannot get rid of the cycle in 
the SSG.  Because rpRVi is only part of the transaction, if it is not processed quickly, more 
conflicting transactions will be accumulated, which will result in more restarts.  Therefore, 
we should not restart RVi and rpRVi. 
 The pseudocode for no-wait algorithm is as follows:
 Begin
 Let rvPtr points to the first RV in the ROLL 
 while (rvPtr is not null and the pointed RV is not rpRV) {
       	  If (REPOST bit of the RV is zero) //Representing a static RV
	    CHECK and process this RV
                rvPtr++
                continue
	  Else  //Representing a dynamic transaction
	    If it is the first check on this RV 
	      Make a copy of RV as cpRV
	    Check initial RV and process it
	    For (each RV below the initial RV and RV ?rpRV and rvPtr is not null) {
	        If (RV is not in REPOST conflict with cpRV)
		Check and process this RV
	        Else
		Restart this RV
                    rvPtr++
	     }
	     if (the next RV is rpRV)
                   Check and process this RV
                   Change the repost bit of initial RV to 0
	    }
         End

5.3.6. Pulling Up/Pushing Down Algorithm
As the other two REPOST processing algorithms, the bottom line of this algorithm 
is to make sure that the rp-interval must not produce a SSG cycle.  It is equivalent to make
the initial RV and rpRV meet each other and finally be combined together into a single RV 
(Figure 5.6).  In this algorithm, we need to calculate the AV for the rpRV1.  If  rpRV1 can 
access all the data items it needs, then it can be ORed together with the initial RV1 directly. 


	  Figure 5.6.  Move up rpRV1 and combine it with RV1.


Otherwise, further analysis may reveal that the initial RV1 can be moved down.  In this 
case, we will move the initial RV1 down as far as possible.  After initial RV1 is moved 
down, if initial RV1 and rpRV1 do not meet, all the RVs in the rp-interval will be reordered 
(maintaining their relative order with each other).  Conflict test will be performed firstly 
against initial RV1 and then rpRV1.  Any RVs in conflict with initial RV will be moved into 
initial RV conflict list, and for the rest of the RVs, anyone in conflict with rpRV1 will be 
moved into rpRV1 list.  The final order of the RVs will be rpRV1 conflict list->  initial RV1 
-> rpRV1-> initial RV1 conflict list.  
	 	
 The pseudocode for Pulling Up/Pusing Down Algorithm is as follows:

For all the RVs between initial RVi and rpRVi
If  AVi for rpRVi does not block data access
  Let RVi = initial RVi OR rpRVi
  Change the repost bit of initial RVi to 0
  Remove rpRVi 
Else{
  Let j=i+1
  While (RVj is not rpRVi){
     If RVj not in conflict with initial RVi and repost bit of RVj=0
      Move the initial RVi down
      Let j= j+1
 }
 If the next RV is rpRVi
    Let initial RVi = rpRVi OR initial RVi
    Remove rpRVi
    Change the repost bit of initial RVi to 0
Else
  Sort the RVs between initial RVi and rpRVi
  Let RV_List1= the initial RVi conflict list
  Let RV_List2 = rpRVi conflict List
  Order the conflict interval as RV_List2->initial RVi->rpRVi->RV_List1
  Let RVi = rpRVi  OR initial RVi
  Change the repost bit of RVi to 0
  Remove rpRVi


For the original ROLL shown in Figure 5.7, the access vector for rpRV1 is 11 01 11 
11 01 01, which is blocking rpRV1 from accessing the third and fourth data item.  So rpRV1 


	      	     Original Roll			             Sorted ROLL
1.4 
1.5    Figure  5.7. The original and sorted ROLL.


cannot be moved up.  Further analysis reveals that RV1 can be moved one position lower. 
After RV1 is moved down, a sorting is needed for the RVs.  The conflict list for RV1 
includes RV3 and RV4, which should be moved directly below RV1.  The conflicting RVs
for rpRV1 includes only RV6, so RV6 and rpRV1 should be moved up above RV1; then 
rpRV1 is ORed together with RV1.  After that, rpRV1 is removed.  After the order of the 
RVs are set, rpRV1 and RV1 can be combined together and the REPOST bit of the 
combined RV should be changed to zero.  The final sorted ROLL is as Figure 5.8.


               Figure 5.8. A ROLL with reposted RV removed.


5.3.7.  Proof of Serializability
	For ROLL r1 with RV0…RVi-1 RVi  RVi+1…RVi+n  rpRVi, all the transactions 
represented by RVs from RV0 to RVi-1 are guaranteed to be serializable (theorem from 
ROLL concurrency control method), therefore proving the ROLL with rpRV produces 
serializable history boils down to proving the processing of those RVs from RVi to rpRVi 
is serializable.  There is only one possible way that can lead to a cycle in a SSG, when a 
path like RVi ? RVi+x ? rpRVi ,  where x>0, exists in the SSG.  Based on the REPOST 
algorithm, this kind of cycle cannot exist because whichever RVi+x is detected, it will be 
either moved to CLISTi when using conservative algorithm, restarted when using no-wait 
algorithm, or moved below initial RV or above rpRV using the pulling up/pushing down  
method, before it is processed.  From this proof, we conclude that ROLL still produces 
only serilizable execution after introducing REPOST algorithm. 
	The algorithms developed in this paper and the overdeclaration method favor the 
dynamic transactions.  When there are conflicts, other conflicting transactions will be 
delayed or restarted.  Another method is to set priorities for transactions requiring 
REPOST.  If high priority is set, then this transaction will be processed first, and all the 
REPOST conflicting transactions between initial RVi and  rpRVi will be queued or 
restarted as described in the above algorithms.  If the priority of the transaction is low, the 
initial RVi will be queued and processed later.  

5.4. Multiple REPOSTs from a Single Transaction

1.5.3 Although in most cases, only one rpRVi is needed for an initial RVi, there 
are some times when multiple rpRVi are needed.  Multiple REPOSTs can happen when a 
transaction contains nested conditional operations where each condition can only be 
determined at runtime.  The basic REPOST algorithm needs to be changed to handle 
mutiple REPOST situations.  For transaction Ti with initial RVi and rpRVi0 
rpRVi1…rpRVin, for j=0 to n-1, all the rpRVij should have the REPOST bit set to 1 so that 
they will not be allowed to commit until the processing of the rpRVin is finished.  A copy 
of RVi and each rpRVij is also needed for checking conflicts.  When checking REPOST 
conflicts, all the RVj (j>i) between the initial RVi and the final rpRVi should be checked.  
Each RVj posted after the initial RVi should be checked against cpRVi and all the cprpRVix 
which are located before RVj in the ROLL.  The performance of the algorithm will degrade 
greatly with an increase in the number of REPOSTs.  In this case, more RVs located 
between the initial RVi and the final rpRVi will be moved to the CLIST or restarted.



            Figure 5.9. A multiple REPOSTs example.

	In the above ROLL shown in Figure 5.9, RV0 has multiple REPOSTs.  RV1 does 
not need REPOST.  For RV1, because no other reposted RV0 has appeared yet, it only 
needs to be checked against cpRV0 for conflict.  It should be moved into CLIST0 since it is 
in conflict with cpRV0 when accessing the second data item.  The REPOST bit of rpRV01 is 
set to 1, so it is not the final REPSOT.  When checking REPOST conflicts for RV2, it is 
necessary to check conflict against both cpRV0 and cprpRV01.  Although RV2 does not 
conflict with cpRV0, it conflicts with cpRV01, so it should be moved to CLIST0.  The 
REPSOT bit of rpRV02 is still 1, so it is not the final REPOST for RV0.  All the RVs posted 
after rpRV02 should be checked against cpRV0, cprpRV01, and cprpRV02 for conflict.  After 
the final rpRV0 is processed, the CLIST0 should be pasted below the final rpRV0.  Note that 
in this case, no matter how many times a RVi is repsoted, there is only one conflict list for 
this RVi.  The execution order for RVs from RV0 to the final rpRV0 should be equivalent to
5. 	…RV0? rpRV01? rpRV02?… ?rpRV0n-1 ?rpRV0n… 
	When multiple REPOSTs exist in a ROLL, the changed algorithm still produces 
serializable history.  To prove the serializability, suppose we have the following 
transactions in the SSG:
	…RVi?….?RVi+1 ?rpRVi0? …?RVi+2…?rpRVin-1? RVi+x? rpRVin
where RVi+1, RVi+2, and RVi+x are the RVs which are in conflict with initial RVi or one of 
the rpRVij.  Based on the algorithm, any RVi+1, RVi+2, and RVi+x will be either moved to 
the CLISTi or restarted, so it is not possible to have such a SSG.
 


	           Figure 5.10 A ROLL with two conflicting initial RVs.


5.5. Multiple Dynamic Transactions
Transactions requiring REPOST may interleave in the ROLL.  For a ROLL with nested 
dynamic transactions, there are two cases.  In the case shown in Figure 5.10, although both 
RV0 and RV1 need REPOST, the rpRV can be processed just like there is only one RV 



          Figure 5.11. A ROLL with two non-conflicting initial RVs.


Requiring REPOST.  Because RV1 is in conflict with RV0, RV1 will be moved into CLIST0 
and processed later.  In this case, RV0 and RV1 will be processed sequentially.
In another case, as shown in Figure 5.11, both RV0 and RV1 need REPOST, and they are 
not in conflict.  If we allow RV0 and RV1 to be processed concurrently, we should set rules 
for checking the conflict.  If a RV is posted below RV1 and this RV is not in conflict with 
both of RV0 and RV1, then this RV can be checked and processed.  If it is in conflict with 
only one of the two initial RVs, then it should be put into the corresponding CLIST.  If the
final rpRV0 is posted first, the order of the execution is from RV0 to RV1; otherwise, the 
order is from RV1 to RV0.  In order to produce serializable execution, we need to make 
sure that the earlier posted rpRV and the CLIST will not introduce new conflicts with other 
dynamic transactions which have been partially processed when the CLIST is incorporated 
into the ROLL.  It is to say that supposing we want the execution in RV0 RV1 order, rpRV0 
and CLIST0 should not introduce new conflicts with initial RV1 and its rpRV1s after CLIST0 
is pasted to the ROLL.  If rpRV0 is in conflict with the initial RV1, then RV0 and all the 
rpRV0s, or RV1 and all the rpRV1s must be aborted.  If any RV in CLIST0 is in conflict with 
the initial RV1, then this RV should be put into CLIST1 instead of pasting after rpRV0.  
Therefore, a conflict should be checked for each RV in CLIST0 against cpRV1.  We can see 
from the above analysis that, if RV0 and RV1 are processed concurrently, aborting static 
and dynamic transactions is possible.  Checking for conflict for each RV in the CLIST is 
also needed before pasting a CLIST.  To make things worse, when there is a great number 
of dynamic transactions with multiple REPOSTs in the ROLL, the algorithm will be very 
expensive to perform.  Aborting dynamic transactions is much more expensive than static 
transactions.  To avoid the overhead, we choose to allow only one RV requiring the 
REPOST scheduled at a time; i.e., we keep dynamic transactions scheduled sequentially.  
For initial RV0 to initial RVn where each RV represents a dynamic transaction, we will put 
all these RVs except RV0 into a queue and process them one by one or force all the 
subsequent initial RVs into CLIST0.  Some change will make the basic REPOST algorithm 
work for processing the nested REPOSTs.  

5.6.  General Algorithms                                                                            
1.8 Put all the discussed scenarios together; we need algorithms to handle ROLL 
allowing multiple REPOSTs from a single dynamic transaction and multiple initial 
RVs representing different dynamic transactions.  This type of ROLL is processed by 
concurrent processing of non-conflicting static transactions with a dynamic 
transaction, but dynamic transactions are processed sequentially; i.e., in each 
CHECK operation, only one dynamic transaction is scheduled. 
 
5.6.1. Conservative Algorithm
	The pesudocode for general conservative algorithm is as follows:
Begin
Let rvPtr points to the first RV in a ROLL //rvPtr is a RV pointer
While (rvPtr is not null) { //not past the last RV
If the repost bit of the pointed RV =0  //static transaction 
    Check and process the RV
    rvPtr++ //Move to the next RV
    Continue  //go to the starting point of the loop
  Else  //dynamic transaction
    If this is the first check on the RV
      Copy initial RV to cpRV //make a copy
    Check and process the initial RV
    rvPtr++
    While (rvPtr is not null and not final rpRV){
       If the RV has same transaction id with initial RV and repost bit=1//not final rpRV
         If this is the first check on RV
           Make a copy of the current RV
         Process the current RV
       Else if RV has same transaction id and repost bit =0 //final repost
         Check and process it
         Past CLIST below this RV
         Change the repost bit of the initial RV and all the non-final reposts to 0
         Clear CLIST
         Return
         Else if repost bit =1 // a different dynamic transaction
         Move to CLIST
       Else if repost bit =0 and not in conflict with the initial RV and any cprpRVij above it 
         Process it
       Else move it to CLIST //there is a conflict
       rvPtr ++
       }
     }
End

5.6.2. No-wait Algorithm
         
	The pesudocode for general no-wait algorithm is as follows:

Begin
Let rvPtr points to the first RV in a ROLL //rvPtr is a RV pointer
While (rvPtr is not null) { //not past the last RV
1.8.1.1.1.4   If the repost bit of the pointed RV =0  //static transaction 
    Check and process the RV
    rvPtr++ //Move to the next RV
    Continue  //go to the starting point of the loop
  Else  //dynamic transaction
    If this is the first check on the RV
      Copy initial RV into cpRV //make a copy
    Check and process the initial RV
    rvPtr++
    While (rvPtr is not null and not final rpRV){
       If the RV has same transaction id with initial RV and repost bit=1
         If this is the first check on the RV
           Make a copy of the current RV
         Process the current RV
       Else if RV has same transaction id and repost bit =0 //final repost
         Check and process it
         Change the repost bit of the initial RV and all its non-final reposts to 0 
         Return
       Else if repost bit =1 // a different dynamic transaction
         Restart this RV
       Else if repost bit =0 and not in conflict with the initial RV or any cprpRVij above it,  
        process it
       Else  
       Restart this RV //there is a conflict
       rvPtr ++
       }
     }
     End



5.6.3. Pulling Up/Pushing Down Algorithm	
The pesudocode for general pulling up/pushing down algorithm is as follows:
For all the RVs between initial RVi and the final rpRVij
While (rpRVij not meeting the initial RVi){ 
  If  AVij for rpRVij does not block data access
    Let rpRVij-1 = rpRVij-1 OR rpRVij
    Remove rpRVij and change the repost bit of rpRVij-1 to 0
    Let j=j-1
}
 If j=0 //All the rpRVs have be ORed together, work done
     Return
 Else
   Let k=i+1
   Let m=0  //RVi0 is the initial RVi
While (RVk is not rpRVij){
     If RVk not in conflict with RVim and repost bit of RVk =0
      Move the RVim down
       If RVk is rpRVim+1
       Let RVim+1=rpRVim OR rpRVim+1
       Remove rpRVim
        Let m=m+1
        Let k= k+1
  }
  If RVk is final rpRVij 
    Let rpRVim = rpRVim OR rpRVij
    Remove rpRVij
     Change the repopst bit of rpRVim to 0
     Return  //all done
  Else
  Sort the RVs between RVim and final rpRVij
  Let RV_Listm= the RVim conflict list
  Let RV_Listm+1=rpRVim+1 conflict list
  ….
  Let RV_Listj = rpRVij conflict list
  Let rpRVij = rpRVij  OR rpRVij-1 OR … rpRVim 
  Remove all the rpRVi from rpRVim to rpRVij-1
  Arrange the conflict interval as rpRVij->RV_Listm->RV_Listm+1… ->RV_Listj
 
5.6.4. Serializability
The most complicated senario with this algrithm is when both multiple REPOSTs 
and nested REPOSTs exist in the ROLL:RV0 –> RV1–> rpRV11 ->  RV2->  rpRV12 –>RV3.     
Although both RV1 and RV2 represent dynamic transactions, the scenario is the same as a 
single dynamic transaction with multiple REPOSTs.  The same proof can be used to prove 
the serializability because only one dynamic transaction is processed at a certain time.  For 
all the additional initial RVs which require REPOSTs, they will be put into the conflict list 
and wait to be processed in the next cycle.

5.6.5. Performance
In comparison with 2PL, the basic ROLL has no deadlock, fewer restarts, and 
higher concurrency.  After REPOST is introduced to ROLL, the only extra cost is  
reordering the RVs in the rp-interval between an initial RV and its rpRV.  It still maintains 
the advantages of the basic ROLL. 
Higher concurrency and throughput is an evident advantage of the ROLL even with 
dynamic transactions allowed.  The best case occurs when there are no dynamic 
transactions in a ROLL.  In this case, it is processed just as a regular ROLL.  As usual, one 
check will get all the accessible data items, and those data items can be accessed 
concurrently.  The worse case occurs when every RV is from a dynamic transaction.  In 
this case, because only one dynamic transaction can be checked at each cycle, it is 
equivalent to overdeclaration in concurrency level.  The time used to process the query is 
much shorter because, when the final rpRV is submitted, the initial RV and previously 
submitted RVs have been completely or partially processed, so the throughput should be 
much higher.  The level of concurrency on average case, where low percentage of RVs are 
from dynamic transactions in a ROLL, would be much higher because it allows non-
conflicting static transactions to be processed concurrently with a dynamic transaction; 
therefore, the algorithms achieve higher concurrency most of the time and higher 
throughput all the time.
Another advantage is no deadlocks.  When there are a lot of dynamic transactions 
with multiple REPOSTs, the chance of deadlock is very high when using 2PL.  To break 
the deadlock, one of the involved transactions will be chosen as victim and restarted.  
Restarting partially processed transactions is very expensive.  It wastes the system 
resources and lowers the throughput.  In addition, 2PL needs a dedicated process to detect 
deadlocks.  This feature will add more cost to the scheduler.  The conservative and pulling 
up/pushing down algorithm are also restart-free.  In the aggressive algorithm, although 
restarts will happen, they are different from 2PL restarts in that not any processing is done 
on the victim transactions; therefore, they are more economic restarts.
The algorithms of this paper are less CPU-intensive.  The CPU usage in 2PL 
includes checking lock availability, allocating lock, updating lock table, and handling 
deadlocks.  Because deadlock is unavoidable in 2PL, detecting and breaking deadlocks is a 
routine task for the scheduler.  Choosing a victim and restarting it is the solution to break 
deadlock.  Because the deadlock detection is so CPU-intensive, it is a big boost to 
performance when avoiding it.  In ROLL, because there is economic restart and no 
deadlock, there is less wasting in computer resources.   The CPU usage will be mostly on 
pointer manipulation and bit AND/OR operations.  Bit operations are very fast. 
The disadvantage is that more memories may be needed for ROLL.  In 2PL, the 
memory used will be dependent on the number of data items, lock mode, and number of 
bytes used for each lock.  The number of data items is dependent on the granularity level 
and the number of data objects.  The granuality can be at table, record, or field level.  The 
higher the granularity, the smaller number of locks is needed, and thus lower concurrency 
is allowed.  For DBMS systems, the number of locks available is usually a configurable 
parameter.  In ROLL, the determining factors are number of data items, operation mode 
and active transactions.  The number of bytes used for 2PL and ROLL can be calculated 
using the following formula. 
B2pl =B * M * D
Broll=M*D*T/8

    where
  	  B2pl: Bytes needed for 2PL
              Broll: Bytes needed for ROLL
              M: number of operation mode
              D: number of data items
              T:  number of active transactions      

In SQL Server, each lock takes 32 bytes of memory.  In comparison with this 
implementation, for the same number of operation modes and data items, 
when the number of active transaction is 256, the memory usage is the same 
for the two methods.  Above this number, ROLL uses more memory than 
2PL, and below this number, 2PL uses more memory.  By setting total 
number of locks smaller than the total number of data items, 2PL may save 
memory significantly; the consequence is reduced concurrency.     

5.7. Discussion of Implementation Issues
	ROLL is an efficient concurrency control method.  Its advantages are evident and 
there is still a lot of room to make improvement.  This discussion will be on replacing 
pointers with sequence numbers, using block/extent to store ROLL segment, and the issue 
of strict levels.
Pointers are used extensively to connect RVs in the ROLL to maintain atomic posting of 
RVs.  When the number of pointers becomes great, they can be expensive to 
maintain.  To reduce the overhead, a sequence number can be assigned to 
each RV instead of using pointers.  In a centralized ROLL, adding an auto-
increasing identifier will do the job.  In a distributed environment, a 
sequence number server is needed to assign the number to each RV at each 
site.  The server will keep an array of sequence numbers which represent the 
current maximum number at each site and update the sequence number after a new RV is posted. 
As a ROLL becomes big, say millions of RVs, performance for checking can become time-
consuming.  A lot of work in AV calculation is repetitive and can be 
avoided. One solution is to decrease the number of RVs scanned when 
calculating an AV.  We can store ROLL segments in blocks or extents.  At 
the header of each block or extent, an AV is stored.  This AV will be 
updated only when the RVs above this block or extent have been changed.  
To calculate AV for the RVs in the block, only the RVs in the current block 
or extent need to be accessed; the disk I/O and the number of AND/OR 
operations will be dramatically reduced.  Blocks can be moved back and 
forth from the disk to memory based on the demand.
Basic ROLL can guarantee serializable execution of the transactions, but higher strictness 
can also be easily achieved.  The level of strictness is determined by when a 
bit can be changed from 1 to 0.  When calculating an AV for a RV, it should 
determine whether the conflicting RVs are committed, so it is necessary to 
add a commit bit to a RV.  To obtain strict history, the write bits in a RV 
should be changed after the corresponding transactions commit or abort.   

5.8. Optimum of the Algorithms
	The algorithms proposed in this paper are optimal for most cases and nearly optimal 
in the rest of the cases. To verify this claim, we may look at those cases.  When there are 
no dynamic transactions in a ROLL, all the RVs are processed as a regular ROLL.  In this 
case, all the available data items will be filtered out after one check, and all the available 
data items can be accessed at the same time.  Highest concurrency can be achieved, so it is optimal. 
When both dynamic transactions and static transactions are present, all the non-
conflict static transactions are treated as in regular ROLL.   For the rest of the RVs, they 
will be rearranged into a serializable order.  After that, all those RVs will be processed as 
those in a regular ROLL.  The question is whether the way to reorder the RVs in 
rp-interval is optimal.  When there is a single dynamic transaction, the REPOST status of 
the transaction is determined at the earliest possible time, i.e., when the transaction arrives 
to the transaction manager.  When the initial RV arrives, it will be scheduled at the earliest 
possible time.  When the final rpRV arrives, it will be processed in the shortest possible 
time since the initial RV and all the other rpRVs have already scheduled.  The ROLL with 
a rp-interval will be turned to a regular ROLL as soon as possible.   All the conflicting RVs 
in the interval are on hold until the final rpRV is scheduled. The holding is necessary 
because they will be restarted otherwise.  Therefore, the whole algorithm is optimal.   
When there are multiple dynamic transactions, all non-conflict transactions will be 
processed as a regular ROLL; the dynamic transactions will be scheduled in a serial order; 
i.e., one dynamic transaction will be checked in each check cycle; others will be put into a 
conflict list.  Whether the strategy to handle this case is optimal depends on the following factors:

1. Checking interval
Because checking can be done continuously, the checking interval is dependent on 
the number of transactions and the number of data items.  The number of active 
transactions is usually small.  When using block/extent storage structure and distributed 
databases or partitioned ROLLs, checking can be done very fast, and hundreds of checking 
cycles per second can be achieved.

2. Number of REPOSTs per transaction
The more REPOSTs per transaction, the longer it takes to process a single dynamic 
transaction; other dynamic transactions will wait longer in the queue.  The waiting is 
necessary because there is a great chance of conflict among the dynamic transactions with 
multiple REPOSTs and it is much more expensive to restart them when they are partially processed.  

3. The conflict rate among dynamic transactions
The higher the rate of conflicts among the dynamic transactions, the more dynamic 
transactions will be restarted to maintain serializable scheduling.  There is a tradeoff 
between waiting and wasteful restart.  The algorithms developed in this paper let 
conflicting transactions wait or quick restart.  When using ROLL partitions, even multiple 
dynamic transactions exist; they can still be checked concurrently if they only use data 
items in different ROLLs.  In this case, we can achieve concurrent checking with no restart 
or economic restart.   Therefore, using the algorithms along with ROLL partition and 
block/extent storage, the algorithms can achieve optimal performance.     

5.9. Conclusion
This paper proposed three algorithms for processing dynamic transactions.  They can 
be used as an alternative to the overdeclaration approach.  Each of these algorithms has 
some pros and cons.  The advantage of the conservative algorithm is that it avoids 
transaction restart.  The disadvantage is that the commit order of the transactions is not the 
same as the arrival order.  The conservative algorithm may be not fair in some cases.  
Another disadvantage is that extra cost is needed for moving the RVs between CLIST and 
the ROLL.  In a no-wait algorithm, if there is a conflict, then the transaction will be 
restarted.  This algorithm is simple, but many transactions will be restarted if there is high 
percentage of dynamic transactions.   The ROLL restart is more economic in comparison 
with 2PL restart because no processing is done on the restarted RVs.  The pulling 
up/pushing down method is more fair algorithm, but the sorting is more complicated.
In comparison with the overdeclaration method, the new algorithms allow higher levels 
of concurrency because it only reserves the data items it needs and allows non-conflicting 
RVs between the initial RV and its rpRV to be executed without waiting.  The conservative 
algorithm still maintains the properties of ultimate conservative algorithms, i.e., no abort 
and restart.  Although algorithms are illustrated using the ROLL concurrency control 
method, it can also be used for other types of concurrency control methods.  The basic idea 
is to know whether a transaction is a dynamic one just after a transaction arrives.  At one 
time, only a subset of the required dataset for a transaction is sent to the scheduler; later, 
another subset of the data items is sent to the scheduler.  The key of the algorithm is to 
reorder the conflicting transactions which are submitted between the initial and final 
submission of the dynamic transaction.  The algorithms are optimal for cases of no 
REPOSTs or single REPOST from one dynamic transaction.  Together with ROLL 
partition, the algorithms can achieve optimal performance when nested dynamic 
transactions with multiple REPOSTs exist. 

BIBLIOGRAPHY
[Ber87]	Bernstein, P. A., Hadzilacos, V., Goodman, N.  Concurrency Control and  Recovery in DBMS. Addison-Wesley, 1987.
[Bro97a] Brooks, P.  Data Mining Today.  DBMS Tools & Strategies for IS Professionals. 10(2):59-66, 1997.
[Bro97b] Brooks, P.  March of the Data Marts.  DBMS Tools and Strategies for IS Professionals. 10(3):55-62, 1997.  
[Cob97]	Cobb, E. E. The impact of Object Technology on Commecial Transaction Processing. The VLDB Journal. 6(3):173-190, 1997. 
[Elm94]	Elmasri, R., Navathe, S. B. Fundamentals of Database Systems.  Benjamin/Cummings Publishing Company, Inc., 1994.
[Gra93]	Gray, J., Reuter, A. Transaction Processing: Concept and Techniques. Morgan Kaufmann Publishers, 1993
[Hae84]	Haerder, T. Observations on Optimistic Concurrency Control Schemes.   Information Systems. 9(2):111-120, 1984.	
[Kor83]	Korth, H. F. Locking Primitives in a Database System. Journal of the ACM 30(1): 55-79, 1983.
[Lom97]	Lomet, D. Salzberg, B. Concurrency and Recovery for Index Trees.  VLDB Journal. 6(3):224-240, 1997. 
[Pan97]	Panagos, E., Biris, A.  Synchronization and Recovery in a Client-Server Storage System.  VLDB Journal. 6(3):209-223, 1997. 
[Per91]	Perrizo, W.  Request Order Linked List (ROLL): A CC Object, IEEE Conf. on Data Eng, Kobe, Japan, 1991.
[Per92]	Perrizo, W., Rajkumar, J., Ram, P.  HYDRO: A Heter DDBMS.  IEEE Intl Conf. On Data Engineering, February 1992.
[Yan84]	Yannakakis, M. Serilizability by Locking.  Journal of the ACM 31(2): 227-244, 1984.






Appendix 3
**************************************************************************
ISOLATION LEVELS (more detail)

SQL defines three levels of isolation which are weaker than SERIALIZABILITY
(they do not guarantee ACIDS properties entirely, but they are easier to achieve).


REPEATABLE READ ensures that a transaction, T, reads only the changes made by
committed transactions and that no value read or written by T is changed by
any other transaction until T is complete.  

repeatable read isolation allows the PHANTOM PHENOMENON: If T is reading "all
employee records in the sales department", another transaction might enter (insert)
a new employee record with department=sales, which could be missed by T's read.

Repeatable read isolation can be achieved by using the same locking protocol
as 2PL except that it doesn't use index locking (index locking will guarantee that no
other transaction can complete an insert of an employee record in the sales dept until
T has completed it's read - since such a read locks the the value, department=sales
of the department index and an insert involves a write to that same index record.)


READ COMMITTED ensures that T reads only changes made by committed transactions
and that no value written by T is changed by any other transaction until T is
complete.  However a value read by T may well be modified by another transaction
while T is still in progress.  T is exposed to the phantom problem also.

A read committed transaction obtains exclusive locks before writing items and
holds these locks until the end.  It also obtains shared locks before reading
but these locks are released immediately - their only effect is to guarantee that
th