TRANSACTIONS
The terminology used in this section is that all users
(online interactive users or batch programs) issue transactions to the DBMS.
A TRANSACTION is an atomic unit of database work specified to the DBMS.
A transaction is issued using constructs such as reserved words,
BEGIN to initiate a transaction
(most actual system supply the BEGIN if the user doesn't,
e.g., whenever a new SQL statement is encountered
it is assumed to iniate a new transaction)
END to end a transaction (usually either COMMIT for successful END
and ABORT for unsuccessful END)
(most actual system supply this element if the user doesn't, e.g.,
If SQL statement execution is successful,
Then DBMS supplies COMMIT, else ABORT)
READ whenever any data is needed from the DB
(e.g., in an SQL SELECT)
WRITE whenever any data needs to be written to the DB
(e.g., in an SQL INSERT or UPDATE)
In this set of notes, all others aspects of language, coding, etc.
will be considered as un-intrepreted aspects.
For the purposes of transaction management (Concurrency Control and Recovery)
we only need to consider this level of detail.
ATOMIC unit of work means that the DBMS guarantees that unit will be
done to completion or not at all (in which case, the DB and User community
will be left just as they were when the request came. i.e., as if
the unit of work never existed)
A Transaction is defined by all work specified between a
BEGIN statement and the next encountered END (either COMMIT or ABORT)
When a transaction arrives at the DBMS, a Transaction Manager (TM)
is assigned to it (code segment to act on its behalf).
The TM interfaces with other components, e.g., the Scheduler (SCHED)
for permission to access particular data items
SCHED is like a policeman, giving permission
to access the requested item(s).
Its activity is called concurrency control.
Once permission is granted for TM to access data items, Data Manager (DM)
does the actual reads and writes.
There are several models for describing this interaction. Here are two.
Model-1:
1. TM makes requests to the SCHEDULER to read/write data item(s) or
to commit/abort the transaction
2. SCHED decides if the request can be scheduled
If yes it schedules request (passes it to DM (on TMs behalf)
If no it rejects it and informs TM.
3. DM read/writes the data item or commits/aborts the transaction
if possible, else returns reject to the SCHEDULER
(which returns it to the TM).
4. DM returns the value read (or returns an acknowledgement (ACK) of the
write/commit request to the SCHEDULER
5. SCHED returns the same to the TM.
.---------------------------. There can be one TM multithreaded by
| Transaction Manager(s) | all transactions, or an individual TM
| | assigned to each individual transaction.
`---------------------------'
^ :
: 2/3. reject : 1. read,write,commit,abort
: 5 value, write/commit_ack v
.---------------------------.
| SCHEDULER |
`---------------------------'
^ :
: 3. reject : 2. read, write, commit, abort
: 4 value, write/commit_ack v
.---------------------------.
| DATA MANAGER |
`---------------------------'
^ :
: : 3. read, write
: v
.---------------------------.
| DATABASE ON DISK |
`---------------------------'
In this second model,
TM interfaces with SCHED for permissions and then after getting permissions,
TM interfaces with DM for data access
SCHED interfaces with TM, not DM
(so the scheduler is often named a LOCK MANAGER in locking system
or just the ACCESS MANAGER)
.-----------------------------.
| Transaction Manager(s) |
`-----------------------------'
^value(read) ^ |
| ack/reject | |
|| | | read, write, commit, abort
|| |ack/reject v
|| .---------------------------.
|| | SCHEDULER |
|| `---------------------------'
|
|read, write, commit, abort
v
.---------------------------.
| DATA MANAGER |
`---------------------------'
^
| | read, write
|value(read), ack v
.---------------------------.
| DATABASE ON DISK |
`---------------------------'
We assume the 2nd model most of the time in this section.
It just depends upon where the submodule that
does the actual read/write request to the DM resides (in SCHED or TM)
Background on Concurrency Control (the activity of the scheduler, SCHED):
We need "concurrency control" CC (AKA: mutual exclusion) whenever there are
shared system resources that cannot be used concurrently
An illegal concurrent use of a shared resource is called a conflict
e.g., a printer,
a data item that one user wants to read and another wants to change.
IN DBMSs the shared resources we will call "data items".
"DATA ITEM GRANULARITY" is the level at which we treat concurrency control.
The possible Granularity levels are:
field level (logical level, very fine granularity)
record level (logical level, fine granularity)
page level (physical level, medium granularity)
file level (logical level, coarse granularity)
area level (logical level, quite coarse granularity)
database level (logical level, very coarse granularity)
We will assume, that a data item is a record
(i.e., we assume logical, record-level granularity)
This means there are many more shared resources for DBMS to manage than there
are anywhere else, (e.g., printers for an O/S), and
CC is a harder problem to solve in a DBMS than anywhere else!
A DBMS may have 1,000,000 records or more.
An O/S manages ~ 50 printers.
Ethernet Medium Access Protocol (unswitched) manages ONE bus
Transactions are often called QUERIES when they request only read access
(i.e., QUERIES are READ-ONLY TRANSACTIONS)
Databases provide the user with the ability to define their
own "units of work" (i.e., the transaction) which will be executed
to completion or not at all (guarantee made by the DBMS).
In any resource management situation (OS, NOS, DBMS..)
there are "shared resources" and
there are "users"
items and customers, if you like.
SHARED RESOURCE MANAGEMENT
How can the system insure correct access to shared resources
among concurrently executing transactions?
All answers seem to come from traffic managment!
(traffic management intersections or construction zones or driveup windows)
WAITING POLICY: If a needed resource is unavailable,
requester waits until it becomes available
(e.g., intersection red light, Hardees drive up lane).
This is how print jobs are managed by an OS
Advantages: NO RESTARTING (no unnecessary loss of progress)
e.g., At Hardees, they don't say "Go home! Come back later!
Disadvantages:
DEADLOCKS may happen unless they are managed.
e.g., at a construction zone, if the two flag women
don't coordinate, both traffic lines may start into
the construction zones from opposite directions and
DEADLOCK in the middle!)
INCONSISTENT RESPONSE TIMES
At the Hardees window, you may wait an hour or a minute.
(Not so important at Hardees (well maybe it is? ;-),
but it is very important at, e.g., at your local ER)
RESTART POLICY:
If a needed resource is unavailable,
Then requester terminates request and restarts request later.
e.g., When someone goes before the parole board:
They either get their request
or they restart the process later ( much later? ;-(
e.g., Ethernet (unswitched) CSMA/CD:
If node A wants to send a message to node B:
1. Carrier Sense (the "CS" part):
check the wire for traffic.
if busy (in use by another sender), wait
(according to some "back-off algorithm")
then check again, etc. until the bus is idle
then SEND message
2. Collision Detection (the "CD" part):
listen to bus until you're certain that your message
did not collide with another concurrently sent message
(the required length of wait time is the traversal_time of wire,
since there are terminators (absorbers) at each end)
Advantages of restart policies: simple, no deadlock
Disadvantages: Lower throughput,
lost progress,
long delays?,
possible livelock
A Transaction = A computation or program taking the database from
one consistent state to another (without necessarily preserving
consistency at each step of the way)
- The transaction is an atomic unit of database work,
ie, DBMS executes transaction to completion or not at all, GUARANTEED.
If only one transaction is allowed to execute at time
and if the database starts in a consistent state (obeying all ICs)
then it will always end up in a consistent state!
The problem is, the above SERIAL EXECUTION is much too inefficient!
A DBMS (is supposed to) guarantee ACIDS PROPERTIES of transactions:
ATOMICITY: A transaction is an all-or-nothing proposition.
Either a transaction is executed by the DBMS to
completion or all of its effects are erased completely.
(Transaction = atomic unit of database workload)
CONSISTENCY: Correct Transactions take the database from
one consistent state to another consistent state.
Consistency is defined in terms of consistency
constraints or "integrity constraints", e.g.,
entity integrity,
referential integrity,
others
ISOLATION: Each user is given the illusion of being the sole user
of the system (by the concurrency control subsystem).
DURABILITY: The effects of a transaction are never lost after it is
"committed" by the DBMS. (ie, after a COMMIT request is
acknowledged by the DBMS) (requires RECOVERABILITY)
SERIAL EXECUTION insures most of the ACID properties
(Consistency and isolation for sure.
It also helps in atomicity and durability).
ie, queue all transactions as they come in (into a FIFO queue?)
Let each transaction execute to completion before the next even starts.
Serial execution may produce unacceptable execution delays
(i.e., long response times) and low system utilization.
SERIALIZABLE EXECUTION is much much better!
Concurrent execution of multiple transactions is called
serializable if the effect of the execution of operations
(reads and writes) within the transactions are sequenced
in a way that is equivalent to some serial execution
(i.e., is is as if it was done by a serial execution of transaction operations)
Serializability facilitates ATOMICITY, CONSISTENCY and ISOLATION of
concurrent, correct transactions, just as much as SERIAL does.
A property called RECOVERABILITY facilitates
DURABILITY (more on that later)
An execution is RECOVERABLE if every transactions that commits,
commits only after every other transaction it read-from is committed.
ISOLATION LEVELS
SQL defines levels of isolation which are weaker
than SERIALIZABILITY (they do not guarantee ACIDS properties
entirely, but they are easier to achieve).
REPEATABLE READ ensures that no value read or written by a transaction, T,
is changed by any other transaction until T is complete;
and that T can read only changes made by committed transactions.
READ COMMITTED ensures that no value written by a transaction, T
is changed by any other transaction until T is complete; and that
T can read only changes made by committed transactions.
READ UNCOMMITTED ensures nothing (T can read changes made to an
item by an ongoing trans and the item can be further changed
while T is in progress.
There will be further discussion on these later in these notes.
For now, please note there are several suggested paper topics
in the topics file concerning isolation levels.
But also note that I think they are bunk!
CONCURRENT TRANSACTIONS = transactions whose executions
overlaps in time (the individual operations (read/write
of a particular data item) may be interleaved in time)
Again, the only operations we concern ourselves with are
BEGIN, READ, WRITE, COMMIT, ABORT
READ and WRITE are the operations that apply to data items.
data item can be a field, record, file, or DB (logical granules)
or page (physical granule). We assume record-level granularity.
A read(X) operation, reads current value of the data item, X,
into a program variable (which we will also called X for simplicity).
Even though we will not concern our selves with these details in this section,
read(X) includes the following steps:
1. Find the address of the page containing X
2. Copy that page to a main memory buffer
(unless it is already in memory)
3. Copy the value of the dataitem, X, from the buffer
to the program variable, X
The write(X) operation, writes the value of the program variable,
X, into the database item X.
It includes the following steps:
1. Find the address of the page containing X
2. Copy that page to a main memory buffer (unless it is already in memory)
3. Copy the program variable, X, to buffer area for X
4. Write the buffer back to disk (can be deferred and is governed by DM)
DBMSs should guarantee ACID properties
(Atomicity, Consistency, Isolation, Durability).
This is typically done by guaranteeing the condition of
SERIALIZABLILTY introduced above.
- Database operations are scheduled so that changes
to the database and output to users is equivalent
to the changes and outputs of SOME serial execution.
If each transaction is correct by itself
(takes a correct database state to another correct state),
then a serial sequence of such trans will be correct also.
Thus, serializable executions or histories guarantee correctness,
Some important example of "incorrectness" problems,
which can happen without proper concurrency control:
Allowing arbitrary interleaving of operations from concurrent transactions.
(Note: We introduce Two Phase Locking concurrency control as solutions
LOST UPDATE:
tammy1
Tammy deposits 500; Jimmy deposits 1000 in their joint account.
@@@ ///
@ - - @ | o o |
@ ` ~ ' ` - '
| ____ _____ |
.( )---|$500| |$1000|-----|-.
.' | |____| |_____| ( ) `.
/ `. ^
_____ | |
L L JOINT L L
ACCOUNT
T1 (dep $500) BALANCE T2 (dep $1000)
workspace of T1 ON DISK workspace of T2
0. $2000
1. $2000 <- - $2000
2. add 500: $2500 $2000
3. $2500 $2000 - -> $2000
4. $2500 $2000 $3000 1000++
5. $2500 - -> $2500 $3000
6. $3000 <- - $3000
SOLUTION?
LOCKING = Each transaction must obtain a "lock" on an item
(an access right obtained from a "lock manager"
or "scheduler") before it can access that item.
NOTES:
Concurrent read-read to the same data item can be done in either order
(no conflict exists).
If T1: read1(x) and T2: read2(x) are concurrent,
then in terms of changes to the database (none are made here)
and messages to users (2 are made here), the same "effect" is
produced regardless of order of execution of read operations.
Concurrent read-write or write-write to the same data produce different
results depending on the order
(a conflict exists iff at least one operation is a write and the
operations access the same item)
Even if the operations themselves (the individual reads and writes)
are made atomic by the Buffer Manager, there can still be "conflict"
because different transaction results can occur.
If T1: write1(x) and T2: read2(x) are concurrent,
then in terms of changes to the database (one is made here)
and messages to users (one is made here), different "effects"
are produced by the 2 orders of execution of the operations.
i.e., if write1(x) is done first,
user-2 will get the value written to X by T1,
while if write1(x) is done second,
user-2 will get the initial value of X.
If T1: write1(x) and T2: write2(x)
then in terms of changes to the database (two are made here),
different "effects" are produced by the 2 orders of execution
of the operations.
i.e., if write1(x) is done last, the database will be
left with the value written by T1,
while if write2(x) is done last, the database will be
left with the value written by T2,
Therefore, sometimes, to improve performance,
we distinguish between locks for read-only access and
locks for write-access by having two types of locks:
A lock for read-only access is read-lock or shared-lock (SLOCK).
----------------
A lock of write-access is write-lock or exclusive-lock (XLOCK).
------------
SLOCKS are "compatible" or "non-conflicting":
if an SLOCK is held on a data item,
another trans can be granted a concurrent SLOCK
XLOCKS are "incompatible" or "conflicting":
if an XLOCK is held on a data item,
another trans cannot be granted a concurrent XLOCK
nor a concurrent SLOCK
Therefore the compatibility table is:
\ Requester> |
Holder \ > SLOCK | XLOCK
vvvvvvvv\________>___________|___________
| | |
| | |
SLOCK | | yes | no
| | |
________|________|___________|___________
| | |
| | |
XLOCK | | no | no
| | |
| | |
Or said another way, the conflict table is:
\ Requester> |
Holder \ > SLOCK | XLOCK
vvvvvvvv\________>___________|___________
| | |
| | |
SLOCK | | no | yes
| | |
________|________|___________|___________
| | |
| | |
XLOCK | | yes | yes
| | |
| | |
Both the Compatibility and Conflict tables give the very same info.
Sometimes you will see it given as a compatibility table and sometimes
as a conflict table.
Is locking with SLOCKS and XLOCKS enough Concurrency Control?
NO!
Example:
*INCONSISTENT ANALYSIS PROBLEM
Tammy transfers $100 from savings to checking, concurrently the
bank is running an audit on the 2 accounts (summing accounts)
tammy2
@@@ ________ __/BANK)
@ - - @ |ACCOUNTS| _____ < $> |
@` - ' |--------| |AUDIT| >___'
| ____ |CHECKING| |Ch__<==__.|
.( )-|100 | |--------| |Sav__|----|
.' | |____| | | |Tot__| (|
/ `. |--------| ^
/_____`. |SAVINGS | | |
L L |________| L L
SAVINGS CHECKING
T1 (trans) BAL LOCK BAL LOCK T2 (audit)
T1-wkspace T2-wkspace
0. 500 800
1. 100 <- - 400 T-1 800
2. 100 400 800 T-2 - - > 800
3. 100 400 T-2 - - - - - - - - -> 400
4. 100 400 800 = 1200
5. 100 - - - - - - - - -> 900 T-1
6. 400 900 = 1200
SOLUTION?
2-Phase Locking (commonly called "2PL")
Locking as above, with the additional condition that each transaction
must acquire all its locks before releasing any of its locks.
Point in time at which a transaction releases it's first lock
is called "lockpoint"
In 2PL systems, the serial order to which
the serializable order is equivalent, is lockpoint order.
Is Two-Phase Locking (2PL) enough concurrency control?
NO!
CHECKING SAVINGS
T1 (TRANS) BAL LOCK BAL LOCK T2 (AUD)
========== === ==== === ==== ========
0. 500 800
1. 100 <- - 400 T-1 800
2. 100 400 T-1 800 T-2 - - -> 800
3. 100 400 T-1 800 T2 blocked
4. 100 400 T-1 800
5. 100 400 T-1 800 T1 blocked
In the above example, an impasse has been reached! (called DEADLOCK)
Two-Phase Locking (2PL) is still not enough concurrency control.
NEED DEADLOCK MANAGEMENT
We will look at many more deadlock management schemes later, but
The simplest deadlock management scheme, one that will always work, is
"TIMEOUT" which simply means that each transaction is given a set amount of
time to complete. If it fails to complete in that set amount of time
(governed by a "timeout clock") it must abort and restart.
Is 2PL + deadlock mgmt enough CC?
NO!
* UNCOMMITTED DEPENDENCY PROBLEM (or cascading rollbacks)
Tammy deposits 500, then Jimmy deposits 1000,
Tammy's transaction aborts after Jimmy's commits.
Note: In order it accomodate transaction "abort" or "rollback",
Write-Ahead Logging (WAL) protocol is used:
-------------------
A changed database item cannot be written to the database disk until
the "before value" (the value before the change took place) has been
"logged" to a secure place (the system log - on a separate disk).
@@@ ///
@ - - @ | o o |
@ ` - ' ` - '
| ____ _____ |
.( )---|$500| |$1000|-----|-.
/ /_`. ^
L L L L
T1 (YOU Dep 500) BALANCE T2 (SIG Dep 1000)
0. $2000
1. XLOCK
2. $2000<- - $2000
3. 500++ $2500 - ->$2500 . . . . WAL: before value, 2000, written to log
4. UNLOCK
5. XLOCK
6. $2500- ->$2500
7. $3500<- -$3500 1000++
8. UNLOCK
9. COMMIT
10. ROLLBACK (using logged "before-values")
11. $2000
* Solution: Strict 2-Phase-Locking:
all locks must be held until COMMIT/ABORT
TRANS-1 (YOUR BALANCE T-2 SIG
DEP OF $500) DEP $1000
============= ======= ========
0. $2000
1. LOCK
2. $2000<- - $2000
3. 500++ $2500 - ->$2500
4. ABORT $2000
5. LOCK
6. $2000-> $2000
7. $3000<-- $3000 $1000++
COMMIT
These, of course, are not the only solutions. But they are solutions.
Many commercial system use Strict 2PL or something close to it.
To review:
LOCKING:
TM must acquire a lock (XLOCK to write / SLOCK to read if 2 MODES,
else, just a LOCK, if there is only 1 MODE)
from the SCHEDULER (model-1) or LOCK MANAGER (model-2)
before the operation request can be sent to the DATA MANAGER
by the SCHEDULER (Model-1 or the LOCK MGR (Model-2)
DATA MGR will return the value_read for a READ operation or
an Acknowledgement (Ack) for WRITE/COMMIT/ABORT operations.
TM request the RELEASE of all locks (to the SCHEDULER/LCOK_MGR RELEASE)
on or before the transaction ENDs.
TWO PHASE LOCKING (2PL): For a given transaction, all locks must be
acquired before any are releasing any.
STRICT TWO PHASE LOCKING (S2PL): All locks are RELEASE request are made
at transaction END (COMMIT/ABORT)
CONSERVATIVE TWO PHASE LOCKING (C2PL): All locks are ACQUIRED before any
operation request are sent to the DATA MGR.
i.e.,
lock acquisitions
^ .^.
| . .
| . .
| . .
| . .
| . .
| . .
General 2PL ._______________________________________ time
| | |
BEGIN LOCKPT END
lock acquisitions
^ .. . . . . . . .
| . .
| . .
| . .
| . .
| . .
| . .
C2PL ._______________________________________ time
| |
BEGIN END
and LOCKPT
lock acquisitions
^ . . . . . . . ..
| . .
| . .
| . .
| . .
| . .
| . .
S2PL ._______________________________________ time
| | |
BEGIN LOCKPT END
lock acquisitions
^ . . . . . . . . . . . . . . .
| . .
| . .
| . .
| . .
| . .
| . .
CS2PL ._______________________________________ time
| |
BEGIN END
and LOCKPT
All are 2PL of course.
DEADLOCK MANAGEMENT
Deadlocks can occur when a WAITING POLICY is used for CC.
How can deadlocks be
PREVENTED (precluding the possibly of one ever happening),
AVOIDED (taking corrective action when one is imminent) or
RESOLVED (detecting existing deadlocks and resolving them (periodically)
First, a useful tool for studying deadlock mgmt is the WAIT-FOR-GRAPH or WFG which has
a node for each transaction that is involved in a wait an edge
from each waiting transaction (the holder)
to the transaction it is waiting for (requester).
T1 ------> T2
^ /
`-------'
WFG in the 2PL example above
Formal Definition: A DEADLOCK is a cycle in the Wait-For-Graph.
SAVINGS
It is sometimes useful to label edges with item involved T1 -----------> T2
^ /
This is called a binary cycle (2 transactions) `------------'
CHECKING
Cycles can have lengths greater than 2, of course:
T1 is waiting on T2 for a a b c
T2 is waiting on T3 for b T1--->T2--->T3--->T4
T3 is waiting on T4 for c ^ /
T4 is waiting on T1 for d `--------------'
the cycle length is 4. d
Deadlock Prevention
Action is taken to prevent even the possibility of a deadlock.
E.g., flu shot is a preventative action
(you may not have gotten the flu anyway)
1. Transaction Scheduling (all trans obtain access to all needed data items
before beginning execution.) prevents deadlocks.
(C2PL is a transaction scheduling mechanism)
Comes from construction zone management.
--------------. .----------
`. .' ____
____ `-------------' ____ <____|
<____| ____ ____ <____| _ _
<____| <____|
___ .--. .--. _____________ .--.
|___>(stop |go > \ |go >
-------`--' `--' `-`--' -----
`.O.' O.'
| |
^ ^
Need GO permission from both flag persons before proceeding into the zone.
2. SERIAL EXECUTION prevents deadlocks:
3. WOUND_WAIT & WAIT-DIE deadlock prevention methods
are timestamp-based prevention methods to decide
who can wait whenever a conflict arizes.
timestamp = unique ordinal transaction-id or "stamp"
(usually start-time i.e., DoB (Date of Birth)
so we can talk about one transaction being "older than"
another, meaning its timestamp is lower (born before the other) )
WOUND_WAIT: When a requesting trans (the requester) finds that the
requested data item is held by another trans (the holder):
if REQUESTER is OLDER (has lower timestamp),
then REQUESTER WOUNDS HOLDER,
else REQUESTER WAITS;
where WOUND means holder is given a short time to finish with the item,
otherwise it must restart (bleeds to death from the wound?)
NOTES:
WW a pre-emptive method.
The only waits allowed are YOUNGER REQUESTERS waiting for OLDER HOLDERS.
Assumes blocking waits only (e.g., if requester waits, it waits idly)
Never a cycle in WFG. Why not?
M Luo, M.S. 87; M Radhakrishnan, M.S. 92; and T Wang, M.S. 96 advanced this
protocol as their M.S. theses (and also published their results).
WAIT_DIE: When a requesting trans (the requester) finds that the
requested data item is held by another trans (the holder):
if REQUESTER is OLDER (has lower timestamp),
then REQUESTER WAITS,
else REQUESTER DIES;
NOTES:
WD is non-preemptive.
W. Yao modified wound-wait and wait-die to allow forward and backward waiting
by introducing an additional parameter assigned to each waiting trans,
called "orientation". (Info Sci Journal, V103:1-4, pp. 23-26, 1997.)
Wait-Die and Wound-Wait
deadlock AVOIDANCE (avoiding deadlock when it's about to happen)
1. Request Denial: Deny any request that would result in deadlock
(must check WFG cycle every time a wait is requested)
DEADLOCK DETECTION and RESOLUTION TECHNIQUES
All Deadlock detection/resolution protocols revolve around Wait-For-Graph (WFG)
put an edge in WFG representing each new wait,
periodic analysis WFG for cycles and then
select a victim trans to restart from each cycle.
Victim selection criteria can vary.
Some system use "youngest" others use "oldest",
others use "been waiting the longest time" and still
others use "been waiting the shortest time".....
TIMEOUT
1. When a TRANSACTION BEGINs, a timeout clock is set.
If transaction is still active when the timeout clock runs down to zero,
then transaction is aborted.
2. When a TRANSACTION has to WAIT, a timeout clock is set.
If transaction is still waiting when the timeout clock runs down to zero,
then transaction is aborted. (reduces timeout clock overhead)
Potential improvements probably leap to mind for 2, e.g.,
2.1 only set timeout clock if the item requested is already in LockTable
(meaning that there is already a wait in progress for that item).
General Notes:
Deadlock management is still a very important area of research and there's still
much to be done, even though there are many methods described in the literature.
One reason: Deadlocks which involves data distributed across a network
are a much harder problem than centralized deadlocks.
Locking, as a concurrency control method,
REQUIRES a CENTRALIZED lock-table object (logically at least)
with a SINGLE THREADED lock manager (a monitor or critical section)
The Locking protocols presented above are called PESSIMISTIC
OPTIMISTIC Locking assumes there will be no conflict
and then tests that assumption for validity at COMMIT time.
If assumption proved false, entire (completed) transaction is aborted.
OTHER CONCURRENCY CONTROL METHODS
---------------------------------
BASIC TIMESTAMP ORDERING (BTO) is a RESTART POLICY (no waiting)
Each transaction gets a unique timestamp (ts) (usually arrival time)
Note that timestamps were introduced already in the context of deadlock
management schemes to accompany Locking Concurrency Control. Now we are going
to use timestamps for concurrency control itself! (not deadlock management
will be necessary here since the CC method is "restart" method, not "waiting"
method).
BTO SCHEDULING DECISION:
When Scheduler receives a READ request, it rejects it iff a
YOUNGER trans has written that item.
When Scheduler receives a WRITE request, it rejects it iff a
YOUNGER trans has written or read that item.
NOTES on BTO:
Timestamp is usually "arrival time" but can be ANY linear ordering.
When the SCHEDULER rejects a request, the requesting trans restarts.
BTO must also schedule accepted operations to DM in ts order also.
In order to make the SCHEDULE decisions,
scheduler must know ts of last transaction to write each item and
ts of last transaction to read each item.
Thus, the system must keep both of these "data-item timestamps"
for EVERY data item, x, in the system,
a data item read timestamp, rts(x), and
a data item write timestamp, wts(x).
Usually kept right with the data item itself:
.--------------------+--------+--------.
| data_item(x) | rts(x) | wts(x) |
`--------------------+--------+--------'
That takes a lot of extra space
e.g. 10 billion data items (records) in DB (not uncommon),
data-item-timestamps may take up 160 GB,
assuming 8 byte ts (note that 4 bytes won't do).
BTO is a pure RESART policy (uses only restart conflict resolution.
BTO CC is deadlock free (since waiting is not used).
BTO, however, can experience livelocks (trans continuously
restarting for the same reason over and over).
BTO results in lower concurrency in central systems (studies have shown)
BTO, works better in distributed systems. Why?
All the Scheduler has to have in order to make the scheduling decision
when a transaction, t asks for a data item, x, is the
transaction-timestamp, ts(t), and
data-item-write-timestamp, wts(x) (for a read request) and the
data-item-read-timestamp, tts(x) (for a write request)
DISTRIBUTED BTO SCHEDULERS NEED NO INFORMATION FROM OTHER SITES
ts(t) comes along with the transaction, t (part of its identifier)
wts(x) and rts(x) are stored with the data item, x, at that site.
SITE1 site where t(r[x],w[y]), arrives
t
/ \
r[x],ts(t) / \ w[x],ts(T)
/ \
/ \
_________________ v v _________________
|x, rts(x), wts(x)| SITE2 SITE3 |y, rts(y), wts(y)|
`-----------------' `-----------------'
SCHED2 at SITE2 schedules r[x] SCHED3 at SITE3 schedules w[y]
if ts(t) > write_ts(x), if ts(t) > MAX{ wts(y), rts(y) }
else rejects r[x] else rejects w[y]
Again, one can see, that there is system overhead in BTO since
EVERY DATA ITEM has to have a read_timestamp (rts) and a write_timestamp (wts)
each could be 8 bytes, so additional 16 bytes of system data for each record
A large database can have billions and even trillions of DATA ITEMS (Records).
By contrast, a distributed 2PL scheduler must maintain Lock Table at some 1 site
Then any request coming from any site for data at any other site would have
to be sent across the network from the request-site to the LT-site and then
the reply wold have to be sent from the LT site to the data site(s).
However, LT is not nearly as large
One further downside to BTO:
The BTO Scheduler must submit accepted conflicting operations to DM in ts-order
BTO could issue them in a serial manner:
Wait to issue next one until previous is ack'ed
That's very inefficient! (serial execution is almost always inefficent
Usually a complex "handshake" protocol is used to optimize this.
DO NOT CONFUSE BTO with Wound-Wait or Wait-Die Deadlock Management!
Both are timestamp-based, but
BTO is Concurrency Control Scheduler, while
WW/WD are deadlock prevention methods (to go with a, e.g., 2PL scheduler)
OPTIMISTIC concurrency control assumes optimistically, no conflicts will occur.
Transactions access data without getting any apriori permissions.
But, a Transaction must be VALIDATED when it completes (just prior to COMMIT)
To make sure its optimistism was correct. If not, it must abort.
VALIDATION
Note: validation module must be single threaded (monitor or mutually excluding):
1. A commiting transaction is "validated"
if it is in conflict with no active transaction
else it is declared "invalid" and must restarted.
So a transaction must list the data items it has accessed
and the system must maintain an up-to-date list of
"active transactions" with t(ts) and accessed data-item-ids?
Basically, an optimistic concurrency control can be thought of as being BTO,
in which the "timestamping" is done at its commit time, not at start time,
(transaction is validated iff it is not "too late" accessing any of its data)
since active transactions are younger than the committing transaction.
Validation must be an atomic, single threaded process
Therefore if any active trans has already read a item
that the committing trans wants to write (all writes are
delayed until validation) it's too late for committing
trans to write it in ts order and thus, must be restarted.
Note that this is non-prememptive optimistic CC.
There are other optimistic methods. It is possible to devise a
pre-emptive optimistic CC protocol in which the validating
trans actually "kills" all active transactions which have
read items it needs to write.
How would a pre-emptive OCC compare to the standard non-pre-emptive protocol?
Got an idea? It could be a paper topic!
CVS (Concurrent Versioning System) for software version control is
an optimistic CC system!
Need to write a simple Concurrency Controller (Scheduler) for your boss?
A very simple and effective SCHEDULER (no critical section coding required).
Cooperating TMs do "self service" 2PL using the ethernet LAN CSMA/CD protocol
CSMA/CD = Carrier Sense Multiple Access with Collision Detect
CSMA/CD-Concurrency Control
When cooperation TM, t, seeks access to item, x, it will:
1. Check availability of x (analogous with "carrier sensing")
(Is another trans using it in a conflicting mode?).
2. If x is available, set lock on x (TM does this itself! in a LockTable File)
else try later (after some backoff random period).
3. Check for collision (with other cooperating trans that might have been
setting conflicting locks concurrently (analogous to "collision detecting")
4. If collision, the TM removes all lock(s) it set
and tries again later (after some backoff period).
5. Release all locks after completion (COMMIT or ABORT) (Strict 2PL).
(This is a S2PL protocol WITHOUT an active scheduler).
To make it even simpler, we can dispense with the carrier sense step:
CD-Only Method:
When a cooperation trans, t, seeks access to a data item, x, it must:
2. Set lock.
3. Check for collisions.
4. If there is a collision, remove all locks and try later (after backoff).
5. Release all locks after completion (COMMIT or ABORT).
(This is also a S2PL protocol without an active scheduler).
In fact, one can write this code in SQL, something like:
Assume there is a file acting as the LockTable, called LT, such that
LT(TID, RID, MODE) where TID is column for the Trans' ID number,
RID is column for Record's ID number,
and MODE is either "shared" or "exclusive" (S or X).
Below shows some of the code for a CD-like CC Method
(what additional code would be required for a CSMA/CD like method?).
If T7 (transaction with TID = 7) needs an XLOCK on the data
item with RID = (53,28), the TM for T7 issues:
BEGIN
INSERT INTO LT VALUES ('7', '(53,28)', 'X');
V = SELECT COUNT(*) FROM LT WHERE RID='(53,28)';
IF V = 1, THEN COMMT
ELSE ABORT (try again later),
DELETE FROM LT WHERE TID='7';
Another CC method that uses Cooperation Transaction Managers and no Scheduler:
(Note this technology - together with a later refinement called ROCC, is patent
pending concurrency control technology at NDSU. In reverse time order, it
can be called ROCC and ROLL Concurrency Control)
ROLL: Request Order Linked List Concurrency Control
(a further enhancement of this approach, ROCC and MVROCC
are patent pending technologies at this time by NDSU).
ROLL is a generalized model which includes aspects of locking
and timestamp ordering as well as other methods.
ROLL is:
1 non-blocking (no idle waiting)
2 restart free and thus livelock free.
3 deadlock free
4 self-service for trans mgrs
(no active singlethread scheduler other than an enqueue operation)
5 very parallel (little critical sectioning)
6 ROLL is easily distributed
Data items are requested by a transaction using a REQUEST VECTOR (RV) bit vector
Each data item is mapped to specific bit position using an assignment table
(Domain Vector Table or DVT).
A 1-bit at a position indicates that that item is requested by the trans and a
0-bit means it is not requested.
- if read and write modes are to be distinguished, use 2 bits, a read-bit and
a write-bit for each item.
- could use a bit vector for the items to be read, the ReadVector
and another bit vector for the items to be written, the WriteVector
ROLL can be thougth of as an object in which the data structure
is a queue of Request Vectors, one for each transaction.
.----------.
|010010...0| <- -Ti
`----------'
^
|
.----------.
|010010...0| <- -Tj
`----------'
^
.
.
.----------.
|010010...0| <- -Tk
`----------'
^
|
.------.
| tail |
`------'
with 3 basic methods:
POST: For establishing requests
CHECK: For checking availability
RELEASE: For releasing control
ROLL OPERATIONS
---------------
POST (allows a transaction to specify its requests)
POST is an atomic "enqueue" operation (the only atomicity required
the only critical section)
CHECK (determines availability)
CHECK returns the logical OR of all RVs ahead of requesters POSTED vector
in the ROLL.
- The vector resulting from this OR operation is (called the "Access Vector"
or AV and represents a "lock table" for that transaction
(specifies which items are available and which are not).
- If we have a separate ReadROLL and WriteROLL, in order
to determine what can be read, a trans CHECKs the WriteROLL only and
to determine what can be written, a trans CHECKS both WriteROLL and ReadROLL
- reCHECKing can be done any time
eg, when trans finishes data items found available on first CHECK,
it would issue another CHECK expecting that some of the previously
unavailable items have become available in the interim.)
RELEASE: (releases dataitems to the next requester)
RELEASE set some or all of trans' 1-bits to 0-bits.
VALIDATE: (can be added for optimistic transactions)
1. Optimistic transactions would read data without POSTing
2. Optimistic transactions would buffer all writes until commit
3. Upon reading x, optimistic transaction would record rts(x),
by copying the current ROLL tail-pointer.
4. before commit, Optimistic trans would have to VALIDATE
VALIDATE:
POST its request vector
CHECK the intevening ROLL interval from its vector to its reads
If there are no intervening writes in conflict with its reads, the
Transaction is valid and can be committed,
else it must be restarted.
A garbage collector can operate in the background to remove zeroed vectors.
PROBLEMS?
Two potential problems leap immediately to mind immediately.
1. Excessive Vector length for fine data item granularity.
One-bits are most space efficient way to indicate a needed item.
Zero-bits are unnecessary except to maintain positional matchup.
SOLUTIONS:
Partitioning DB (eg, by files or even by ranges of records within files)
Designate a separate ROLL for each partition
.----------. .----------.
T1|010010...0| -->|010010...0|-null
`----------' `----------'
^ ^
| |
.----------. | .-
T2|010010...0| - - - - - - - -. - >|0
`----------' | `-
^ |
. |
. |
.----------. .----------.
Tm|010010...0| -->|010010...0| - - . .
`----------' `----------'
^ ^
| |
.------. .------.
|File-1| |File-2|
To ensure a serialization partial order (trans-1's vector is POSTed ahead of
trans-2's in 1 ROLL, therefore it must be POST ahead in all ROLLs)
i. to insure proper POSTing, entire series of POSTs for a given transaction
can be made atomic.
or
ii. a POST-ROLL can govern POSTing. A separate POST-ROLL is maintianed.
- Each bit position in the POST-ROLL represents a DB partition.
- Each new transaction POSTs a special vector (POST-RV) indicating the
partitions to which it needs to POST (using 1 bits at those positions).
- Then CHECKing POST-ROLL determines which partitions can be POSTed to.
- Allows multithreading of POSTs. Multi-level partitions are possible.
PROBLEM 2. Multiple ORs in CHECK operation would be slow (if ROLL is long).
A background process may Precompute and save all AVs in 1 pass down the ROLL.
(the ROCC technology adds a major improvement to multiple CHECK operations)
PROBLEM 3. Basic Roll does not accommodate dynamic transactions
(the entire access set of data items is not known ahead of time
(e.g., "give $1000 bonus to all red-headed professors").
SOLUTION: Allow rePOSTing
But then the intervening interval between the rePOST and the original POST
must be examined for conflicts. If there are unresolvable conflicts,
RESTART is necessary.
(ROCC technology provides a sophisticated way of doing this)
DISTRIBUTED DATABASE MANAGEMENT SYSTEM, DDBMS consists of multiple DBMSs connected over a network
- If all DBMSs are same (same vendor), it's called a homogeneous DDBMS.
(and we will assume the vendor supplies the software to make the separate
DBs operate as one).
Else , it is call a heterogeneous or federated DDBMS
- A heterogeneous DDBMS requires software on top of the of local DBMSs
to intercept global transactions (those accessing data >= 2 sites)
We assume that this Global harness is unable to change internal workings
of local DBMSs (local DBMSs are binary licensed)
But we do assume that the Local DBMSs are set to maintain local
serializability
DDBMS TRANSACTION MANAGEMENT (some issues)
DISTRIBUTED DEADLOCKs are more difficult to manage than local deadlocks,
since it may be that no site can detect the
global deadlock by itself.
e.g., Assume an agent represents a trans at each site where it needs data
.- - -At A, a T1-agent is waiting on T2 for x
.
x
SITE-A T1 -- >T2
^ | a T2-agent at A is waiting for a T2-agent at B
| | (e.g., to make an update at B based on a read at A)
| v and a T1-agent at B is waiting for a T1-agent at A
SITE-B T1< -- T2
y
.
`- - -At B, a T2-agent is waiting on T1 for y
Requires more communication, since no local WFG will have a cycle.
solutions are based on an expensive WFG forwarding process.
DISTRIBUTED COMMITMENT is more difficult to manage than local commitment.
Recall, that before a DDBMS can ACKnowledge the COMMIT request of a trans,
it must be sure that all agents can successfully COMMIT (for DURABILITY)
If even 1 agent cannot guarantee DURABILITY, COMMIT must be rejected.
A solution,
TWO PHASE COMMIT (2PC)
Similar to Marriage Ceremony: unless all parties commit, there's no marriage
and there are 2 phases to the commitment, a
VOTE phase and a
DECISION phase.
In marriage, a COORDINATOR (minister,imam,rabbi,priest...) initiates a
VOTE_PHASE: all PARTICIPANTS vote YES or NO
(implicit in this is DURABILITY, the assumption that a yes means you are
ready to get married no matter what - can't change your mind!)
Participants who vote YES must be in the READY state (either to get
married or not depending upon the decision made by the coordinator))
Coord: Tracy, do you take Pat to be your lawfully wedded spouse?
Tracy: Yes!
Coord: Pat, do you take Tracy to be your lawfully wedded spouse?
Pat: yes!
Coord: Anyone here present know of any reason why these 2 should not be joined matrimony?
All: silence (means yes).
DECISION PHASE: the Coordinator counts votes and makes the DECISION.
If all votes are YES, COORD decides COMMIT and broadcasts that to participants
I pronounce you married!
If even 1 vote is NO, coordinator decides ABORT and broadcasts that:
Sorry folks. Hope you saved your gift receipts.
In DDBMSs we use the same 2PC protocol to commit distributed transactions.
Upon receiving a COMMIT request from a transaction:
COORD asks agents to VOTE.
Agent votes YES only if it can force READY log record
(with enough detail to COMMIT or ABORT) else votes NO and aborts.
COORD gets all YESs, then DECIDES COMMIT (forces COMMIT log record)
else DECIDES ABORT (forces ABORT log record)
In either case, informs all agents to do the same.
Whenever the system fails,
If a transaction has a BEGIN in the log, but no END record,
if a decision log record is found, RESTART PROCEDURE can finish 2PC process
else it must ABORT the transaction and send that decision to all.
2PC STATE TRANSITION DIAGRAM:
C = Coordinator
Si = Subordinate-i (agent-i)
0. SEND 5 ALL (NOT ALL) YES, FORCE COMMIT (ABORT) LOGREC, Move to COMMIT (ABORT) STATE
"PREPARE" 6 SEND COMMIT to all (ABORT to all NON-NO voters)
C C------------------->C C gets ACKs
v ^ v ^ 11 WRITE END RECORD LOCALLY
| | | | 12 FORGETS TRANS
| | | |
PREPARE YES or NO COMMIT or |
| ABORT ACK
/ \ | / \ |
S1 S2 | S1 S2 |
`----`---->' `----`---->'
Si READY TO COMMIT (ABORT), Si->COMMIT (ABORT) STATE
1 FORCE "READY"(ABORT) to LOG 7 FORCE "COMMIT" (ABORT) LOGREC
2 ENTER READY STATE 8 SEND ACK (NON-NO's SEND ACK)
3 SEND YES (NO) 9 COMMIT(ABORT) LOCALLY
4 (ABORT LOCALLY) 10 FORGET TRANS
5 (FORGET TRANS)
DISTRIBUTED CONCURRENCY CONTROL IS HARD
Indirect conflict is a problem unique to Heterogenous DDBMSs
Let T1 be a local trans (not even known to the heterogeneous DDBMS) executing
concurrently with 2 global transactions, T2 and T4:
The only ordering required is that
T2 must r2(a) first, then w2(c)
T4 must r4(c) first, then w4(b) (both are read-modify-write sequences)
. - > -.
: :
SITE2: T2: r2(a), w2(c)
. .
. .
. .
w1(a) a .
SITE1 T1:^ c SITE3
`r1(b) b .
. .
. .
. .
SITE4: T4: w4(b), r4(c)
: :
` - < -'
DSCHED sees no T2, T4 conflict (only knows r2(a) is executed before w2(c))
DBMS1 (SITE1) see no T2, T4 conflict, so it could interleave:
w4(b) 1st
r1(b) 2nd
w1(a) 3rd
r2(a) 4th so T4 < T1 < T2
DBMS3 (SITE3) could interleave: w2(c) 1st
r4(c) 2nd so T2 < T4
Execution is NOT SERIALIZABLE
but no DDBMS could detect it
assuming each local system is an autonomous binary licensed COTS product).
TICKET SOLUTION force all pairs of distributed transactions
into conflict where ever they both access data by
forcing each to write its id to a bogus data item called
site ticket
Since both are writing to a common data item,
they are in conflict at that site and therefore
HDDBMS can enforce a particular order.
Let the site ticket at SITE-i be ti
T2: w2(t1),w2(t3),r2(a),w2(c) SITE2
\ \ ^ \
\ \ / \
\ \/ \
\_____/\________ \
/\ \ \
.--------- > a v v v
T1: r1(b),w1(a) SITE1 t1 t3 c SITE3
^--------------- b ^ ^ ^
^ / / /
\/ ______/ /
/\ / /
/ X /
____/ / \ /
/ / \ /
T4: w4(t1),w4(t3),w4(b),r4(c) SITE4
HDDBMS now sees a conflict between T2 and T4 so it decides an ordering
e.g., T2 < T4
HDDBMS forces w2(t1) to be ACKed before w4(t1) is sent and the
HDDBMS forces w2(t3) to be ACKed before w4(t3) is sent and then
DBMS1 will have to order T2 before T4 at SITE1 and
DBMS3 will have to order T2 before T4 at SITE3.
**************************************
APPENDIX: More on Concurrency Control.
-------------------------------------
HYDRO TRANSACTION MODEL:
This global trans processing model is assumed.
T1 T2
| | ...
V V
s: TM1 TM2
/ | \ / | \
/ | \ / | \
/ | \ / | \
1 2 n 1 2 n
T T T T T T
1 1 1 2 2 2
| | | | | |
v v v v v v
.-------------------------------
| HYDRO MODULE |
`-------------------------------'
| | | | | |
v v v v v v
1 1 1 2 2 2
T T T T T T
1 2 n 1 2 n
| | | | | |
v v v v v v
LOCAL HYDRO LOCAL HYDRO
DRIVER1 DRIVER2
| |
V V
DBMS1 DBMS2
Ti's are Global Trans Mgrs
i
T subtrans of Tj running at site i.
j
HYDRO protocol
At each site ther is a Local Trans ROLL (LROLL) and
a Global Trans ROLL (GROLL)
i
Each T is submitted to its site in
j a serializable order by HYDRO.
The LOCAL-HYDRO-DRIVER POSTs as below,
then send an acknowledgement to HYDRO
(guaranteeing POST order and serialization
p.o. compatible).
POSTing:
Local Trans POST to the LROLL only, but get a
"position" in both ROLLs (read both tail-pointer values).
Global Trans POST to the GROLL only, but get a
"position" in both ROLLs.
CHECKing:
--------
Local Trans CHECK (LCHECK) GROLL only (conflicts between
pairs of local trans are the responsibility of the
local DBMS) If there are no conflicts submit, else wait.
Global Trans CHECK (GCHECK) both LROLL and GROLL.
If there are no conflicts, submit else wait.
RELEASE: Same.
-------
HISTORIES:
----------
Example program:
PROCEDURE P begin
Start;
temp:=Read(x);
temp:=temp+1
Write(x,temp);
Commit
END
for the purposes of concurrency control theory this program
can be represented as the "History":
H0 r1[x] -> w1[x] -> c1
where r1[x] stands for the read operation by transaction-1 on dataitem, x.
and w1[x] stands for the write operation by transaction-1 on dataitem, x.
and c1 stands for the commit operation of transaction-1.
This graph is called the Serialization Graph (SG). The nodes of SG are operations
and each edge represents the execution order of two conflicting operations.
(Note we don't even care what value gets written. The value written is an
"uninterpreted" feature or characteristic, as far as concurrency control
is concerned)
More Examples:
--------------
Consider three transactions,
T1 r1[x]--> w1[x]--> c1
T2 r3[x]--> w3[y]--> w3[x]--> c3
T4 r4[y]--> w4[x]--> w4[y]--> w4[z]--> c4
An example of a complete history (execution) for { T1,T3,T4 } as a SG is:
r3[x]-> w3[y]-> w3[x]-> c3
^ ^
| |
H1 r4[y]-> w4[x]-> w4[y]-> w4[z]-> c4
^
|
r1[x]-> w1[x]-> c1
We use simple left-to-right ordering to indicate the Von Neumann
order of execution of operations when we wish to do so
(assuming a Von Neumann machine - one which executes one operation
at a time):
Four possible Von Neumann histories over the transactions,
T1 = w1[x] w1[y] w1[z] c1
T2 = r2[u] w2[x] r2[y] w2[y c2 are:
H2 w1x w1y r2u w2x r2y w2y c2 w1z c1
H3 w1x w1y r2u w2x r2y w2y w1z c1 c2
H4 w1x w1y r2u w2x w1z c1 r2y w2y c2
H5 w1x w1y r2u w1z c1 w2x r2y w2y c2
APPENDIX 2:
----------
Another approach to ROLL: ROLL is a queue object of RVs
Methods available to transactions:
1. QUEUE(RV) {returns ROLL address of RV}
2. READ(a,b) {returns RVs from address a to b}
where a must be an address returned to the TM from a POST
and b must be an address returned to the TM from a POST or "HEAD"
("address" could be simply "position number" and HEAD could be position 0)
3. SET(a,list)
sets the bits in all positions in the list of the RV at address a to zero
where a must be an address returned to the TM from a POST
list can be any comma list of numbers and/or intervals
Then TMs are responsible to compute AV from the link list returned by READ(a,HEAD)
and TMs are responsible to determine validity of a rePOST from the return of READ(a,b)
Inefficient due to massive return parameters from READ.
Background daemon does the garbage collection periodically.
Could have an AV-Daemon accepting FastREAD requests from transactions, maintaining AVs
and returning the appropriate AV immediately. AV-Daemon has read access to all RVs.
May be a good model for DataCycleROLL.
How would that work in a Beowulf Cluster such as MiDAS?
- one transaction manager per node and round robin assignment of arriving
transactions to node TMs
- stripe data per file in non-random fashion so that a ROLL can be
partitioned hierarchically and all CHECKing done in parallel and targeted
at only the stipes containing items requested
Another view of NON-PREDECLARATIVE TRANSACTIONS: (DYNAMIC TRANSACTIONS)
THEOREM: Serializability and restart-free requires predeclaration.
If no predeclaration requirement, then the following can happen:
1. Transaction, T, declares need for exclusive use of data item, D.
2. Transaction, S, declares need for exclusive use of data item, E.
3. Transaction, T, declares need for exclusive use of data item, E.
4. Transaction, S, declares need for exclusive use of data item, D.
We assume request 3 was made based on the results of request 1, request 4
was made based on the results of 2.
DEADLOCK results, requiring RESTART. (Note: If 1 or 2 were delayed in
anticipation of 3 & 4, 3 & 4 would have had to been predeclared.)
Thus, in order to accommodate dynamic trans, restarts must be allowed.
We accommodate dynamic trans by allowing a trans to POST more than once.
ASSUME:
All write requests delayed until final POST. Each additional POST by a trans
is a superset of its predecessor. Maintain correctness: examine accesses
between POST1 & 2. For T1, POST1 includes read of "a" T1(r,a), POST2
access to "b"
5 CASES: compare each case with 2PL (waiting policy) & BTO (restart policy)
CASE 1: No intervening writes to "a", no intervening conflict on "b".
CASE 2: Intervening T2(w,a), T1(r,b), but no intervening writes to "b".
CASE 3: Intervening T2(w,a), T1(r,b), intervening write, T2(w,b) on "b".
CASE 4: Intervening T2(w,a), T1(w,b), 1st intervening b-access is T2(w,b).
CASE 5: Intervening T2(w,a), T1(w,b), 1st intervening b-access is T2(r,b).
We will compare each case with 2PL (waiting policy) & BTO (restart policy)
DYNAMIC TRANSACTIONS:
CASE 1: No intervening writes to "a", no intervening conflict on "b".
"a" "b"
.----------.
T1->|100000...0|POST1
`----------'
rw rw ^
|
.----------.
T2->|000010...0|
`----------'
rw rw ^
|
.
.----------.
T1->|100010...0|POST2
`----------'
"a" "b"
.----------.
T1->|000000...0POST1
`----------'
Zero POST1. rw rw ^
(rereading would |
give same value) .----------.
Serialization T2->|000010...0|
order: T2, T1) `----------'
rw rw ^
|
.
.----------.
T1->|100010...0|POST2
`----------'
(2PL) requires no restart
(no deadlock possible).
(BTO) no restart since neither
T2(r,a) nor T1(w,b) in conflict.
CASE 2: Intervening T2(w,a), T1(r,b),
but no intervening writes to "b".
"a" "b"
.----------.
T1->|100000...0|POST1
`----------'
rw rw ^
|
.----------
T2->|010010...0|
`----------'
rw rw
.----------.
T1->|100010...0|POST2
`----------'
"a" "b"
.----------.
T1->|100010...0|POST1
`----------'
rw rw ^
|
Move T1(r,b) .----------.
POST2->POST1. T2->|010010...0|
Zero POST2. `----------'
(Serialization rw rw ^
order: T1, T2) |
.
.----------.
T1->|000000...0|POST2
`----------'
(2PL) requires no restart. However, T1
cannot release T1(r,a) until after the
POST2 point, even if it is done reading
a (idle wait).
(BTO) requires T1 to restart if T2(w,a)
execution precedes T1(r,a)
CASE 3: Intervening T2(w,a), T1(r,b),
intervening T2(w,b) on "b".
"a" "b"
.----------.
T1->|100000...0|POST1
`----------'
rw rw ^
|
.----------.
T2->|010001...0|
`----------'
rw rw ^
|
.
.----------.
T1->|100010...0|POST2
`----------'
"a" "b"
.----------.
T1->|100010...0|
`----------'
rw rw ^
|
Move T1(r,b) to POST1. .----------.
(Read "b" before value) T2->|010001...0|
`----------'
Zero POST2. rw rw ^
Serialization order: T1,T2 |
.
.----------.
T1->|000000...0|
`----------'
Need to read the before-value of the T2(w,b) (this possibility can be easily
provided with ROLL by delaying REMOVal of writes just one additional
generation.).
2PL: deadlock (T2(w,a) blocked by T1(r,a) & T1(r,b) blocked by T2(w,b))
Restarting required to break this deadlock. (If 2PL accompanied by
multiversioning, no restart is required, however, adding multiversion
support to 2PL much more expensive).
BTO restarts T1(r,b) since it is in conflict with T2(w,b).
CASE 4: Intervening T2(w,a), T1(w,b), 1st intervening b-access is T2(w,b).
.----------.
T1->|100000...0|POST1
`----------'
rw rw ^
|
.----------.
T2->|010001...0|
`----------'
rw rw ^
|
.
.----------.
T1->|100001...0|POST2
`----------'
.----------.
T1->|100001...0|
`----------'
rw rw ^
|
Move T1(w,b) to POST1. .----------.
(T1(w,b) not written T2->|010001...0|
since obsolete) `----------'
rw rw ^
Zero POST2. |
alization order: T1,T2 .
.----------.
T1->|000000...0|
`----------'
T1(w,b) can be discarded (written to T1
workspace as if completed) since
T2(w,b) overwrites it and no trans was
supposed to have read T1(w,b). T1(w,b)
should be written to T1 workspace for recov.
2PL deadlocks, since T2(w,a) blocked by
T1(r,a) & T1(w,b) by T3(w,b). Restart
required. (no mechanism in 2PL to note
T1(w,b) unneeded).
BTO requires restart, T1(w,b) would
conflict with T2(w,b).
CASE 5: Intervening T2(w,a), T1(w,b),
1st intervening b-access is T2(r,b).
.----------.
T1->|100000...0|POST1
`----------'
rw rw ^
|
.----------.
T2->|010010...0|
`----------'
rw rw ^
|
.
.----------.
T1->|100001...0|POST2
`----------'
.----------.
T1->|000000...0|
`----------'
rw rw ^
|
Must restart (at least .----------.
move POST1 ahead to T2->|010001...0|
POST2 and reread "a") `----------'
rw rw ^
Zero POST1. |
alization order: T2,T1 .
.----------.
T1->|100001...0|
`----------'
T1(w,b) cannot be moved to POST1 since
T3(r,b) may already have read previous
value of "b". Restart needed (at least
patial, redo T1(r,a))
2PL & BTO would also result in restart
DYNAMIC TRANSACTIONS
ROLL 2PL BTO
CASE_PROBLEM___________________________
idle wait NO NO NO
1
restart NO NO NO
_______________________________________
idle wait NO YES NO
2
restart NO NO YES
_______________________________________
idle wait NO YES NO
3
restart NO YES(deadl) YES
_______________________________________
idle wait NO YES NO
4
restart NO YES(deadl) YES
_______________________________________
idle wait NO YES NO
5
restart YES YES(deadl) YES
_______________________________________
YET ANOTHER WAY TO LOOK AT DYNAMIC TRANSACTIONS
It is useful to think about what type of dynamic-ness there is,
and how frequent each type might be.
Define the baseset, readset and writeset of a transaction as the set
of all data items accessed in any manner, read, written, respectively.
Then transactions can be categorized as follows, with respect to dynamism:
SSS: Static baseset, static readset,
static writeset
These are standard static trans
SSD: Static base, stat read,
dyn write sets
These are predominant
dynamic trans??
eg, Double sal readheaded emps.
- baseset=all emp.records
- readset=all emp.records
- writeset=all emp.records
where hair=red
Note that 2PL would have to writelock all emp.records and hold them until
lockpoint (at the very least - maybe until commit, if strictness is used).
ROLL would need to set writebit for all emp.records, but could release
non-redhead records immediately (even with strictness).
SDS: eg,
IF status=WAGE, AMT=HRS*5, ELSE AMT=300
- baseset=all emp.records + status
- readset=all emp.recs or empty + status
- writeset=all emp.records
2PL would have to writelock all emps, to lockpt & readlock status.
ROLL would have to writelock all emps, only until write completed and
readlock status.
SDD: If Status=bonus, write AMT=AMT+5, else PRINT all emps
ROLL would readlock status & writelock base (early release, if unneeded)
2PL would readlock status & read/write lock emps..
DSD: If Status=1_time, write AMT=500.
ROLL would readlock status and writelock base (early release, if unneeded)
2PL would readlock status and write/no lock emps.
DDS: Write to TOTAL, sum of first emp.AMTs such that sum <= 1000.
ROLL could writelock TOTAL and readlock
all emp (release rest as soon as total is reached).
2PL could readlock emps 1-at-a-time until total is reach, writelock total.
DDD: ?????
PARALLELISM:
ROLL operations can be done in parallel on the same ROLL data structure except
that all POSTs must be in partial order.
APPENDIX-2
----------
D-ROLL: An Extension of ROLL for Dynamic transactions
1. INTRODUCTION
Database management systems (DBMS) are usually modeled into three components []:
transaction manager, scheduler and data manager, from concurrency control point of view.
Concurrency control methods are used to maximize the number of concurrent operation
executions in the system while guaranteeing correctness of transactions. When operations
from different transactions are submitted to the scheduler, the scheduler determines the
execution sequence of the operations using its embedded concurrency control methods. So
far, various methods are proposed, and may be classified into three categories: locking,
timestamp ordering (TO) and serialization graph testing (SGT).
We proposed ROLL (Request Order Linked List), which eliminates the need of
scheduler by introducing ROLL object and three simple operations (POST, CHECK and
RELEASE) performed by transaction manager. The ROLL method has the advantages of
deadlock and livelock free. It also minimizes waiting and restarting, compared to 2PL and
TO. Furthermore, experiments showed that ROLL has a significant performance
improvement in terms of transaction response time.
This paper discusses an extension of ROLL, D-ROLL. The proposed extension
removes the requirement of predeclared transactions, which is the major weakness of
ROLL. By adding RPOST and RCHECK operations to the ROLL method, D-ROLL may
handle dynamic transactions while inheriting ROLL’s advantages. Predeclared
transactions predeclare their readsets and writesets before executions of all their read/write
operations. Dynamic transactions allow submitting their readsets and writesets after some
of them be excuted.
The paper is organized as follows:
2. ROLL and its extension, D-ROLL
ROLL method used three basic data structures: request vector (RV), ROLL, and
access vector (AV). ROLL is a linked list of bit vectors. Each bit vector is called a request
vector, which represents the requested data item by a transaction. The number of bits in the
bit vector equals the number of data items in the database multiplied by the number of
access mode. The value of a bit is either one or zero. One indicates the corresponding data
item is requested by the transaction, and zero means the data item is not used by the
transaction. Each request vector has a vector pointer pointing to its precedent request
vector. The RV pointer of the first RV in a ROLL is called head pointer; its value is
NULL. A tail pointer points to the last request vector of the ROLL. Another data
structure, AV, is used to represent the data items accessible for a transaction at a certain
time. Like RV, it is also a bit vector and needs the same number of bits as a RV, but it
does not contain a RV pointer. One bit in the access vector indicates the corresponding
data item is not accessible, and zero indicates accessible. The AV is calculated by a
CHECK operation. For every CHECk operation, the AV corresponding to each RV will be
calculated once. Suppose we have a database with six data items. Figure 1 shows the
ROLL and its RVs. In this example, each data item has two access modes: read and write.
Two bits are assigned to each data item – the first bit corresponds to the read access mode,
and the second bit corresponds to a write access mode. There are six data items in each
RV. Four transactions are posted to the ROLL (RV1 corresponds to transaction T1, RV2
corresponds to transaction T2, etc.). RV1 indicates transaction T1 requests access three
data items, the first, the fifth, and the sixth.
.----------.
|010010...0| <- -Ti
`----------'
^
|
.----------.
|010010...0| <- -Tj
`----------'
^
.
.
.----------.
|010010...0| <- -Tk
`----------'
^
|
.------.
| tail |
`------'
Figure 1 An example of ROLL
Three basic operations are devised for ROLL method: POST, CHECK, and RELEASE.
POST is a critical operation establishing the serialization partial order of transactions. It
links the most recent received RV as the last vector in the list. After posting a RV, the
vector pointer of this RV points to its previous RV. To calculate AV of a transaction,
CHECK simply accumulates ORing (bitwise logic OR operation) results of the bits in the
ROLL ahead of its RV. Because CHECK only needs bitwise logic operation, the execution
is fast. Figure 2 demonstrates how to calculate AV4 (the AV of transaction T4) for the
example shown in Figure 1.
.----------.
|010010...0| <- -T0 \
`----------'
^ \
|
.----------. \
|010010...0| <- -T1
`----------' > OR yields AV4
^
. /
.
. /
.----------.
|010010...0| <- -T4 /
`----------'
^
|
.----------.
|010010...0| <- -T5
`----------'
^
|
.------.
| tail |
`------'
Figure 2 CHECK for transaction T4: Calculate AV4
Once the operation on a data item is finished, the transaction does not need this data
item any more, and the corresponding bit in the RV should be changed to zero so that it can
be accessed by other transactions. Such a change is made by RELEASE operation.
We can clearly see the increased concurrency advantage of ROLL method from the
above example. All the data items requested by transaction T4 are available and thus it
can be committed at once. Using 2PL, T4 has to wait because T4 conflicts with T1 and T2.
T4 has to restart for the same reason if using TO method.
The Problem in ROLL method is its requirement of transaction predeclaration. For
example, suppose T1 is not predeclared. It submits another access request rpRV= after T4
arrives. Then we may run into trouble by using ROLL, i.e., the execution enforced by
ROLL is no longer serializable. Figure 3 shows an example of such problem.
Figure 3 ROLL is NOT serializable for dynamic transaction T1
Problems are evident in the ROLL concurrency control method. (1). Space problem: For huge
centralized databases, vectors of billions of bits are required. If the number of transactions
is large too, then it needs a lot of storage space. This is not a problem in heterogeneous
distributed database systems because, in that environment, a number of smaller ROLLs are
kept at different sites. Less space is used at each site, and the processing of the ROLL at
each site is faster, too. Even for a huge, centralized database, different methods can be
used to solve this problem. One solution is to use partition. The rational of this method is
to partition the whole database (by files) into a number of smaller sections and build one
ROLL for each section. The individual ROLL size will be reduced dramatically. In this
case, a transaction only POSTs to the ROLLs where the needed data items are present. A
constraint needed here is that the ROLLs should be kept in a serializable order; i. e., the
relative position of a vector in all of the ROLLs should be the same. Using surrogate
(hidden system-supplied permanent record identification number) or using function to
assign vector positions can maintain a serializable order of the RVs in the partitioned
ROLLs. (2). Dynamic transaction support problem: When the data items used are not
known in advance, overdeclaration strategy is used. It is to reserve all the possibly used
data items for write. Once it is determined that a data item is not used, the corresponding
bit is released immediately. In this case, most of the requested data items are not used,
while other transactions which need these data items cannot access the data.
Overdeclaration approach is a “demotion” approach. (3). The dynamic RV size problem:
The size of the RV depends on the size of the database. For a database of fixed size, there
should be no problem using a static RV, but when database size tends to change, dynamic
RV size should be supported. If there is only one ROLL for the database, one solution is to
set the vector length more than sufficient for the time being. Based on the change in the
database size, the length can be reset periodically.
For ROLL and the three basic conservative concurrency control methods,
predeclaration is required. For some transactions, the data items used may not be known
until some operations are processed first. This type of transaction is called dynamic
transaction. To apply conservative concurrency control method to dynamic transactions,
overdeclaration is often used. Although the algorithm is easy to implement, it is not a good
strategy because it lowers the concurrency. Suppose there is a file with a million records;
if we do not know which records are needed, overdeclaration may reserve the entire file
instead. Developing algorithms to handle dynamic transactions efficiently will be very
useful to improve the efficiency of both ROLL and other conservative concurrent control
methods.
The goal of this paper is to develop alternative algorithms to process dynamic
transactions. The algorithms should overcome the disadvantages of the overdeclaration
method.
For example, there is a transaction to raise salary by five percent for an employee
whose name will be entered at runtime. To process this query, the name of the employee
needs to be known first; then it can be decided whether a write is needed for a given record.
This type of transaction is called dynamic transaction. Two strategies can be used to
process dynamic transactions. One is overdeclaration. It is also called a “demotion”
approach. To use this approach, a transaction needs to reserve all the possibly used data
items. For the above query, simply declare that all the records in the employee relation are
needed for write. If an employee name does not match the user-entered name, then release
the bits corresponding to the data items of this record immediately. This method is easy to
implement, but it lowers the concurrency in data access. Because of the overdeclaration of
this transaction, no other transactions can access the whole table. The second strategy is to
use REPOST. It is also called a “promotion” approach. At the beginning, a RV
representing the initial data requirement for a transaction is posted; as more data items are
needed later, another RV will be posted for the same transaction. For a request vector with
position index i in the ROLL, the firstly posted RVi is called initial RVi, and the reposted
RVi is called rpRVi. Both initial RVi and rpRVi represent the same transaction. The
promotion approach allows concurrent execution of transactions, but it requires REPOST
algorithms to reorder of the RVs between the initial RVi and the rpRVi and make sure that
all the execution is serializable. We define REPOST and rpRVi as the following.
1. REPOST: a REPOST is the operation to post rpRVi to the ROLL.
2. rpRVi: rpRVi is reposted RVi. It represents some additional data items needed
after the processing of the initial RVi. Usually it requires more data items to
write than the initial RVi.
Here are some observations about the relationship between the initial RVi and the rpRVi.
1. The initial RV is read-intensive.
2. Whether a REPOST is needed can be determined after a transaction is sent to the TM.
3. Only after the initial RVi is processed, can rpRVi be created.
4. Due to the short time interval, the number of RVs between RVi and rpRVi is small.
5. Usually, there is only one rpRVi for each initial RVi.
6. The transaction cannot commit until its initial RVi and all the rpRVi are executed.
5.2. Goal
The goal of this paper is to develop some efficient REPOST algorithms. The algorithm
should allow concurrent execution whenever possible. Other goals are to
achieve (1). POST-REPOST compatibility, i. e., all the RVs representing
static and dynamic transactions can be posted as usual, and (2). REPOST-
CHECK concurrency, i.e., CHECK can be performed when a REPOST is in
progress or vice versa.
5.3. Basic Algorithms
5.3.1. REPOST Serializability
Although the initial RVi and rpRVi are sent to the scheduler as two RVs, they represent the
same transaction. A correct REPOST algorithm should produce a
serializable history. If rpRVi is processed just after initial RVi, then these
two RVs are equivalent to occupying the same RV position. Therefore, all
the RVs in the ROLL are processed in posting order and are guaranteed in a
serializable manner. Otherwise, we need to make sure that the RVs between
the initial RVi and rpRVi are serializable in order to produce a serializable
history. If there are no RVs between the initial RVi and rpRVi, then the
overall ROLL will be REPOST serializable (R1, Figure 5.1). Otherwise,
the ROLL is REPOST serializable only if none of the RVs in between are in
conflict with the initial RVi (R2, Figure 5.1).
If any of the RVs between the initial RVi and rpRVi are in conflict with the initial RVi, then
there will be a cycle in the SSG. In this case, REPOST algorithm should be
used to force serializable execution.
Theorem: ROLL still produces serializable history if it is REPOST serial or REPOST serializable.
To keep initial RVi and rpRVi executed in RVi rpRVi order or a REPOST serializable
order, evidently there are two ways to schedule the RVs between the initial
RVi and rpRVi. One way is to make the initial RV and rpRV meet by moving the
Figure 5.1. A REPOST serial (R1) and a REPOST serializable (R2) ROLL.
rpRVi up or initial RV down. The alternative is to keep the initial RVi and
rpRVi where they are, but move the conflicting RVs between the initial RVi
and the rpRVi just below the rpRVi or restarting those conflicting RVs. In a
regular ROLL, the REPOST status of a RV is not known until the
corresponding rpRVi is posted. To start checking conflict earlier and
prevent an initial RV from committing too early, the REPOST status of a
RV should be determined as early as possible. The ideal situation is that the
REPOST status is known at the time when a RV is posted. In fact, a
transaction’s REPOST status can be determined based on the semantics of
the query. For example, suppose we have a table called STUDENT which
contains the general information for all the students at NDSU; a query to get
all the names of the students who are currently enrolled at NDSU will not
need a REPOST. On the other hand, to increase the stipend by 10% for a
student with a name matching a value of variable, where the value of the
variable can only be determined at runtime, is a dynamic transaction. The
second query needs a REPOST because who is entitled to the increase
cannot be determined until the value of variable is determined.
5.3.2. Parser
A parser should be added to the transaction manager module. The
function of the parser is to parse the transaction after it arrives and to
determine the REPOST status for the transaction. As a matter of fact, all the
ultimate conservative concurrency control methods need a preprocessor to
parse the transaction in order to get the readset and writeset of a transaction.
To get the REPOST status, adding a few more logic processing operations
will be enough. Transactions requiring REPOST can easily be distinguished
by analyzing the user request. Usually REPOST is needed for transactions
with conditional write or conditional update where the condition is not
known until runtime.
In addition, each transaction should be assigned a unique transaction ID by the transaction
manager. For a dynamic transaction, the initial RV and all its reposted RVs
should use the same transaction ID. In effect, all relational database systems
do assign transaction ID number to transactions; therefore, no extra work is needed.
5.3.3. Data Structure Modification
The structure of the RV should be changed in order to represent the
REPOST status in the REPOST algorithm. An extra bit is needed at the
beginning of the request vector to indicate whether the corresponding
transaction needs a REPOST later. The value of the REPOST bit is either
one or zero. One means that a REPOST is needed for this transaction, and
zero means no REPOST is needed for this transaction. Figure 5.2 is an
example of the modified RV. In the illustrated RV, there are five data
items, each with two operation modes. The first bit represents read
operation and the second bit represents the write operation.
Figure 5.2. A modified RV.
In addition to the data structures used in ROLL concurrency control method, a new data
structure called CLIST (conflict list) needs to be introduced. It has the same
structure as ROLL but usually with smaller size. For each RVi which needs
a REPOST, there is a CLISTi associated with it. Each CLIST also has a
head pointer and tail pointer. The head pointer points to its initial RVi in the
ROLL, and the tail pointer points to the last RV in the CLIST. The
operations associated with the CLIST are MOVE, PASTE, and CLEAR.
MOVE performs the procedures done by REMOVE and POST operations of the regular
ROLL. The operation disconnects a RV from a ROLL and posts it to a
CLIST. All it does is to disconnect and connect RV pointers.
PASTE is similar to POST. The difference is that PASTE pastes a RV list to the ROLL
instead of a single RV. After pasting, the head pointer of CLISTi is
disconnected from the initial RVi and connected to the rpRVi.
CLEAR clears the CLIST after the CLIST has been pasted to the ROLL. The function of
the CLEAR is to get rid of the CLIST identifier from a list so it cannot be
referred or manipulated later.
The key of the algorithm is to order the conflicting RVs between initial RVi and rpRVi.
Before describing the rules to arrange the RVs, we first define REPOST conflict.
REPOST conflict: For each RVj which is after initial RVi and before rpRVi, it is in
REPOST conflict with initial RVi if it has read-write or write-write conflict
with the initial RVi.
For any RVj which is in REPOST conflict with an initial RVi (i ?j), there will be a
cycle in the SSG because of the existence of the path RVi ?RVj ?rpRVi, which is
equivalent to Ti ->Tj -> Ti, in the corresponding SSG. Different algorithms can be used to
keep the execution serializable. One is a no-wait approach; i.e., the REPOST conflicting
transactions are restarted before any processing has been done to these transactions.
Another strategy is to force the transaction in REPOST conflict to wait until the initial RVi
and rpRVi are processed. We call this kind of algorithm a conservative algorithm. It can
avoid aborting and restarting transactions and is also consistent with the goals of the basic
conservative concurrency control methods. Regarding the REPOST algorithms, firstly let
us consider the simplest case in which there is only one RVi requiring a REPOST in a
ROLL. We will develop a basic algorithm to handle this situation.
5.3.4. Conservative Algorithm
The goal of the conservative algorithm is to avoid aborting or restarting
transactions. The extra cost is the pointer assignments needed for the conflicting RVs.
3. The pseudocode for conservative algorithm is as follows:
Begin
Let rvPtr points to the first RV in the ROLL
while (rvPtr is not null and the pointed RV is not rpRV) {
If (REPOST bit of the RV is zero) //Representing a static transaction
CHECK and process this RV
rvPtr++ //Go to next RV
Continue
Else //Representing a dynamic transaction
If it is the first check on this RV
Make a copy of RV as cpRV
Check this RV and process it
For (each RV below the initial RV and RV ?rpRV and rvPtr is not null) {
If (RV is not in REPOST conflict with cpRV)
Check and process this RV
Else
Move this RV into a CLIST
rvPtr++
}
if (the next RV is rpRV)
Check and Process rpRV
Change the repost bit of initial RV to 0
Paste CLIST below rpRV
Clear CLIST
}
1.1.3.1.1.1 End
When the rpRV is being processed, other transactions can still be posted to the
ROLL. After CLISTi is pasted below the rpRVi, all the RVs after rpRVi, including the
pasted RVs from CLIST and the newly posted RVs, cannot be checked and processed by
the current CHECK. The following example shows how the conservative REPOST algorithm works.
Figure 5.3 shows all the RVs in a ROLL at certain time. The RVs are posted in the order
from top to bottom. Based on the REPOST algorithm, before every CHECK, a REPOST conflict test
should be conducted. For RV0, its REPOST bit is set to 1, so we need a CLIST0 to hold the
conflicting RVs. A copy of the original RV0, cpRV0, is needed for checking. RV1, RV2,
and RV3 are posted after RV0 and should be checked against cpRV0 for conflict. RV1 is
not in REPOST conflict with cpRV0, so it can be checked and processed. RV2 is in
Figure 5.3. Some RVs in a ROLL.
REPOST conflict with cpRV0, so it should be moved to CLIST0 before any checking and
processing is done on it (Figure 5.4). RV3 is not in conflict with cpRV0, so RV3 can be
checked and processed. The conflicting RVs is RV0-> RV2-> rpRV0. After rpRV0 is
Figure 5.4. A ROLL and the CLIST after checking.
detected, it will be checked and processed immediately. After checking for rpRV0, the
repost bit of RV0 is changed to 0, and the CLIST0 is pasted below rpRV0. The order of the
RVs after this pasting will be the rearranged order of these RVs. It is shown in Figure 5.5.
It is equivalent to T1 ->T3 ->T0 ->T2.
5.3.5. No-wait Algorithm
In this algorithm, we do not use a CLIST to hold the conflicting RVs; instead, we
Figure 5.5. The rearranged order of the RVs.
restart the conflicting RVs to make the execution serializable. For every RV between the
initial RVi and rpRVi, if it is not in conflict with the initial RVi, then it can be checked and
processed; otherwise, it will be restarted. In this case, we have choices over which
transaction should be the victim. The alternative is to restart the initial RVi and rpRVi.
Restarting rpRVi alone will not solve the problem because it cannot get rid of the cycle in
the SSG. Because rpRVi is only part of the transaction, if it is not processed quickly, more
conflicting transactions will be accumulated, which will result in more restarts. Therefore,
we should not restart RVi and rpRVi.
The pseudocode for no-wait algorithm is as follows:
Begin
Let rvPtr points to the first RV in the ROLL
while (rvPtr is not null and the pointed RV is not rpRV) {
If (REPOST bit of the RV is zero) //Representing a static RV
CHECK and process this RV
rvPtr++
continue
Else //Representing a dynamic transaction
If it is the first check on this RV
Make a copy of RV as cpRV
Check initial RV and process it
For (each RV below the initial RV and RV ?rpRV and rvPtr is not null) {
If (RV is not in REPOST conflict with cpRV)
Check and process this RV
Else
Restart this RV
rvPtr++
}
if (the next RV is rpRV)
Check and process this RV
Change the repost bit of initial RV to 0
}
End
5.3.6. Pulling Up/Pushing Down Algorithm
As the other two REPOST processing algorithms, the bottom line of this algorithm
is to make sure that the rp-interval must not produce a SSG cycle. It is equivalent to make
the initial RV and rpRV meet each other and finally be combined together into a single RV
(Figure 5.6). In this algorithm, we need to calculate the AV for the rpRV1. If rpRV1 can
access all the data items it needs, then it can be ORed together with the initial RV1 directly.
Figure 5.6. Move up rpRV1 and combine it with RV1.
Otherwise, further analysis may reveal that the initial RV1 can be moved down. In this
case, we will move the initial RV1 down as far as possible. After initial RV1 is moved
down, if initial RV1 and rpRV1 do not meet, all the RVs in the rp-interval will be reordered
(maintaining their relative order with each other). Conflict test will be performed firstly
against initial RV1 and then rpRV1. Any RVs in conflict with initial RV will be moved into
initial RV conflict list, and for the rest of the RVs, anyone in conflict with rpRV1 will be
moved into rpRV1 list. The final order of the RVs will be rpRV1 conflict list-> initial RV1
-> rpRV1-> initial RV1 conflict list.
The pseudocode for Pulling Up/Pusing Down Algorithm is as follows:
For all the RVs between initial RVi and rpRVi
If AVi for rpRVi does not block data access
Let RVi = initial RVi OR rpRVi
Change the repost bit of initial RVi to 0
Remove rpRVi
Else{
Let j=i+1
While (RVj is not rpRVi){
If RVj not in conflict with initial RVi and repost bit of RVj=0
Move the initial RVi down
Let j= j+1
}
If the next RV is rpRVi
Let initial RVi = rpRVi OR initial RVi
Remove rpRVi
Change the repost bit of initial RVi to 0
Else
Sort the RVs between initial RVi and rpRVi
Let RV_List1= the initial RVi conflict list
Let RV_List2 = rpRVi conflict List
Order the conflict interval as RV_List2->initial RVi->rpRVi->RV_List1
Let RVi = rpRVi OR initial RVi
Change the repost bit of RVi to 0
Remove rpRVi
For the original ROLL shown in Figure 5.7, the access vector for rpRV1 is 11 01 11
11 01 01, which is blocking rpRV1 from accessing the third and fourth data item. So rpRV1
Original Roll Sorted ROLL
1.4
1.5 Figure 5.7. The original and sorted ROLL.
cannot be moved up. Further analysis reveals that RV1 can be moved one position lower.
After RV1 is moved down, a sorting is needed for the RVs. The conflict list for RV1
includes RV3 and RV4, which should be moved directly below RV1. The conflicting RVs
for rpRV1 includes only RV6, so RV6 and rpRV1 should be moved up above RV1; then
rpRV1 is ORed together with RV1. After that, rpRV1 is removed. After the order of the
RVs are set, rpRV1 and RV1 can be combined together and the REPOST bit of the
combined RV should be changed to zero. The final sorted ROLL is as Figure 5.8.
Figure 5.8. A ROLL with reposted RV removed.
5.3.7. Proof of Serializability
For ROLL r1 with RV0…RVi-1 RVi RVi+1…RVi+n rpRVi, all the transactions
represented by RVs from RV0 to RVi-1 are guaranteed to be serializable (theorem from
ROLL concurrency control method), therefore proving the ROLL with rpRV produces
serializable history boils down to proving the processing of those RVs from RVi to rpRVi
is serializable. There is only one possible way that can lead to a cycle in a SSG, when a
path like RVi ? RVi+x ? rpRVi , where x>0, exists in the SSG. Based on the REPOST
algorithm, this kind of cycle cannot exist because whichever RVi+x is detected, it will be
either moved to CLISTi when using conservative algorithm, restarted when using no-wait
algorithm, or moved below initial RV or above rpRV using the pulling up/pushing down
method, before it is processed. From this proof, we conclude that ROLL still produces
only serilizable execution after introducing REPOST algorithm.
The algorithms developed in this paper and the overdeclaration method favor the
dynamic transactions. When there are conflicts, other conflicting transactions will be
delayed or restarted. Another method is to set priorities for transactions requiring
REPOST. If high priority is set, then this transaction will be processed first, and all the
REPOST conflicting transactions between initial RVi and rpRVi will be queued or
restarted as described in the above algorithms. If the priority of the transaction is low, the
initial RVi will be queued and processed later.
5.4. Multiple REPOSTs from a Single Transaction
1.5.3 Although in most cases, only one rpRVi is needed for an initial RVi, there
are some times when multiple rpRVi are needed. Multiple REPOSTs can happen when a
transaction contains nested conditional operations where each condition can only be
determined at runtime. The basic REPOST algorithm needs to be changed to handle
mutiple REPOST situations. For transaction Ti with initial RVi and rpRVi0
rpRVi1…rpRVin, for j=0 to n-1, all the rpRVij should have the REPOST bit set to 1 so that
they will not be allowed to commit until the processing of the rpRVin is finished. A copy
of RVi and each rpRVij is also needed for checking conflicts. When checking REPOST
conflicts, all the RVj (j>i) between the initial RVi and the final rpRVi should be checked.
Each RVj posted after the initial RVi should be checked against cpRVi and all the cprpRVix
which are located before RVj in the ROLL. The performance of the algorithm will degrade
greatly with an increase in the number of REPOSTs. In this case, more RVs located
between the initial RVi and the final rpRVi will be moved to the CLIST or restarted.
Figure 5.9. A multiple REPOSTs example.
In the above ROLL shown in Figure 5.9, RV0 has multiple REPOSTs. RV1 does
not need REPOST. For RV1, because no other reposted RV0 has appeared yet, it only
needs to be checked against cpRV0 for conflict. It should be moved into CLIST0 since it is
in conflict with cpRV0 when accessing the second data item. The REPOST bit of rpRV01 is
set to 1, so it is not the final REPSOT. When checking REPOST conflicts for RV2, it is
necessary to check conflict against both cpRV0 and cprpRV01. Although RV2 does not
conflict with cpRV0, it conflicts with cpRV01, so it should be moved to CLIST0. The
REPSOT bit of rpRV02 is still 1, so it is not the final REPOST for RV0. All the RVs posted
after rpRV02 should be checked against cpRV0, cprpRV01, and cprpRV02 for conflict. After
the final rpRV0 is processed, the CLIST0 should be pasted below the final rpRV0. Note that
in this case, no matter how many times a RVi is repsoted, there is only one conflict list for
this RVi. The execution order for RVs from RV0 to the final rpRV0 should be equivalent to
5. …RV0? rpRV01? rpRV02?… ?rpRV0n-1 ?rpRV0n…
When multiple REPOSTs exist in a ROLL, the changed algorithm still produces
serializable history. To prove the serializability, suppose we have the following
transactions in the SSG:
…RVi?….?RVi+1 ?rpRVi0? …?RVi+2…?rpRVin-1? RVi+x? rpRVin
where RVi+1, RVi+2, and RVi+x are the RVs which are in conflict with initial RVi or one of
the rpRVij. Based on the algorithm, any RVi+1, RVi+2, and RVi+x will be either moved to
the CLISTi or restarted, so it is not possible to have such a SSG.
Figure 5.10 A ROLL with two conflicting initial RVs.
5.5. Multiple Dynamic Transactions
Transactions requiring REPOST may interleave in the ROLL. For a ROLL with nested
dynamic transactions, there are two cases. In the case shown in Figure 5.10, although both
RV0 and RV1 need REPOST, the rpRV can be processed just like there is only one RV
Figure 5.11. A ROLL with two non-conflicting initial RVs.
Requiring REPOST. Because RV1 is in conflict with RV0, RV1 will be moved into CLIST0
and processed later. In this case, RV0 and RV1 will be processed sequentially.
In another case, as shown in Figure 5.11, both RV0 and RV1 need REPOST, and they are
not in conflict. If we allow RV0 and RV1 to be processed concurrently, we should set rules
for checking the conflict. If a RV is posted below RV1 and this RV is not in conflict with
both of RV0 and RV1, then this RV can be checked and processed. If it is in conflict with
only one of the two initial RVs, then it should be put into the corresponding CLIST. If the
final rpRV0 is posted first, the order of the execution is from RV0 to RV1; otherwise, the
order is from RV1 to RV0. In order to produce serializable execution, we need to make
sure that the earlier posted rpRV and the CLIST will not introduce new conflicts with other
dynamic transactions which have been partially processed when the CLIST is incorporated
into the ROLL. It is to say that supposing we want the execution in RV0 RV1 order, rpRV0
and CLIST0 should not introduce new conflicts with initial RV1 and its rpRV1s after CLIST0
is pasted to the ROLL. If rpRV0 is in conflict with the initial RV1, then RV0 and all the
rpRV0s, or RV1 and all the rpRV1s must be aborted. If any RV in CLIST0 is in conflict with
the initial RV1, then this RV should be put into CLIST1 instead of pasting after rpRV0.
Therefore, a conflict should be checked for each RV in CLIST0 against cpRV1. We can see
from the above analysis that, if RV0 and RV1 are processed concurrently, aborting static
and dynamic transactions is possible. Checking for conflict for each RV in the CLIST is
also needed before pasting a CLIST. To make things worse, when there is a great number
of dynamic transactions with multiple REPOSTs in the ROLL, the algorithm will be very
expensive to perform. Aborting dynamic transactions is much more expensive than static
transactions. To avoid the overhead, we choose to allow only one RV requiring the
REPOST scheduled at a time; i.e., we keep dynamic transactions scheduled sequentially.
For initial RV0 to initial RVn where each RV represents a dynamic transaction, we will put
all these RVs except RV0 into a queue and process them one by one or force all the
subsequent initial RVs into CLIST0. Some change will make the basic REPOST algorithm
work for processing the nested REPOSTs.
5.6. General Algorithms
1.8 Put all the discussed scenarios together; we need algorithms to handle ROLL
allowing multiple REPOSTs from a single dynamic transaction and multiple initial
RVs representing different dynamic transactions. This type of ROLL is processed by
concurrent processing of non-conflicting static transactions with a dynamic
transaction, but dynamic transactions are processed sequentially; i.e., in each
CHECK operation, only one dynamic transaction is scheduled.
5.6.1. Conservative Algorithm
The pesudocode for general conservative algorithm is as follows:
Begin
Let rvPtr points to the first RV in a ROLL //rvPtr is a RV pointer
While (rvPtr is not null) { //not past the last RV
If the repost bit of the pointed RV =0 //static transaction
Check and process the RV
rvPtr++ //Move to the next RV
Continue //go to the starting point of the loop
Else //dynamic transaction
If this is the first check on the RV
Copy initial RV to cpRV //make a copy
Check and process the initial RV
rvPtr++
While (rvPtr is not null and not final rpRV){
If the RV has same transaction id with initial RV and repost bit=1//not final rpRV
If this is the first check on RV
Make a copy of the current RV
Process the current RV
Else if RV has same transaction id and repost bit =0 //final repost
Check and process it
Past CLIST below this RV
Change the repost bit of the initial RV and all the non-final reposts to 0
Clear CLIST
Return
Else if repost bit =1 // a different dynamic transaction
Move to CLIST
Else if repost bit =0 and not in conflict with the initial RV and any cprpRVij above it
Process it
Else move it to CLIST //there is a conflict
rvPtr ++
}
}
End
5.6.2. No-wait Algorithm
The pesudocode for general no-wait algorithm is as follows:
Begin
Let rvPtr points to the first RV in a ROLL //rvPtr is a RV pointer
While (rvPtr is not null) { //not past the last RV
1.8.1.1.1.4 If the repost bit of the pointed RV =0 //static transaction
Check and process the RV
rvPtr++ //Move to the next RV
Continue //go to the starting point of the loop
Else //dynamic transaction
If this is the first check on the RV
Copy initial RV into cpRV //make a copy
Check and process the initial RV
rvPtr++
While (rvPtr is not null and not final rpRV){
If the RV has same transaction id with initial RV and repost bit=1
If this is the first check on the RV
Make a copy of the current RV
Process the current RV
Else if RV has same transaction id and repost bit =0 //final repost
Check and process it
Change the repost bit of the initial RV and all its non-final reposts to 0
Return
Else if repost bit =1 // a different dynamic transaction
Restart this RV
Else if repost bit =0 and not in conflict with the initial RV or any cprpRVij above it,
process it
Else
Restart this RV //there is a conflict
rvPtr ++
}
}
End
5.6.3. Pulling Up/Pushing Down Algorithm
The pesudocode for general pulling up/pushing down algorithm is as follows:
For all the RVs between initial RVi and the final rpRVij
While (rpRVij not meeting the initial RVi){
If AVij for rpRVij does not block data access
Let rpRVij-1 = rpRVij-1 OR rpRVij
Remove rpRVij and change the repost bit of rpRVij-1 to 0
Let j=j-1
}
If j=0 //All the rpRVs have be ORed together, work done
Return
Else
Let k=i+1
Let m=0 //RVi0 is the initial RVi
While (RVk is not rpRVij){
If RVk not in conflict with RVim and repost bit of RVk =0
Move the RVim down
If RVk is rpRVim+1
Let RVim+1=rpRVim OR rpRVim+1
Remove rpRVim
Let m=m+1
Let k= k+1
}
If RVk is final rpRVij
Let rpRVim = rpRVim OR rpRVij
Remove rpRVij
Change the repopst bit of rpRVim to 0
Return //all done
Else
Sort the RVs between RVim and final rpRVij
Let RV_Listm= the RVim conflict list
Let RV_Listm+1=rpRVim+1 conflict list
….
Let RV_Listj = rpRVij conflict list
Let rpRVij = rpRVij OR rpRVij-1 OR … rpRVim
Remove all the rpRVi from rpRVim to rpRVij-1
Arrange the conflict interval as rpRVij->RV_Listm->RV_Listm+1… ->RV_Listj
5.6.4. Serializability
The most complicated senario with this algrithm is when both multiple REPOSTs
and nested REPOSTs exist in the ROLL:RV0 –> RV1–> rpRV11 -> RV2-> rpRV12 –>RV3.
Although both RV1 and RV2 represent dynamic transactions, the scenario is the same as a
single dynamic transaction with multiple REPOSTs. The same proof can be used to prove
the serializability because only one dynamic transaction is processed at a certain time. For
all the additional initial RVs which require REPOSTs, they will be put into the conflict list
and wait to be processed in the next cycle.
5.6.5. Performance
In comparison with 2PL, the basic ROLL has no deadlock, fewer restarts, and
higher concurrency. After REPOST is introduced to ROLL, the only extra cost is
reordering the RVs in the rp-interval between an initial RV and its rpRV. It still maintains
the advantages of the basic ROLL.
Higher concurrency and throughput is an evident advantage of the ROLL even with
dynamic transactions allowed. The best case occurs when there are no dynamic
transactions in a ROLL. In this case, it is processed just as a regular ROLL. As usual, one
check will get all the accessible data items, and those data items can be accessed
concurrently. The worse case occurs when every RV is from a dynamic transaction. In
this case, because only one dynamic transaction can be checked at each cycle, it is
equivalent to overdeclaration in concurrency level. The time used to process the query is
much shorter because, when the final rpRV is submitted, the initial RV and previously
submitted RVs have been completely or partially processed, so the throughput should be
much higher. The level of concurrency on average case, where low percentage of RVs are
from dynamic transactions in a ROLL, would be much higher because it allows non-
conflicting static transactions to be processed concurrently with a dynamic transaction;
therefore, the algorithms achieve higher concurrency most of the time and higher
throughput all the time.
Another advantage is no deadlocks. When there are a lot of dynamic transactions
with multiple REPOSTs, the chance of deadlock is very high when using 2PL. To break
the deadlock, one of the involved transactions will be chosen as victim and restarted.
Restarting partially processed transactions is very expensive. It wastes the system
resources and lowers the throughput. In addition, 2PL needs a dedicated process to detect
deadlocks. This feature will add more cost to the scheduler. The conservative and pulling
up/pushing down algorithm are also restart-free. In the aggressive algorithm, although
restarts will happen, they are different from 2PL restarts in that not any processing is done
on the victim transactions; therefore, they are more economic restarts.
The algorithms of this paper are less CPU-intensive. The CPU usage in 2PL
includes checking lock availability, allocating lock, updating lock table, and handling
deadlocks. Because deadlock is unavoidable in 2PL, detecting and breaking deadlocks is a
routine task for the scheduler. Choosing a victim and restarting it is the solution to break
deadlock. Because the deadlock detection is so CPU-intensive, it is a big boost to
performance when avoiding it. In ROLL, because there is economic restart and no
deadlock, there is less wasting in computer resources. The CPU usage will be mostly on
pointer manipulation and bit AND/OR operations. Bit operations are very fast.
The disadvantage is that more memories may be needed for ROLL. In 2PL, the
memory used will be dependent on the number of data items, lock mode, and number of
bytes used for each lock. The number of data items is dependent on the granularity level
and the number of data objects. The granuality can be at table, record, or field level. The
higher the granularity, the smaller number of locks is needed, and thus lower concurrency
is allowed. For DBMS systems, the number of locks available is usually a configurable
parameter. In ROLL, the determining factors are number of data items, operation mode
and active transactions. The number of bytes used for 2PL and ROLL can be calculated
using the following formula.
B2pl =B * M * D
Broll=M*D*T/8
where
B2pl: Bytes needed for 2PL
Broll: Bytes needed for ROLL
M: number of operation mode
D: number of data items
T: number of active transactions
In SQL Server, each lock takes 32 bytes of memory. In comparison with this
implementation, for the same number of operation modes and data items,
when the number of active transaction is 256, the memory usage is the same
for the two methods. Above this number, ROLL uses more memory than
2PL, and below this number, 2PL uses more memory. By setting total
number of locks smaller than the total number of data items, 2PL may save
memory significantly; the consequence is reduced concurrency.
5.7. Discussion of Implementation Issues
ROLL is an efficient concurrency control method. Its advantages are evident and
there is still a lot of room to make improvement. This discussion will be on replacing
pointers with sequence numbers, using block/extent to store ROLL segment, and the issue
of strict levels.
Pointers are used extensively to connect RVs in the ROLL to maintain atomic posting of
RVs. When the number of pointers becomes great, they can be expensive to
maintain. To reduce the overhead, a sequence number can be assigned to
each RV instead of using pointers. In a centralized ROLL, adding an auto-
increasing identifier will do the job. In a distributed environment, a
sequence number server is needed to assign the number to each RV at each
site. The server will keep an array of sequence numbers which represent the
current maximum number at each site and update the sequence number after a new RV is posted.
As a ROLL becomes big, say millions of RVs, performance for checking can become time-
consuming. A lot of work in AV calculation is repetitive and can be
avoided. One solution is to decrease the number of RVs scanned when
calculating an AV. We can store ROLL segments in blocks or extents. At
the header of each block or extent, an AV is stored. This AV will be
updated only when the RVs above this block or extent have been changed.
To calculate AV for the RVs in the block, only the RVs in the current block
or extent need to be accessed; the disk I/O and the number of AND/OR
operations will be dramatically reduced. Blocks can be moved back and
forth from the disk to memory based on the demand.
Basic ROLL can guarantee serializable execution of the transactions, but higher strictness
can also be easily achieved. The level of strictness is determined by when a
bit can be changed from 1 to 0. When calculating an AV for a RV, it should
determine whether the conflicting RVs are committed, so it is necessary to
add a commit bit to a RV. To obtain strict history, the write bits in a RV
should be changed after the corresponding transactions commit or abort.
5.8. Optimum of the Algorithms
The algorithms proposed in this paper are optimal for most cases and nearly optimal
in the rest of the cases. To verify this claim, we may look at those cases. When there are
no dynamic transactions in a ROLL, all the RVs are processed as a regular ROLL. In this
case, all the available data items will be filtered out after one check, and all the available
data items can be accessed at the same time. Highest concurrency can be achieved, so it is optimal.
When both dynamic transactions and static transactions are present, all the non-
conflict static transactions are treated as in regular ROLL. For the rest of the RVs, they
will be rearranged into a serializable order. After that, all those RVs will be processed as
those in a regular ROLL. The question is whether the way to reorder the RVs in
rp-interval is optimal. When there is a single dynamic transaction, the REPOST status of
the transaction is determined at the earliest possible time, i.e., when the transaction arrives
to the transaction manager. When the initial RV arrives, it will be scheduled at the earliest
possible time. When the final rpRV arrives, it will be processed in the shortest possible
time since the initial RV and all the other rpRVs have already scheduled. The ROLL with
a rp-interval will be turned to a regular ROLL as soon as possible. All the conflicting RVs
in the interval are on hold until the final rpRV is scheduled. The holding is necessary
because they will be restarted otherwise. Therefore, the whole algorithm is optimal.
When there are multiple dynamic transactions, all non-conflict transactions will be
processed as a regular ROLL; the dynamic transactions will be scheduled in a serial order;
i.e., one dynamic transaction will be checked in each check cycle; others will be put into a
conflict list. Whether the strategy to handle this case is optimal depends on the following factors:
1. Checking interval
Because checking can be done continuously, the checking interval is dependent on
the number of transactions and the number of data items. The number of active
transactions is usually small. When using block/extent storage structure and distributed
databases or partitioned ROLLs, checking can be done very fast, and hundreds of checking
cycles per second can be achieved.
2. Number of REPOSTs per transaction
The more REPOSTs per transaction, the longer it takes to process a single dynamic
transaction; other dynamic transactions will wait longer in the queue. The waiting is
necessary because there is a great chance of conflict among the dynamic transactions with
multiple REPOSTs and it is much more expensive to restart them when they are partially processed.
3. The conflict rate among dynamic transactions
The higher the rate of conflicts among the dynamic transactions, the more dynamic
transactions will be restarted to maintain serializable scheduling. There is a tradeoff
between waiting and wasteful restart. The algorithms developed in this paper let
conflicting transactions wait or quick restart. When using ROLL partitions, even multiple
dynamic transactions exist; they can still be checked concurrently if they only use data
items in different ROLLs. In this case, we can achieve concurrent checking with no restart
or economic restart. Therefore, using the algorithms along with ROLL partition and
block/extent storage, the algorithms can achieve optimal performance.
5.9. Conclusion
This paper proposed three algorithms for processing dynamic transactions. They can
be used as an alternative to the overdeclaration approach. Each of these algorithms has
some pros and cons. The advantage of the conservative algorithm is that it avoids
transaction restart. The disadvantage is that the commit order of the transactions is not the
same as the arrival order. The conservative algorithm may be not fair in some cases.
Another disadvantage is that extra cost is needed for moving the RVs between CLIST and
the ROLL. In a no-wait algorithm, if there is a conflict, then the transaction will be
restarted. This algorithm is simple, but many transactions will be restarted if there is high
percentage of dynamic transactions. The ROLL restart is more economic in comparison
with 2PL restart because no processing is done on the restarted RVs. The pulling
up/pushing down method is more fair algorithm, but the sorting is more complicated.
In comparison with the overdeclaration method, the new algorithms allow higher levels
of concurrency because it only reserves the data items it needs and allows non-conflicting
RVs between the initial RV and its rpRV to be executed without waiting. The conservative
algorithm still maintains the properties of ultimate conservative algorithms, i.e., no abort
and restart. Although algorithms are illustrated using the ROLL concurrency control
method, it can also be used for other types of concurrency control methods. The basic idea
is to know whether a transaction is a dynamic one just after a transaction arrives. At one
time, only a subset of the required dataset for a transaction is sent to the scheduler; later,
another subset of the data items is sent to the scheduler. The key of the algorithm is to
reorder the conflicting transactions which are submitted between the initial and final
submission of the dynamic transaction. The algorithms are optimal for cases of no
REPOSTs or single REPOST from one dynamic transaction. Together with ROLL
partition, the algorithms can achieve optimal performance when nested dynamic
transactions with multiple REPOSTs exist.
BIBLIOGRAPHY
[Ber87] Bernstein, P. A., Hadzilacos, V., Goodman, N. Concurrency Control and Recovery in DBMS. Addison-Wesley, 1987.
[Bro97a] Brooks, P. Data Mining Today. DBMS Tools & Strategies for IS Professionals. 10(2):59-66, 1997.
[Bro97b] Brooks, P. March of the Data Marts. DBMS Tools and Strategies for IS Professionals. 10(3):55-62, 1997.
[Cob97] Cobb, E. E. The impact of Object Technology on Commecial Transaction Processing. The VLDB Journal. 6(3):173-190, 1997.
[Elm94] Elmasri, R., Navathe, S. B. Fundamentals of Database Systems. Benjamin/Cummings Publishing Company, Inc., 1994.
[Gra93] Gray, J., Reuter, A. Transaction Processing: Concept and Techniques. Morgan Kaufmann Publishers, 1993
[Hae84] Haerder, T. Observations on Optimistic Concurrency Control Schemes. Information Systems. 9(2):111-120, 1984.
[Kor83] Korth, H. F. Locking Primitives in a Database System. Journal of the ACM 30(1): 55-79, 1983.
[Lom97] Lomet, D. Salzberg, B. Concurrency and Recovery for Index Trees. VLDB Journal. 6(3):224-240, 1997.
[Pan97] Panagos, E., Biris, A. Synchronization and Recovery in a Client-Server Storage System. VLDB Journal. 6(3):209-223, 1997.
[Per91] Perrizo, W. Request Order Linked List (ROLL): A CC Object, IEEE Conf. on Data Eng, Kobe, Japan, 1991.
[Per92] Perrizo, W., Rajkumar, J., Ram, P. HYDRO: A Heter DDBMS. IEEE Intl Conf. On Data Engineering, February 1992.
[Yan84] Yannakakis, M. Serilizability by Locking. Journal of the ACM 31(2): 227-244, 1984.
Appendix 3
**************************************************************************
ISOLATION LEVELS (more detail)
SQL defines three levels of isolation which are weaker than SERIALIZABILITY
(they do not guarantee ACIDS properties entirely, but they are easier to achieve).
REPEATABLE READ ensures that a transaction, T, reads only the changes made by
committed transactions and that no value read or written by T is changed by
any other transaction until T is complete.
repeatable read isolation allows the PHANTOM PHENOMENON: If T is reading "all
employee records in the sales department", another transaction might enter (insert)
a new employee record with department=sales, which could be missed by T's read.
Repeatable read isolation can be achieved by using the same locking protocol
as 2PL except that it doesn't use index locking (index locking will guarantee that no
other transaction can complete an insert of an employee record in the sales dept until
T has completed it's read - since such a read locks the the value, department=sales
of the department index and an insert involves a write to that same index record.)
READ COMMITTED ensures that T reads only changes made by committed transactions
and that no value written by T is changed by any other transaction until T is
complete. However a value read by T may well be modified by another transaction
while T is still in progress. T is exposed to the phantom problem also.
A read committed transaction obtains exclusive locks before writing items and
holds these locks until the end. It also obtains shared locks before reading
but these locks are released immediately - their only effect is to guarantee that
the transaction that last modified the object has completed that task.
READ UNCOMMITTED ensures nothing (T can read changes made to an item by an ongoing
transaction and the item can be further changed while T is in progress. T is
exposed to the phantom problem.
a read uncommitted transaction does not obtain shared locks before reading items.
This mode represents the greatest exposure to uncommitted changes of other
transactions; so much so that SQL prohibits such a transaction from making any
changes itself - a read uncommitted transaction is required to have an access mode of
READ ONLY.
APPENDIX-4
----------
PARALLEL DATABASE MANAGEMENT
Introduction and Overview (scaleup/speedup; what are the
applications?; OLTP/OLCP)
Speedup: Faster response withmore hareware on the same workload
Scaleup: More work on a greater workload
Applications:
Decision Support (DM, DW...)
Batch Processing (loading, archiving, updating, adding deleting
OLTP
OLCP (OnLine Complex Processing)
Parallel Hardware and Software
- Parallel Hardware architectures
- Shared-memory (multiple nodes share same memory and disks)
easy load balancing among CPUs
easy to do system admin
limited to number of CPUs (no more than 32?)
cannot tolerate single-point failure
eg, SMP boxes
- Shared-disk (multiple nodes sharing same set of disk)
no automatic load balancing
more complex system admin
better scalability
tolerates single-point failures (higher availability
eg, Sun Sparc 1000 cluster
- Shared-nothing (no shared memory or disk)
each node can be an SMP box
no automatic load balancing
complex system admin
goo