SECURITY = protection of data against unauthorized disclosure, alternation or destruction.
TYPES of DB SECURITY:
Discretionary Security
(owner-creator of object grants and revokes rights at his/her discretion)
Mandatory Security
- Enterprise requires security matches (clearances) for access to certain
data by certain users.
- used in military database systems, industry where trade secrets are involved
Discretionary security mechanisms
---------------------------------
-user-views
-authentication and authorization
-dynamic modification of user requests
SQL Security
1 View mechanism (can hide sensitive data from unauthorized users)
2 authorization subsystem (user can grant & revoke rights to others)
- initally DBA has SYSADM authority (can do everything and noone else can do anything).
- DBA uses GRANT verb to authorize others.
Then those others can further authorize still others using GRANT.
- GRANT... TO PUBLIC means all users
- GRANT ALL ... means grant all rights
- GRANT WITH GRANT OPTION ...U1 gives user U1 authority and the right to
grant authorization to others.
- rights include SELECT, UPDATE, DELETE, INSERT, ALTER, INDEX.
- all authorizations can be revoked by granter using REVOKE ...
- Revocations cascade if applied at the top of a hierarchy.
Data encryption can provide security (e.g., password file, etc.)
INTEGRITY = accuracy or validity of data
* Usually implemented using predicates (WHERE clauses).
* Most DBMSs have minimal integrity system
"Mandatory" Security
In 1982 summer a project was initiated by National Research Council
and Air Force Studies Board to study different issues in Multilevel Data
Management Security, at Woods Hole Massachusets. In their report they
have considered the following four generic approaches to retrofit existing
DBMSs in order to make them secure.
1. Physical separation (e.g., different discs per data classification)
2. Temporal separation (e.g., periods processing)
3. Logical separation (e.g., security kernel mediates shared resources)
4. Cryptographic separation (e.g., different encryption keys for each data classification)
Based on these four approaches ten different architectures were
obtained and out of these ten, three were found to be of particular
interest. These alternatives are given below.
(Notations:- Users: U, Data: D, Reference Monitor Filter: F, and the
boxes bounded by "+" signify a trusted component.)
Case 1. Physical (Temporal) Separation
______ ______ ______ ______
| D Hi |----| DBMS |----| F Hi |----| U Hi | Hi
|______| |______| |______| |______|
-------------------------------------------------
______ ______ ______ ______
| D Lo |----| DBMS |----| F Lo |----| U Lo | Low
|______| |______| |______| |______|
Case 2. Logical Separation
______ ______ ______ ______
| D Hi |----| DBMS |----| F Hi |----| U Hi | Hi
| | |______| |______| |______|
|++++++| ,-------------------------------------
| | : ______ ______ ______
| D Lo |----| DBMS |----| F Lo |----| U Lo | Low
|______| : |______| |______| |______|
----------'
This case is same as case 1 except for the shared physical media
(e.g., disc) to store the date and it requires logical separation
by a trusted disc or file manager in a trusted OS kernel. The
kernel also encapsulates the other security objects (e.g., DBMS,
filter, user interface).
Case 3. Logical Separation Plus
______ ______ ++++++++ ______
| D Hi |----| DBMS |----+ F Hi +----| U Hi | Hi
| | |______| ++++++++ |______|
|++++++| ,----------. + + ,--------------
| | : ______ : ++++++++ : ______
| D Lo |----| DBMS |----+ F Lo +----| U Lo | Low
|______| : |______| : ++++++++ : |______|
----------' `-----------'
This case has the same configuration as Case 2, but with an
interconnection between the Hi and Lo filters. This allows the
Hi user to access the Lo data, but it complicates the design of
the filters (which now must be trusted).
Case 4. Logical Separation Plus
______ ______ ++++++++ ______
| D Hi |----| DBMS |----+ +----| U Hi | Hi
| | |______| + F Hi + |______|
|++++++| ,----------. + & + ,--------------
| | : ______ : + F Lo + : ______
| D Lo |----| DBMS |----+ +----| U Lo | Low
|______| : |______| : ++++++++ : |______|
----------' `-----------'
This configuration is the same as Case 3 except that we collapse
separate filters into a single trusted object to simplify its design.
Case 5. Physical Separation Plus
______ ______ ++++++++ ______
| D Hi |----| DBMS |----+ +----| U Hi | Hi
|______| |______| + F Hi + |______|
---------------------. + & + ,--------------
______ ______ : + F Lo + : ______
| D Lo |----| DBMS |----+ +----| U Lo | Low
|______| |______| : ++++++++ : |______|
`-----------'
Case 5 combines the benifits of Cases 1 and 4. It shows good
security because of physical separation, and it allows Hi user
access to both Hi and Lo data (data access performance is more
complex for the Hi user, who must integrate separate queries
against Hi and Lo databases).
Case 6. Physical Separation Plus
______ ______ ++++++++ ______
| D Hi |----| DBMS |----+ F Hi +----| U Hi | Hi
|______| |______| ++++++++ |______|
---------------------. + + ,--------------
______ ______ : ++++++++ : ______
| D Lo |----| DBMS |----+ F Lo +----| U Lo | Low
|______| |______| : ++++++++ : |______|
`-----------'
This case is a combination of features from Case 3 and 1, with benefits similar to Case 5.
Case 7. Fully Integrated
______ ++++++++ ______
| | + F +----| U Hi | Hi
| D | ______ + Hi + |______|
| Hi |----| DBMS |----+ & + ,--------------
| & | |______| + Lo + : ______
| Lo | + +----| U Lo | Low
|______| ++++++++ : |______|
---------------------------------'
Case 7 is a fully integrated database, with a fully integrated
tursted filter and an untrusted DBMS.
Case 8. Partially Integrated, Physically Separate
______
| D Hi | ______ ______ ______
| Lo |----| DBMS |----| F Hi |----| U Hi | Hi
|______| |______| |______| |______|
-------------------------------------------------
______ ______ ______ ______
| D Lo |----| DBMS |----| F Lo |----| U Lo | Low
|______| |______| |______| |______|
In this case the Hi user has a direct access to an integrated,
physically separate Hi and Lo database. But Lo data
synchronization in both the containers is a problem.
Case 9. Partially Integrated Plus
______
| D Hi | ______ ++++++++ ______
| Lo |----| DBMS |----+ F +----| U Hi | Hi
|______| |______| + Hi + |______|
---------------------. + & + ,--------------
______ ______ : + L + : ______
| D Lo |----| DBMS |----+ +----| U Lo | Low
|______| |______| : ++++++++ : |______|
`------------'
This configuration offers the same benefits as Case 8 and resolves
the data synchronization dilemma.
Case 10. Partially Integrated, Logically Separate.
______ ______ ++++++++ ______
| D Hi |----| DBMS |----+ F +----| U Hi | Hi
| & Lo | |______| + Hi + |______|
|++++++| ,----------. + & + ,--------------
| | : ______ : + Lo + : ______
| D Lo |----| DBMS |----+ +----| U Lo | Low
|______| : |______| : ++++++++ : |______|
----------' `-----------'
Case 10 is functionally the same as Case 9, but a security kernel
is used to supply logical separation of the databases. System
performance and security depend on kernel performance and security.
These ten models generated can be gouped into the following four generic
security architectures.
Generic Architecture Model Cases Key DBMS Technology
------------------------------------------------------------
Physical 1,5,6,8,9 Distributed Nets
Temporal 1 ----
Logical 2,3,4,10 OS Kernel
Cryptographic 1,7 Cryptographic Integrity Lock
Out of these ten models cases 5,9,7,4,and 10 are of particular interest.
These 5 are identified by the key technology involved in providing the
security. Physical protection can be achieved via a "Distributed DBMS"
in a local net. Logical protection is achieved by using trusted filters
in combination with trusted OS software based on secutity-kernel
techonology, i.e., a "Kernelized DBMS". Although Case 7 is the simplest
design and offers the best performance, it is very onsecure because the
untrusted DBMS can modify data and classification labels needed by the
trusted filter. The key problem is the integrity of the database, and
the key solution technology is cryptography i.e., "Cryptographic Integrity
Lock" otherwise known as "Spray Paint Architecture".
The advantages and disadvantages of the three main archtectures:
Database Problem Benefit Architecture
------------------------------------------------------------------------
Separated Complex Hi Query Database Isolation Kernelized
Hi, Lo parsing Trojan Horse Resistance
Performance Direct Lo Query
overhead
Separated Database Database Isolation Distributed
Hi-Lo,Lo Synchronization Trojan Horse Resistance
Direct Hi and Lo Query
Integrated Trojan Horse High Performance Spray Paint
Hi-Lo Vulnerability Least Complex (Integrity Lock)
Kernalized Architecture Examples
--------------------------------
- Research Prototype SEAVIEW (SRI) : element-level labelling
- Commercial Systems ORACLE : relation-level labelling
========================================================================
SECTION B
---------
In the previous section we have seen different retrofit architectures
for Multilevel Secure Databases. This section describes a Concurrency
Control technique and Query Processing in Kernelized architecture. As
we know in this kind of architecture all writes are local and all low
level data item reads are local but remote (with respect to container).
While discussing a concurrency control mechanism our goal is to reduce
transaction response time while increasing system through-put. Query
processing mechanism also focuses on reducing transaction response time.
CC in kernelized architecture has two basic problems: security and
data availability. When a high level process wants to read low level
data item, it can not set a read lock on the data item since this process
would establish a covert channel. This could be done by a trozan horse
trying to send bits of 1's and 0's by locking and then unlocking low
level data which is being watched by a low level subject. Trying to
eliminate this problem causes the hign user to wait for the low level
data which creates the problem of data availability. A better solution
is to use multiversioning and hence high users can read older versions
of low data although it would be stale. Some people suggest to restrict
the number versions to two, one for the subjects at the same level of
data item and one for subjects of higher levels.
Let us consider 3 classification levels U (Unclassified) < C (Confidential)
< S (Secret). We use the following notations: Ti(S) means a transaction
Ti which has entered at level S and x(U) means the data item x at level U.
In our method we suggest to use a POSTROLL in the TCB and a ROLL at each
level. The POSTROLL contains one bit for each container and a '1' at a
particular position of a vector shows that that particular transaction
needs to access atleast one data item from that container. When a
transaction enters the system it declares its data needs and POSTs a
Vector in POSTROLL which would eliminate indirect conflicts between
containers. Then it posts the RV at the same level of the transaction.
Since we are using restricted *-policy (i.e., users are not allowed to
wirte up) the transaction would not POST any RV on the ROLLs that are in
the higher contaiers. If a transaction, Tk(C) say, wants to read a lower
level data item, x(U) say, it keeps a pointer to the private workspace
of the transaction Ti(U) iff Ti(U) is the most recent transaction in the
ROLL at U level that wants to write x(U) and the serialization order of
Ti(U) is less than that of Tk(C). When Ti(U) finishes writing X(U) and
about to commit Tk(C) reads the value of X(U). If Ti(U) aborts then
Tk(C) moves its pointer to the private workspace of the transaction which
satisfy the above condition. If at a given time there is no such Ti(U)
which satisfies the condition, then Tk(C) reads the value of x(U) from
the database.
Next we discuss some issues in Query Processing in Kernelized
Architecture. Here we introduce a security policy (like simple
security policy and *-policy) which we call Primary Key-Foreign
Key policy.
P-F Key Policy: A subject should never be allowed to reference up.
Let us consider two relations R and S where relation R has the
primary key and relation S has the foreign key. If the two
relations are at two different levels then relation R must be at
a lower level than relation S, otherwise it would violate the P-F Key
policy. Now we consider inserts and deletes into both the relations.
Insert into relation S:
We must check relation R to make sure that the value at the
foreign key field of the tuple being inserted exists in R.
Since R is at a lower level there is no problem in doing so.
Delete from relation S:
No problem.
Insert into relarion R:
No problem.
Delte form relation R:
We can not check the relation S for referential integrity
constrains since S is at a higher level. Because in doing so
if we find out that it would violate the referential integrity
constraint and hence reject the deletion, it would confirm
the user at lower level that the value exists at the higher
level. So we would let the deletion occur. Then we check for
the matching foreign key values in relation S and if any found
we would delete those tuples from S.
SECTION 2:
A system which enforces Bell-La Padula security policies (i.e., simple
securiy policy and star policy) is called BL-Secure. This is the minimum
requirement for a secure system. In addition to this some systems are
also free from a certain class of covert channels, (i.e., those caused
by contention for data). These kind of systems are called DC-Secure. A
system which is free from all kind of covert channels, is called Covert
Channel Secure or CC-Secure. The following figure illustrates these
three classifications.
,-------------------------,
| BL-Secure |
| ,-------------------, |
| | DC-Secure | |
| | ,-------------, | |
| | | CC-Secure | | |
| | `-------------' | |
| `-------------------' |
`-------------------------'
There are two main kinds of covert channels: storage channels and timing
channels. A storage channel signals information by indicating the presence
of a stored object such as a file etc. where as a timing channel does so
by modulating an observable delay. Besides these there are other kinds of
channels such as inference channels, aggregations channels etc. In covert
channel the information flows from one process to another. Where as in
other kinds of channels only one subject gets the information to which
he does not have access to. An example of inference channel is given
below.
Consider an relation EMPLOYEE with an attribute NAME and another attribute
SALARY and both are leveled as unclassified. But retrieved together it
becomes secret so that no unclassified user can know the salary of any
particular employee. Now if an unclassified user knows an employee to
be the president of the company and the fact that the president gets the
highest salary he can find out the salary of the president by retrieving
both name and salary separately.
Woods hole report discusses the near term solutions and long term
requirements for multilevel secure databases. There are ten different
architectures out of which three architectures are most significant.
They are 1) replicated or distributed, 2) Kernelized, and 3) Integrity Lock
architectures. Further details of these archituctures have been discussed
in previous notes.
In kernelized architecture data at different classification levels are
logically separated from each other. The classification of data could
be at relation level (example: trusted ORACLE), or at attribute level
(example: ?) or at row level (example: ASD by TRW) or at element level
(example: SeaView by SRI).
One of the long term requirements for secure databases as given in
Woods hole report is the dynamic classification of data. As per a
paper by some database researchers at Honeywell, the classification
of data could be either simple (example: name attribute of EMPLOYEE
relation is unclassified), or content based (example: name of employee
is secret if the dept. he works in is D15), or context based (example:
name and salary taken together is secret where as when individually taken
both name and salary are unclassified).
In the above situation, the use of rules can control the access of
users to the database more easily. But as constraints start growing,
the number of rules grow exponentially. To make the point more clear
let us consider the above example. Now if an unclassified user has
already retrieved the name and social_security_number of employee then
attribute which might establish the link between the names and salaries)
should be rejected. To find this out the history of retrival and the
rule which must be triggerd according to the history have to be stored.
Again this exponential growth of rules makes the search time an
exponential function which ultimately increases the response time.
Fragmentation of the relation solves this problem. In the above
example salary could be taken off the original relation and kept
separately as a relation itself with the same classification level
(unclassified in this case). Then another relation with name
and salary together could be created which could be accessed by the
higher class users only.
Open Question: Is there any classification constraint which could
not be handled just by fragmentation (horizontal or vertical)
of data?
These decomposition and recovery algorithms have the following advantages
over both Lunt's and Jajodia's algorithms.
1) The tuples that belong to entirely one level (i.e., all the
attributes at the same level) are stored in a relation.
Hence we can just retrieve them without requiring any join.
2) Unlike Jajodia's algorithm the full primary key need not
be stored.
3) The recovery algorithm results in lossless join which is the
main problem in Lunt's algorithm as pointed out by Jajodia.
4) The use of DVs help in retrieving the tuples much faster.
Sometimes we can do a selection before doing a join in order
to delete tuples that should not be retrived for a given
class.
5) Also it serves all the purposes of decomposition as mentioned
in Lunt's paper.
The Decomposition Algorithm:
1. Group the Multilevel Relation (MLR) R by the primary key class and
name each group R'(x) where x is the class of the PK attribute A1.
2. For each R'(x), do the followings:
2.1 Create a Domain Vector Table DVT(x).A1 for the PK attribute A1.
2.2 Create the single level relation R(x) = {t | all the attributes
of t have the same classification level}.
2.3 Construct a Domain Vector DV_R(x).A1 for the attribute A1.
2.4 Create R''(x) = R'(x) - R(x).
R''(x) contains tuples that have PK attribute classification x
and atleast one of the other attribute classification is greater
than x.
3. For each R''(x), where R''(x) is not empty, create the following
relations. (Note that for system high R'' will be empty.)
3.1 Construct the primary key relation R1(x) by projecting the PK
attribute A1.
3.2 Construct a Domain Vector DV_R1(x).A1 for the attribute A1.
3.3 For each attribute Ai (i not equal to 1) present in R''(x), and
for y >= x where y is the classification of attribute Ai, construct
attribute relation Ri(x,y)(A1,Ai).
Destroy R'(x) and R''(x).
The Recovery Algorithm
Let y be the clearance of the subject that has submitted the query.
If y is not system high then do steps 1 to 4.
1) AND the domain vectors DV_R(y).A1 and DV_R1(y).A1.
(This helps in finding the polyinstantiated elements.)
2) XOR the result with DV_R1(y).A1.
(This gives the tuples that are not polyinstantiated but have
at least one attribute at higher level and hence are not in the
relation R(y). This drops polyinstantiated elements which the
low level user is not supposed to know.)
3) For each 1 bit present in the above resulted vector, using the
DV table, find the domain values of the PK attribute.
4) Then select those values, gotten from step 3, in relation R1(y)
and perform a left outer join among R1(y) R2(y,y), ...,Rn(y,y).
Let us denote this result relation as R'_y.
else denote R'_y = R(y).
5) Construct relation R_y = R'_y union R(y)
For each x