Chapter 1
> In the section 'introduction' when you have defined what is database, you
> have written that "REPOSITORY" implies "persistence".This is little bit
> confusing to me.
> Could you kindly throw some light on this matter?
Persistence means that the data create or changed by the program survives
beyond the termination of the program. That's what databases are for -
repositories for this persistent data following the termination of the
user query or user program that uses the database.
Chapter 2
> 1)While using MOD function as hashing function, you have to
> choose a prime no.Is it because one gets all the reminders
> or there is any other reason ?
Any divisor will do, however, studies have shown that prime numbers give
one a better distribution (more even distribution of values to buckets).
That's the only reason for choosing a prime. It is certainly not
mandatory either, if there is a reason to prefer another divisor
(e.g., allocated, say 16 pages, then 16 would be a good divisor).
> 2)How do you choose the prime no.?
The divisor is chosen based on the number of buckets (it is always
the number of buckets available in the case of a MOD hash function).
> 3)In the extendable hashing section, you have mentioned
> about local depths. It is little bit vague to me now.
The local depth is the number of bits used to get to that page.
The global depth is, essentially, the maximum of the local depths.
> 4)In the same page , you have used the concept of page-
> splitting. How did you choose page# 17 and then page#32'
The idea is that a request is made to the OS for a page and there is
a page allocated. It could have any page number whatsoever, so I
just pick randomly.
> Some other faculty in the
> department do not have a very high opinion of it as a database.
> Faults sited include a lack of current development and lack of
> features. From my limited knowledge of this I gather that these
> criticisms are unfounded.
Probably the criticisms come mainly from those looking for production DBMS to use,
rather than one to do research with. Postgres is not a commercial product and if
someone is interested in a full-featured, supported, commercial product, Informix is
the commercial product that came out of the research Postgres prototype. However,
for our purposes, Postgres is a good choice because we have source code (can change
things to do our research), whereas finished commercial products are never available
at a source level.
> The DBMS, uniVerse (by informix, I beleive, but I have heard rumors of
> a merger so I'm not sure of the vendor), uses dynamically
> allocated space for different numbers of attributes in a single table.
> so that a table could be defined as:
>
> ID FIRST_NAME LAST_NAME ADDR HOBBY
> 3 JACK FROST 3rd reading
>
> Then when a person is added with two hobbies the table would contain
> records of both the above type and this:
>
> ID FIRST_NAME LAST_NAME ADDR HOBBY HOBBY1
> 3 JACK FROST 3rd reading
> 4 JOHN DOE 4th biking singing
>
> Allocating only enough space for the additional hobby in the record
> which contains two hobbies, not wasting space in the first record with
> only one hobby. ad infinitum as new hobbies are added.
>
> This would be done usually as multiple tables such as
>
> ID FIRSTNAME LASTNAME ADDR
> 3 JACK FROST 3rd
> 4 JOHN DOE 4th
>
> ID HOBBY
> 3 READING
> 4 BIKING
> 4 SINGING
>
> I beleive. Different Anyway.
>
> I might guess that perhaps the dynamic memory allocation is actually
> allocated in separate table and that it is merely hidden from the
> programmer. It does seem to bring in some rather annoying questions if it is not.
You may be right. Another, possibilty is that those systems taht show users a view which
has the separate table (and in "normalized" in the sense we will talk about when we get to
normalization) may actually store the data by just allocating space the way uniVerse
does. Many times there is substantial difference between the way in which data is
store and the way in which it is "presented" for view to users.
> Any thoughts on this system? It just seemed like a rather curious way
> of looking at the system.
It would be quite interesting if uniVerse actually presents the data with
non-flat (repeating groups) and still claims to be relational.
That would be a pretty serious abuse of the terminology, "relational".
If they claim only "object-relational" then, like most object-relational
system, complex datatypes are accomodated through the "Large Object" construct
(which allows a pointer to a complex object in any field instead of a single
value, basically).
B-trees: Consider the follow B-tree.
>
> 32|50
> / | \
> / | \
> / | \
> 15|17 35|40 52|60
>
> Where each node can hold a maximum of 2 entries. In this case how to insert
> a data say 55 in to the tree?
As in the notes, when the appropriate node is full, split it into 2 nodes
and promote the middle value the the next higher level (and if that node
is full, split it and promote the middle value, etc.)
Thus, we split 52|60 and promote 55:
32|50
/ | 55
/ | ^
/ | :
15|17 35|40 52|__ 60|__
32|50 is full, so we split into 32|__ and 55|__ and promote 50:
50|__
/ \
/ \
32|__ 55|__
/ | / \
/ | / \
/ | | \
15|17 35|40 52|__ 60|__