>> Alright. So continuing with our study of ordered
associative arrays in the context of an application, which
we're calling Word Bench. Let me start out by reviewing from
your lecture notes the pertinent slides and scripts that define
ordered associative array. So the first will be
in your Chapter Ten. We're going to start with
Slide Six on associative array, and just sort of savor
that for a moment. So this is a special case of table. So in like table we're
storing key data pairs. What we have is three
operations - put, get, and erase. We have special associative
array bracket operator, which instead of taking an integer
variable takes a key-type variable as input and returns a reference to
a data type [inaudible] facility. Now, this is so-called
syntactic sugar in the sense that it doesn't add to
the functionality. It's essentially a different
name for the get function, and it is implemented here in the braces as returning get of K
where K is the key. So keep that in mind as we go. So you have a put, a get, and erase. Of course, you have clear,
and the usual things. If you like it [inaudible], but in
this case, in the light version, we're not going to have iterators. The big four are constructor,
destructor, copy constructor, and zone operator. You have to decide whether
you want those, and if so, you have to create them,
otherwise, disable them. In our case, we're going
to actually create them. And so, anyway, this is the
API for associative array. So let's look at the next slide,
which shows, and, by the way, I reordered these slides
from a few days ago. So this used to be about two slides down
the list, but it's now the next slide. This is using the associative
array bracket operator. So let's look at this would be a
client program of associative array. So we're going to have
an associative array. It's going to have a string object
for its key, an integer for its data. We're going to call it double
a, and that's its name. That's the associative array. So we're going to have a
string object called a key, an integer called some data, we've got
a [inaudible], an input stream IFS. We're going to open that input stream. Now, notice this structure. We can do one all IFS double arrow
a key, double arrow some data. Now assuming that vowel contains
first a string and then some data and then another string
and then some data, this will read a key data
pair out of that file. And if the, one of those two
operators, input operators fails, it'll terminate the while loop. So this is extremely convenient and
handy, and it follows the pattern for reading data out of files that
we've been trying to establish since the beginning of the previous
course in [inaudible] programming, but, anyway, this is a very cool way
to read that data out of the file. And then notice that there's exactly
one line of code in the while loop, and this is the line of code
that puts the data into a table. So notice that it's, the name
of the object is double a, and we are activating the
associative array bracket operator. We're putting the key we just read
in there as the so-called index. Of course, the index is
no longer an integer. It's a key type, and remember what that
returns is a reference to a data house, a data place in the table, and
so because that's a reference, we can assign the data we just
read into that data house. Now the subtle under the hood stuff
going on is that because, presumably, we start out with this
table empty, right, the first time we call double a bracket
a key for a certain key, it has, it plays the role of an
insert operation, and the, so that inserts a default data
location in that table associated with that particular key, and
brings you back a reference to it, and then because you
have a reference to it, you can write that data into that box. So this is a very subtle
thing that's going on. Now if you were to read a second copy of
that key, some, the same key somewhere in the file, this would
behave differently. This would retrieve the data box, the
data storage facility that's already in the table associated with that
key, and then you would override it with the new data item that
you read out of the file. Anyway, so you, presumably, though,
the file contains [inaudible] keys because it's a file of
data that you maybe stored from some previous run of this program. So, anyway, that's the file, and
that's how you read the file. And, now, so, of course,
we close the stream. So that entire file has been
rated and inserted into that. Then you could go into a
query loop if, for example, and so that would be enter a key. The user would enter a key, which
in this case would be a string, and then this would output the data from
that key and tell you what the data is. So this would be an example of that
associative array operator behaving like a retrieval or get function. It's returning the data
associated with that key. Of course, is someone entered a
key here that's not in the table, then you would be inserting
that key into the table, and so what you would output here
would be the default value of data, whatever that happens to be. So that would have inadvertently
resulted in an insert operation. Now then, I want to go
next to a demonstration of how these associative arrays work. And so I have conveniently already
established a log in to your homework, my homework for [inaudible]. So what I want to show is that
we have some sample tables here. My directories always get cluttered up. Sorry about that. So that's table dot 100, and, by the
way, that's table dot 100 dot com, which we'll look at in a
second, but it's table dot 1,000, table dot 1,000 dot com, and so on. By the way, here's a
table dot 100,000 dot com, and there's no table dot 100,000. I haven't gotten around
to creating that. So remember how our test harnesses work. F is for functionality tests,
and here's the menu for that. I'll clear the screen, and so the menu's
right now at the top of the screen. So you can load data from a file. One and two we're going to use for put and get pretty much the
rest of the semester. E is for erase. We can invoke the bracket
operator as an input or an output. So if we do bracket key equals
data, what that will do is insert that data [inaudible] set to
wherever the box is for that key, and that can be either an
insert or an overwrite depending on whether the key already
is in the table or not. And if I invoke it without an
equals, that the retrieval phase of the bracket operator,
and we got clear. Now we've got three different
escalatingly complicated version of dump, which I will show you later,
but a sizer, a traversal, a rehash, a copy assign test, and a switch
from batch mode and quit the program. So this is pretty typical of our test
or in this interface is very simplistic, and I guess some would also
say a little bit cryptic. So here, how does this work? Well, for example, we could
put, that's the number one. The key aaaa or aaa with data 20, and
that would, we could now do a traversal, and you would see that
we've got the key a and, in the table associated
with the data 20. We could do 1bbb30, and another
traversal would show you that bb is not in the table associated with 30. Let me show how the dumps work. D1 is the simple version. That's the one that was discussed
in the lecture notes already that just displays the
structure of this, and, remember, we're using left leaning red black tree
technology to implement these things, which we're going to
talk about later again, but that's why you're
seeing blue for black. Blue is the new black, and red is the
new red, and that's what that red black, left leaning tree looks
like with those two elements in it, which is not a huge deal. What I want to do, though, is load a
file, and let me load table dot 100, and you notice it says
table data red and stored. I'm going to now do a dump,
the first simple kind of dump, and that shows you that tree structure. These red black left leaning trees have
this peculiar property that they tend to get better with redundant inserts. They always ask all the nice theory that
the height is bounded, the height is o of the log of n, but they get a
little bit frazzled down at the bottom of the tree on the first
round of inserts. What I'm going to do is
call rehash, and if, again, if you recall from the lecture notes, what that does is essentially rebuild
the tree with a traversal of the tree, and that has the effect, in this case, of making the tree a
little more compact. So let's just look at it again, and you
see that it's, wow, it's pretty nice. So that's the new tree that restructure. So the next thing I'm going to do
is show you the two level of dump, which shows you all of the keys in the
tree, we don't bother showing the data in these notes because the data
is really just along for the ride. We show it, of course, in the traversal, but in the dumps we're
typically interested just in the structure of the tree here. So, anyway, d2 shows you all the keys, but it doesn't show you the missing
nodes, and notice that it's fixed up so that it doesn't, the, there's just
enough space used to output these trees in some sort of comprehensible fashion. I'll go ahead and show you d3 which
I'm at it, and you see it just puts in the dashes where there are missing
nodes, and that really is showing, notice that this is a wrap,
you're wrapping around. Actually, you may not be able, [inaudible] make sure
that's all on the video. So let me do that again. Does d3, the last two lines on the
screen are really one line in the tree. One, that's the bottom
layer of the tree. That's this bottom layer
of the tree right there. Recall all these star,
everything expands to, from one space to I guess five spaces. You need more than a screen to output
that last [inaudible] level in the tree. So, anyway, here's the d2. Now let me show you the menu here. We can erase stuff. So let's erase, of course, you can
try to erase anything, and it will or will not depending on if it's
there, but let's erase D-U-M-P. So we erase D-U-M-P. Let's erase G-Z-X-Z. Oh, let's do one more and erase X-W-U-Q,
and so now I can do this level dump. I must have tried to erase something. It wasn't in the, oh, no. Let's see here. [ Background Sounds ] That's your d2, and then so, I
guess I've erased two things. I must have tried to erase something. It's actually not in the tree. Mistyping. In any case. So those two things are
now the two of the, they are the two tombstones
in this tree. Remember how, what erase does
is just mark things dead, but leave the tree structure unchanged,
and use those as kind of stepping stones to for other live nodes down the tree. So we could erase, for example, the
root, which is P-X-L-B, and you see now that the root has been erased, and. [ Pause ] We've, of course, need the root to get
to the rest of the elements of the tree, but, you know, so there you see it. The root and two other
things are tombstones, but still the root makes a
nice sort of stepping stone to the data that's still alive. So anyway, onward here. So we can then, we could then put
PX, well, let's put X-W-U-Q, and put, it's going to need the
key as well as some data. So let's put that in there
with the data, and then 100. You know what I did is
I use p for put and p. [ Background Sounds ] P isn't the symbol for put. It's a defined symbol. So let me go back then, and
here's that table dot 100 dot com. Notice what that does is call load table
dot 100, and it calls the size function. Then it calls rehash, then it
calls the size function again, and then it switches to keyboard input. Let me just run that, and. [ Background Sounds ] You know, that actually
saves a little bit of typing. So that's, so it preloads
the table, calls rehash, and so now we've got
a table we had before. So let's see. What we were doing is we were
erasing something H-F-W-A, and going to erase D-U-M-P, and
we've got a couple of tombstones. The tree structure is still the same, but we've got a couple
of dead nodes in there. So if you were to insert, I've got
to remember to use 1's and 2 here. So 1, I'm going to insert, let's
say, D-U-M-P, with data 100, and, sure enough, it put it in there. So let's see if we can get D-U-M-P,
and that'll retrieve the 100 that we just put in there,
but we can traverse. It's going to be, you know,
a hundred things going by, but D-U-M-P is in there somewhere, and it's the one that's had
its data upgraded to 100. Notice, by the way, that, well,
it's probably not noticeable to the naked eye, but that traversal
would not have included the tombstone still in the tree. So let's me put this back to sleep
for a moment and go to our slides and review some of the slides in
our Chapter 13, which is the chapter on red black left leaning trees, and
we're going to start with, let's see. Start with Slide Seven. So we're just reminding you about
red black left leaning trees, and so it's basically a
binary search-free technology, but we managed the depth
of the nodes using colors, and there are these five rules. Every node is either red or black, and,
remember, in our visuals, blue is black. The root is black. The left leaning property
says that if a node is red, then it has to be a left child. It can't be a right child. Alright, limiting property is,
though, on any root to null path, you can't have more than two consecutive
red nodes, and the balance property is that all the root to null path has
the same number of black nodes. And remember the theorems, the theory
is that the height is o of log n or you can say theta log n. Binary
search tree search, therefore, has run time, o of log n, and insert
and remove would also have run time o of log n, and there's some theory that
we went over that shows how to prove that the height of a left
leaning red black tree is less than 3 times a base 2
logarithm of n plus 1 where n is the number
of nodes in the tree. So that's all the theory we
went over, and what we want to do is just quickly skip
ahead to remind you of things, and then get to the practice
of implementing all this stuff. So [inaudible] examples of
red black left leaning trees. This is one that's OK. It's got no more than
two red nodes in a row, and the red nodes are
both left children. So that's OK. This is bad because here are two is a
right red node, and that's not allowed. This is bad because we have
three red nodes in a row. And this is just kind of showing the
output from our [inaudible] programs, and, you know, that's really
indicative of a tree structure. You can make it look symmetric
if you want to with an editor by just putting spaces inappropriately
to make it look more like this, but this is the output
you're going to get from the actual demonstration
mode functionality test programs. So that's your dump output. And there are some demos, which
we just looked at one of them. The implementations of, require the
notion of a rotation, and the pictures of those rotations, again,
this is just review. We're going to have a, we got a node n and a node p. N has the [inaudible]
value 90, and what we're going to do is rotate about p,
if you like, to the left. So the two things you have to
know what is p, and the, well, I guess you need to know what n is. You can find p by saying p is
n's right child to start with. So if we do a left rotation about
n, what we want to end up with is p, but where n used to be, n
is kind of rotated around, and we have to redefine
exactly two links. The right child of p, I'm sorry, the
right child of n and the left child of p. So what we do is we either find
the right child of n to be the form or left child of p, and we redefine n to
be the left child of p replacing that. And so just by a couple of changes of
node pointer values, namely those shown in this cod right here, you
accomplish that rotation. Now it gets slightly more
complicated if you have to preserve all the red black
left leaning properties, but that is taken care of as well. So this is example of the right
rotation, the picture is an example. Here is the actual code
of a rotate right. Now, we get to the core of things. The insert is, the insert for
these trees is accomplished with a recursive call
which is going to recurse on node pointers and
return a node pointer. So that, the insert looks like this,
or insert of the root comma the value, and it returns a pointer, which
we've redefined to be the root, and then we make sure the root
is black because that's one of the rules for these L-O-R-B trees. What that does is postpone
all our hard work down to this recursive function here. And that recursive function,
we went over in some detail in the lecture notes on
left, on balance trees. So I'm just going to remind you of
it here that it's here and go on. This is repeating it
with different kinds of documentation, feeding the code. Then we get to some experiments
which we will go, which we went over with the
lecture notes on balance trees. So what I want to jump
to now is Slide 19. [ Background Sounds ] Actually I guess it's 17. [ Background Sounds ] Well, let's go with 19. It's what I have in my notes. So this is just reminding
why lazy removal works, and that's basically the idea
of setting the node that we did, but not making any structural
change in the tree, and we call that erase
rather than remove. So to erase the node, what you
have to do is just find it first. So node is a local node, n is a
local pointer to node variable. Of course, it starts out the root. Only place you can start out in a tree,
and then while this node is not 0, if it's value is, I'm
sorry, if t is value less than the value on the node, you go left. If t is the value, greater than the
value of the node, you go right. If it's not greater than and not less
than, then it's equal, that's our kind of definition of, or one
of our principle assumption about our less than operators. And so that means you've found
it, and so you set it dead, and just return at that point. Otherwise, [inaudible],
this thing just runs out. Falls out the bottom of the
tree and nothing gets set dead because you didn't find
what you were looking for. So you have to define
your size functions. Remember we did this back
for binary search for these. How do you define the size, and,
again, it's a recursive call to r size, but r size now has to
be slightly modified. The old r size we're now going to
call r num nodes, r number of nodes. R for recursive, number of nodes. This is what the old size looked like. If that's your pointer, if the
pointer is 0, you return 0. Otherwise we return 1
plus the number of nodes in the left sub tree plus the
nodes in the right sub tree, and the 1 is for n. The difference
here is that n may or may not be alive, and if it's not alive, you don't
want to count it in the actual size. You can count in the number of nodes. That's sort of a structural question,
which you don't want to count it in the size, which is a set question. How big is the set. And so a simple way of accomplishing that is calling nr is alive,
which is Boolean valued. You can think of that as either 0 or 1,
and do a ks over 2, the counting domain, and that'll become a true 0 or a true
1 depending on if n is alive or not, and so you count it here as a
1 if it's alive, and as a 0, that is to say not counted
if it's not alive. So the notion of set retrieve, which
we went over before and is not going to be relevant to our associative array. So we're going to skip
that down to, well, let's remind ourselves
the engineering details. So we have flags. We're going to enum,
let's enumerate a type. Zero is just the name of the 0 flags. They're all going to be thought
of as unsigned eight-bit integers. Dead is the one bit, the first bit. Red is the, red black, dead alive is the
first bit, red black is the second bit, and there's a couple of more we're
going to talk about in a week or two. Left thread, right thread,
and then here we define, went ahead and defined a default. Red left thread, right thread. You don't really need to do that, but this slide illustrates
how you could do it. Anyway, so the node has a value
in it, and it has a left child and a right child pointer,
and it has been expanded to include one byte of flags. So we have eight flags available. In this course, we will only
use at most four of those flags. That leaves four other flags for,
that could be used for other things. For example, if you wanted
to do AVL trees, you could use some of
those flags for that. And these things are, these
functions define four nodes are, they're one liners, but
they're very cryptic one liners. So, for example, is red depends
on whether or not the red bit in the flags variable is 1, which
is the same as asking is it not 0. And so returning 0 not equal to red
[inaudible] flags, that's kind of hard to remember and easy to mess up. So we have a function is red. That's easy to remember
and hard to mess up. So that's why we have that function. We've got is red, is
black, is dead, is alive. We've got set red, set black, set
dead, set alive, and we need all eight of those now to manage the red/black
structure that we're going to implement. So some more engineering details
are that we have a color map which assists us in creating
these displays, and we have, because we're not going to have
iterators, we're going to need some way to make copies of trees, and so we do that with function object
called copy node. This is a function object. Here's the overload of the operator, and
it takes, of the [inaudible] operator, it takes a pointer to a node. If that node is alive, it sets, it
inserts this, it inserts a new node at the new root, and then
sets the new root to black. The new root is just the thing you
pass in, and so we have a new root which is recorded as data in the
function object, and a free, I'm sorry, a associative array ref or point
or the, we've recorded in this. So, for example, here's
how rehash would work. You define new root to
be a node pointer. It can be initialized as 0. Define copy node to be a function
object based on new root and this tree, traverse that this tree
or that copy node, OK, what that has done now is create a
new copy of the tree at new root. So we can now clear the old tree and let
the old root be equal to the new root. We've essentially built a new
tree, got rid of the old tree, and now the new root, the old, the new
root is being copied to the old root. Now, only reason for doing that is
we copy nodes only if they're alive, and so you would use this when
the tree is full of dead nodes that you want to get rid of. So this is the table API, which
we'll come back to in a later day. The table node is similar to the tree
node except that we have key and data. It's the only difference. Table or tree, we'll come back
to later, but what I want to get to is associative arrays here. So we have put define in terms
of get, and get defined in terms of recursive get, and what's subtle
here is that recursive get has to take a pointer, a key you're
looking for, and a location variable that is going to be sort of the probe,
and it's going to be set to the location in the tree where k was
found, if it was found. And so that location gets set to be
pointing to where k is, and, of course, if k is not in the tree to begin with,
this would insert it into the tree. So it always ends up the location
pointing to where the key is. And then what we return is the
data associated with that location. So let's talk a little
bit more about that, and we do that with looking
at the narrative here. [ Background Sounds ] So this is your put, this is your get
that we just talked about in the slide, and what's in the narrative is the
recursive get that gets called. So this is recursive get, and, again,
notice that it takes a node pointer, it takes a key, and here is
its location being passed in. It's a pointer to type node
being passed by reference, which is a little unusual kind of
notation to see, but you're passing it by reference because you want that
location value to be modified. It's going to end up being a pointer,
which is the address of the node where k, the key is, k
[inaudible] is found. So all this work. Just like in the past with [inaudible], with red black left leaning
tree is an ordinary BST surge. If the node pointer is 0, that means
you're at the bottom of the tree. So you, that location
just be a new node, and notice that new node caused the
default constructor for your data type. It puts the key in there with it,
and make sure that the color is red, and then you return that location. OK. So that's what happens if
you get to the bottom of a tree. Otherwise, you keep going
just like before, and. So if the, you know, if the key is less
than the key in the node where you are, you make a call to r get with a pointer to the left child key
[inaudible] and location. Otherwise, to the right child. And, finally, you get the case where
it's a lot, where you, if they're equal, and if they're equal, that
means you've found the location. Now for other kinds of things such as
insert, you would copy the value past n, some kind of value past n, into
the value for the data associated with the node, but we don't do that. We just need to create the node and/or
find it and return a reference to it. So we found it structurally so
we make sure that it's alive, and then we set location
to be that pointer, and there's no recursive
[inaudible] here. So everything unwinds, and
now location has been set to the found place in the tree. And then we go through
the same thing as before. On the way back up the tree, we
ensure the left leaning property. Remember how this goes is
on the way down the tree, we ensure the node count properties, and on the way up we ensure
the left leaning properties. So the net effect of that
is to set location by, it's a pointer passed in by reference. So it gets set in there. In either case, you found
it or you didn't find it, and when you didn't find it,
it gets to be the new node. When you did find it, it gets to
be the place where you found it. In either case, location is set and gets
returned by reference, and, in fact, this returns a [inaudible]
node is important in the recursive calls structure of
the function, but what's important for get itself is the location,
OK, gets set by this call, and we set the root,
the new root to black. It may not be changed,
but we, there's no harm in just setting it to black [inaudible]. That's quicker than testing
an n setting, and, finally, you return the data associated with
the location that got dragged back from the recursive get,
which you can kind of think of as the deep-sea fishing expedition,
location being the result of the catch. So that brings us really
back to the project. So let's go back to the
homework for document, and this, so let's talk about what
you've got to deliver. Well, you have to deliver Word
Bench Two, which is just kind of a refactored Word Bench. That's not going to be
a problem, but the OAA, order of associative
array, version light. That's the one without iterators. We've put get in the bracket operator. That's the tough one that
you've got to deliver, and that's what we've
been talking about here. Now it's a major project
to develop all that stuff. I'm talking about one that
would take maybe a semester. So, clearly, I'm not going
to ask you to do that. So what I'm doing is giving you a
lot of this stuff already laid out. So, in effect, you have this. It's already in a file distributed. This is defining a class OAA. Here's your key type, data
type, [inaudible] type defined. This is your constructors,
etc. As your bracket operator, put get [inaudible] raise clear rehash. Empty size, number of nodes, height. We are going to use the
recursive transversal technology because when you do need to traverse
trees, and if you don't have iterators, you have to go to the
recursive traversal technique. As usually, we'll have something
called display which will be a way to define an output operator for a
tree, and we got our dump methods which are kind of the development
assistance to show you in various levels of detail what the tree structure is. Those come in real handy
for both demonstrating and also debugging your code. And I've color coded things here. The red here, just another color
at this point, but that's the, showing you the enumeration
of the flags. Now I've commented out the true
definition of default [inaudible]. There it is. So your default is red,
but it's also alive. And here's your color map. And, finally, here's
the definition of node. It's in black. So you got your key data,
left child point or right child pointer and your flags. Here's your node constructor, and
we're going to make OAA a friend of this node class, and you've
got your [inaudible] and your sets for the various flags for a node. Changing colors again, we have the class
print node we were just talking about. I guess we talked about
copy node, not print node. So there's copy node, and we talked
about, completely implemented for you. Print node is also implemented, and all
it does is go through the tree and print out the data in the node where it is. So. [ Background Sounds ] So this is back. This is, well, we've talked about
two sub, three subclasses here. One is the node, one is the
copy node function object, and one is the print
node function object. Those are the red, the
green, and the blue. Those are three classes
that are inside of, defined inside of our
main class, which is OAA. So we're back out at the main class OAA. Its private variables are the
pointer to node called root, and a predicate object
called [inaudible]. These are your recursive, well,
these are static functions, most of which are recursive. View node is just, encapsulates
new node in such a way that you can supply error
messages in a uniform manner. Then we got recursive release,
recursive clone, recursive size, recursive [inaudible], and recursive
height, and those are all defined just like in the lecture notes,
and the code is given to you, rotate right and rotate left is given. Your r traverse is given. That's going to be a recursive function
that takes as a, takes a function object and a node pointer, and basically
recursively goes through the tree in an, in order traversal and applies
the node of object f to the, to each node as it gets to it. And we've got recursive left leaning
get and recursive left leaning insert. So the, you're going to be responsible
for implementing recursive get here and I think recursive insert. So you've got to, so that's where
the rubber meets the road in terms of defining this left leaning
red black tree structure, which is implementing your order to
associative array is getting the insert or the get done correctly,
and I'm asking you to do both because the code is very, very similar
but subtly different in key places, and I would like for you to understand
that, and the best way to do that is to write the codes yourself, and your
understanding kind of flows, you know, from your head through your
fingers to the keyboard and then back out so to speak. So that, in a nutshell, is the project,
and I know you're going to enjoy it. And good luck with it. So.