>> All right.
So today we want to talk about sets and
maps and, time permitting, homework 3.
So looking at the intro to sets,
[inaudible] notes, I just remind you
of what we mean by generic containers.
Some template class, that's
a proper type that is capable
of storing arbitrary
numbers of T objects.
And an associate container is one
that supports an associated
iterator class a type iterator.
And supports -- I'm sorry -- and
an organizing accessible by value.
So, you may insert objects or
move objects in the container.
But what you as a client may not do
is determine the storage position
in the container.
So your positional details are left to
the implementation and become irrelevant
to the client's needs
and the [inaudible].
Typically that will be iterators
associated with associative containers,
but they'll typically
be constant iterators.
In other words, they can allow you to
go through the container or traverse it,
pick things out of it, look
at -- inspect items in it,
but typically don't allow you to change
the value of an item in the container.
There's an exception to that.
Works will tell you that,
which we'll talk about later.
But typically because the set
uses the -- because the container,
the associative container uses the
values that are stored in it as part
of the [inaudible] self-organizing
system, if you went
and changed the value in place you
might destroy the organization system
of the container.
And so, generally that's -- you're
prevented from being able to do that.
So, if you want to, you could have
a certain, say, key plus data stored
in an associative container.
Being able to change the key,
you typically have to remove that
and then reinsert it with the new
key rather than just change the key
as it sits in the container.
So, this can get quite confusing
because we still have iterators
and we still can iterate
through a container.
So I just want to talk for
a minute about the client.
For those in a position where have you
have a client perspective and a sort
of internal perspective
or structural perspective.
So from the client perspective
you can do the following.
You can do the standard
traversal for loop,
which is where the object container
type, that loop looks like this:
For C double colon iterator.
That names the internal
type called iterator.
Now you can see that
begin [inaudible] C dot
in plus plus i. We've seen
that loop a million times.
And you do whatever you need to inside.
Now what that loop is doing
is going through the elements
of that container one at a
time from beginning to end.
And it gives an external
feel for some sort
of sequential structure
to the container.
But that sequential structure is really
only in the line of the client program.
It does not have anything to do
with the actual structure
in the container itself.
And in fact, if we were implementing
something like a set, what you're going
to learn over the next several weeks is
that there are quite a few widely
differing ways to implement a set all
with the same external API, but
internals all being quite different.
And so how an iterator
manages to get from one item
to the next internally depends
on how that set is implemented.
And it [inaudible] very widely
from one implementation to another.
But remember, in an associative
container,
a client may not dictate
the position of an element,
really from either perspective.
So you can't decide in an ordered
set, let's say, of numbers,
you can't as a client decide
you want 3 to come before 2.
The set will say no, 3 comes after 2.
And you won't be able to
change the iteration order
because the set will insist on
storing them in a certain way.
So let's look at some examples of
-- sorry -- one more little detail.
This now a notion of multimodal
versus unimodal associative container.
So, a multimodal associative container,
duplicate elements are allowed.
In set terminology, these
things are called multisets.
So you're allowed to have
a set with two copies of A,
three copies of B and C in it.
And that multiset would have
six elements, A, A, B, B, B,
and C. But normally in a set or
uniset, you would not allow that.
And that would be the case of a
unimodal associative container.
Well, in computer [inaudible],
a multimodal set is typically
called a multiset.
That's what it happens to be
called in in the standard template
of [inaudible] C plus plus.
It's also called a bag.
That term is widely used.
It's used, I believe, in your textbook
and quite a number of other places.
It's also sometimes set - you see this
for both uni- and multimodal cases,
relying on context to
distinguish [inaudible].
Now, a unimodal associative container,
duplicate elements are not allowed.
So this is a little more
complicated because that means thanks
to our operations kind of
have a dual personality.
Let's say you want to put A into
a set, S. And let's say that A --
it does not already belong to that set.
Then inserting A into a set,
S, will make the set, S,
one element bigger in size.
And A will now be an element of the set.
But what happens if you would insert A
into a set when A is already in the set?
Well, you might think well nothing.
But not really because it might
be that A that you want to insert
and the A that's in the set
look the same structurally
but they might have different
data associated with them.
For example, if the object A might be
some sort of record, student record,
and determining whether the two
students are the same would be whether
or not they have the
same student number.
But, the student record A that you want
to insert might have the same
student number as the record A that's
in the set, but it might have updated
information in other respects.
So you might have updated
your own [inaudible].
You might have another set of
semester grades in the record.
And so when you insert A into
a set that already contains A,
rather than do nothing, what you do is
you copy the new A on top of the old A.
So it amounts to an update
of information.
This is very important to keep in mind.
You can't store.
Now, we're going to have an
ordered associative container.
We're actually going to
talk about them first.
Then we're going to have
unordered associative containers.
They will come towards
the end of the semester.
An ordered associative container
typically is structured using the order.
And that order can be either
a standard less than operator
or it can be a predicate
class which is passed
in when you create the set object.
And that predicate would then be used
for a structural setup for the set.
So, let's take a look at the
associative container API.
Typically you're going to
have an insert operation.
And that's taking a type T element.
And I have it in there as a const
reference to T. Some people might put it
in as passing it in by value.
Functionally equivalent.
One's just a little more
efficient than the other one.
Remove an element T. And notice
when you insert T into a set
or an associative container
you're putting it in there.
And that may be multi-
or unimodal insert.
When you remove, if it's a multi
container, then there might be more
than one copy of T. And so how
do you know which copy to remove?
Well, we just remove them all and
return the number that you removed.
That's typically clear, which
is making this thing empty.
So you have insert and remove.
And then you have some ways of finding
stuff in the associative container.
And then an includes method that
is typically part of the API.
And what that does is return an iterator
pointing to the location in the set.
Now, it's pointing to
the element in the set.
I shouldn't really phrase it that way
because location only
makes sense to the set.
It doesn't make sense to
the user or the client.
But nevertheless, if you're a
client and you call includes,
you will either get the end iterator,
which will tell you that T is not
in the set or you will
get a non-end iterator.
And what that will do is when you
[inaudible] reference it will give you
that element that's in the set.
Then we take that empty and size
operations and the usual support system
of equality and non-equality
operators, iterator support
for bidirectional iterators and
what your textbook calls the big 4:
constructor, destructor, copy
constructor and assignment operator.
So we're going to concentrate
now on the four operations here:
insert and remove, clear and include.
So that's pretty much all you need
to do with an associative container.
Put things in, take things out, take
everything out and find an element
in there and retrieve
the data associated.
Notice that your includes
is a const method,
which means it doesn't change the set.
Of course, insert and remove pretty
much guarantees to change the object,
make them [inaudible] so
they're not [inaudible].
Clear, obviously, changes it.
Empty and size don't.
So your const methods would
be includes, empty and size.
And your working methods,
non-const methods,
would be insert and move and clear.
Now, if it's an ordered
associative container you're going
to have a few more operations
in the API.
And I've highlighted
them in garnet here.
Lower bound and upper bound.
And lower bound means the same thing
as before reinterpreted as follows:
So it's the first position, p, in the
container with T less than or equal
such that that position value
is bigger than or equal to 2.
And upper bound is the current position
such that the position value
is strictly bigger than 2.
Of course, bear in mind, those would
not make sense unless you had an ordered
associative container.
So the ordered associative container
does give you these other operations.
Now, so let's start talking about an
ordered list and a multimodal list.
How would we create a list
that maintains the order
and therefore implements
the order set API?
Okay, let's call it MO list.
M is for multi, O is for order, list.
Notice that I'm going to have an
element type as a template parameter.
I'm also going to have this P, which is
a predicate, which tells us what order,
how to keep order in the list.
Notice that it had a default
value, which is ordinary less than.
So if you don't specify a predicate,
you'll get the less than operator.
If you do specify, then it
would be whatever you specify.
Now, underneath this we're going to have
a list that's an ordinary list, right?
Position oriented list like
the one you did for homework 1.
And notice that's [inaudible]
or protected.
And you're going to store the
predicate object called pred.
So we've got list underscore
and pred underscore.
Those are our two primary or protected
class data members of the class MO list.
So you got your equality operator,
you've got some terminology support.
T is the value type we were
storing, P is the predicate type,
generator is an ordered list iterator.
And by the way, a const iterator
is also an ordered list iterator.
So if somebody says I want an
iterator, they're really going
to get a const iterator but they'd
be allowed to call it an iterator.
You cannot make them
call it a const iterator.
But in either case, [inaudible]
const behavior.
Now, so we've got insert and
we've got a secondary insert.
So what I've highlighted here
in green are the standard associative
container, insert, remove and clear.
We've also added for
convenience an insert
that includes an iterator
pointing to the location.
And you might say, well, we're
already breaking the rules.
And the answer is, no, not really,
because what we'll do is check first
that that's a legal position.
The container is allowed to
reject the insert if the value
and the structure would be
broken by the insert operation.
So, it looks the same as the
insert into our ordinary list
but in fact you give the container
the opportunity to reject that insert
if it's going to mess up
the internal structure.
So in this case, of course, it's
going to be an ordered list.
And if you're calling
a place in the list,
it might be through some previous call
to, say, includes, you have an iterator
that you know is going
to the right place.
And so, rather than have to do
another search to find that place,
you might get at that [inaudible].
And the iterator is the
way to insert it.
And then the standard
remove is remove by value.
But we also have a remove at a location.
And if we had an iterator pointing
to someplace in the container,
well it certainly makes sense
to say remove that element.
That can't really mess up the structure.
We would just simply remove
the element from the set.
Includes is there.
We're going to postpone
a little bit talking
about lower bound and upper bound.
You have iterator support
begin, end, empty and size,
assignment operator, constructors.
There's an explicit constructor
that takes a predicate argument.
So if you want to build one
around a specific predicate.
And there's a copy constructor
and a destructor.
And there's also technically things
like suppose you want to know, hey,
what am I using to determine
the order in this set?
You can get a copy of the
predicate and inspect it,
the const reference [inaudible].
There's the [inaudible] display
and there's a developer's
helper, which are dump.
And we're going to have a check MOlist,
which will check the internal structure
of this multimodal ordered list.
And we're going to have a check
UOlist, which will check to see
if it's a unimodal or a US.
And these are just developer's
helpers that, you know, we have,
but we probably wouldn't leave them
in when we made this thing
available [inaudible].
So, there's really not too many
things we have to [inaudible].
Far fewer than with the
positional lists.
So, how do you make the Olist iterator?
Well, really, it's just going
to be a list const iterator.
And what it's going to do is
iterate through a list right here.
And as long as that list presumably
has an order maintained in it,
and so iterating through it we can
do with an ordinary [inaudible].
So, right here you see the
complete implementation
of the ordered list iterator.
Those are some type definitions.
I'm sorry -- those are
some declarations.
And we're going to make MOlist
and UOlist a friend of this class.
And the terminology just is the same
as for the ordered list class itself.
So, are two of these iterators equal?
Well, they're equal if and only
if their underlying ordinary
list iterators are equal.
They're not equal if they're not equal.
The referencing is just returning
the new reference private member.
And so you just translate everything
down to the private variable,
which you've stored.
And that's [inaudible],
this iterator class.
So the completed implementation is
right here before you [inaudible].
Now, we're going to make a unimodal
list, ordered list, by deriving it off
of the multimodal ordered list.
And the question is what would we
have to change to make it unimodal?
Well, the only that's different
is the dual personality info.
So we would have to override the insert
methods with nothing [inaudible].
Of course, you've got to find the big 4,
the constructors and
assignment operator.
And that's it.
So, all we've got to do
functionally is override the insert
to make it unimodal instead
of multimodal.
And everything else will
work just the same.
And by the way, our Olist
iterator works perfectly well
for either multimodal or unimodal.
So that same iterator class works
for both of these two set types.
So, here's your multimodal
insert operation.
We're going to make a list iterator,
not an ordered list iterator,
type inter starting at the
beginning of our underlying list.
Do sequential search.
That's what this is.
And it keeps going as long as --
I have written there not
paren T comma star iter.
That's like say star iter is less than
or equal to T. And it stops wherever
that becomes -- as soon
as that becomes true.
And then it calls the underlying list
insert operation at that location.
And it's going to return
true if and only if --
that's probably supposed
to be a [inaudible].
So we have this iter
returned by the list insert.
That returned an iterator
according to a new element.
If it fails, it returns
the end iterator.
So this should be a [inaudible]
that you can see right there.
I need to go fix that in this slide.
And then, now, for the unimodal one,
you perform the same sequential search
except that your prediction is you want
to have the value less than 2.
And then what you want -- and then
you stop that sequential search.
And presumably that's the first place
where the value is not less than 3.
And then you have to check
whether the iterator is valid.
And you want to see if it's
at the end or not at the end.
Then you want to see if
it's not at the beginning.
And you want to see if it's equal to 2.
So that would be the case where you
found the item already in the list.
And if you did, then what we do is
you take the item that's in the list
and you override it with
the incoming argument t.
So that's the personality
phase in which you override
because you've found t already in there.
But if you didn't find
it, then you just insert.
So remember, the two
personalities are exhibited here.
In this one, we always insert.
That's the multimodal.
And in this one we look
for it and if we find it,
we override overlying the insert.
Now, let's talk about how
much time these things take.
Notice that insert, and by the way,
to remove an operation, includes,
lower bound, upper bound
all use sequential search
to find a place to examine.
So, as you know, sequential search
is an O of n or theta of n operation.
The average sequential
search time is theta
of n. It's actually n over 2, right?
But we call that theta of n. And so that
means all of these operations: insert,
remove and close, lower
bound, upper bound,
all have average case
runtime [inaudible].
Can we improve on that?
And let me just point out
exactly where we can improve.
And that is in our search.
We can improve the quality of
a search if we have a vector
and that vector is ordered, and we
can call our binary search algorithms.
And this search will become binary
search instead of sequential search,
and therefore will have a
logarithmic runtime as the [inaudible].
Now, the downside is that you
can't file insert a vector.
And in order to put, in order to
wedge something into a vector you have
to leave more space in the vector
and then copy all the elements
from that index and higher up one index
so there's space to put that t in.
So when you do an insert, even
though you say on the search for it,
you're still going to pay a theta of
a n price when you actually make room
for that item in the vector because you
have to do what I call leapfrog copy
of all the existing elements
one index higher.
You don't have to do that in the list.
And similarly remove from a
vector [inaudible] everything has
to stay contiguous.
To remove something you have to
do what I call a leapfrog copy
down from all the elements to the right
to get them back down
into the [inaudible].
But, if you're only doing a
search, so if you're only talking
about includes, that's just a search.
And the same for the lower bound
and the same for upper bound.
So your search methods
can become logarithmic
by just changing our underlying
platform of a list to a vector.
It's a little more complicated to
implement that, but it's worth it
because you've got a very
efficient implementation of the set
or associative container [inaudible].
Now, the way -- a way to
do this is that they talk
about a multimodal or a vector.
So we'll follow the same
model we did with list.
We have an underlying vector,
an underlying predicate
and a protected [inaudible].
We're going to have a push front
and a push back that are private.
And they have certain special,
a certain special properties.
We're going to leapfrog insert
and leapfrog remove at an iterator
and leapfrog remove an entire of
range that point to the iterators.
And by the way, these are more or
less in your textbook in these forms.
So you can read about a
different, a slightly different way
of looking at it in the textbook.
Now, the public part is the same
as it was for the list [inaudible].
So you've got your standard
insert and your standard remove
and your standard clear plus the
insert with an int and remove
at a single location, that upper
bound, lower bound and includes.
And so these three items
that we're going to be able
to make logarithmic runtimes
for vector for the params
for a list they were linear runtime.
So, the ordered vector iterator
looks just like it did for the list.
You have an underlying --
well, we're going to actually,
instead of building this on
an actual vector we're going
to build it on a [inaudible].
But anyway, so we've got
this pointer of type T.
And that's going to be our iterator.
So, even though this is a vector,
by the way, it's based on a vector.
The API is for a set, not a vector.
So we still have only bidirectional
iterators and they're const.
So we don't have a bracket
operator for the client programmer
to use for this forward set.
So the MO one is just like when
we would base it on a list.
We [inaudible] on to
the multimodal line,
all we have to do is override
the two different ways to insert,
insert by [inaudible] and insert
a value of an [inaudible].
So, the problem is you have
these helper functions.
And I'm not going to go
through this code line by line,
but I'm giving you the idea.
Push front is going to set
an iterator to the beginning
and do a leapfrog insert
at that iterator.
Push back is going to just
push back on the vect.
Those are primary for good reason
because it's pretty clear you can,
if you hand those over to a
client, the client can use those
to destroy the order structure
that your client maintains.
And those, remember, both protect
the elements, protect the methods.
The leapfrog inserter, you've
got an iterator and an element.
And first you do some checking
to see if the iterator's valid
and see if it's the end iterator.
If it is the end iterator, then we're
going to push back this element,
t. Again, notice this is protected.
We couldn't let the client
[inaudible] something like that.
We will use it judiciously and
make sure that when we insert
or pushback this call,
it's the largest --
it's larger than anything
that's already in the set.
So, the leapfrog part comes like this.
What you've got to do is --
so when you're leapfrogging things up
that you store in a top index and copy,
then, one step up from the top.
The top plus one, and then next up
top to the top and so on all the way
down to the place where you
need wedge this new item in.
And that's what this for loop does.
It starts out at the size of the
vector and keeps going until you get
down to your location and
decrements i as we go.
And it's just copying the vector
element i minus 1 to a vector element i.
And then we can put t
in at the location.
So leapfrog remove is similar except
that the leapfrogging
goes down instead of up.
And so you have to store it
in a location and copy things
down 1 [inaudible] all the way
out to the end of the vector
and then get rid of the last i.
Now, leapfrog removing an
entire range of similar.
I'll let you look through that.
So now let's look at the public methods.
So, how do we do a multimodal insert?
Okay. You're going to have an
iterator and you're going to let
that iterator's private value be
the result of calling g upper bound.
That brings back dot begin
and vect dot end and t
and a predicate that you have stored.
So you're going to call your
generic lower bound [inaudible].
And then you're going to
leapfrog insert at that location.
So you first use the upper
bound search algorithm to find
where to insert the item to maintain
order and then you put it there.
Notice that g upper bound always finds
one place past the last thing that's
equal to t. So what you're ending
up doing is inserting t at the end
of all the other elements equal
t that are in there already.
The unimodal version, again,
is a little more complicated.
You first search at a lower
bound and then you have
to decide, well, did I find t?
And if you did find t, then you copy t
on top of the place where you found it
without increasing the
size of the vector.
If you didn't find it, then you do
a leapfrog insert at that position.
So the analysis of that is
includes all these things
that use binary search
and so sequential search.
The insert and remove, they
also call the leapfrog method,
which is an [inaudible] method.
So, that's not so good.
But the average, of course,
average space binary search one time is
log n. The worst case binary search time
is log n. Average leapfrog
time is n, right?
And the worst case, O of n. So
the conclusion is that insert,
remove at average case runtime theta of
n just like when we built it on a list.
Well, the good news is that the
ones that don't do any leapfrogging,
which is includes, lower
bound and upper bound,
only call the binary search algorithm.
So they have worst case
runtime logarithmic of n. So,
right here's what we got so far.
We've got average case, worst case
runtimes and we've got an O list,
implementation of set, if you like.
And an O vector implementation of set.
Insert, they're the same.
They're theta of n in the average
case, log n in the worst case.
Remove is theta n in average
case and log n in the worst case.
The includes, for in the list
theta of n in the average case
and O of n in the worst case.
The vector is theta of
n in both cases --
I'm sorry, theta of log n in both cases.
So for your search without morphing
the set, there are three ways to do it:
includes, lower bound and upper bound.
And those things have
vastly improved run times.
They're going to have
a log n instead of n.
And once you've done your
homework 2 assignment,
you will have a definite appreciation
for the improvement log
n runtime [inaudible] n,
especially if you enter a number a
little bit too big and have to wait
for 15 minutes for your -- when you
run your algorithms [inaudible].
So, this is really good.
So we could significantly
improve the search time.
And for an ordered associative
container,
that would be the best we
will ever be able to do.
So, an ordered associative container
we will never, in this class,
be able to have a better search time
that theta of log n. The bad news is
that our insert and remove
times will remain linear.
So, this is an [inaudible] ordered
vector is an excellent choice
in real applications in situations
where you do very few inserts
and lots and lots of searches.
So it's an excellent low foot,
small footprint, really fast way
to [inaudible] a set, either
a multiset or a uniset.
And that's the [inaudible].
And there are a lot of examples where
that's the kind of thing you need.
And one is something that I
would call a password server.
So, the password server, most of
the time, sits there and looks
up past failure passwords,
which is just a search.
That would be because
of the includes method.
And occasionally, and
very, very occasionally,
a user gets added to
a system, maybe what?
Once a day, at most normally,
on average.
And probably even less often a
user gets removed from a system.
But, many, many times
a day the users log in.
And so it's really important for the
search to be fast, but the insert
and remove no so important.
And in any case, when you insert a new
user, you've got to also modify files.
When you remove a user, you've
got to also modify files.
And so, because of that [inaudible]
remove really is far exceeded
by the cost of the file
access than the mere O
of n you have in your leapfrog copying.
So, those are kind of irrelevant.
But for looking up passwords
it's super-fast.
And this would be actually be
technology of choice for such a system,
even though we're going to
build more elaborate versions
of this as [inaudible].