>> Okay let's talk a little bit about
Homework 2, it's going to relate to all
of these things here and back
to the algorithms chapter.
I really enjoyed putting this together,
I hope you enjoy working with it,
so it's a new...it's been sort of coming
to life over the last several weeks.
I finally put it together this past
weekend, there's still a draft notice
up here but it's mostly done, the stuff
that's missing has to do with the nature
of a report that I'm
going to ask you to write.
Of course that comes way
after you've done the coding
so let's just talk a
little bit about it.
Your subtitle is, Exploration
of Algorithm Runtime Using a
Trojan Horse Comparison Operator.
So at the end of this I'd like for
you to be able to find function class,
function object, predicate request,
predicate object, generic algorithm,
design and implement function
of predicate request templates,
design and implement generic
algorithms, use function objects
and predicate objects in client
programs and then comes the spy port.
We're going to measure the number
of calls made to an atomic operation
and use these measured
counts of atomic operations
to empirically corroborate known
theoretical runtimes of algorithms
and discuss the advantages
and disadvantages
of known various implementations
of algorithms.
So those are the kinds of things I
would like you to take away from this.
So operational objectives you need to
create a predicate class and it's going
to be called less than spy and it's
going to be in the file compare_spy.h
and a generic algorithm
called g_lower bound just
like g_lower bound we just talked
about in the previous chapter.
It's going to be in a different
name space though called seq,
I just made that up for sequential
and you're going to put it
in the file gssearch.h. If you
replace that first s with a b that's
where your generic binary
search already is in the file.
It's deliverable to three
files compare_spy.h,
gssearch.h and report.txt.
Procedurally copy all the files out of
hw2 directory in the library as usual.
What you expect to get out of
that is some client programs,
one is called sort_spy and
one's called search_spy.
Sort_spy is a client of your less
then spy class, search_spy is going
to be a client of your less than spy
class and your seq search algorithms.
[Inaudible section] I wonder if I stated
this correctly...yeah you've got a lower
bound and an upper bound too.
Okay and then you've got helper up here
called ranuint, you just compile that
and it can generate files at random
[inaudible] and there's an h_sort.cpp
which is a sorter if you have
a bunch of numbers in a file
and you want them sorted you just use
this with redirection, h_sort.dotx.
Left arrow file 1, right arrow
file 2 [inaudible] file 1 sort
and file 2 sort [inaudible].
And of course there's submit script.
So you got to create the files,
compare spy and gssearch.h, test them,
make sure they work correctly.
Use the supplied spy
clients and if you chose
to make modifications that will be fine.
Generate some runtime data
and write a little report
about the discoveries you
make with that runtime data.
So code requirements, for less then spy.
It's going to be a predicate class
template, its object is going
to maintain a count of the number of
times operator paren paren is called
since the object was created or
the last time reset was called.
And the API for less
than spy looks like this,
it's got your operator
paren paren defines a bool,
that makes it a predicate request.
Takes T1 and T2 and it returns true
if and only if T1 is less than T2.
So it's behavior in that
perspective is exactly the same
as the less than function object.
But it has a method called reset,
sets the internal counter to zero.
It's got a method called
count which is const
which just returns the
value of your counter.
The default constructor that it has
also sets the internal counter to zero
so the internal counter it's
initialized to zero on start up
and can be reinitialized
to zero by calling reset.
So this is going to be a very
interesting little object to have
because you can drop it in where
we use the less than object
in a generic algorithm, you can
drop this in and the functionality
of the generic algorithm
will be exactly the same.
But this little Trojan horse
gets dropped in and collects data
on how many times that algorithm
calls the less than operator.
So that's why it's kind of a
Trojan horse because it's going in
and behaves just like
a less than operator.
It's a spy because it's
returning information
about the internal workings
of that generic algorithm.
[ Background noise ]
Code accounts for sequential
lower bound and upper bound,
they're both generic algorithms,
both operate on forward iterators,
much weaker class of iterators
than random access iterators.
A precondition for successful
operation on them is that they range
to which they are applied is sorted
using the same predicate as used
in the algorithm call, g lower
bound returns the lower bound
of t in the range.
The lower bound is defined
exactly the same way as it is
for generic binary search, lower bound.
Upper bound returns the
upper bound index of t,
basically that is the iterator
pointing to the place...the first place
in the range that's bigger than or
equal to t, that's the lower bound.
Upper bound is the first place in
the range that is bigger than t,
so you remember if t is not in there
then equal isn't going to happen
and so lower bound and upper bound
return the same way, namely the pointer
to place, the first place is
bigger than [inaudible] range.
Now that could be anywhere from
medium range to also end of range
because t could be bigger
than everything in [inaudible]
so it could return the end but it always
will return something for iterator
into the range or to
the end of the range.
Now we're in the namespace seq
and both of these things are going
to be in the file gssearch.
You're going to write a report about
your findings with this but we'll talk
about that report later,
actually I think it will be pretty
self explanatory.
Here's some hints, all these things
can be compiled with one command,
you don't even need a makefile, if
you want welcome to make a makefile
but I just compile them
in my c4530 macro.
If you guys haven't gotten your c4530
macro up and running you can copy it
out of examples/scripts, put it in your
.vn there is a cl4530 in there as well.
Put it in your .vn and change it to
executable and call rehash or log out
and log back in and it will be available
for you everywhere in your requirement.
Second hint, some models, you've got
compare that's obviously a pretty good
model to start with for
compare_spy, you've got gbsearch
which is a good model to start with
looking how to create gssearch.
The code implementation
is going to be different
but the prototypes can
look quite similar.
One thing you want to be sure of is
that you can use compare and compare_spy
in the same program which means that
your protection against multiple reads
for compare_spy can't be the
same as it is for compare.
Don't forget your best practices
particularly when you're talking
about constructors for classes.
Here's a hint about the naming, this
g_lower bound and g_upper bound are used
in at least three namespaces
in this class, in this course,
they all have the same name
and all accomplish the same thing
really but in different ways.
So in namespace fsu is implemented
iteratively using the binary search idea
just like in our lecture groups and just
like in the standard template library.
These require random access iterators,
so that's in your file gbsearch.h.
In namespace alt they're
implemented recursively
as divide and conquer algorithms.
They still require random
access iterators,
these are also in your library
implemented in rbsearch.h. And finally
in namespace seq that's for the ones
you're building they're implemented
iteratively but they use
sequential search and operate
with the less restrictive
forward iterator assumptions.
These in your file gssearch.h.
Now sort_spy
and search_spy that's source code that
you're being given, I'm still tinkering
with a little bit so be sure to get
fresh copies whenever you restart
and of course as usual you have
the executable file [inaudible].
So the next thing I will do
is log back in linprog here.
[ Typing ]
There's a couple of sticky keys in this
machine so let's see I can do c4530,
sort_spy, I can save that
[inaudible] you guys can't use it yet,
I've got to distribute gsort.h.
Or take it out...so let's see we'll
just...I need to do [inaudible].
[ Background keystrokes ]
Search so I guess both of them, so
[inaudible] will work, so sort_spy,
of course sorting you know you go to
a file and stuff that needs sorting,
then you sort it right, so it's going
to require input file and output file
so that means I will need to do,
make myself some files of data
so I just compile ranuint.x, it
wants some files, some information
so ranuint.x I'm going to put
n.100 that will be the name
of the file upper bound will be 1000
and the number of items will be 100.
So that's going to give me a number
between 0 and 1000...probably 999
and it will give me 100 of them,
I'll put them in the file n.100.
I'm going to go ahead and up the
ante on all this, get me 1000 numbers
and maybe 10,000 numbers
and maybe 100,000 numbers.
So now I can run sort_spy and it
will remind me of what it wants,
it wants an input file and
an output file so...now
because this is sort_spy
it's not actually going
to write the sort the data
[inaudible] if you want
to sort the data just use g_sort.
But what it's going to write to the file
is the measurements that it's making.
So let's say...what was my first
one, n.100, 100 numbers oh,
it wants an output file too.
It basically wrote this same
little [inaudible] to output,
so what we see is there is that it has
passed in, it's used your comp count,
it's used your spy, your
last end spy object to pass
to these generic algorithms and deduced
from that how many pairs are called up.
So here's your selection_sort
which called
on 1000 objects, 5050
calls to less than.
Insertion_sort, 2579, heap_sort
1034, merge_sort 582 and list_sort
which is the implementation
of merge_sort list 558.
These are just calculated for
your visual convenience, that's n,
that's the main number put in and that's
n log n. This is just a calculation,
n log n is 664, noticed for example
merge_sort and list_sort come in under
that part, n times n plus 1 over
2 that's your n of n squared.
I use n of n plus 1 over 2 instead
of n squared because it just happens
to be exactly how many times comparison
operator gets called by selection_sort.
No matter what, so we could up
the ante here and go for 1000
and you'll see...you
begin to notice sometimes
for example insertion_sort is still
taking a whopping amount of time
but only about half as
much as selection_sort.
I'm using time in a little bit
of an elevated sense it's
not actual clock time.
But it's time in the sense of we've got
an atomic operation, imagine the number
of times, the lowest details of
the algorithm execute and think
of that as sort of a major plan.
So insertion_sort on this
round of [inaudible] runs
about twice as fast as selection_sort.
But when you get down
into like merge_sort,
merge_sort in two different
forms and it's looking real good,
it's on the order of being n
log n instead of n squared.
So crank it up and get [inaudible]
so we crank this up some more,
I believe this was the 100,000
mark, this may take a little while
because you know whenever you multiply
the input by 10 if you have an o
of n squared algorithm, if your
input size goes up by a factor of 10,
your runtime is going to go up by a
factor of 100 which is 10 squared.
So if it took one second for a lot of
say insertion sort to sort those data
or 10,000 or it's going to take
l00 seconds to sort for 100,000.
If the size went up by 10,
the time goes up by 100.
So this is a pretty good lesson in
why you want efficient algorithms,
so we're still...well we
finished the selection_sorts
so that's clearly the
most time consuming.
The insertion_sort is only half again
as fast but then the rest of them just
to come out of there in no time.
So this beginning to be a rather
dramatic difference so again notice
that merge_sort is coming in at
something under n log n here,
heap_sort is still in that ballpark
just by looking at the number of digits,
by looking at the number of digits in
insertion_sort it's 1000 times more.
So okay, onward, let's
check out search_spy.
Search_spy you can put...you can enter
a question mark, you have to put it
in single quotes otherwise UNIX
intercepts it and thinks you're trying
to go to UNIX [inaudible] but if you put
it in single quotes it will just regard
that question mark as an actual command
line argument for search_spy if you put
that in it tells you not operating
program, possible command on arguments.
Of course question mark gives you
the [inaudible] your first command
on argument may be a command file
in which case everything
is processed in batch mode.
Your second command argument
maybe put [inaudible] there,
the truth is it can be any kind,
if it has any second command
line argument it will operate
in batch mode silently.
When you have very, very large amounts
of data that's what you're going to get.
But let me see what we got here, we're
in the uint32, I need to change this to
[ typing ]
notice how...I'm going
to change types here.
I'm going to uncover
[inaudible] char cover up uint32.
That's good for testing the
functionality of your algorithms
and for kind of display purposes.
But because there's only 26 letters,
you can't really do massive searches
in the char so we switch the un
to do massive amounts of data
but to get...so now we have type char.
So I'm just going to run this thing you
know right to mode here and it's going
to look similar to like something
we were looking at before
so we'll enter some characters,
first I'll put in an assignment
and you know it doesn't
ask you to enter them
in alphabetic order,
it stores them for you.
So what we've got here is a deck
and a list, we're using a list
because we're going to
do searches in the deck,
we have random access iterators and so
we can call the binary search versions
of lower bound and upper bound.
But in the list we have to call
the sequential search versions
which you have to write, that's
why those are both there.
So now what we want to
look for let's look for e,
okay so there's the lower
bound for e in the deck
and the upper bound for e in the deck.
There's alt, remember alt is
the recursive implementation,
that's also in your library,
notice it is of course coming
up [inaudible] managers
as you would hope it would
and finally the lower bound and upper
bound in seq, the sequential namespace
and we're doing this search in
a list just to prove the point
that these algorithms only [inaudible]
forward iterators the don't [inaudible].
But of course they come out
with the same, the same thing.
So let's look for a, they should all
come up with [inaudible] if you look
for v, v's in there so lower bound
will be the first occurrence of v
and the upper bound will
be one past the last
because v is the biggest item in there.
If we look for z everything is
pointing to the last element
and then we can do some more
searches, I want to do a few more here.
I don't really care about it now,
looking at it too much but I'm going
to get a dot here and then
we've got a little find report
and the find report tells us the minimum
comps, the average number of comps
and the maximum comps for each call
to the search algorithms, okay.
These are the search algorithms
that got called lower bound,
upper bound in an fsu namespace,
lower bound and upper bound
in the alt namespace, all four
of those are in your library.
Then lower bound and upper
bound in the seq namespace those
in your supply and your deliverables.
Then it computes some interesting
numbers but it's the size
of the search space...that's the size
of the search space, this is the log
of the size of the search space
and the ceiling of the log the size
of the search space, that's of
course [inaudible] algorithm.
So notice that in the lower bound
and upper bound the minimum number
of comps is 4 and the maximum number
is 5 and all this is an average
in between those, 5 being the
ceiling and 4 being the floor
of the [inaudible] but of size.
So what that tells you, that kind of
reminds you of a point we made early
on namely that those binary
search implemented lower
and upper bound run to
completion every time.
They go all the way to the bottom of
search every time, there's no test
for early [inaudible] of those things
and there's a good reason for that
and we talked about it further.
Sequential search of course sometimes
you find the item at the beginning
so there's only one comparison,
sometimes you find it at the end
or you don't find it in which case
you've got to look at everybody.
So there's 20 comparisons, the
average is about half way between.
So there's a much bigger range of
how many comparisons you've made
in the one sequential search
implemented [inaudible] algorithm.
Then there is in the planner
research [inaudible] so I want
to just...I've made up...pre-made
some data files I need to go back
and compile this [inaudible].
Get into some big [inaudible] spaces
[typing]
[inaudible section] both
of these run, copy,
[typing]
search_spy.x2.
Search space number.x and
if we recompile it again
for the characters I'll make a
copy of that so I can do, okay 109.
Okay, so there are some, we made some
compounds here it's a little more
complicated than just reading a file,
executing on the file like for sorts,
you just read the end source,
that's the end but here what we got
to do is first read in the search
space and then call a bunch of search,
individual search commands
and I've done some of that
in at i1.com and i2.com and i3.com.
I for integer I guess.
I'll show you what i1.com looks like,
so the [inaudible] will be zero,
this will be the search space,
okay so it will read all those
in [inaudible] you hit send a
second time it goes into search mode
and then you call search for those
lines you hit it the last time
and the program quits executing.
So we will do i1.com so this is slightly
verbose it's not actually showing the
picture of the search because
integers don't have constant width
so there really isn't a good way
to paint a picture of that action
but it does give you...so
that was omitted.
But it does give you the data here
like before and it's very similar
to what we did so now...i2
though, i2 is a much bigger,
you want this to be silent you
don't want to see all of that stuff
so when I said silent it just tells
me when it's loaded to search the base
and then it goes through
some searches, right.
So that's the size of the search base,
this is the ceiling of
the log of the size 14.
Search base is 10,000, notice that the
lower bound and upper bound for both fsu
and the alt namespaces which require
random acts [inaudible] that's
that algorithm we talked about
the very first week of class goes
between the ceiling and the floor
of a log of size number of comps
in every case, the smallest
is 13, the maximum is 14.
So it kind of leans more
toward the 13 [inaudible].
The sequential lower bound where you're
doing sequential search is 10,000 items
here, the average if 5000
comps, the smallest is 1
and the largest is nearly 10,000.
So notice though for a search space
the size of 10,000 the difference
between taking log end time and taking
end time is beginning to be huge.
Okay I made up an even bigger one here
but you can put these
together yourself quite easily
with those random number generators.
Okay so the search base is loaded
now we're doing a bunch of searches,
quite a few and these searches,
sequential searches are going
to take quite a bit of time when we're
all done and give us a report on that.
So while we're waiting here this
is drumming into us the importance
of using an efficient algorithm
when you can, of course if you need
to search a list you have
to use sequential search.
We are fortunate that in the list there
are efficient sort algorithms namely
[inaudible] but searching is sequential.
So again the size is here is 100,000 is
the size of the search base the floor
of the log is 16, ceiling is 17.
Between 16 and 17 is how
many comps required by each
of the binary search computations,
half of 100,000 is the average cost
of sequential searches for a minimum
of 3 and maximum of nearly 10,000.
So I've kind of shown you how
the test [inaudible] were,
given you I think a little bit of at
least an informal flavor of the kinds
of things you should be concluding when
you do your experiments and putting
in your report, of course to make all
this work you've got to write less
than spy and you've got to write
t lower bond and t upper bound
in the seq namespace...the
sequential namespace.
I've talked at length about how
sequential search works and [inaudible]
so I think you'll enjoy that experience.
After that you will enjoy playing
with these programs that help you
to analyze algorithms so that's the
end of what I have to say about that.
[ Background noise ]