>> Okay let's talk a little bit about
Homework 2, it's going to relate to all of these things here and back
to the algorithms chapter. I really enjoyed putting this together,
I hope you enjoy working with it, so it's a new...it's been sort of coming
to life over the last several weeks. I finally put it together this past
weekend, there's still a draft notice up here but it's mostly done, the stuff
that's missing has to do with the nature of a report that I'm
going to ask you to write. Of course that comes way
after you've done the coding so let's just talk a
little bit about it. Your subtitle is, Exploration of Algorithm Runtime Using a
Trojan Horse Comparison Operator. So at the end of this I'd like for
you to be able to find function class, function object, predicate request,
predicate object, generic algorithm, design and implement function
of predicate request templates, design and implement generic
algorithms, use function objects and predicate objects in client
programs and then comes the spy port. We're going to measure the number
of calls made to an atomic operation and use these measured
counts of atomic operations to empirically corroborate known
theoretical runtimes of algorithms and discuss the advantages
and disadvantages of known various implementations
of algorithms. So those are the kinds of things I
would like you to take away from this. So operational objectives you need to
create a predicate class and it's going to be called less than spy and it's
going to be in the file compare_spy.h and a generic algorithm
called g_lower bound just like g_lower bound we just talked
about in the previous chapter. It's going to be in a different
name space though called seq, I just made that up for sequential
and you're going to put it in the file gssearch.h. If you
replace that first s with a b that's where your generic binary
search already is in the file. It's deliverable to three
files compare_spy.h, gssearch.h and report.txt. Procedurally copy all the files out of
hw2 directory in the library as usual. What you expect to get out of
that is some client programs, one is called sort_spy and
one's called search_spy. Sort_spy is a client of your less
then spy class, search_spy is going to be a client of your less than spy
class and your seq search algorithms. [Inaudible section] I wonder if I stated
this correctly...yeah you've got a lower bound and an upper bound too. Okay and then you've got helper up here
called ranuint, you just compile that and it can generate files at random
[inaudible] and there's an h_sort.cpp which is a sorter if you have
a bunch of numbers in a file and you want them sorted you just use
this with redirection, h_sort.dotx. Left arrow file 1, right arrow
file 2 [inaudible] file 1 sort and file 2 sort [inaudible]. And of course there's submit script. So you got to create the files,
compare spy and gssearch.h, test them, make sure they work correctly. Use the supplied spy
clients and if you chose to make modifications that will be fine. Generate some runtime data
and write a little report about the discoveries you
make with that runtime data. So code requirements, for less then spy. It's going to be a predicate class
template, its object is going to maintain a count of the number of
times operator paren paren is called since the object was created or
the last time reset was called. And the API for less
than spy looks like this, it's got your operator
paren paren defines a bool, that makes it a predicate request. Takes T1 and T2 and it returns true
if and only if T1 is less than T2. So it's behavior in that
perspective is exactly the same as the less than function object. But it has a method called reset,
sets the internal counter to zero. It's got a method called
count which is const which just returns the
value of your counter. The default constructor that it has
also sets the internal counter to zero so the internal counter it's
initialized to zero on start up and can be reinitialized
to zero by calling reset. So this is going to be a very
interesting little object to have because you can drop it in where
we use the less than object in a generic algorithm, you can
drop this in and the functionality of the generic algorithm
will be exactly the same. But this little Trojan horse
gets dropped in and collects data on how many times that algorithm
calls the less than operator. So that's why it's kind of a
Trojan horse because it's going in and behaves just like
a less than operator. It's a spy because it's
returning information about the internal workings
of that generic algorithm. [ Background noise ] Code accounts for sequential
lower bound and upper bound, they're both generic algorithms,
both operate on forward iterators, much weaker class of iterators
than random access iterators. A precondition for successful
operation on them is that they range to which they are applied is sorted
using the same predicate as used in the algorithm call, g lower
bound returns the lower bound of t in the range. The lower bound is defined
exactly the same way as it is for generic binary search, lower bound. Upper bound returns the
upper bound index of t, basically that is the iterator
pointing to the place...the first place in the range that's bigger than or
equal to t, that's the lower bound. Upper bound is the first place in
the range that is bigger than t, so you remember if t is not in there
then equal isn't going to happen and so lower bound and upper bound
return the same way, namely the pointer to place, the first place is
bigger than [inaudible] range. Now that could be anywhere from
medium range to also end of range because t could be bigger
than everything in [inaudible] so it could return the end but it always
will return something for iterator into the range or to
the end of the range. Now we're in the namespace seq
and both of these things are going to be in the file gssearch. You're going to write a report about
your findings with this but we'll talk about that report later,
actually I think it will be pretty self explanatory. Here's some hints, all these things
can be compiled with one command, you don't even need a makefile, if
you want welcome to make a makefile but I just compile them
in my c4530 macro. If you guys haven't gotten your c4530
macro up and running you can copy it out of examples/scripts, put it in your
.vn there is a cl4530 in there as well. Put it in your .vn and change it to
executable and call rehash or log out and log back in and it will be available
for you everywhere in your requirement. Second hint, some models, you've got
compare that's obviously a pretty good model to start with for
compare_spy, you've got gbsearch which is a good model to start with
looking how to create gssearch. The code implementation
is going to be different but the prototypes can
look quite similar. One thing you want to be sure of is
that you can use compare and compare_spy in the same program which means that
your protection against multiple reads for compare_spy can't be the
same as it is for compare. Don't forget your best practices
particularly when you're talking about constructors for classes. Here's a hint about the naming, this
g_lower bound and g_upper bound are used in at least three namespaces
in this class, in this course, they all have the same name and all accomplish the same thing
really but in different ways. So in namespace fsu is implemented
iteratively using the binary search idea just like in our lecture groups and just
like in the standard template library. These require random access iterators,
so that's in your file gbsearch.h. In namespace alt they're
implemented recursively as divide and conquer algorithms. They still require random
access iterators, these are also in your library
implemented in rbsearch.h. And finally in namespace seq that's for the ones
you're building they're implemented iteratively but they use
sequential search and operate with the less restrictive
forward iterator assumptions. These in your file gssearch.h.
Now sort_spy and search_spy that's source code that
you're being given, I'm still tinkering with a little bit so be sure to get
fresh copies whenever you restart and of course as usual you have
the executable file [inaudible]. So the next thing I will do
is log back in linprog here. [ Typing ] There's a couple of sticky keys in this
machine so let's see I can do c4530, sort_spy, I can save that
[inaudible] you guys can't use it yet, I've got to distribute gsort.h. Or take it out...so let's see we'll
just...I need to do [inaudible]. [ Background keystrokes ] Search so I guess both of them, so
[inaudible] will work, so sort_spy, of course sorting you know you go to
a file and stuff that needs sorting, then you sort it right, so it's going
to require input file and output file so that means I will need to do,
make myself some files of data so I just compile ranuint.x, it
wants some files, some information so ranuint.x I'm going to put
n.100 that will be the name of the file upper bound will be 1000
and the number of items will be 100. So that's going to give me a number
between 0 and 1000...probably 999 and it will give me 100 of them,
I'll put them in the file n.100. I'm going to go ahead and up the
ante on all this, get me 1000 numbers and maybe 10,000 numbers
and maybe 100,000 numbers. So now I can run sort_spy and it
will remind me of what it wants, it wants an input file and
an output file so...now because this is sort_spy
it's not actually going to write the sort the data
[inaudible] if you want to sort the data just use g_sort. But what it's going to write to the file
is the measurements that it's making. So let's say...what was my first
one, n.100, 100 numbers oh, it wants an output file too. It basically wrote this same
little [inaudible] to output, so what we see is there is that it has
passed in, it's used your comp count, it's used your spy, your
last end spy object to pass to these generic algorithms and deduced
from that how many pairs are called up. So here's your selection_sort
which called on 1000 objects, 5050
calls to less than. Insertion_sort, 2579, heap_sort
1034, merge_sort 582 and list_sort which is the implementation
of merge_sort list 558. These are just calculated for
your visual convenience, that's n, that's the main number put in and that's
n log n. This is just a calculation, n log n is 664, noticed for example
merge_sort and list_sort come in under that part, n times n plus 1 over
2 that's your n of n squared. I use n of n plus 1 over 2 instead
of n squared because it just happens to be exactly how many times comparison
operator gets called by selection_sort. No matter what, so we could up
the ante here and go for 1000 and you'll see...you
begin to notice sometimes for example insertion_sort is still
taking a whopping amount of time but only about half as
much as selection_sort. I'm using time in a little bit of an elevated sense it's
not actual clock time. But it's time in the sense of we've got
an atomic operation, imagine the number of times, the lowest details of
the algorithm execute and think of that as sort of a major plan. So insertion_sort on this
round of [inaudible] runs about twice as fast as selection_sort. But when you get down
into like merge_sort, merge_sort in two different
forms and it's looking real good, it's on the order of being n
log n instead of n squared. So crank it up and get [inaudible]
so we crank this up some more, I believe this was the 100,000
mark, this may take a little while because you know whenever you multiply
the input by 10 if you have an o of n squared algorithm, if your
input size goes up by a factor of 10, your runtime is going to go up by a
factor of 100 which is 10 squared. So if it took one second for a lot of
say insertion sort to sort those data or 10,000 or it's going to take
l00 seconds to sort for 100,000. If the size went up by 10,
the time goes up by 100. So this is a pretty good lesson in
why you want efficient algorithms, so we're still...well we
finished the selection_sorts so that's clearly the
most time consuming. The insertion_sort is only half again
as fast but then the rest of them just to come out of there in no time. So this beginning to be a rather
dramatic difference so again notice that merge_sort is coming in at
something under n log n here, heap_sort is still in that ballpark
just by looking at the number of digits, by looking at the number of digits in
insertion_sort it's 1000 times more. So okay, onward, let's
check out search_spy. Search_spy you can put...you can enter
a question mark, you have to put it in single quotes otherwise UNIX
intercepts it and thinks you're trying to go to UNIX [inaudible] but if you put
it in single quotes it will just regard that question mark as an actual command
line argument for search_spy if you put that in it tells you not operating
program, possible command on arguments. Of course question mark gives you
the [inaudible] your first command on argument may be a command file in which case everything
is processed in batch mode. Your second command argument
maybe put [inaudible] there, the truth is it can be any kind, if it has any second command
line argument it will operate in batch mode silently. When you have very, very large amounts
of data that's what you're going to get. But let me see what we got here, we're
in the uint32, I need to change this to [ typing ] notice how...I'm going
to change types here. I'm going to uncover
[inaudible] char cover up uint32. That's good for testing the
functionality of your algorithms and for kind of display purposes. But because there's only 26 letters,
you can't really do massive searches in the char so we switch the un
to do massive amounts of data but to get...so now we have type char. So I'm just going to run this thing you
know right to mode here and it's going to look similar to like something
we were looking at before so we'll enter some characters,
first I'll put in an assignment and you know it doesn't
ask you to enter them in alphabetic order,
it stores them for you. So what we've got here is a deck
and a list, we're using a list because we're going to
do searches in the deck, we have random access iterators and so
we can call the binary search versions of lower bound and upper bound. But in the list we have to call
the sequential search versions which you have to write, that's
why those are both there. So now what we want to
look for let's look for e, okay so there's the lower
bound for e in the deck and the upper bound for e in the deck. There's alt, remember alt is
the recursive implementation, that's also in your library,
notice it is of course coming up [inaudible] managers
as you would hope it would and finally the lower bound and upper
bound in seq, the sequential namespace and we're doing this search in
a list just to prove the point that these algorithms only [inaudible]
forward iterators the don't [inaudible]. But of course they come out
with the same, the same thing. So let's look for a, they should all
come up with [inaudible] if you look for v, v's in there so lower bound
will be the first occurrence of v and the upper bound will
be one past the last because v is the biggest item in there. If we look for z everything is
pointing to the last element and then we can do some more
searches, I want to do a few more here. I don't really care about it now,
looking at it too much but I'm going to get a dot here and then
we've got a little find report and the find report tells us the minimum
comps, the average number of comps and the maximum comps for each call
to the search algorithms, okay. These are the search algorithms
that got called lower bound, upper bound in an fsu namespace,
lower bound and upper bound in the alt namespace, all four
of those are in your library. Then lower bound and upper
bound in the seq namespace those in your supply and your deliverables. Then it computes some interesting
numbers but it's the size of the search space...that's the size
of the search space, this is the log of the size of the search space
and the ceiling of the log the size of the search space, that's of
course [inaudible] algorithm. So notice that in the lower bound
and upper bound the minimum number of comps is 4 and the maximum number
is 5 and all this is an average in between those, 5 being the
ceiling and 4 being the floor of the [inaudible] but of size. So what that tells you, that kind of
reminds you of a point we made early on namely that those binary
search implemented lower and upper bound run to
completion every time. They go all the way to the bottom of
search every time, there's no test for early [inaudible] of those things
and there's a good reason for that and we talked about it further. Sequential search of course sometimes
you find the item at the beginning so there's only one comparison,
sometimes you find it at the end or you don't find it in which case
you've got to look at everybody. So there's 20 comparisons, the
average is about half way between. So there's a much bigger range of
how many comparisons you've made in the one sequential search
implemented [inaudible] algorithm. Then there is in the planner
research [inaudible] so I want to just...I've made up...pre-made
some data files I need to go back and compile this [inaudible]. Get into some big [inaudible] spaces [typing] [inaudible section] both
of these run, copy, [typing] search_spy.x2. Search space number.x and
if we recompile it again for the characters I'll make a
copy of that so I can do, okay 109. Okay, so there are some, we made some
compounds here it's a little more complicated than just reading a file,
executing on the file like for sorts, you just read the end source,
that's the end but here what we got to do is first read in the search
space and then call a bunch of search, individual search commands
and I've done some of that in at i1.com and i2.com and i3.com. I for integer I guess. I'll show you what i1.com looks like,
so the [inaudible] will be zero, this will be the search space,
okay so it will read all those in [inaudible] you hit send a
second time it goes into search mode and then you call search for those
lines you hit it the last time and the program quits executing. So we will do i1.com so this is slightly
verbose it's not actually showing the picture of the search because
integers don't have constant width so there really isn't a good way
to paint a picture of that action but it does give you...so
that was omitted. But it does give you the data here
like before and it's very similar to what we did so now...i2
though, i2 is a much bigger, you want this to be silent you
don't want to see all of that stuff so when I said silent it just tells
me when it's loaded to search the base and then it goes through
some searches, right. So that's the size of the search base, this is the ceiling of
the log of the size 14. Search base is 10,000, notice that the
lower bound and upper bound for both fsu and the alt namespaces which require
random acts [inaudible] that's that algorithm we talked about
the very first week of class goes between the ceiling and the floor
of a log of size number of comps in every case, the smallest
is 13, the maximum is 14. So it kind of leans more
toward the 13 [inaudible]. The sequential lower bound where you're
doing sequential search is 10,000 items here, the average if 5000
comps, the smallest is 1 and the largest is nearly 10,000. So notice though for a search space
the size of 10,000 the difference between taking log end time and taking
end time is beginning to be huge. Okay I made up an even bigger one here but you can put these
together yourself quite easily with those random number generators. Okay so the search base is loaded
now we're doing a bunch of searches, quite a few and these searches,
sequential searches are going to take quite a bit of time when we're
all done and give us a report on that. So while we're waiting here this
is drumming into us the importance of using an efficient algorithm
when you can, of course if you need to search a list you have
to use sequential search. We are fortunate that in the list there
are efficient sort algorithms namely [inaudible] but searching is sequential. So again the size is here is 100,000 is
the size of the search base the floor of the log is 16, ceiling is 17. Between 16 and 17 is how
many comps required by each of the binary search computations,
half of 100,000 is the average cost of sequential searches for a minimum
of 3 and maximum of nearly 10,000. So I've kind of shown you how
the test [inaudible] were, given you I think a little bit of at
least an informal flavor of the kinds of things you should be concluding when
you do your experiments and putting in your report, of course to make all
this work you've got to write less than spy and you've got to write
t lower bond and t upper bound in the seq namespace...the
sequential namespace. I've talked at length about how
sequential search works and [inaudible] so I think you'll enjoy that experience. After that you will enjoy playing
with these programs that help you to analyze algorithms so that's the
end of what I have to say about that. [ Background noise ]