>> Alright, so this is an attempt
to today we're going to talk about C strings and dynamic
allocation memory and it's going to be a particularly
thorny topic for you when defining classes and
let's get right to it. So, if you look in the course
organizer this is chapter 7 on C strings and string, I'm sorry, chapter 5 on CC
plus plus pointers and so what we want to talk about is pointers,
this notion of binding time, arrays and the bracket
operator, and pointer arithmetic, so and I thinking this is
somewhat of a review of a topic that you may not have fully
grasped the first time you saw it in a previous class, but we have the
whole the concept of a pointer is that we have static and dynamic values
so let's look at some code here. Suppose we declare an int variable
n and we all know what that means, n is a box in memory that's entitled
to contain or store an integer value. Now if I on the next line down
declare a pointer to type int that would be int star followed by the
name you choose to give it and I'm going to choose to give it n
pointer, hopefully to remind me that that's a pointer to type
integer; so those are two declarations. Now, what do they really mean? Well, let's look at our picture
n represents a box in memory that can store an integer value, what
n pointer does is represent a box in memory that can store
an address of an integer, so n pointer would have an address in
it and that address points into memory and then memory with that address
you're allowed to store an integer. [ Silence ] So, back up to our declarations
when I declare int star n pointer that declares this box the
one I'm pointing to here, but it does not declare this box,
so these are ordinary variables both of those two things, n and n pointer
they're both ordinary declare variables with scope that's determined
by the usual rules of scoping, so if this was in a function it's
actually a scope of that function. [ Pause ] Now how do I get that space for
n pointer to point to as we say? You do that with a call to operator
new like this, so n pointer is new int, new is an operator, int is a type, and
what that is asking for is an address that can be assigned to n
pointer which will be reserved for n pointer to store an integer into. And that is, that request results in
such a relationship between n pointer or address and memory address, memory
that it points to and is allowed to use. [ Pause ] So, to dereference a pointer, I'm
going to go back to the previous slide, dereferencing a pointer means
apply the, sorry about that, apply the star operator to the
pointer and what that does is take it from the pointer value to what it points
to so the dereference in this picture or the dereference of int pointer
would be 3; it points to 3. So let's look at some more code; int
star n pointer declares an n pointer; n pointer equals new int assigns
an int address to n pointer just like we talked about before. If I dereference n pointer that takes me
to the storage box that I just allocated and I can assign 3 into that box,
so I can do star n pointer equals 3 and notice that I could not
do int pointer equals 3. That wouldn't make sense
because 3 is not an address. You can output star int
pointer and you would get 3 and now you can output n pointer
and you would get it's address, not that it would be
particularly meaningful, but it would output the address
that that pointer currently owns and it would output it
in hexadecimal notation, so on a 64-bit machine you would see 16
hexadecimal digits with that address. [ Pause ] Now the reason I waited
to talk about pointers until after I'd introduced
classes is that you can do it with user defined types just as easily
as you can with ordinary built in types, so if I have a type t that
I've defined you can call new t and here's what that would do. It will allocate memory to store a t
object whatever t is and it will call in the t constructor, well new t
will call the t default constructor after that memory is allocated. Now this little sequence of code
shows you, let me read that for you and see if you can understand. First I'm going to declare an int n,
then I'm going to declare a pointer to type BitVector, you don't know
what BitVector is, but you can assume from that code that it's a
type, and you know right off that it's a user defined type because
it certainly isn't a built in part of the C plus plus language and I
want to call that a bvp pointer. I'm going to read in a number into
the int variable and then I'm going to create a BitVector
based on that number; BitVectors have a certain thing
called size and what that's going to do is create a BitVector
object based on that size thing that somebody entered n and
allocate memory for that and give bvp pointer the
address of that allocation. So again, bvp pointer
just like n pointer on the previous slide has
standard scope, has a block scope as I called it here, so wherever
it got declared that's its scope. The object that bvp pointer points to
after operator new has global scope and this is a little, takes
a little getting used to and the full significance
of that may take a couple of courses before you get
completely the significance of that, what that means this BitVector
object that got allocated by operator new has global scope so what you have here is
the distinct possibility that the pointer variable can go out
of scope, but the thing it points to lives forever cause it had
global scope; that's both necessary and extremely difficult to
manage at the same time. It's necessary because you
want to be able create objects and allow client programs to do anything
they want to with those objects, on the one hand, on the other hand
this allows you to create an object and lose a way to get to find the
object, so if your initial pointer to the object goes out of scope
and you have a, use another pointer to capture what that address is and
you have lost the handle so to speak on that object forever, and by the
way that is known as a memory leak and you can see what that means is that
new memory got allocated big enough to store a BitVector in and you lost
the handle on that BitVector object and so it's out there slowing your
program down for the rest of time. [ Pause ] Operator delete needs to be invoked
in concert with operator new. Operator delete is how you give back
the memory that you're allocated with operator new, so
whenever you create this object since you don't want it to live forever
eventually it has to be deleted; operator new calls a
constructor for type t; operator delete calls
a destructor for type t and we didn't mention destructors
yet, but we will very soon. A destructor is going to be a special
function that you define in a class that told you how to, how
to decondition that object. So, when we do delete bvp pointer,
what that's going to do is get rid of that memory that got allocated;
it's first going to all the destructor for BitVector on that footprint and
then it's going to give that memory back to the memory allocator for reuse
on some other piece of the program. Now, when that object is deleted it goes
out of scope, so it comes into scope and the scope is global; it comes into
existence globally with operator new and it goes out of existence
with operator delete. During its lifetime it's
accessible globally from any program running in that space. [ Pause ] That's all I'll say about that for now. You can make arrays of objects, so
operator new t with the bracket in it and an integer inside the
bracket allocates n type t objects in the form of an array. So let's look at code. We got n star n pointer, we're going
to say n pointer equals new int of 20 with square brackets that gives
us an array of 20 integers, but n pointer is the address
and you can user defined types, you can do BitVector
star bvp pointer and Bit, bvp pointer equals new BitVector
20 and that will be an array of 20 10-bit BitVectors, 10 coming
from the 10 that's in the parenths as opposed to the square brackets. [ Pause ] Operator delete with
brackets can and should be in both whenever you
use new with brackets. Now, when you call operator
new t bracket n that's going to first allocate memory and then
it's going to go for each one of those n footprints of type t
it's going to call the t constructor with [inaudible] for t
default constructor at each one of those footprints, so operator new
t bracket n properly initializes all n of those t objects calling
the t constructors one at a time, one for each footprint. Operator delete goes first
through the entire array and calls a plat t destructor for
each one, each one of those elements of that array and then it gives the
memory back to the memory output. So it's important to use that,
those brackets with delete, if you just said delete what would
happen is the memory would get given back, but the destructors for all
those objects would not get called and that could end up being a
serious problem in your program. [ Pause ] So let me just take a minute
and I mentioned the array before and you may not have ever seen
what I was doing called an array. So let's go back and talk about
what arrays are officially. We can do type t whatever it is say int and you can do a and
then bracket n right? That is a way to declare an array
and it's the one you got used to in your first programming course. There's essentially another
way to declare an array and that is dynamically, so you can
have t star a equals new t bracket n and that will declare an array and allocate dynamically
instead of statically. The advantage of doing it dynamically
is you might not know the size you need when you write the code, when
you declare an array and your, when you're in code you have to
know what size the array needs to be when you're actually writing the code
into the file before you compile it, but it's easy to imagine instances where
you don't know how many things you need in your array until after you
get some user input which means after the program has been compiled
and is running and it's too late to put that parameter in statically so that's
when you would use the dynamic way of creating an array so your static
declaration would be t star a that declares a to be a variable
type pointer to t and then later on in the program you can
call operator new t bracket n or n is something a user
might have given to you during the run of the program. Now there are some subtle differences
in the way these two different, there's some subtle differences in
these two ways to creating arrays and we'll discuss those over time,
but the basic mechanism is the same; the way you access elements is the same. Either way you can do a bracket
i and either way dereference, so a bracket i is the i
object in the array, right? Star a and now you have zero both
references to the zero index object in the array which is the first one;
a is the address of the first element of the array in both cases. Now let's talk about the
array bracket operator. Now remember this a is the
address of the block of memory that allocated either
statically or dynamically, so how do you get the address of
the ith element in that array? Well, you need to know what the offset
is and the offset is whatever the size of these t objects are,
let's say it takes 4 iths to house a t object then 4 times i is
how far you have to go up in memory to get where the ith object is stored,
and so you can literally calculate where the ith element of that array is
by that arithmetic process and in fact when you involve the bracket operator
that's exactly what is happening you're doing that arithmetic under the hood
so to speak to come up with an address of an element, and notice that the
pointer has to know it's type in order to do this because you got to
have the sizeof going in there. Whereas when you declare
pointers, you're giving them a type so they know what type their pointing
and this is a question that you have to understand for all kinds of reasons
including passing a question on midterm; what is the difference for an a and
a of zero; where a is the address; where a of zero is stored, a
of zero is the actual value. [ Pause ] [ Inaudible Partial Question ] >> Values to the functions when it
comes to arrays and these pointers, because sometimes I put the address
then I'll put the address with the value and it will not take that
statement so it will come out with a bunch of garbage. >> Yeah, so passing arrays
into functions can be tricky. Your first real homework assignment, first real programming assignment
is homework 2 stats and we'll talk about that maybe later today if we have
time; you got to do a lot of passing in of arrays in and out of functions in
that assignment, but it can be tricky. Like most things, it ceases
to become tricky as soon as you completely understand
what you're doing, what you're passing so let's go on. I want to talk now about something
called pointer arithmetic. Remember pointers are
variables and they are really of some integer type their
addresses, so it kind of makes sense that you could add numbers to pointers
and get other addresses, right? So, in fact there are some arithmetic
functions that pointers have. First of all you can increment
them and decrement them, so you can take a pointer
a and do plus plus a, and what do you think that would do? A is pointing to something
in memory, right? A footprint of some object well
plus plus a says okay point to the next object in memory of that
type; minus minus a says okay point to the previous object in memory of
that type, so makes perfect sense. You can using that concept it makes
sense to talk about adding an integer to a pointer, so let's
say you got a pointer to an array call it a;
what would a plus 10 mean? Well, that would mean in effect ith
being a 10 times so it would now point to the index 9, well actually the
index 10 element in that array, so a plus zero would point to the index
zero on a plus which is a bracket zero; a plus 1 would point to the a
bracket 1 element; a plus 2 points to the a bracket 2 element and so on. Now, so that is an alternative
way dealing with arrays, you can use pointer arithmetic
and dereference instead of a bracket operator and I'm going
to write something on the board here, I don't know if it's going to
come out well in the video or not, but let's say I have an int, let's
say an int star a and I'm going to allocate a equals new int 20, well a is now dynamic allocated
array 20 integer, 20 integers. So, a is the address of the
entire block of allocation which in particular is the address
of a of zero, so what is star a? Well star a is what a points to, right? The reference, a at the pointer that's
what is pointing to, well it's point to a of zero, so star a and a bracket
zero basically means the same thing. I should write that grey
bars it means like that. If I did a plus 5 that's the address
of the index 5 element of the array, so that means if I do reference a plus 5 that would mean the same
thing as a bracket 5. [ Pause ] Now you can notice that this
just takes a and adds 5 to it, but it doesn't replace a with 5. Does the plus equals operator if
it's defined because you can add, so you could do a plus equals 5
and that would have the effect of changing the value that a points
to itself to the index 5 element. [ Pause ] Okay, here are some more
pointer arithmetic examples. So I'm declaring a to be an array of
type t [inaudible] with 20 elements in it, I'm also declaring b
to be a pointer to type t, but I'm not calling operator new
for b, so I can do plus plus a that means a now points to the
index 1 element of that array right, no longer the index zero
element, but the index 1 element. I can do b equals a minus minus
notice that that minus minus is on the right hand side of a; remember
we have pre and postfixed operators for increment and decrement so the
effect of this is b will now point to the index 1 element and then a
gets incremented so a now points to the index zero element,
so at this stage right here. [ Pause ] At that stage right there a is
pointing to the zero index element and b is pointing to
the 1 index element. Now, could increment b would not
be pointing to the index 2 element; a is still pointing to the index zero
element, so I don't know you could, you can subtract those two pointers
and that value would be 2 because what that effect, the effect of that is
index 2 minus index zero which is 2, so I could also do a equals b plus 3. These are just examples of the things
you can do with pointer arithmetic; so b plus 3 is another pointer
and they can be assigned to a; b minus a is not a pointer, but it's an
integer it's the difference between b and a in terms of the number of objects
in that array, so if the number of, it's the difference in their indices
in those two elements and note that can be positive or negative cause
a and b might, one might be ahead of the other or behind
the other in the array. [ Pause ] Okay, here's another code example and
this is just something to learn to read; I'm just exercising the
possibilities here so I'm going to have an unsigned int i, I'm going
to have an a equals new char bracket 4, so a is now an array of 4 characters. If i equals zero to 4 I'm going to let
a of i be the character int a plus i, so that'll end up fatally making
a of zero be the lower case a, a of 1 lower case b, a of 2 lower
case c, a of 3 lower case d. So now I can do x equals a plus plus, if I output star x, star
x will be a, right? Cause a points to the zero
element, but then after assigning that to x I incremented it postfixed so
after the assignment a is now pointing to the index 1 element and so when I output a bracket zero it outputs
a b not an a, cause a got incremented. Now I'm going to do x equals plus plus
a that's prefixed incremented so x and a are now the same
pointer for one thing and for another a has arrived before
incremented more and more times, so a is now pointed to the
index 3 element and so is x; the index 3 element is
c, but if a is pointing to index 3 element then a bracket 1 is
pointing to 1 index 1 higher than that so that would be the old index for i
element and so a bracket 1 is now d, so an output star x followed
by a 1 I get cd; it's just exercising
the arithmetic here. In the narrative I believe
at this stage I have one of these places I have a pretty
good example, it's right here, yes; pull that up and I'm not a
tech person enough to do that. [ Pause ] No, can't do it. This is an illustration of how convenient pointers can be
sometimes more intuitive than indexes so what I have here is an
array, array of characters, it's got 10 characters
in it a through j, right? Right here, this is in the narrative, so
a is an array with 10 characters in it a through j. And what I'm going to
have is 3 pointers to type char i, j and k. Now I'm going
to have a loop you do, I'm just illustrating how you can create
a loop of pointers instead indexes, so I can do for i equals a that's
initializing the pointer i to a and terminating 1 i becomes a plus 10
and incrementing for i at each stage and then I can do the same thing for j
and the same thing for k then I'm going to output all three of those things and,
you know, it'll output a lot of letters but basically that will output every
possible three letter combination using a through j and that's
not all that interesting, what's interesting is how you could
use the pointers in place of indices to point to different places in
those, in that original array. [ Pause ] Are there any questions
about that so far? [ Pause ] Okay, I'm going to go ahead and talk
about binding times just briefly. When you declare a variable we
call that compile time binding. [ Pause ] The binding is the variable with some
place in memory that holds an integer so the declaration is
the compile time binding. It makes c represent an
address of an integer. You're also giving say a value
at compile time namely 15. So here is again compile time
binding of n to a memory location and then doing the run of the
program you might give n a value, so it would be runtime valuation of n
so you have compile time binding of n to a memory address and runtime
giving n a value and this of course is another way of
giving n a value at runtime. Okay, so now I can do
int star n pointer, and again that's an ordinary declaration
so it's a compile time binding between n pointer and a place
to store an integer address. So now I can do n pointer equals new
int, that happened during the run of the program so this is a onetime
binding between what n pointer points to in memory and the value of n pointer,
so that's called a runtime binding which is very different
from a compile time binding. So this happens when you
run the compiler, right, and create executable code; this happens with the program is actually
executing you're in runtime, and of course now we can do a runtime
evaluation of that thing as well. In the case of unbinding that occurs
with operator delete at runtime as well so because you bound it to an
address at runtime, you got to break that binding while the program is
still running; when something is bound at compile time you don't
have to worry about that. [ Pause ] So, just to reiterate
what we said before, static variables ordinary declarations
the usual scoping rules apply, but dynamic variables created with
operator new have global scope always. And here are some examples, so int star
n pointer that's compile time binding, static memory allocation
of type int star and n pointer equals new
int that's a runtime binding which has a dynamic memory allocation
of type int whose address is stored in n pointer and then delete n pointer
is a runtime unbinding if you will. So in that little snippet of code, the
usual scope rules apply to n pointer with the variable allocated at
star n pointer has global scope and this variable remains in scope
even after n pointer itself goes out of scope, so it remains in
scope until you've deleted it. [ Pause ] So all of this comes to rest in
one particular important place and that is so-called C strings. C to some people stands for
the C programming language and to other people it
stands for an abbreviation for character; either way is fine. C strings however are the way that the C
programming language dealt with strings and strings are basically the
way we communicate, right? So they're clearly a very important
thing in computer programs and something that you need to be able to
store, write, read, etcetera. So, you need to know how C strings work. You will need to know
this almost immediately to get your homework assignments done
and homework assignment 3 you will have to do dynamic allocation/deallocations
C strings and project 1 will have a
substantial number of class variables that are C strings that need
to be allocated [inaudible]. Okay, so that's this course, but later on you will take a course
called operating systems and that course it'll be
taught using the C language which you're really learning this
semester or have already learned in a previous course most it, but
because operating systems are written in C you don't have access to objects
in those in operating systems, but all those strings in operating
systems are C strings and so you deal with C strings in operating systems to
the point it'll drive you to distraction because if you think about
it everything you type on the keyboard is first entered as a
string and then it has to be interpreted so whenever you give a
command it's a string so there's strings everywhere flying
around in operating systems all the time and you need to be able to program C
strings because of that, so anyway. C strings, they are basically strings of characters functionally speaking
they're strings of char, I'm sorry, not C strings; they are arrays of
characters, but and there's a big but, the assumption is that whenever
you have an array of characters, this is specific characters not
integers or any other kind of object, but characters it'll assume that you
have one extra place in that array and in that extra last place, you store
what is called the null character. The null character is representing
C plus plus as backslash zero and of course like all characters
you have to put it in single quotes if you mention it in a program. That character has got to be at the
end of that character array in order to form a correct C string and
it plays a very important role in the way you compute with C strings. So, look at this code here, char str1
bracket 11 that's a statically declared array of type char. If I put, if I access index 10 element of that array that's the last
element allocated, right? So index 10 element is the 11th element
and so at the very last place I'm going to put in this null character;
when you see that. [ Pause ] Right there that's where I put in
the null character in the last one and then I'm going to call a function
strcty, this is a library function. We'll talk about what library in
a minute, but it of course stands for string copy and what it's going to
do is copy this string into this string, you know, if you count those elements
you'll see that I have 10 characters a through j and what string copy does
is copy characters until it runs into the null character and then stops
and it copies on one slot at a time. This string here is represented
in code by another character array with [inaudible] slots in it and
backslash zero in the left slot. [ Pause ] So, let's look at some more code. Suppose I had char star str2
is now a part of the type char when I do fcr2 equals new char11, so
I've now allocated 11 character slots; index 10 the last slot I'm going stick a
backslash zero into the null character, now I'm going to call string copy str1,
str2, notice that this copies from right to left, in other words like the
assignment statement or the thing on the right gets copied
to the thing on the left. So that copies all characters
in str1 to the places in str2. And finally, the output operator
is overloaded for character strings so when I do cout str1 and
then a blank and then an str2 and then a new line what you'll get is a
through j blank a through j again, null. Now, the hidden assumption 1 is that these strings are all correctly
null-terminated that's what it means to have the null character there. And hidden assumption number 2
is memory has been allocated, of course in a static declaration
the memory allocated, you know, dynamic allocation we have to make
an [inaudible] new, operator new. What would happen if I'd called
this strcpy command without, and suppose I had forgotten
to do the allocation for str2. This would happen anyway. So these characters in str1 would be
copied to the address at str2 blinding, it has no way of knowing that
you, the programmer forgot to allocate memory there and so
that clearly might be a bad thing, right because you're copying
characters into the place in memory that has not been allocated
to your running program. It could for example write over
some of your important code as part of that program and that would make your
program crash, it might happen to write into memory that you're not
currently using to run the program and you'd see no effect for maybe
10 more minutes and then you'd get to the portion of the code
that you just over rode and then the program would crash. So it would have unpredictable results
and in most cases pretty much guarantee to be disastrous results;
don't forget that allocation. Now, this opens up a lot of
possibilities for making mistakes and mistakes have been made programming
in C; a lot of mistakes of, you know, it also opens up the
possibility of intentional misuse of program tremendously, so some
of the security breaches and limits in unix systems, and by the way
Windows is written on top of unix, it comes from mishandling of character
strings it's operating system level. [ Pause ] So, this is a possible implementation
of strcpy and it's actually very close to being the way it's
implemented in the library. So, the function header which is
the prototype without the semicolon, it's got void return type, the name is
strcpy; it's got a receiving variable which is str2 here as a type
char star pointer to char. The variables from which you're
assigning str1 is a type const char star; const char star means that
you don't intend to change any of the values str1points to. The fact that there's no const here says
I probably am going to change the value of str2 points to and so here's
the way you could write that; while str1 doesn't point to the null
character, copy what str1 points to into what str2 points to and then
increment str2 and increment str1, null; remember I told you, showed
you how pointers can behave like loop control variables that's what
this is doing, so str1 will increment, increment, increment and then ultimately
it will find that null character and stop this loop, well if you
forgot to put the null character in there it won't stop, right, until
it finds one somewhere at memory; this will just keep on copying. So if you forget to null-terminate the
str1 you've got a copy gone wild here. The other point is that str2, suppose
you forgot to allocate memory to that, well then you're writing right here,
you're copying stuff from this string into places in memory that
you don't own, and again, with possible disastrous consequences
and this is literally how this was done and with the C language you
didn't really have a better option at that low level of stuff, you
didn't really have a better option and this was the system that
got them in, works great as long as the programmers don't make
mistakes, but you can see just from this sample implementation
of string copy how the assumption that str1 is null-terminate is used
implicitly and how the assumption that scr2 has had memory allocated
to it is also clearly needed. So, one of the things we'll
talk about in a week or two is in sequels plus we now have this concept
of class and so we can build up strings as objects instead of C strings
and C plus plus and you can begin to add string objects as if they're
sort of first class entities; you can assign one to the other and
things like that and they'll work as you have hoped they would whereas
down in C you couldn't really work with them that way, you can't even
assign str1 to str2, well you can but that assigns their addresses it
doesn't assign what they point to.