The most current version of this document can be found at http://www.augustcouncil.com/~tgibson/tutorial/. Please contact me with any errata, comments, suggested changes, or improvements: Send email to tgibson@augustcouncil.com


Copyright 2001-2014 Todd A. Gibson. All Rights Reserved.

While this document is copyright by me with all rights reserved, permission is granted to freely distribute verbatim copies of this document provided that no modifications outside of formatting be made, and that this notice remain intact.

Return to tutorial index

Pointers

The source code to all code listings is available as a tarball and as a zip file.

Using Variables

Essentially, the computer's memory is made up of bytes. Each byte has a number, an address, associated with it.

The picture below represents several bytes of a computer's memory. In the picture, addresses 924 thru 940 are shown.

Basic ribbon of memory

Try:

C Code Listing 1
 1:#include <stdio.h>
 2:int main()
 3:{
 4:  float fl=3.14;
 5:  printf("%.2f\n", fl);
 6:  return 0;
 7:}
C++ Code Listing 1
 1:#include <iostream>
 2:int main()
 3:{
 4:  float fl=3.14;
 5:  std::cout << fl << std::endl;
 6:  return 0;
 7:}

At line (4) in the program above, the computer reserves memory for fl. In our examples, we'll assume that a float requires 4 bytes. Depending on the computer's architecture, a float may require 2, 4, 8 or some other number of bytes.

Variable fl allocated

When fl is used in line (5), two distinct steps occur:

  1. The program finds and grabs the address reserved for fl--in this example 924.
  2. The contents stored at that address are retrieved

To generalize, whenever any variable is accessed, the above two distinct steps occur to retrieve the contents of the variable.

Separating the Steps

Two operators are provided that, when used, cause these two steps to occur separately.

operatormeaningexample
&do only step 1 on a variable &fl
*do step 2 on a number(address) *some_num

Try this code to see what prints out:

C Code Listing 2
1: #include <stdio.h>
2: int main()
3: {
4:   float fl=3.14;
5:   printf("fl's address=%lu\n", (unsigned long int) &fl);
6:   return 0;
7: }
C++ Code Listing 2
1:#include <iostream>
2: int main()
3: {
4:   float fl=3.14;
5:   std::cout << "fl's address=" << (unsigned long int) &fl << std::endl;
6:   return 0;
7: }

On line (5) of the example, The & operator is being used on fl. On line (5), only step 1 is being performed on a variable:

1. The program finds and grabs the address reserved for fl...

It is fl's address that is printed to the screen. If the & operator had not been placed in front of fl, then step 2 would have occurred as well, and 3.14 would have been printed to the screen.


Keep in mind that an address is really just a simple number. In fact, we can store an address in an integer variable. Try this:

C Code Listing 3
1: #include <stdio.h>
2: int main()
3: {
4:   float fl=3.14;
5:   unsigned long int addr=(unsigned long int) &fl;
6:   printf("fl's address=%lu\n", addr);
7:   return 0;
8: }
C++ Code Listing 3
1: #include <iostream>
2: int main()
3: {
4:   float fl=3.14;
5:   unsigned long int addr=(unsigned long int) &fl;
6:   std::cout << "fl's address=" << addr << std::endl;
7:   return 0;
8: }

The address of fl is stored in addr

The above code shows that there is nothing magical about addresses. They are just simple numbers that can be stored in integer variables.


Now let's test the other operator, the * operator that retrieves the contents stored at an address:

C Code Listing 4
1: #include <stdio.h>
2: int main()
3: {
4:   float fl=3.14;
5:   unsigned long int addr=(unsigned long int) &fl;
6:   printf("fl's address=%lu\n", addr);
7:   printf("addr's contents=%.2f\n", * (float*) addr);
8:   return 0;
9: }
C++ Code Listing 4
1: #include <iostream>
2: int main()
3: {
4:   float fl=3.14;
5:   unsigned long int addr=(unsigned long int) &fl;
6:   std::cout << "fl's address=" << addr << std::endl;
7:   std::cout << "addr's contents=" << * (float*) addr << std::endl;
8:   return 0;
9: }

In line (7), step 2 has been performed on a number:

2. The contents stored at that address [addr] are retrieved.

OK, But why do we need & and *

We have shown that 2 distinct steps occur when accessing a variable, and that we can make those steps occur separately. But why is this useful?

To see why, let's first look at how functions work in C/C++. Try this code:

C Code Listing 5
 1: #include <stdio.h>
 2: void somefunc(float fvar)
 3: {
 4:   fvar=99.9;
 5: }
 6: int main()
 7: {
 8:   float fl=3.14;
 9:   somefunc(fl);
10:   printf("%.2f\n", fl);
11:   return 0;
12: }
C++ Code Listing 5
 1: #include <iostream>
 2: void somefunc(float fvar)
 3: {
 4:   fvar=99.9;
 5: }
 6: int main()
 7: {
 8:   float fl=3.14;
 9:   somefunc(fl);
10:   std::cout << fl << std::endl;
11:   return 0;
12: }

What prints out? 3.14? 99.9? It turns out that 3.14 prints out. The general term used to describe this behavior is pass by value. When somefunc(fl) is called at line 9:

  1. Execution jumps to line (2) to run the function
  2. fvar is created as its own variable and fl's value is copied into fvar
    fvar stores value passed into function
  3. On line (4), 99.9 is assigned to fvar
    99.9 is assigned to fvar
  4. Now that the function is finished, execution resumes in main where it left off (line 10). The fl variable is unchanged, 3.14 prints out.

We can circumvent this pass by value behavior and change values passed into functions by using the & and * operators.

C Code
 1: #include <stdio.h>
 2: void somefunc(unsigned long int fptr)
 3: {
 4:   *(float*)fptr=99.9;
 5: }
 6:
 7: int main()
 8: {
 9:   float fl=3.14;
10:   unsigned long int addr=(unsigned long int) &fl;
11:   somefunc(addr);
12:   printf("%.2f\n", fl);
13:   return 0;
14: }
C++ Code
 1: #include <iostream>
 2: void somefunc(unsigned long int fptr)
 3: {
 4:   *(float*)fptr=99.9;
 5: }
 6:
 7: int main()
 8: {
 9:   float fl=3.14;
10:   unsigned long int addr=(unsigned long int) &fl;
11:   somefunc(addr);
12:   std::cout << fl << std::endl;
13:   return 0;
14: }

Quite simply, the two steps that normally occur when accessing a variable are being separated to allow us to change the variable's value in a different function.

  1. The floating point variable fl is created at line (9) and given the value 3.14
    Variable fl allocated
  2. The & operator is used on fl at line (10) (do only step 1, get the address). The address is stored in the integer variable addr.
    The address of fl is stored in addr
  3. The function somefunc is called at line (11) and fl's address is passed as an argument.
  4. The function somefunc begins at line (2), fptr is created and fl's address is copied into fptr.
    The argument addr is copied to fptr
  5. The * operator is used on fptr at line (4) -- do step 2, the contents stored in an address are retrieved. In this example, the contents at address 924 are retrieved.
  6. The contents at address 924 are assigned the value 99.9.
    99.9 is assigned to fl
  7. The function finishes. Control returns to line (12).
  8. The contents of fl are printed to the screen.

Pointer Variables

Even though we have shown that an address is nothing more than a simple integer, the creators of the language were afraid we might confuse variables in our programs. We might confuse integers we intend to use for program values (e.g. variables storing ages, measurements, counters, etc.) with integers we intend to use for holding the addresses of our variables.

The language creators decided the best way to eliminate confusion was to create a different type of variable for holding addresses. A first attempt at this might have looked something like this:

1:...
2:  float fl=3.14;
3:  float Ptr addr = &fl;
4:...

On line (3), here is how to describe the addr variable:
addr is a pointer to a float
(A) addr is an integer. (B) However, it is a special integer designed to hold the address of a (C) float

In the code above, line (3) is close to what the creators of the language wanted except for one thing: using Ptr would require introducing another keyword into the language. If there is one thing that all C instructors like to brag about, it is how there are only a very small number of keywords in the language. Well, using line (3) as shown above would mean adding Ptr as another keyword to the language.

To avoid this threat to the very fabric of the universe, the creators cast about for something already being used in the language that could do double duty as Ptr shown above. What they came up with was the following:

1:...
2:  float fl=3.14;
3:  float * addr = &fl; 
4:...

Even with the * instead of Ptr, addr is described the same way:
addr is a pointer to a float
(A) addr is an integer. (B) However, it is a special integer designed to hold the address of a (C) float

These variables are described this way, regardless of the type:

addr is a pointer to a char
(A) addr is an integer. (B) However, it is a special integer designed to hold the address of a (C) char

addr is a pointer to an int
(A) addr is an integer. (B) However, it is a special integer designed to hold the address of an (C) int

This "...special integer..." way of describing these variables is a mouthful, so we shorten it and just say "addr is a float pointer" or "addr is a pointer to a float" (or char, or int, etc.).

Unfortunately, the language creators chose the * character to replace Ptr. The * character is confusing because the * character is also used to get the contents at an address ("do step 2 on a number"). These two uses of the * character have nothing to do with each other.

What is all that "syntax sugar" anyway? (Casting)

Let's take one last look at our original code that illustrates the utility of separating out steps 1 & 2.

C Code Listing 7
 1: #include <stdio.h>
 2: void somefunc(unsigned long int fptr)
 3: {
 4:   *(float*)fptr=99.9;
 5: }
 6:
 7: int main()
 8: {
 9:   float fl=3.14;
10:   unsigned long int addr=(unsigned long int) &fl;
11:   somefunc(addr);
12:   printf("%.2f\n", fl);
13:   return 0;
14: }
C++ Code Listing 7
 1: #include <iostream>
 2: void somefunc(unsigned long int fptr)
 3: {
 4:   *(float*)fptr=99.9;
 5: }
 6:
 7: int main()
 8: {
 9:   float fl=3.14;
10:   unsigned long int addr=(unsigned long int) &fl;
11:   somefunc(addr);
12:   std::cout << fl << std::endl;
13:   return 0;
14: }

In nearly all of the code samples, you have been asked to ignore certain bits of the code. These bits of code have always appeared around those areas where we are either taking the address of a variable or getting the contents at an address (doing step 1 or step 2 on a variable)

Those bits of "syntax sugar" are there to keep the compiler from complaining. The first example of this in the above program is on line (10).

On line (10) we are taking the address of the floating point number fl ("do only step 1 on a number"). After we get that address, we store it in addr.

Why would the compiler complain? Because when we assign the address of fl to addr, the compiler does not expect addr to be an unsigned long int. The compiler expects addr to be a float *. That is, a special integer designed to hold the address of a float. To keep the compiler from complaining, we tell the compiler to treat &fl as an unsigned long int rather than a float *.

This "syntax sugar" that causes the compiler to treat variables and expressions differently is called casting. The way a programmer describes line (10) is: "The address of fl is being cast into an unsigned long int and assigned to addr"

The other place casting occurs is on line (4). On line (4), we are getting the contents at an address ("do step 2 on a number/address"). Why would the compiler complain? Because the compiler should get the contents of the address of a float. The address of our float is in stored in fptr, which is an unsigned long int, not a float *. We tell the compiler to treat fptr as the address of a floating point number by casting it into a float *. Once we tell the compiler this, we can get the contents at the address without complaint.

Putting it all together

From the previous section, you might be left with the impression that whenever you deal with addresses and pointers, there is a lot of casting. Not so. The only reason our examples up till now have required casting is because we were storing our addresses in unsigned long int variables. The language designers want us to store addresses in the "special integer" variables, that is, the pointer variables they designed for just such a purpose.

Once we replace our unsigned long int variables with these pointer variables, none of the casting is required:

C Code Listing 8
 1: #include <stdio.h>
 2: void somefunc(float* fptr)
 3: {
 4:   *fptr=99.9;
 5: }
 6:
 7: int main()
 8: {
 9:   float fl=3.14;
10:   float* addr = &fl;
11:   somefunc(addr);
12:   printf("%.2f\n", fl);
13:   return 0;
14: }
C++ Code Listing 8
 1: #include <iostream>
 2: void somefunc(float* fptr)
 3: {
 4:   *fptr=99.9;
 5: }
 6:
 7: int main()
 8: {
 9:   float fl=3.14;
10:   float* addr = &fl;
11:   somefunc(addr);
12:   std::cout << fl << std::endl;
13:   return 0;
14: }

Revision History

2013 February 26 Correcting formatting code in printf to accommodate "long" unsigned ints
2012 September 10 Machines, they be growing! Added "long" to "unsigned int" to accommodate 64 bit g++.
2006 April 7 Corrected spelling errors
2005 December 1 Added tarball and zip file of source code
2005 March 01 Some formatting corrections made at the behest of my former mentor, and all-around upstanding citizen Jeff H. You're a wonderful human being, Jeff. Thanks!
2003 June 12 Reformatted document to be valid XHTML. Minor code corrections.
2002 June 02 Updated the C++ I/O preprocessor directives and I/O calls to conform to standard.
2001 April 30 Some minor corrections.
1999 March 19 Added C version of code. Minor corrections to text.

Miscellaneous

The graphics in this tutorial were created using the freely distributed image manipulation program The GIMP. Information on The GIMP can be found at http://www.gimp.org/