Pointers, arrays, and string literals
A recently posted question on Stack Overflow highlighted a common misconception about the role of pointers and arrays held by many programmers learning C.
The confusion stems from a misunderstanding concerning the role of pointers and strings in C. A pointer is an address in memory. It often points to an index in an array, such as in the function strtoupper
in the following code:
void strtoupper(char *str) { if (str) { // null ptr check, courtesy of Michael while (*str != '\0') { // destructively modify the contents at the current pointer location // using the dereference operator to access the value at the current // pointer address. *str = toupper(*str); ++str; } } } int main() { char my_str[] = "hello world"; strtoupper(my_str); printf("%s", my_str); return 0; }
my_str
is actually a pointer to a block of memory holding chars. This allows us to use address math to access indices of the array and modify them using the dereference operator. In fact, an array index such as my_str[3]
is identical to the expression *(my_str + 3)
.
char my_str[] = "hello world"; *my_str = toupper(*my_str); *(my_str + 6) = toupper(*(my_str + 6)); printf("%s", my_str); // prints, "Hello World"
However, if my_str
is declared as a char pointer to the string literal “hello world” rather than a char array, these operations fail:
char *my_str = "hello world"; *my_str = toupper(*my_str); // fails *(my_str + 6) = toupper(*(my_str + 6)); // fails printf("%s", my_str);
Let’s explore the difference between the two declarations.
char *a = "hello world"; char b[] = "hello world";
In the compiled program, it is likely that “hello world” is stored literally inside the executable. It is effectively an immutable, constant value. Pointing char *a
to it provides the scope with read-only access to an immutable block of memory. Therefore, attempting to assign a value might cause other code that points to the same memory to behave erratically (read this response to the above post on Stack Overflow for an excellent explanation of this behavior.)
The declaration of char b[]
instead declares a locally allocated block of memory that is then filled with the chars, “hello world”. b
is now a pointer to the first address of that array. The complete statement, combining the declaration and assignment, is shorthand. Dispensing with the array size (e.g., char
instead of char[12]
) is permitted as the compiler is able to ascertain its size from the string literal it was assigned.
In both cases the pointer is used to access array indices:
int i; for (i = 0; a[i] != '\0'; ++i) printf("%c", toupper(a[i]));
However, only with b
is the program able to modify the values in memory, since it is explicitly copied to a mutable location on the stack in its declaration:
int i; for (i = 0; b[i] != '\0'; ++i) b[i] = toupper(b[i]); printf("%s", b);
You have a bug in strtoupper() — it will crash if NULL is passed in. i.e. strtoupper(NULL);
This is a more type safe version:
void strtoupper(char *str)
{
if( str )
while (*str)
{
*str++ = toupper(*str);
}
}
Thanks. I added a check for null pointers.
There’s another subtle bug on strtoupper:
int a[] = “”;
a = strtoupper(a);
will break. The pointer is not null (so, it passes the null check), but then inside the do, the first () char gets toupper’ed, which is not a problem. On the while check, the pointer is advanced firs, and checked second, so the (only) of the string isn’t seen.
A correct implementation would be as Michael wrote, checking the content first, and stepping forward last.
Also, a better (IMHO) way to handle errors is to return as early as possible, like so:
char* strtoupper(char *str)
{
if(NULL == str) {
return NULL;
}
char *start = str;
while (*str) {
*str = toupper(*str);
str++;
}
return start;
}
(I’ve separated the assigning and the increasing, even though it’s not necessary, to make it an itty bit cleaner.)
I’ve updated the function, putting the null terminator test at the beginning of the loop.
Pretty nice post. I just came across your site and wanted to say
that I have really enjoyed reading your blog posts. Any way
I’ll be subscribing to your feed and I hope you write again soon!
Oh wow. I know about these kinds of things generally, which makes reading/writing C easier. But I thought *char vs char[] was simply a syntax nicety. Thanks for clearing that up. Explicit pointer syntax is stupid. Do language developers ever think about these kinds of easily solved syntax problems? I think it’s disastrous that Lisp uses CAR/CDR instead of FIRST/REST. These are issues that newbies discover every time they are introduced to a language. How come engineers with PhDs can’t imagine them?
Often, it’s because a language is developed organically rather than top-down. That is especially true for lisp, which was conceived of more than half a century ago when the available alternatives were fortran, assembly, and punched cards. CAR and CDR are much simpler than their equivalent forms in assembly. From http://www.statemaster.com/encyclopedia/Car-and-cdr:
The 704 assembler macro for cdr was
LXD JLOC,4
CLA 0,4
PDX 0,4
PXD 0,4[1]
It can be even more insidious than that, in fact. There are really 4 possibilities, and 3 look quite similar.
char a[] = "This is a test"; // immutable pointer to mutable memory
char *b = "this is another test"; // mutable pointer to immutable memory
char *c = malloc(256); // mutable pointer, mutable memory
const char *const d = "this is the fourth test"; // immutable pointer, immutable memory -- really, really constant
strcpy(c, "this is yet another test");
a[2] = 'I'; // compiles, works
b[2] = 'I'; // compiles, Fails!
c[2] = 'I'; // compiles, works
d[2] = 'I'; // will not compile
a = b; // compile error
c = a; // Legal, works
d = c; // compile error
This also touches on the fact that saying const char *foo gives you a constant value. You can’t change the thing it points to, but the pointer itself is fair game. Strings in C are a minefield of trickiness.
By the way, it was nice meeting you at PyOhio this weekend.
You too, Ian. Have you looked at the safe c library? It contains some very well-written string functions. The code is extremely clean and readable, too.
It will be great if you elaborate your article with pointer declarations like this,
char *string = { “This is a string” }
I was not sure how to pass-by-reference a string like this and also the const-correctness issue.