Why does this code work (chp.6)?

wrockinator · December 1st, 2008, 07:38 AM

I simply don't get it. Please have a look at this function that is listed on p.302 and is supposed to kill spaces in a string:

while( (*(str + i) = *(str + j++)) != '\0') // Loop while character
if( *(str + i) != ' ') // Increment i as
i++;

Within the while condition the paranthesis will be executed first, i.e.:

*(str + i) = *(str + j++)

Here we have to go first to the increment (j++). Now that means in round 1 we get:

*(str + 0) = *(str + 1)

But this assignment shouldn't work since we use two dereferenced pointers, i.e. values.

Can anyone help?

Old Pedant · December 1st, 2008, 07:23 PM

No, you misunderstand a couple of things.

(1) The expression j++ means "USE the CURRENT value of j and THEN increment it by one."

Note the BIG difference between this and ++j which says "FIRST increment and THEN use the value."

So, assuming that i and j are both initialized to zero (you don't show that, but it's a reasonable assumption), then indeed doing
    *(str + i) = *(str + j++)
is the same as doing
    *(str + 0) = *(str + 0)
(with the side effect that j will be one greater after the expression is finished).
which is of course the same as
    *(str) = *(str)

(2) When a pointer is dereferenced (that is, when the * operator is used on a pointer), the meaning depends on whether an lvalue is needed/used or not. On the left side of the equals sign, we need an lvalue, and so that is saying "put a value into the location referenced by the pointer str". Then, on the right side of the equal sign, the dereference of str means "get the character pointed to by str."

And so, indeed, we copy a character from the location pointed to by str to the location pointed to by str. In other words, we don't really change anything. But the copy still occurs.

Now, the rest of that expression:
    while ( (...what we just analyzed) != '\0' )
simply says "do this until we copy a null character." Which makes perfect sense. Right?

So now, continuing on with the code:
    while (all of that)
    {
        if ( *(str+i) != ' ' ) i++;
    }
That simply says, "if the character we just copied *INTO* is NOT a space, then advance the value of i by one."

(I put the braces in for clarity, only.)

Okay, so now consider where we are at on the next time through this while loop:

The value of j will *ALWAYS* be incremented. So it is now 1. The value of i will be incremented *UNLESS* we copied a space. Let's assume we did NOT copy a space. So the value of i is ALSO 1.

So we loop back to the top. And we execute
    *(str + i) = *(str + j++)
which we now know is the same as
    *(str + 1) = *(str + 1)
(again, with the side effect that j is bumped by one AFTER the copy).

But C and C++ are sloppy. They allow you to treat
    str+1
the same as (and here I assume that str was declared as char *)
    (char *)( ((int)str) + 1*sizeof(char) )
That is, when you add 1 to a pointer, you are, effectively, adding ONE ELEMENT of the type pointed to by that pointer.

And so we are now copying the NEXT character from itself to itself.

But let's assume we copied a space, this time. So we will *NOT* increment i. But we know that j was already incremented. And so, when we loop to the top, again, and do
    *(str + i) = *(str + j++)
which will NOW be doing
    *(str + 1) = *(str + 2)

And so we will OVERWRITE the space with the next character.

Does this help???

Old Pedant · December 1st, 2008, 07:59 PM

This stupid forum ate my nice long answer, but I think it's important enough that I'll try again.

***************

I, personally, think that the code you show there is BAD code.

It's yet another example of C/C++ authors trying for the most compact code and, in the process, not even necessarily producing the FASTEST code. The fact that the result is hard to understand just exacerbates the situation.

Here are, I think, THREE better ways:

(1) Use a pair of pointers:

Code:

     char * src = str; 
     char * dest = str;
     while ( ( *src = *dest++ ) != '\0' )
     {
          if ( *src != ' ' ) ++src; // or src++ ... does not matter
     }

That will almost surely produce faster machine code than the original. Hard to imagine an architecture where it would be slower.

(2) If you want to use indices, USE THEM AS INDICES:

Code:

    while ( ( str[i] = str[j++] ) != '\0' )
    {
         if ( str[i] != ' ' ) ++i; // or i++
    }

By the very RULES of C/C++, that code SHOULD compile identically to the original version. But isn't it easier to understand?

(3) Stop trying to be tricky and instead write CLEARLY READABLE code!

Code:

    while ( char c = str[j++] ) 
    {
        if ( c != ' ' )
        {
            str[i++] = c;
            if ( c == '\0' ) break;
        }
    }

And you want to hear something funny? A really good optimizing compiler could probably produce code from that clearer code that matches the performance of any of the other versions.

But that's not really my point. Unless this code is going to be in the critical path of some often-occurring event, the few nanoseconds difference in the performance of any of those shouldn't really be a factor. So opt for the version that helps out the poor sap who has to come along in 3 years and maintain your code. Opt for clarity.

jabney · December 3rd, 2008, 03:16 AM

I went back and looked at my source for this chapter (the calculator program) and remembered having a similar struggle understanding the code.

To sort it out, I wrote three versions of loops that did the same thing. One of them looks nearly identical to Old Pedant's #2 version:

Code:

while( (str[i] = str[j++]) != '\0' )
{    
    if( str[i] != ' ' )
        i++;
}

For some reason it took me an hour of fiddling to get my head around it. I'm glad I spent the time however, because I can see pointer notation more clearly when I look at it now, having constructed some alternatives to understand what the code was doing.

In defense of the author, he avoids always showing the most optimal method of constructing simple code segments in favor of showing different ways of accomplishing the same thing. While this can sometimes lead to frustration, the persistent student will benefit, as he'll gain practice with alternative ways of accomplishing the same task.

I agree with OP, and prefer array notation in most circumstances -- it's clearer and easier to read (and write); however the C/C++ programmer should be comfortable with at least basic pointer arithmetic techniques. These little exercises are designed to help give that familiarity.

I settled on a for loop version, because I liked being able to initialize the iterators together for compactness:

Code:

for( int i=0, j=0; (str[i] = str[j]) != '\0'; j++ )
    if( str[i] != ' ' ) 
        i++;

I agree that tight source can create obfuscation, and so clearer code is usually desired, but this is a small function that works fine as a black box: pass it a string, and it will eat the spaces.

Plus, I was trying to think a little outside-the-box in my for loop usage. This was the first time I'd used compound initialization in a for loop, and an evaluation expression that wasn't "i is less than something."

Thanks for the code samples, Old Pedant.