I know I've harped on EVERYONE about writing your sites to use register_globals = off, but I never addressed a more real-world (and terribly important) issue: converting an older site to use the stricter settings.
(my original FAQ is here:
http://p2p.wrox.com/archive/beginnin...2002-11/17.asp)
The problem with most register_globals = on sites is that the programmer often deliberately takes advantage of naming conflicts and variable copying order to write an easier program.
For example, if you have a form that submits some data via post, but have hyperlinks submit the same data via get, and store the user's most recent choice in the cookie, the programmer need only access $foo and assume that whichever version was submitted ($_GET['foo'], $_POST['foo'], $_COOKIE['foo']) is what's supposed to be used.
Herein lies the root of the problem -- the programmer mistakenly views his own laziness as cleverness and elegance, and a security risk is born.
When going through books and tutorials that are written using register_globals, go through the application as it's written and absorb the author's original intent. In essence, converting a site to conform with a stricter configuration is like translating a story from one language to another. You need to understand the gist of the story before you can dive in and start rewriting it; many phrases have multiple meanings and understanding context is essential to getting the translation correct.
When you're confident you understand what's going on, turn register globals off. Turn error reporting to E_ALL. Make a note of the errors you see. Go through the code and find the lines where all the errors occur.
Pretty much all of your undefined index or undefined variable warnings are caused by two things --
1) The variable is a global variable and assumed to be copied from GET, POST, COOKIE, SESSION, or ENV superglobals.
2) The variable appears in a conditional expession to test for it's existence.
Let's go over each of them in detail.
1) The variable is a global variable and assumed to be copied from GET, POST, COOKIE, SESSION, or ENV superglobals.
The first case is difficult to handle because the programmer might have intended the variable to come from a variety of sources. The simplest workaround somewhat defeats the purpose of turning off register_globals -- simply copy all your request variables into global scope in the order the original programmer intended. This can be risky, though, as it reopens some of the security holes that turning register_globals = off was intended to fix.
A much more difficult task is to refactor the original code to be much more explicit. If a variable can come from multiple locations, then test only for the location that makes sense in the context of the original variable.
For example, look at this simple logged-in validity check:
if ($logged_in)
You don't want to check $_GET['logged_in'] to see if a user session has their logged_in flag set to true. Also, you'll want to check that the value of $logged_in is a boolean true, not just a "not-false" value.
if (isset($_SESSION['logged_in']) && ($_SESSION['logged_in'] === true))
Here's a tricker example -- suppose you have a form that retrieves data from a database and displays the data in a table with sortable columns. The user can submit their "sort by" column in the POST form, but can change their sort by clicking on the column headers, which are links that set the new sort column via GET. Finally, the user's last sort method is stored in their session data, so they can browse through multiple pages of results without losing the proper sort.
A lazy programmer would just use a single variable, $sort_by, to handle all three scenarios. I must admit that it seems elegant and harmless to keep things this way, but we can't do it. This is one place where it seems harmless to simply copy the "sort_by" variable from your request input.
However, we should be a little more careful to make sure that any incoming values are valid. If we generate the form input fields and header columns, then we should know in advance what the available sort_by values are. Suppose we put these in an array called $columns. We use an if/else chain to test whether or not a valid sort_by was passed in by a form or link, and reset the session value if it has.
After we test and process incoming form data, we run the query. At this time, we ONLY need to check the session for a "sort_by" to see if the user has ever selected a sort method -- it's guaranteed to be his/her latest choice.
if (isset($_POST['sort_by']) && in_array($_POST['sort_by'], $columns))
{
$_SESSION['sort_by'] = $_POST['sort_by'];
}
else if (isset($_GET['sort_by'] && in_array($_GET['sort_by'], $columns))
{
$_SESSION['sort_by'] = $_GET['sort_by'];
}
2) The variable appears in a conditional expession to test for it's existence.
The second case works when NOTICE level warnings are suppressed because PHP creates a variable with a default value if it is accessed before it's initialized. This value depends on the context in which the variable is used: boolean (false), number (0), string (""), array (empty array), or just NULL.
This means that many lazy programmers use this line to test if $foo exists.
if ($foo)
One problem is that $foo can exist with a value that evaluates to false when converted to boolean context. Be as specific as you need to be when writing your conditional expressions -- many times, testing for existence isn't enough.
if (isset($foo)) // existence only
if (isset($foo) && ($foo != '')) // exists and not empty string
// more examples:
if (isset($foo) && !empty($foo))
if (isset($foo) && is_array($foo) && !empty($foo))
Best of luck!
Take care,
Nik
http://www.bigaction.org/