A few months ago I started a pretty ambitious project that involved writing a new PHP framework from scratch. One of the driving prinicipals of the framework was that it was to feature the ability to create "clean URLs", but use a single PHP file to process all requests. The goals of the project were:
- Only files not found in the normal file system get processed by my PHP framework
- Query strings must still work and be available
- The document cannot return a 404 error on successful requests, as this could affect search indexing and the like
After some snooping around I came to the consensus that there were two methods of acheiving this. The first was via 404 error pages and the second via the Apache mod_rewrite module. At first the 404 method seemed like the easiest method, as I had no experience with mod_rewrite. So I began writing my framework using the 404 method.
In Apache it is trivial to specify a custom 404 errror document. In httpd.conf, you can do something like this either globaly or within your virtual host configuration.
Code:
ErrorDocument 404 /index.php
ErrorDocument 500 /index.php
ErrorDocument 401 /index.php
ErrorDocument 400 /index.php
ErrorDocument 503 /index.php
ErrorDocument 403 /index.php
A few things became apparent immediately. Apache does not supply POST or GET data to the 404 error document, this means that in PHP the $_GET and $_POST variables aren't populated when you post to a document that doesn't exist.
You may be asking yourself, well since you're creating clean URLs why do you need query strings? I still want the option to use query strings, if I need them. Certain internal functionality can still be done with query strings, while I can retain the benefit of clean URIs for the majority of my static content.
I found a work-around for GET data. That data can be found in $_SERVER['REQUEST_URI'], no work-around for the POST method though. So for a while I was content with using query strings, when it was required, and posting directly to the index.php page when I needed to send POST data.
Later, after building a couple AJAX-enabled applications, I realized that there were more problems with this method. These problems seemed to center around redirecting to the 404 error document. They persisted even after explicitly setting the HTTP status heading in the response headers of the PHP document as:
Code:
Status: 200 OK HTTP/1.1
My AJAX applications failed in certain browsers. Opera was especially bad, as nothing worked. Explorer was buggier than usual.
Despite these woes, another problem manifested itself. In one of my AJAX applications I included download links to files stored in my Framework's Virtual File System (by creating a clean-url driven application, I am essentially creating my own filesystem). These links failed in Explorer, Opera, and Mozilla 1.8 Beta! It seems that something I wasn't seeing was being sent in the HTTP response headers. Viewing the response headers didn't reveal any new information, and I was stuck at an impasse. Clearly the 404 method wasn't going to work out. Time to revisit Apache mod_rewrite.
I read about Apache's mod_rewrite early on in the development of my framework. Essentially, it exists for the purpose of mapping a URL to another file on the server (hence its name). The
mod_rewrite docs were informative, but they didn't shed much light onto the approach that I wanted to take. Through a few Google searches and a few more reviews of the docs, I managed to get a working solution.
mod_rewrite must be enabled in httpd.conf. mod_rewrite is included in the Windows Apache default installation (but must be enabled), and must be compiled with Apache on Linux.
After some tweaking, I came up with the following (which must appear
after DOCUMENT_ROOT is set):
Code:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteLog logs/rewrite.log
RewriteLogLevel 0
# If the REQUEST_FILENAME does not exist as a file or directory
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !-f
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !-d
RewriteRule ^(.*) /index.php?%{QUERY_STRING} [L]
</IfModule>
RewriteEngine On activates mod_rewrite. You can specify a log for debugging as well to see what URIs fit the critera in the conditions, and which don't. Then comes the real magic; I specify two conditions that decide whether or not the subsequent rule is to be executed. When there are two or more conditions, as I have here, it is the same as binding them together with a logical '&&' or 'AND' operator. Additionally, an implicit [OR] can be supplied after the first condition to alter the logic to '||' or 'OR' (see
mod_rewrite docs for syntax). The rule says that if the file in the document root path does not exist and there is no directory that exists by that name either and the request string contains any characters of any length, send the request to index.php. Therefore
http://www.example.com/any/directory/anypage.html can be sent to index.php where I can analyze the path and output accordingly.
This will also replace any 404 document, if the request cannot be found within the framework, an explicit
Code:
Status: 404 Not Found
can be supplied since it is now impossible to invoke normal 404 error handling. A small tradeoff, IMO, as I can log my own 404 errors and react appropriately.
With this in place I can now make up any URL I like, and fully create my own database-driven virtual file system using PHP and MySQL.
Regards,
Rich
--
[
http://www.smilingsouls.net]
Mail_IMAP: A PHP/C-Client/PEAR solution for webmail
Author: Beginning CSS: Cascading Style Sheets For Web Design