Your first priority is to plug the hole that lets dirty data into your database; the second priority is to clean the existing data.
The simplest way to plug the hole is to wrap the input data in a CDATA section. Note that this means users won't be able to enter HTML markup and have it interpreted as markup; instead, <li> will appear on the browser screen as "<li>". If you want to allow users to enter HTML markup, then your best bet might be to put it through the HTMLTidy program before storing it in the database.
This is also the way to clean your data. Take each record, process it using HTMLTidy, and then write it back.
Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference