View Single Post
  #13 (permalink)  
Old June 18th, 2003, 09:27 PM
David Cameron David Cameron is offline
Friend of Wrox
Join Date: Jun 2003
Location: Sydney, NSW, Australia.
Posts: 215
Thanks: 0
Thanked 0 Times in 0 Posts

Ok then, possible solutions:

1. Maybe by IP address. Most people will post from a limited range of IP addresses. If I'm posting from work there is just one IP address, if from home it is a little more difficult. Still most people will get their dialup IP address from a limited pool. The problem with this solution is how to find that IP address range. One possiblility may be to allow those people who *can* specify a few IP addresses to post be email. Also each time a person logs into the site, you could record that IP address and offer to set that as the email IP address if different from the current email IP address.

2. AFAIK headers follow a standard format, <name>: value. In email postings you could also get a number of >> in front of it. The nubmer of headers used can't be huge, so why can't generate a regexpr based on the known headers and remove them. eg ">{0:10}(From: |To: ).*\n". Furthermore you could update your list of headers from the incoming emails, which will give you a list of all the headers that are being used by differene email clients (and people using the email clients). This may not be a 100% solution, but should be worth a try. The worst problem that I can see is the chance of someone posting some text by email that looked like a header and the text being deleted.

3. Clear any lines that are prefaced by more than 2 >, and compare the earlier messages. If you get a match to a line in an earlier message, delete it.

4. Can't comment, but limiting posters to those who have accounts and if those email addresses are not shown on the site this should have some effect. Also if suggestion 1 is followed this should be less of an issue. The only danger is spammers signing up for accounts. Still I guess that is already an issue. If I cared enough I could write a perl script to automatically create a new account, login and post a message to all forums.

5. Don't know enough about the issue, but surely Out of Office messages generally conform to a pattern.

6. Ditto as above. Possibly for this and above you could add a button to the page to set the message as a new format that has slipped through. That way your filter can be "trained" with little work on your part.

7. Ditto as above + comments by Jeff.

I don't think that there will be a perfect solution, but there may be one that comes close to fitting the requirements.

David Cameron