View Single Post
  #18 (permalink)  
Old June 19th, 2003, 08:42 AM
Hal Levy Hal Levy is offline
Friend of Wrox
Points: 3,489, Level: 24
Points: 3,489, Level: 24 Points: 3,489, Level: 24 Points: 3,489, Level: 24
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
Join Date: Jun 2003
Location: Central, NJ, USA.
Posts: 1,102
Thanks: 0
Thanked 2 Times in 2 Posts

quote:Originally posted by David Cameron
1. Maybe by IP address. Most people will post from a limited range of IP addresses. If I'm posting from work there is just one IP address, if from home it is a little more difficult. Still most people will get their dialup IP address from a limited pool. The problem with this solution is how to find that IP address range. One possiblility may be to allow those people who *can* specify a few IP addresses to post be email. Also each time a person logs into the site, you could record that IP address and offer to set that as the email IP address if different from the current email IP address.
You would rather connect to the web site to validate your IP address than use PGP to sign the e-mail that your sending? I was also thinking that perhaps we could have e-mail that is not identified attached to the user account sending it- and requiring a quick login to the web site to confirm the posting (or generate an e-mail to the poster to let them confirm via e-mail). Kind of self-moderation.

2. AFAIK headers follow a standard format, <name>: value. In email postings you could also get a number of >> in front of it. The nubmer of headers used can't be huge, so why can't generate a regexpr based on the known headers and remove them. eg ">{0:10}(From: |To: ).*\n". Furthermore you could update your list of headers from the incoming emails, which will give you a list of all the headers that are being used by differene email clients (and people using the email clients). This may not be a 100% solution, but should be worth a try. The worst problem that I can see is the chance of someone posting some text by email that looked like a header and the text being deleted.
Yes, headers should have a pretty standard format and we should be able to catch most of them with a regex.

3. Clear any lines that are prefaced by more than 2 >, and compare the earlier messages. If you get a match to a line in an earlier message, delete it.
As you said in another message- this is complex because of line breaking. We also need to deal with outlooks love of the "indented" reply and filtering that as appropriate. For that matter, we need to figure out how to filter all the HTML garbage that Outlook (and others) add to the e-mail.

quote:4. Can't comment, but limiting posters to those who have accounts and if those email addresses are not shown on the site this should have some effect. Also if suggestion 1 is followed this should be less of an issue. The only danger is spammers signing up for accounts. Still I guess that is already an issue. If I cared enough I could write a perl script to automatically create a new account, login and post a message to all forums.
I agree- #1 covers this greatly unless a spammer signs up for an account. I would say the best way to prevent that is to use the system used elsewhere - create a graphic that needs to be human-read and entered into the form to create an account.

quote:5. Don't know enough about the issue, but surely Out of Office messages generally conform to a pattern.
Honestly, I have no idea if they do or don't someone want to investigate this?

quote:6. Ditto as above. Possibly for this and above you could add a button to the page to set the message as a new format that has slipped through. That way your filter can be "trained" with little work on your part.

7. Ditto as above + comments by Jeff.
I agree that 4, 5, 6, and 7 merge together into "Filtering Unwanted messages"
I agree that problem #4 goes away If we implement #1 well.

I'd really like to come up with a way to meet Requirement #1 in a way that everyone is happy.

Hal Levy
Daddyshome, LLC
NOT a Wiley/Wrox Employee