Email Filtering:
Killing the Killer App


About

Meetings

Articles

Links

Contact

Join

Forums

Email Filtering: Killing the Killer App

by Geoff Duncan geoff@tidbits.com

One of the things I handle behind the scenes for TidBITS is bounce management: the tedium of figuring out which addresses should be removed from our various mailing lists due to delivery errors. We consider maintaining "clean" mailing lists part of running an email-based publication responsibly: just as we don't want to send TidBITS to people who don't want it, we don't want to waste bandwidth, effort, or time (for us or anyone else) trying to deliver TidBITS to addresses which aren't accepting it. I can't claim there are no undeliverable addresses on our mailing lists - that's an impossible goal - but we try to run a tight ship. And it's necessary work: Internet access providers regularly shut down, are acquired, and change their names; and - if our experience is any indicator - people simply abandon (or are forced to abandon) email addresses far more often than they unsubscribe from mailing lists. So we get lots of bounces.

I briefly outlined TidBITS's bounce management process in "Not Your Grampa's Mailing List" back in TidBITS-420, and although some of the details have changed, the idea remains the same. Basically, a custom tool I wrote ferrets out bouncing email addresses from the collection of bounces we receive each week, determining whether an address is eligible for removal based on the number and types of errors that come back over a particular period. Different lists have different removal criteria: it might take four to eight weeks of errors for an address to be removed from the main TidBITS list (which only sends a message once a week), while addresses would be removed from a discussion list like TidBITS Talk more quickly (although a higher number of errors would be required).

<http://db.tidbits.com/ getbits.acgi? tbart= 04761>

In the last year or so, we've noticed a new trend: some weeks, we get errors from hundreds (or even thousands) of subscribers whose servers refuse delivery of TidBITS issues. On the heels of these errors, we usually receive a flurry of complaints: "Why didn't I get this week's issue?" or "Please fix my subscription - I didn't get TidBITS today but your system says I'm still on the list!"

The reason for these errors is that from time to time, some email systems conclude that TidBITS is spam or - worse - an email-borne worm or virus. These email systems are utterly wrong - TidBITS is never sent to any address that has not subscribed, and an issue of TidBITS has never contained a worm or virus - but they serve to highlight some interesting points:

  • Email is increasingly being filtered for its content;
  • That filtering is often being done without the knowledge or consent of affected users;
  • Over time, inaccurate filtering will substantially reduce the general utility of email.

In short, we're starting to see signs that email, often hailed as the Internet's "killer app," is in danger of becoming an unreliable, arbitrarily censored medium - and there's very little we can do about it.

Them's Spam-Fighting Words! What causes some email systems to misinterpret TidBITS as spam or malicious email? I can't be specific here - or thousands of subscribers will never receive this TidBITS issue! - but I can point to some recent examples:

  • Jeff Carlson's article on the Palm i705 in TidBITS-635 made a passing reference to a well-known Pfizer drug for men, technically known as sildenafil citrate. Our mail error logs indicate over 2,500 TidBITS issues were rejected by over 1,000 sites because they contained the drug's name; many of the rejections were from relatively high-profile sites like the Association for Computing Machinery (ACM) and VeriSign. (Even leaving aside errors which cited that particular word, we received a substantially above-average number of errors for the week, which probably puts the total closer to 4,000 rejected issues, or about 10 percent of that week's mailing). <http://db.tidbits.com/ getbits.acgi? tbart= 06856>
  • Adam's article on bandwidth limitations on Apple's Mac.com service in TidBITS-634 caused TidBITS to be rejected as a worm by approximately 250 sites because it contained the proper name of Apple's Web page hosting service and the words "my" and "pictures" in succession.
    <
    http://db.tidbits.com/ getbits.acgi? tbart= 06851>
  • In a particularly bizarre example, approximately 180 mail servers rejected TidBITS issues containing Matt Neuburg's articles on Unicode under Mac OS X, seemingly because the title of his articles named a particular fruit and the text contained the words "keystroke" and/or "keycode." <http://db.tidbits.com/ getbits.acgi? tbser= 1217>
  • Adam's article in TidBITS-618 on copyright caused issues to be rejected by approximately 120 servers because it mentioned the name of a well-known peer-to-peer music swapping service and the name of a pop music group. <http://db.tidbits.com/ getbits.acgi? tbart= 06729>
  • Adam's article "A Couple of Cool Concepts" caused TidBITS-616 to be rejected by over 1,100 sites because it sarcastically referred to an advertising campaign for a particular type of wireless video camera. Still other sites rejected it because it contained the word "undress" and another word describing a hair color. <http://db.tidbits.com/ getbits.acgi? tbart= 06720>

Filter Me Timbers -- It's important to note that these TidBITS issues are being rejected by mail servers - typically run by businesses, organizations, or ISPs - rather than by individual mail clients like Eudora or Outlook Express. Current email programs can process incoming mail in any number of ways, and there's no way to prevent users from intentionally - or unwittingly - creating a rule or filter which marks TidBITS as spam and deletes it outright. In fact, publications like TidBITS have run afoul of client-side filtering such as that included in Microsoft's Outlook Express and Entourage. <http://db.tidbits.com/ getbits.acgi? tbart= 05647>

Although the utter opacity of tools like Microsoft's Junk Mail Filter somewhat belies this distinction, the crucial difference between client-side mail filtering and server-side mail filtering is that the former are largely under the control of individual email users, while the latter are typically governed by organizational policy. In an organization, this may mean only one or two people in charge of thousands of email accounts determine what mail will or won't be accepted in the organization, and there's often no way for users to determine whether or how their email is being filtered

For instance, the servers which rejected Adam's article on Mac.com services largely did so because they were running particular commercial anti-virus packages, and those organizations trusted those products would not reject legitimate email. Obviously, they were wrong. On the flip side, every copy of TidBITS-601 sent to subscribers at a large aerospace company (whose name sounds like "boing!") was rejected because it contained a particular URL; apparently, an email administrator somewhere within this organization of tens of thousands of people decided that any email message containing that URL should be rejected outright. Ironically, the offending URL was owned by a company that counts the aerospace company among its clients. Oops.

Senseless Censors
It's hard to argue with the practical necessity of filtering email, given the tremendous amount of spam clogging the Internet. (A company that provides an anti-spam filtering service to large organizations, Brightmail, estimates that the amount of spam has gone up by 600 percent this year.) The costs of spam are quite real in terms of storage, bandwidth, and processing power, not to mention vast amounts of human time deleting, filtering, identifying, and cleaning up after spam. There's no denying administrators are trying to save time, trouble, and (in some cases) actual harm by assaying email before it gets to users's desktops. Even TidBITS performs some very basic filtering on incoming mail, and I'm more aggressive with mail filtering on my business's servers. <
http://www.brightmail.com/>

The thing to remember is that, like Web content filtering, email content filtering is at best unintelligent and arbitrary. A rule which seems perfectly sensible to reject spam regarding long distance telephone service may have the unintended consequence of rejecting all email from your Aunt Tillie, simply because Aunt Tillie's Internet provider has IP numbers which contain a subset of a spammer's advertised phone number. (That's a real problem one of my clients encountered - although Aunt Tillie's name has been changed.) Similarly, a rule designed to screen out promotions for adult Web sites might prevent a user from participating in a breast cancer support group's mailing list. It's easy to come up with countless examples where blocking mail based on specific words, terms, and phrases in email can do the wrong thing.

As much as on-target filtering might save administrators and users time, money, and trouble, filtering that backfires also has direct costs. Part of that cost is passed off to the sender whose email has been improperly identified: every time spam filtering hits TidBITS, I get to track the problem down, deal with email administrators, and assuage irritated subscribers. (That's time I could be spending - should be spending - doing useful things like writing articles or improving TidBITS services.) Part of the cost also stays with the organization doing the filtering, largely to support users who didn't receive expected email or dealing with remote administrators like me to figure out what's going wrong. Misfiring filters reduce the utility of email for all involved.

Put a Sock In It
We've sometimes tried to avoid words and terms in TidBITS that might trigger overly broad content filters. (Here "we" mostly means "me," because I'm the staff member most familiar with the email errors and problems TidBITS encounters.) For instance, we changed portions of Dan Kohn's "Steal This Essay" series to omit a term describing adult materials (it starts with the letter P and rhymes with "corn"), and lately hardly a week goes by where we don't make changes to an issue to avoid phrases and terms which have set off overly aggressive filters. Recently self-censored articles include Adam's series on converting to Mac OS X, "Corrupt Audio Disks Stick in Mac's Craw" in TidBITS-631, "Goodies from Kensington" in TidBITS-630, "Mac OS X: Curse of the New" in TidBITS-629, and "Was Bill Gates Lying?" in TidBITS-628. These articles run the gamut of everything TidBITS covers from analysis and commentary to news and reviews. As you've noticed, in this article I'm also trying to avoid terms or sequence of words which have caused TidBITS to be rejected.

To a degree, publishing offensive or controversial terms is a judgment call: is the editorial value worth the potential backlash and arbitrary rejection of TidBITS? But when we reach a point where TidBITS cannot mention the name of Apple's Web hosting service in the same issue as a phrase such as "my" followed by "pictures" without confusing hundreds of readers and committing (already limited) staff hours to sorting out the problem, a line has been crossed. When TidBITS cannot publish the name of a common fruit in the same issue as a word like "keystroke," mention a type of medication even in passing, or discuss a well-known online advertising campaign, we've exited the Realm of the Reasonable and landed squarely on Planet Preposterous.

All Done Now
There's no way TidBITS can hope to self-censor against these types of mishaps: the terms and phrases are simply too arbitrary and unpredictable. Maybe tomorrow someone will release a new Windows worm, and commercial anti-virus software will start blocking all email containing the words "stopwatch" and "banana." (If you didn't get this issue as expected via email, maybe that's why!)

As a result, there's no way we can make reasonable assurances TidBITS will be able to reach you via email: we simply have no way of knowing what you or your provider might consider content non grata. We will continue to make reasonable efforts to avoid controversial or offensive terms, and may "dress up" such terms in ways so they are likely to get by some types of email filtering. We will not, however, refrain from publishing commentary about topics that are likely to set off spam filters: that's knuckling under to the email administrators who - probably unintentionally - have caused this situation. And although all discussions of true censorship and freedom of the press are generally only relevant in relation to the government, if this sort of content filtering continues to become more prevalent, there will be no freedom of speech through email.

So here's what you should do. If TidBITS doesn't arrive when you expect in email, first check our Web site to make sure the issue was published (we do take a couple of issues off each year). Then send email to <tidbits@tidbits.com>, which should always return the current issue, probably within minutes. If it hasn't arrived in an hour or two, it's a good bet that whoever manages your email server has a foolish content filter in place that we've failed to anticipate in our use of the English language. (If this requested issue does arrive, it's more likely that there were communication problems between our servers and yours that have cleared up since we sent the first copy.) The next step is to ask your email administrator - nicely - if they are performing content filtering on incoming email because you haven't received mail you expected. You may wish to ask them to remove their content filtering for all the reasons mentioned above: feel free to point them at this article. These actions won't solve the larger problem, but it might make administrators think a little harder about the impacts of email filtering.

If all else fails, you subscribe to the announcement version of TidBITS, which delivers a brief email message containing an abstract of the issue and a table of contents with links to articles on the Web. Because the announcement version of TidBITS doesn't contain the full text of the issue, it has a good chance of passing through content filters. <http://www.tidbits.com/ about/ list.html>


Home | About | Meetings | Links | Contact | Join | Forums

Wellington Macintosh Society Inc. 2002