Content Filtering

In the IT world, if someone says the word “anti-sp*m”, everyone adds the word “filter” in their minds. It seems that “of course they just go together”. But why? We just cannot figure it out.

It must be something like “of course the world is flat” or “of course cigarettes are safe” or “of course General Motors is a solid investment”. Those things that “everybody knows” right up until they are shocked by the reality.

Well, content filtering of network traffic is cool technology. And, when it can help make simple, clear yes/no, good/bad, black/white decisions, it can be very useful. For instance, in data leak prevention (DLP) products, filtering can work well. If the filter “sees” a credit card number or a social security number where there should not be one, then actions can be taken. A totally deterministic process.

Even anti-virus scanning is a form of content filtering. If a virus signature is matched, block the message. Again, a totally deterministic process.

The Failure: Using Content Filtering to Determine Meaning

But using content filtering for the “semantic content” (i.e. the “meaning”) of an email message seems crazy. Sure, some content would be “obviously” bad, but most of it is open to interpretation. If you have ten people read a message, and four find it offensive, three don’t care and three really like it, what should software do ? This is NOT a deterministic process. This is why, with email, filters have failed.

Again, let’s be blunt. If email filters really worked, we would not still be having these problems (which are actually getting worse). But since filters are really just “guessing” at the meaning of a message, the bad guys keep adapting their approaches and the garbage just keeps flowing in.

What it comes down to: Who, or what, do you Trust?

It comes down to who, or what, you trust, and whether information is flowing in or out. If you truly trust your employees, then there is nothing to worry about with regard to the wrong information leaving the company. In the real world, even good people can make mistakes, so many companies are deploying DLP products because it really is all about the data.

On the incoming side, it is NOT about the data. It’s about whether you trust the sender of the data. If you do not trust them, why do you want ANYTHING they send ? And the answer is, you don’t.