These classnotes are depreciated. As of 2005, I no longer teach the classes. Notes will remain online for legacy purposes

UNIX03/DCC - Distributed Checksum Clearinghouse

Classnotes | UNIX03 | RecentChanges | Preferences

The DCC or Distributed Checksum Clearinghouse is currently a system of many clients and more than 120 servers that collect and count checksums related to several million mail messages per day, most as seen by Internet Service Providers. The counts can be used by SMTP servers and mail user agents to detect and reject or filter spam or unsolicited bulk mail. DCC servers exchange or "flood" common checksums. The checksums include values that are constant across common variations in bulk messages, including "personalizations."

There are graphs of recently detected spam you can see here:

The idea of the DCC is that if mail recipients could compare the mail they receive, they could recognize and deal with unsolicited bulk mail. A clearinghouse server totals reports of checksums of messages from clients and answers queries about the total counts for individual checksums. Each recipient decides independently how to handle each bulk message. A DCC client reports and asks about the total counts for several different checksums for a mail message in each transaction. If one of the totals for a message is higher than a threshold set by the client, a DCC client that is part of an SMTP server can log, discard, or reject the message. DCC clients that are parts of mail user agents can discard, file, or score messages based on their "bulkiness."

Unless used with isolated servers and so losing some of its power, the DCC does cause some additional network traffic. However, the client-server interaction for a mail message consists of exchanging a single pair of UDP/IP datagrams. That is often less than the several pairs of UDP/IP datagrams required for a single DNS query. Most SMTP servers make at least one DNS query for every message to check the envelope Mail_From value and often several more.

There is a small security concern here. Because DCC uses UDP/IP datagrams, they are suspect to many different types of spoofing and forgery (see section 5.2 of the book). However, with a properly set up content filtration system (like we will be setting up here) the worst thing an compromised DCC could do is allow more spam (or less legitimate mail) through.


Classnotes | UNIX03 | RecentChanges | Preferences
This page is read-only | View other revisions
Last edited June 7, 2003 12:18 am (diff)
(C) Copyright 2003 Samuel Hart
Creative Commons License
This work is licensed under a Creative Commons License.