Yahoo Groups archive

Milter-greylist

Index last updated: 2026-04-28 23:32 UTC

Message

Re: [milter-greylist] Re: is this a DoS?

2004-06-01 by Enrico Scholz

manu@... (Emmanuel Dreyfus) writes:

>> > 2) ensure regex are matched on the address before it is truncated
>> Sounds good, and is probably the best choice. Doing this would allow to
>> store hashes in the database: the comparisions with settings in the
>> configuration file (rcpt/from) happens on the real addresses (pointers
>> given by the milter-interface), and the lookup in the database is done
>> with hashes.
>> 
>> This would make milter-greylist very efficiently:
>
> I have no idea how costly is cryptographic hash computing. Are you sure
> it won't consume more CPU if we go that way?

My P4 2.6 calculates the md5sum of 1GiB data in 5 seconds and the sha1sum
in 15 seconds:

| $ dd if=/dev/zero bs=1024k count=1024 | time md5sum
| 4.62user 0.51system 0:05.24elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
|
| $ dd if=/dev/zero bs=1024k count=1024 | time sha1sum
| 14.56user 0.40system 0:15.18elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k


For milter-greylist, we will calculate the checksum of 50-200 bytes of
data per connection. Somewhere I read about 200k connections/day, which
is still far behind these 1GiB above, so the costs for md5/sha1 should
not matter.


>> * generate *one* hash over the entire triple, e.g.
>>   | sha1("%s#%s#%s", relay, from, rcpt)  -->  40 bytes ASCII resp. 20 bytes binary
>
> You can't do that: you loose the ability to perform subnet matching.

Mmh... perhaps I am missing here something, but why not do

| relay &= mask;

*before* calculating the checksum? E.g. '10.0.0.0' will be used in
hash-calculation for both 10.0.0.1 and 10.0.0.2 when using '-L 24'.


> At most, you can hash from and rcpt addresses, but not the IP. Or
> you store the IP next to the hash, and each time the config file is
> changed, you renegerate the hashes.

Yes, you will loss the auto-whitelist when changing the '-L' parameter, but
since such changes are an exception and greylisting is self-generating, it
should not be a problem.


>>   - I would recommend sha1 over md5 since it is more collision resistent
>
> I was wondering what was the best hash to use here. I suspect we don't care
> about the security usage of hashes (ie: it is difficult to create two identical
> hashes with different data on purpose),

With relatively high costs a spammer could generate some addresses
with the same hash. But indeed, I do not see how this can be used to
circumvent greylisting.



>> * do a binary search over these lists (insert is more expensive than
>>   the current linked-list implementation, but there should be far more
>>   'lookup' than 'insert' operations)
>
> I thought about this too, but before going that path, we need to measure
> how long a lookup is. It's useless to optimize something that is low 
> compared to other problems. I suspect the database dump to be the biggest
> CPU hog.

ACK; but the delay-option should reduce the impact.


> Would you like to run some tests?

I fear, that I can not present representative results since I have only
a low mailtraffic (50 people company, some maillists, most frequently
connecting relays are white-listed). For today, I get

(1) neither in pending nor auto-white: 2 lookup() & 1 insert() & 0 remove() --> 192 times
(2) in pending, but still in timeout:  1 lookup() & 0 insert() & 0 remove() -->  66 times
(3) in pending, reached timeout:       2 lookup() & 1 insert() & 1 remove() -->  66 times
(4) already in auto-whitelist:         2 lookup() & 0 insert() & 0 remove() --> 129 times

1511 entries in auto-whitelist, 230 in pending. I do not have numbers
about expiration of database entries.


Results were gathered with

| # grep 'please come back in' /var/log/maillog  | \
|      grep    <initial-timeout> | wc -l              ## --> (1)
| # grep 'please come back in' /var/log/maillog  | \
|      grep -v <initial-timeout> | wc -l              ## --> (2)
| # grep 'X-Greylist:' /var/log/maillog | wc -l       ## --> (3)
| # grep 'for more' /var/log/greylist | wc -l         ## --> (4)


>> Advantages:
>> * enhanced privacy
>
> I disagree with the last advantage: the log file is only readen by the
> administrator, and the ability to easily debug things is probably more
> useful than the ability to hide what was greylisted to the sysadmin.

I am not an expert in privacy law but some people could interpret
'greylist.db' as a list of profiles which exceeds common logging.



Enrico

Attachments

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.