manu@... (Emmanuel Dreyfus) writes:
>> > 2) ensure regex are matched on the address before it is truncated
>> Sounds good, and is probably the best choice. Doing this would allow to
>> store hashes in the database: the comparisions with settings in the
>> configuration file (rcpt/from) happens on the real addresses (pointers
>> given by the milter-interface), and the lookup in the database is done
>> with hashes.
>>
>> This would make milter-greylist very efficiently:
>
> I have no idea how costly is cryptographic hash computing. Are you sure
> it won't consume more CPU if we go that way?
My P4 2.6 calculates the md5sum of 1GiB data in 5 seconds and the sha1sum
in 15 seconds:
| $ dd if=/dev/zero bs=1024k count=1024 | time md5sum
| 4.62user 0.51system 0:05.24elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
|
| $ dd if=/dev/zero bs=1024k count=1024 | time sha1sum
| 14.56user 0.40system 0:15.18elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
For milter-greylist, we will calculate the checksum of 50-200 bytes of
data per connection. Somewhere I read about 200k connections/day, which
is still far behind these 1GiB above, so the costs for md5/sha1 should
not matter.
>> * generate *one* hash over the entire triple, e.g.
>> | sha1("%s#%s#%s", relay, from, rcpt) --> 40 bytes ASCII resp. 20 bytes binary
>
> You can't do that: you loose the ability to perform subnet matching.
Mmh... perhaps I am missing here something, but why not do
| relay &= mask;
*before* calculating the checksum? E.g. '10.0.0.0' will be used in
hash-calculation for both 10.0.0.1 and 10.0.0.2 when using '-L 24'.
> At most, you can hash from and rcpt addresses, but not the IP. Or
> you store the IP next to the hash, and each time the config file is
> changed, you renegerate the hashes.
Yes, you will loss the auto-whitelist when changing the '-L' parameter, but
since such changes are an exception and greylisting is self-generating, it
should not be a problem.
>> - I would recommend sha1 over md5 since it is more collision resistent
>
> I was wondering what was the best hash to use here. I suspect we don't care
> about the security usage of hashes (ie: it is difficult to create two identical
> hashes with different data on purpose),
With relatively high costs a spammer could generate some addresses
with the same hash. But indeed, I do not see how this can be used to
circumvent greylisting.
>> * do a binary search over these lists (insert is more expensive than
>> the current linked-list implementation, but there should be far more
>> 'lookup' than 'insert' operations)
>
> I thought about this too, but before going that path, we need to measure
> how long a lookup is. It's useless to optimize something that is low
> compared to other problems. I suspect the database dump to be the biggest
> CPU hog.
ACK; but the delay-option should reduce the impact.
> Would you like to run some tests?
I fear, that I can not present representative results since I have only
a low mailtraffic (50 people company, some maillists, most frequently
connecting relays are white-listed). For today, I get
(1) neither in pending nor auto-white: 2 lookup() & 1 insert() & 0 remove() --> 192 times
(2) in pending, but still in timeout: 1 lookup() & 0 insert() & 0 remove() --> 66 times
(3) in pending, reached timeout: 2 lookup() & 1 insert() & 1 remove() --> 66 times
(4) already in auto-whitelist: 2 lookup() & 0 insert() & 0 remove() --> 129 times
1511 entries in auto-whitelist, 230 in pending. I do not have numbers
about expiration of database entries.
Results were gathered with
| # grep 'please come back in' /var/log/maillog | \
| grep <initial-timeout> | wc -l ## --> (1)
| # grep 'please come back in' /var/log/maillog | \
| grep -v <initial-timeout> | wc -l ## --> (2)
| # grep 'X-Greylist:' /var/log/maillog | wc -l ## --> (3)
| # grep 'for more' /var/log/greylist | wc -l ## --> (4)
>> Advantages:
>> * enhanced privacy
>
> I disagree with the last advantage: the log file is only readen by the
> administrator, and the ability to easily debug things is probably more
> useful than the ability to hide what was greylisted to the sysadmin.
I am not an expert in privacy law but some people could interpret
'greylist.db' as a list of profiles which exceeds common logging.
EnricoMessage
Re: [milter-greylist] Re: is this a DoS?
2004-06-01 by Enrico Scholz
Attachments
- No local attachments were found for this message.