eclark wrote: > Matt, that depends on entirely how greedy the regex in question is, and how efficient one stack is vrs another. Agreed. I would bet money that the regex evaluation in the milter is vastly insuperior to the dns resolution stack that the milter is compiled against. Agreed. Fortunately, I'm talking about 2 orders of magnitude CPU usage difference. Even if the regex eval in the milter is a factor of 2 slower than it should be, it would still end up with 50 times less CPU usage than the RBL query. I don't care how fast your DNS resolver is, or how local it is, it's still going to involve context switches. Context switches HURT. BADLY. Talk to a someone who writes schedulers sometime. Even a good scheduler is bound by how long it takes the processor to save and load context. If context switches were fast, the HZ (rate of timer interrupts and basis for when the scheduler runs) in a standard linux kernel would be about 1 million not 100 or 1000. A regex involves no context switches and no I/O, it's all memory operations in your process space. A RBL query involves at least 3 context switches and may involve network I/O. On a damn good OS and CPU, a context switch is about 1 us by itself. That's the kind of context-switch overhead folks BRAG about. In my testing even locally cached queries over lo take about 50us a pop. Moreover, network latency is a nonissue when you locally mirror RBLs inside your own network, or populate local sendmail databases on a regular basis with content pulled in an automatic fashion. True. I'm speaking mostly to the CPU usage side, not the latency. The latency numbers are so hugely different it's not even worth discussing. From here forward, unless otherwise specified, every reference to SPEED or FAST refers to EXECUTION TIME on the processor, not clock time on the wall. Also, not everyone mirrors the RBLs in question inside their own networks, so there you're making a broad, possibly false, assumption. But check my other post. Even locally cached DNS queries are NOT nearly as fast as you seem to think they are. Sure they're "fast" compared to running out over the wire. But they're not fast compared to a regex. > The bigger point is, the broader the stroke you can cut, the more you can knock out at once, which ultimately means less resource consumption, even if you are using remote network connectivity to do some of your work (as remote connectivity is limited to what got past the initial greylist in the first place). That statement is patently false. It would only be true if the two tests making the "cut" are of equal complexity. In that case, yes the broader brush works better. In this case the broader brush is only about twice as effective, and consumes about 100 times the CPU as a small handful of regexes. I can speak firsthand about the ailments that are caused by rbls; we have seen mailservers brought to their knees running sendmail RBLs as if it were nothing, with no additional filtering at all. Drop nameservice to a mailserver > , and your RBLs will wait for an extended period of time until process death, and back up the runqueue in no time. No, RBLs are not an end-all solution, and were not suggested as one. I agree. They were pointed out as a replacement for an unknown number of expensive regexs. And I contest the claim those regexes are "expensive" when compared to the RBL. Most folks who talk about how "expensive" regexes are are talking in comparison to memcmp() operations. memcmp() is essentially 1 clock cycle per 4 bytes, plus memory I/O overhead. It's hard to be faster than that. Folks who talk about how "expensive" regexes are are NOT talking in comparison to disk or network i/o, or parsing through a handful of recursive queries in a DNS resolver. You are making the fatal error of assuming the only expression being evaluated were the two pasted; I seriously doubt this is the case, and figure there are probably many more in his conf as well. Ok, fair enough. I have a lot more too. I'd still venture to guess it would take 100 regexes of similar complexity to hit the "break even" point with the RBL in terms of CPU usage. That said, I'd love to ditch some of my regexes for a RBL querry.. but not because it's faster. I'd do it because it's more accurate, and I'm willing to increase my CPU usage, and substantially increase my latency, in order to do it. At what point would you conceed that use of alternate, broader methods of checking would be superior to a list of expressions? 10 greedy regexs? 15? 5? Depends on how greedy you're talking. About 100 of my revised versions of those regexes. About 50 of the original ones. About 10 that have .* in them without much good fixed-match text at the start (ie: /a.*b/). And of course, 1 really badly written one could do it. Anyone can write a regex that is more-or-less a DOS attack by itself. Use a ton of back references, make it about 5 miles long, lots of .*'s.. yeah, it gets ugly. If replacing an entire greylisting mechanism made of 15 or more greedy expressions with one locally based hostname lookup in a mirrored in your immediate network is considered less effective, then you truely have me stumped as to what might be considered more efficient. Well given that it took me 2 weeks of work to even make a RBL-enabled build of milter-greylist that doesn't instantly segfault. And given that milter-greylist's usage of RBLs is new, and probably pretty suboptimally written. And given that I'm pretty reasonable at regex tuning. I don't use * or + and I rarely use . in my regexes for milter-greylist. Given all that, Yes. I do consider the regexes more efficient, as long as you keep their numbers down to a sane count, and keep them reasonably written. RBL queries are REALLY SLOW by comparison to short, reasonably written regexes. Period.
Message
Re: [milter-greylist] Re: Limiting resident memory usage
2006-11-03 by Matt Kettler
Attachments
- No local attachments were found for this message.