Yahoo Groups archive

Milter-greylist

Index last updated: 2026-04-28 23:32 UTC

Thread

thread leak

thread leak

2014-02-11 by Emmanuel Dreyfus

Hello everybody

Sometimes, milter-greylist gets overloaded, and stops answering requests
in time. sendmail  says it goes "to error state", and I have to restart
milter-greylist to get it working again.

Today I looked closly at the problem and found that the frozen 
milter-greylist had more than 1600 threads.

Anyone experienced thread leakage with libmilter?

-- 
Emmanuel Dreyfus
manu@...

Re: [milter-greylist] thread leak

2014-02-11 by Peter Bonivart

On Tue, Feb 11, 2014 at 4:18 PM, Emmanuel Dreyfus <manu@...> wrote:
> Hello everybody
>
> Sometimes, milter-greylist gets overloaded, and stops answering requests
> in time. sendmail  says it goes "to error state", and I have to restart
> milter-greylist to get it working again.
>
> Today I looked closly at the problem and found that the frozen
> milter-greylist had more than 1600 threads.
>
> Anyone experienced thread leakage with libmilter?

I've used milter-greylist for many years on both Solaris and RHEL and
both can sometimes go to error state combined with the process
consuming huge amounts of memory or simply crashing. A restart of the
process always works. Nowadays I don't see it often though, I usually
have many months of update so it's time to reboot the servers due to
patching anyway.

/peter

RE: [milter-greylist] thread leak

2014-02-12 by Bruncsak, Attila

> Today I looked closly at the problem and found that the frozen
> milter-greylist had more than 1600 threads.
> 

How much sendmail (or postfix) process
 had you on the system at that time?
Around 1600 or much less?

Re: [milter-greylist] thread leak

2014-02-12 by manu@...

Bruncsak, Attila <attila.bruncsak@...> wrote:

> How much sendmail (or postfix) process
>  had you on the system at that time?
> Around 1600 or much less?

Much less, of course.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

RE: [milter-greylist] thread leak

2014-02-12 by Bruncsak, Attila

> > How much sendmail (or postfix) process
> >  had you on the system at that time?
> > Around 1600 or much less?
> 
> Much less, of course.
> 
Are the threads normal working threads created in libmilter?

Re: [milter-greylist] thread leak

2014-02-12 by Johann Klasek

On Tue, Feb 11, 2014 at 03:18:48PM +0000, Emmanuel Dreyfus wrote:
> 
> Sometimes, milter-greylist gets overloaded, and stops answering requests
> in time. sendmail  says it goes "to error state", and I have to restart
> milter-greylist to get it working again.
> 
> Today I looked closly at the problem and found that the frozen 
> milter-greylist had more than 1600 threads.

Is there any hint what these threads are doing?
What says 
pstack PID_OF_MG_PROCESS
?

Re: [milter-greylist] thread leak

2014-02-12 by manu@...

Johann Klasek <johann@...> wrote:

> Is there any hint what these threads are doing?

All sleeping. I suspect libmilter fails to track threads and leaves some
behind.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] thread leak

2014-02-12 by manu@...

Bruncsak, Attila <attila.bruncsak@...> wrote:

> Are the threads normal working threads created in libmilter? 

It seems they are: 1600 sleeping threads. That's odd.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] thread leak

2014-02-12 by Johann Klasek

On Wed, Feb 12, 2014 at 08:07:54PM +0100, manu@... wrote:
> Johann Klasek <johann@...> wrote:
> 
> > Is there any hint what these threads are doing?
> 
> All sleeping. I suspect libmilter fails to track threads and leaves some
> behind.

I think libmilter does not actually tracking his threads. They are
created/cloned to process a SMTP-session and self-terminate later ...

Have you a sample for a backtrace of your threads? Are they all the
same (beside the organizational ones)?

With Linux Fedora 16 most of the workers looks like this:

Thread 2 (Thread 0x7f6f2cc13700 (LWP 21649)):
#0  0x00000036df0e8283 in select () from /lib64/libc.so.6
#1  0x000000000041fbd5 in mi_rd_cmd ()
#2  0x000000000041f3ec in mi_engine ()
#3  0x000000000041c478 in mi_handle_session ()
#4  0x000000000041b129 in mi_thread_handle_wrapper ()
#5  0x00000038a8807d90 in start_thread () from /lib64/libpthread.so.0
#6  0x00000036df0eeddd in clone () from /lib64/libc.so.6

beside the dumper, sync_master, sync_sender and Signaling thread.

RE: [milter-greylist] thread leak

2014-02-13 by Bruncsak, Attila

> > Is there any hint what these threads are doing?
> 
> All sleeping. I suspect libmilter fails to track threads and leaves some
> behind.
> 

Did you had libmilter compilation option defined "_FFR_WORKERS_POOL"  ?
(FFR: for future release)
By the way, your libmilter is coming from which version of sendmail?

Re: [milter-greylist] thread leak

2014-02-14 by Emmanuel Dreyfus

On Thu, Feb 13, 2014 at 08:48:00AM +0000, Bruncsak, Attila wrote:
> Did you had libmilter compilation option defined "_FFR_WORKERS_POOL"  ?
> (FFR: for future release)
> By the way, your libmilter is coming from which version of sendmail?

No, and it is 8.14.7.

But I managed to track down the offending code. It was tricky because 
once milter-greylist get too much threads, gdb becomes unable to 
explore them. Catching the process soon enough (351 threads) gives me this:

#0  0x00007f7ff6875d6a in ___lwp_park50 () from /usr/lib/libc.so.12
#1  0x00007f7ff70088f1 in ?? () from /usr/lib/libpthread.so.1
#2  0x00007f7ff78245a1 in ldap_send_initial_request ()
   from /usr/pkg/lib/libldap_r-2.4.so.2
#3  0x00007f7ff7815668 in ldap_pvt_search ()
   from /usr/pkg/lib/libldap_r-2.4.so.2 
#4  0x00007f7ff781576f in ldap_pvt_search_s ()
   from /usr/pkg/lib/libldap_r-2.4.so.2
#5  0x00007f7ff7815839 in ldap_search_ext_s ()
   from /usr/pkg/lib/libldap_r-2.4.so.2
#6  0x00000000004157d9 in ldapcheck_validate (ad=<optimized out>,
    stage=<optimized out>, ap=0x7f7fdffff4d0, priv=0x7f7ff5110800)
    at ldapcheck.c:502
#7  0x00000000004120e8 in acl_filter (stage=AS_RCPT, ctx=<optimized out>,
    priv=0x7f7ff5110800) at acl.c:2407
#8  0x0000000000408f53 in real_envrcpt (ctx=0x7f7ff7332220,
    envrcpt=0x7f7ff511b3d0) at milter-greylist.c:725
#9  0x000000000040928f in mlfi_envrcpt (ctx=0x7f7ff7332220,
    envrcpt=0x7f7ff511b3d0) at milter-greylist.c:230
#10 0x00000000004231b6 in st_rcpt ()
#11 0x000000000042301a in mi_engine ()
#12 0x0000000000420bbf in mi_handle_session ()
#13 0x000000000041faf9 in mi_thread_handle_wrapper ()
#14 0x00007f7ff700b2ce in ?? () from /usr/lib/libpthread.so.1
#15 0x00007f7ff6875d80 in ___lwp_park50 () from /usr/lib/libc.so.12

ldap_send_initial_request() uses two mutex. I think one thread get 
stuck in connection opening or request sending, and the other threads
wait. 

The timelimit option of ldap_search_ext_s() will not help: this is
a server-side timeout for the request.

I think the fix is to start a new LDAP connexion when we detect the
deadlock. I could be because the thread count involved in LDAP
operations reach a threshold, or because the oldest opeartion hits
a timeout. I suspect the second approach is better.

There is still a problem with that approach: correctly handling 
if the LDAP directory is misbehaving: we do not want to open an
inifinite amount of connexions if they all get stuck.

-- 
Emmanuel Dreyfus
manu@...

RE: [milter-greylist] thread leak

2014-02-14 by Bruncsak, Attila

> There is still a problem with that approach: correctly handling
> if the LDAP directory is misbehaving: we do not want to open an
> inifinite amount of connexions if they all get stuck.
> 

This pseudo code is just to show the concept how I imagine to implement
the client side time-out:

Worker thread:

If  no ldap_connection_established
then
    get_the_lock
    If no ldap_opener_thread_working
   Then
      Spawn ldap_opener_thread (detach, etc...)
   fi
  Mark ldap_opener_thread_working
   release_the_lock
  Wait for a very short only (client side time-out) and return LDAP error if no connection yet
fi

Ldap opener thread:

open ldap connection
if no ldap_connection_established
then
  sleep for a while (to throttle the connection attempts)
fi
get_the_lock
UnMark ldap_opener_thread_working
release_the_lock
exit

Re: [milter-greylist] thread leak

2014-02-15 by manu@...

Bruncsak, Attila <attila.bruncsak@...> wrote:

> This pseudo code is just to show the concept how I imagine to implement
> the client side time-out:

I realized that a much simplier way of dealing with the issue was to use
LDAP asynchronous requests. I am testing it right now.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.