Yahoo Groups archive

Milter-greylist

Index last updated: 2026-04-28 23:32 UTC

Thread

Milter-greylist crashes on DragonFly-2.8

Milter-greylist crashes on DragonFly-2.8

2011-01-22 by Francois Tigeot

Hi!

I have been running sendmail + milter-greylist 4.2.6 on a DragonFly
host. I recently upgraded this machine to DragonFly-2.8 (from 2.6), and
milter-greylist now crashes during startup with this error message:

  conf.c:351 BUG: conf_retain called twice?

Howewer, it is not always the case: the program fails to start about 9
times out of 10 but if I'm really obstinate and try to start it manually
long enough, it ends up running as it should and never crashes after
that.

As an experiment, I removed the assert on line 351 of conf.c:

  --- conf.c.orig
  +++ conf.c
  @@ -349,7 +349,6 @@ conf_retain(void) {
          if (GET_CONF()) {
                  mg_log(LOG_ERR, "%s:%d BUG: conf_retain called twice?",
                                  __FILE__, __LINE__);
  -               assert(0);
          }

I still got the error message at startup, but milter-greylist ran
fine after that.

-- 
Francois Tigeot

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-23 by manu@netbsd.org

Francois Tigeot <ftigeot@...> wrote:

> I still got the error message at startup, but milter-greylist ran
> fine after that.

Could you try replacing the assert(0) by a return ?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-23 by Francois Tigeot

On Sun, Jan 23, 2011 at 05:43:44AM +0100, manu@... wrote:
> Francois Tigeot <ftigeot@...> wrote:
> 
> > I still got the error message at startup, but milter-greylist ran
> > fine after that.
> 
> Could you try replacing the assert(0) by a return ?

Done:
  -       assert(0);
  +       return;

Apart from the conf_retain error message, milter-greylist started fine
and is now running normally.

-- 
Francois Tigeot

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-23 by manu@netbsd.org

Francois Tigeot <ftigeot@...> wrote:

> Apart from the conf_retain error message, milter-greylist started fine
> and is now running normally.

And you only get it on startup, right? What happens on config reload?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-23 by Francois Tigeot

On Sun, Jan 23, 2011 at 03:23:51PM +0100, manu@... wrote:
> Francois Tigeot <ftigeot@...> wrote:
> 
> > Apart from the conf_retain error message, milter-greylist started fine
> > and is now running normally.
> 
> And you only get it on startup, right? What happens on config reload?

There's no error message, it works fine:

Jan 23 15:39:32 pizza milter-greylist: reloading config file "/etc/mail/greylist.conf"
Jan 23 15:39:32 pizza milter-greylist: reloaded config file "/etc/mail/greylist.conf" in 0.009047s

-- 
Francois Tigeot

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-26 by Francois Tigeot

On Sun, Jan 23, 2011 at 10:35:30AM +0100, Francois Tigeot wrote:
> On Sun, Jan 23, 2011 at 05:43:44AM +0100, manu@... wrote:
> > Francois Tigeot <ftigeot@...> wrote:
> > 
> > > I still got the error message at startup, but milter-greylist ran
> > > fine after that.
> > 
> > Could you try replacing the assert(0) by a return ?
> 
> Done:
>   -       assert(0);
>   +       return;
> 
> Apart from the conf_retain error message, milter-greylist started fine
> and is now running normally.

That was on a single test machine (the MX for wolfpond.org).

I have now upgraded a second server to DragonFly-2.8 and applied this same
modification to milter-greylist-4.2.6 (installed from pkgsrc).

Startup is still fine, but this time, milter-greylist crashes after a few
minutes with a signal 11:

  [1]    20946 segmentation fault  /usr/pkg/bin/milter-greylist -D -p /var/milter-greylist/milter-greylist.sock

FWIW, the first machine was a single VIA C3 and the second one (the one where
the crashes occur) is a 4-core Xeon running a SMP kernel.

What can I do to debug this ? I'd like to at least get a core dump.

-- 
Francois Tigeot

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-27 by manu@netbsd.org

Francois Tigeot <ftigeot@...> wrote:

>   [1]    20946 segmentation fault  /usr/pkg/bin/milter-greylist -D -p
> /var/milter-greylist/milter-greylist.sock
> 
> FWIW, the first machine was a single VIA C3 and the second one (the one where
> the crashes occur) is a 4-core Xeon running a SMP kernel.
> 
> What can I do to debug this ? I'd like to at least get a core dump.

Make sure you built with -g, and run inside gdb:
# gdb milter-greylist
(gdb) r -Dv -p /var/milter-greylist/milter-greylist.sock

Once you crash, type bt to get a backtrace

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-27 by Francois Tigeot

On Thu, Jan 27, 2011 at 10:48:54AM +0100, manu@... wrote:
> Francois Tigeot <ftigeot@...> wrote:
> 
> >   [1]    20946 segmentation fault  /usr/pkg/bin/milter-greylist -D -p
> > /var/milter-greylist/milter-greylist.sock
> > 
> > What can I do to debug this ? I'd like to at least get a core dump.
> 
> Make sure you built with -g, and run inside gdb:
> # gdb milter-greylist
> (gdb) r -Dv -p /var/milter-greylist/milter-greylist.sock
> 
> Once you crash, type bt to get a backtrace

This is what I got:

Program received signal SIGSEGV, Segmentation fault.
0x281342ff in select () from /usr/lib/libc.so.7
(gdb) bt
#0  0x281342ff in select () from /usr/lib/libc.so.7
#1  0x280b0eae in select () from /usr/lib/libpthread.so.0
#2  0x280d7983 in mi_listener () from /usr/lib/libmilter.so.3
#3  0x280d660f in smfi_main () from /usr/lib/libmilter.so.3
#4  0x0804f2ff in main (argc=0, argv=0xbfbff4ec) at milter-greylist.c:1687

-- 
Francois Tigeot

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-29 by manu@netbsd.org

Francois Tigeot <ftigeot@...> wrote:

> This is what I got:
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x281342ff in select () from /usr/lib/libc.so.7
> (gdb) bt
> #0  0x281342ff in select () from /usr/lib/libc.so.7
> #1  0x280b0eae in select () from /usr/lib/libpthread.so.0
> #2  0x280d7983 in mi_listener () from /usr/lib/libmilter.so.3
> #3  0x280d660f in smfi_main () from /usr/lib/libmilter.so.3
> #4  0x0804f2ff in main (argc=0, argv=0xbfbff4ec) at milter-greylist.c:1687

That suggests things are quire rotten, and that ignoring the assertion
in conf_retain() was not the right approach.

Can you try replacing it by a return? I am not sure it can work, but it
is easy enough to be worth trying.

        if (GET_CONF()) {
                mg_log(LOG_ERR, "%s:%d BUG: conf_retain called twice?",
                                __FILE__, __LINE__);
                return;
        } 
 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-29 by Francois Tigeot

On Sat, Jan 29, 2011 at 12:13:10PM +0100, manu@... wrote:
> Francois Tigeot <ftigeot@...> wrote:
> 
> > This is what I got:
> > 
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x281342ff in select () from /usr/lib/libc.so.7
> > (gdb) bt
> > #0  0x281342ff in select () from /usr/lib/libc.so.7
> > #1  0x280b0eae in select () from /usr/lib/libpthread.so.0
> > #2  0x280d7983 in mi_listener () from /usr/lib/libmilter.so.3
> > #3  0x280d660f in smfi_main () from /usr/lib/libmilter.so.3
> > #4  0x0804f2ff in main (argc=0, argv=0xbfbff4ec) at milter-greylist.c:1687
> 
> That suggests things are quire rotten, and that ignoring the assertion
> in conf_retain() was not the right approach.

Do you think this sort of crash may be consistent with a bug in select() ?

> Can you try replacing it by a return? I am not sure it can work, but it
> is easy enough to be worth trying.

I already did since you asked a few days ago.
My experiments were all done with "return;" on line 352 of conf.c instead of
a comment; I'm sorry, I should have made this more clear.

-- 
Francois Tigeot

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-29 by manu@netbsd.org

Francois Tigeot <ftigeot@...> wrote:

> > That suggests things are quire rotten, and that ignoring the assertion
> > in conf_retain() was not the right approach. 
> Do you think this sort of crash may be consistent with a bug in select() ?

I would assume that select() is above any suspicion, otherwise most of
your program would crash. But if milter-greylist memory becomes
corrupted, you can get a crash in any system call, including select().

> I already did since you asked a few days ago.
> My experiments were all done with "return;" on line 352 of conf.c instead of
> a comment; I'm sorry, I should have made this more clear.

This is bad news. We will have to discover how you re-entered
conf_retain for a second time. And the first  step is to discover how we
got there.

I'm affraid you will have to grep conf_retain *.c and add this before
each occurence:
    if (GET_CONF())
       mg_log(LOG_WARN, "conf_retain called from %s", __func__); 

There are 18 occurences in the sources. Note that you will laso have to
include "conf.h" in order to get it building.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-29 by manu@netbsd.org

Emmanuel Dreyfus <manu@...> wrote:
> I'm affraid you will have to grep conf_retain *.c and add this before
> each occurence:

And while you are there, logging the thread ID will help:
     if (GET_CONF())
        mg_log(LOG_WARN, "%d: conf_retain called from %s",
                       pthread_self(), __func__); 

As I understand the problem you get could only occur if the same thread
re-enter conf_retain. 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-29 by Francois Tigeot

On Sat, Jan 29, 2011 at 03:17:41PM +0100, manu@... wrote:
> Emmanuel Dreyfus <manu@...> wrote:
> > I'm affraid you will have to grep conf_retain *.c and add this before
> > each occurence:
> 
> And while you are there, logging the thread ID will help:
>      if (GET_CONF())
>         mg_log(LOG_WARN, "%d: conf_retain called from %s",
>                        pthread_self(), __func__); 

I had to use LOG_WARNING but otherwise there was no particular issue with the
added lines.

> As I understand the problem you get could only occur if the same thread
> re-enter conf_retain. 

It seems to always be the same thread indeed.

[With the assertion in conf.c:352 intact]
Jan 29 16:43:37 milter-greylist: 673055056: conf_retain called from dumper
Jan 29 16:43:37 milter-greylist: conf.c:351 BUG: conf_retain called twice?
Jan 29 16:43:42 milter-greylist: 673055056: conf_retain called from dumper
Jan 29 16:43:42 milter-greylist: conf.c:351 BUG: conf_retain called twice?
Jan 29 16:43:47 milter-greylist: 673055056: conf_retain called from dumper
Jan 29 16:43:47 milter-greylist: conf.c:351 BUG: conf_retain called twice?

[With return in conf.c:352]
Jan 29 16:49:25 milter-greylist: 673055056: conf_retain called from dumper
Jan 29 16:49:25 milter-greylist: conf.c:351 BUG: conf_retain called twice?
Jan 29 16:57:30 milter-greylist: 673055056: conf_retain called from dumper
Jan 29 16:57:30 milter-greylist: conf.c:351 BUG: conf_retain called twice?

-- 
Francois Tigeot

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-29 by manu@netbsd.org

Francois Tigeot <ftigeot@...> wrote:

> It seems to always be the same thread indeed.
> 
> [With the assertion in conf.c:352 intact]
> Jan 29 16:43:37 milter-greylist: 673055056: conf_retain called from dumper
> Jan 29 16:43:37 milter-greylist: conf.c:351 BUG: conf_retain called twice?

Great, we now have our culprit. Unfortunately there are two calls to
conf_retain() in dumper(), therefore you need to alter the debug
statement to include the line number:

    mg_log(LOG_WARNING, "%d: conf_retain called from %s, line %d",
                 pthread_self(), __func__, __LINE__); 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-29 by Francois Tigeot

On Sat, Jan 29, 2011 at 05:40:05PM +0100, manu@... wrote:
> Francois Tigeot <ftigeot@...> wrote:
> 
> > [With the assertion in conf.c:352 intact]
> > Jan 29 16:43:37 milter-greylist: 673055056: conf_retain called from dumper
> > Jan 29 16:43:37 milter-greylist: conf.c:351 BUG: conf_retain called twice?
> 
> Great, we now have our culprit. Unfortunately there are two calls to
> conf_retain() in dumper(), therefore you need to alter the debug
> statement to include the line number:
> 
>     mg_log(LOG_WARNING, "%d: conf_retain called from %s, line %d",
>                  pthread_self(), __func__, __LINE__); 

I guess it's on line 133:

Jan 29 20:45:53 milter-greylist: 673055056: conf_retain called from dumper, line 133
Jan 29 20:45:53 milter-greylist: conf.c:351 BUG: conf_retain called twice?
Jan 29 20:49:22 milter-greylist: 673055056: conf_retain called from dumper, line 133
Jan 29 20:49:22 milter-greylist: conf.c:351 BUG: conf_retain called twice?

-- 
Francois Tigeot

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-30 by manu@netbsd.org

Francois Tigeot <ftigeot@...> wrote:

> I guess it's on line 133:

The first call, on top of dumper(), right?

This means we launch two dumper threads, which is a good reason for
getting into troubled.

Can you add debug messages at the beginning of dumper(), in places where
dumper() is called (there should be only dumper_start()), and before the
dumper_start() call (there should be only one, in main())?

I have trouble to figure how we go there twice. If we don't understand,
we will alwys have the possibility to use a flag to prevent the second
execution, but I would like to understand.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-30 by Francois Tigeot

On Sun, Jan 30, 2011 at 04:05:44AM +0100, manu@... wrote:
> Francois Tigeot <ftigeot@...> wrote:
> 
> > I guess it's on line 133:
> The first call, on top of dumper(), right?

That's right.

> This means we launch two dumper threads, which is a good reason for
> getting into troubled.
> 
> Can you add debug messages at the beginning of dumper(), in places where
> dumper() is called (there should be only dumper_start()), and before the
> dumper_start() call (there should be only one, in main())?

I get some messages when I start milter-greylist:

673054832: calling dumper_start() from main, line 1711
673054832: calling dumper() from dumper_start, line 116
673055056: conf_retain called from dumper, line 134
conf.c:351 BUG: conf_retain called twice?

But there's nothing when it crashes.

> I have trouble to figure how we go there twice. If we don't understand,
> we will alwys have the possibility to use a flag to prevent the second
> execution, but I would like to understand.

All I can say for now is there has been a great deal of work on SMP and
threading between this DragonFly release and the previous one.

-- 
Francois Tigeot

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-30 by manu@netbsd.org

Francois Tigeot <ftigeot@...> wrote:

> 673054832: calling dumper_start() from main, line 1711
> 673054832: calling dumper() from dumper_start, line 116
> 673055056: conf_retain called from dumper, line 134
> conf.c:351 BUG: conf_retain called twice?

But there is a mystery here: we do not see two conf_retain() calls.  I suggest
you log pthread_self() on conf_retain() entering.

I wonder if the problem could not be that Thread Speficfic Storage get
inherited (AFAIK it shoud not). Could you build and run this?

/*  cc -Wall -Werror -ansi -lpthread -o tss tss.c */
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <err.h>
#include <sysexits.h>

pthread_key_t key;

void *
child(arg)
        void *arg;
{
        void *tss;

        printf("child (%d) starts, key = %p\n", 
               (int)pthread_self(), (void *)key);

        if ((tss = pthread_getspecific(key)) == NULL)
                errx(EX_OSERR, "pthread_getspecific() failed in child");
        
        printf("child (%d) TSS key %p readen as %p\n", 
               (int)pthread_self(), (void *)key, tss);
                
        sleep(2);

        printf("child (%d) exit\n", (int)pthread_self());

        return NULL;
}

int
main(void) {
        char test[] = "foo";
        pthread_t tid;
        void *tss;

        if (pthread_key_create(&key, NULL) != 0)
                err(EX_OSERR, "pthread_key_create() failed");

        if (pthread_setspecific(key, test) != 0)
                err(EX_OSERR, "pthread_setspecific() failed");
        
        printf("parent (%d) TSS key %p set to %p\n", 
               (int)pthread_self(), (void *)key, test);
                
        if (pthread_create(&tid, NULL, child, NULL) != 0)
                err(EX_OSERR, "pthread_create() failed");

        if ((tss = pthread_getspecific(key)) == NULL)
                errx(EX_OSERR, "pthread_getspecific() failed in parent");
        
        printf("parent (%d) TSS key %p readen as %p\n", 
               (int)pthread_self(), (void *)key, tss);
                
        sleep(3);

        printf("parent (%d) exit\n", (int)pthread_self());

        return 0;
}


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-30 by Francois Tigeot

On Sun, Jan 30, 2011 at 10:17:39AM +0100, manu@... wrote:
> Francois Tigeot <ftigeot@...> wrote:
> 
> > 673054832: calling dumper_start() from main, line 1711
> > 673054832: calling dumper() from dumper_start, line 116
> > 673055056: conf_retain called from dumper, line 134
> > conf.c:351 BUG: conf_retain called twice?
> 
> But there is a mystery here: we do not see two conf_retain() calls.  I suggest
> you log pthread_self() on conf_retain() entering.

Done.
/var/log/maillog log extract:

Jan 30 11:23:33 milter-greylist: 673054832: calling dumper_start() from main, line 1711
Jan 30 11:23:33 milter-greylist: 673054832: calling dumper() from dumper_start, line 116
Jan 30 11:23:33 milter-greylist: 673055056: conf_retain called from dumper, line 134
Jan 30 11:23:33 milter-greylist: 673055056: Entering conf_retain()
Jan 30 11:23:33 milter-greylist: conf.c:353 BUG: conf_retain called twice?

Terminal output is quite large. I have put it in a separate file:
http://dl.zefyris.com/milter-greylist.log.txt

In the end, milter-greylist crashes with a bus error:
25197 bus error  /usr/pkg/bin/milter-greylist -D -p /var/milter-greylist/milter-greylist.sock

Previously, it was a signal 11.

> I wonder if the problem could not be that Thread Speficfic Storage get
> inherited (AFAIK it shoud not). Could you build and run this?
> 
> /*  cc -Wall -Werror -ansi -lpthread -o tss tss.c */

parent (672792688) TSS key 0x0 set to 0xbfbff6c8
parent (672792688) TSS key 0x0 readen as 0xbfbff6c8
child (672792912) starts, key = 0x0
child (672792912) TSS key 0x0 readen as 0x28083500
child (672792912) exit
parent (672792688) exit

-- 
Francois Tigeot

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-30 by manu@netbsd.org

Francois Tigeot <ftigeot@...> wrote:

> parent (672792688) TSS key 0x0 set to 0xbfbff6c8
> parent (672792688) TSS key 0x0 readen as 0xbfbff6c8
> child (672792912) starts, key = 0x0
> child (672792912) TSS key 0x0 readen as 0x28083500

Ok, I think we tracked down the problem.

I am not a POSIX.1 expert, but I suspect this is a pthread bug. This TSS
has not been set in the child, so reading the key in the child should
fail. I just say I am suspicious, since it might be a grey area in the
specification, where both behavior are compliant to the letter of the
standard.

Example on NetBSD 5.0.2
parent (-1080033280) TSS key 0x0 set to 0xbfbfecd8
parent (-1080033280) TSS key 0x0 readen as 0xbfbfecd8
child (-1151336448) starts, key = 0x0
tss: pthread_getspecific() failed in child

On Linux 2.6.18
parent (16384) TSS key (nil) set to 0xbfa6dd34
parent (16384) TSS key (nil) readen as 0xbfa6dd34
child (16386) starts, key = (nil)
tss: pthread_getspecific() failed in child

On Darwin 8.11.0 (Mac OSX 10.4.11 PPC)
parent (-1610551928) TSS key 0x6 set to 0xbffffb9c
parent (-1610551928) TSS key 0x6 readen as 0xbffffb9c
child (25167360) starts, key = 0x6
tss: pthread_getspecific() failed in child

On Darwin 10.6.0 (Mac OSX 10.6 i386)
parent (1885527200) TSS key 0x101 set to 0x7fff5fbffc40
parent (1885527200) TSS key 0x101 readen as 0x7fff5fbffc40
child (745472) starts, key = 0x101
tss: pthread_getspecific() failed in child

(other mailing list subscriber are welcome to post results on other
systems).

Could you run the test on the previous DragonflyBSD release, which ran
milter-greylist fine? 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-30 by manu@netbsd.org

Emmanuel Dreyfus <manu@...> wrote:

> I am not a POSIX.1 expert, but I suspect this is a pthread bug. This TSS
> has not been set in the child, so reading the key in the child should
> fail. I just say I am suspicious, since it might be a grey area in the
> specification, where both behavior are compliant to the letter of the
> standard.

It seems to be a real bug in DragonflyBSD, here is whar the standard says:
"Upon thread creation, the value NULL shall be associated with all defined keys
in the new thread."
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html
 
This is not what happen on DragonflyBSD with my tss.c test.

Any knowledgabe person can comment?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-30 by manu@netbsd.org

Here is a bug workaround attempt. The patch is against latest sources but I
tested it with success on latest 4.2.x.

I think it should be ifdef only for DragonflyBSD, but what label should I use?
#ifdef __DragonflyBSD__ ?

Index: conf.c
===================================================================
RCS file: /cvsroot/milter-greylist/conf.c,v
retrieving revision 1.69
diff -U 4 -r1.69 conf.c
--- conf.c      16 Jun 2010 01:30:30 -0000      1.69
+++ conf.c      30 Jan 2011 15:11:06 -0000
@@ -150,8 +150,9 @@
        FILE *stream;
        struct timeval tv1, tv2, tv3;
        struct conf_rec *currconf, *threadconf, *newconf;
 
+       WORKAROUND_BROKEN_PTHREAD_KEY();
        CONF_LOCK;
        currconf = TAILQ_FIRST(&conf_list_head);
        CONF_UNLOCK;
        assert(conf_cold ? (currconf == NULL) : (currconf != NULL));
Index: conf.h
===================================================================
RCS file: /cvsroot/milter-greylist/conf.h,v
retrieving revision 1.51
diff -U 4 -r1.51 conf.h
--- conf.h      9 Sep 2009 12:19:17 -0000       1.51
+++ conf.h      30 Jan 2011 15:11:06 -0000
@@ -150,8 +150,10 @@
 #define C_ALL          0x3
 
 extern struct conf_rec defconf;
 extern pthread_key_t conf_key;
+#define WORKAROUND_BROKEN_PTHREAD_KEY(dontcare) \
+    (void)pthread_setspecific(conf_key, NULL);
 #define GET_CONF() ((struct conf_rec *)pthread_getspecific(conf_key))
 #define conf (*GET_CONF())
 extern char *conffile;
 extern int conf_cold;
Index: dump.c
===================================================================
RCS file: /cvsroot/milter-greylist/dump.c,v
retrieving revision 1.41
diff -U 4 -r1.41 dump.c
--- dump.c      31 Oct 2009 21:28:03 -0000      1.41
+++ dump.c      30 Jan 2011 15:11:06 -0000
@@ -116,8 +116,9 @@
 {
        struct conf_rec *confp;
        struct timeval start;
 
+       WORKAROUND_BROKEN_PTHREAD_KEY();
        conf_retain();
        confp = GET_CONF();
        gettimeofday(&start, NULL);
        for (;;) {
Index: milter-greylist.c
===================================================================
RCS file: /cvsroot/milter-greylist/milter-greylist.c,v
retrieving revision 1.235
diff -U 4 -r1.235 milter-greylist.c
--- milter-greylist.c   12 Jul 2010 01:38:14 -0000      1.235
+++ milter-greylist.c   30 Jan 2011 15:11:06 -0000
@@ -174,8 +174,9 @@
        _SOCK_ADDR *addr;
 {
        sfsistat r;
 
+       WORKAROUND_BROKEN_PTHREAD_KEY();
        conf_retain();
        r = real_connect(ctx, hostname, addr);
        conf_release();
        return r;
Index: sync.c
===================================================================
RCS file: /cvsroot/milter-greylist/sync.c,v
retrieving revision 1.89
diff -U 4 -r1.89 sync.c
--- sync.c      16 Jun 2010 01:30:30 -0000      1.89
+++ sync.c      30 Jan 2011 15:11:06 -0000
@@ -760,8 +760,9 @@
        void *arg;
 {
        struct sync_master_sock *sms = arg;
 
+       WORKAROUND_BROKEN_PTHREAD_KEY();
        conf_retain();
        for (;;) {
                sockaddr_t raddr;
                socklen_t raddrlen;
@@ -1030,8 +1031,9 @@
        socklen_t addrlen;
        time_t date;
        time_t aw;
        
+       WORKAROUND_BROKEN_PTHREAD_KEY();
        conf_retain();
 
        aw = time(NULL) + conf.c_autowhite_validity;
        fprintf(stream, "200 Yeah, what do you want?\n");
@@ -1402,8 +1404,9 @@
 sync_sender_start(void) {
        pthread_t tid;
        int error;
 
+       WORKAROUND_BROKEN_PTHREAD_KEY();
        if ((error = pthread_create(&tid, NULL, 
            (void *(*)(void *))sync_sender, NULL)) != 0) {
                mg_log(LOG_ERR, "pthread_create failed: %s", strerror(error));
                exit(EX_OSERR);


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-30 by Francois Tigeot

On Sun, Jan 30, 2011 at 01:05:15PM +0100, manu@... wrote:
> Francois Tigeot <ftigeot@...> wrote:
> 
> > parent (672792688) TSS key 0x0 set to 0xbfbff6c8
> > parent (672792688) TSS key 0x0 readen as 0xbfbff6c8
> > child (672792912) starts, key = 0x0
> > child (672792912) TSS key 0x0 readen as 0x28083500
> 
> Ok, I think we tracked down the problem.
> 
> I am not a POSIX.1 expert, but I suspect this is a pthread bug.

Damn. Doesn't surprise me :-(

> Could you run the test on the previous DragonflyBSD release, which ran
> milter-greylist fine? 

I'll try to setup a test system soon.

-- 
Francois Tigeot

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-30 by Francois Tigeot

On Sun, Jan 30, 2011 at 04:21:18PM +0100, manu@... wrote:
> Here is a bug workaround attempt. The patch is against latest sources but I
> tested it with success on latest 4.2.x.
 
I have cleaned-up some of the mg_log() calls and added your last patches.
milter-greylist now crashes in a different way:

Jan 30 16:56:16 milter-greylist: 673054832: calling dumper_start() from main, line 1712
Jan 30 16:56:16 milter-greylist: conf.c:381 BUG: conf_release before conf_retain

> I think it should be ifdef only for DragonflyBSD, but what label should I use?
> #ifdef __DragonflyBSD__ ?

Please use __DragonFly__

-- 
Francois Tigeot

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-30 by manu@netbsd.org

Francois Tigeot <ftigeot@...> wrote:

> I have cleaned-up some of the mg_log() calls and added your last patches.
> milter-greylist now crashes in a different way:
(...)
> Jan 30 16:56:16 milter-greylist: conf.c:381 BUG: conf_release 
> before conf_retain

Now you have to find, by adding mg_log everywhere, the location where
conf_release is called wile GET_CONF or conf_retain has not.

Adding this before each conf_release() call will help:
 if (!GET_CONF()) 
  mg_log("calling conf_release() without a config at %s:%d in %d", 
                __FILE__, __LINE__, pthread_self())

Sorry, you have to find the many places where the DragonflyBSD bug will
hit. I think this is the last round. Assuming we will not encounter
another bug later.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-30 by Francois Tigeot

On Sun, Jan 30, 2011 at 01:05:15PM +0100, manu@... wrote:
> Francois Tigeot <ftigeot@...> wrote:
> 
> > parent (672792688) TSS key 0x0 set to 0xbfbff6c8
> > parent (672792688) TSS key 0x0 readen as 0xbfbff6c8
> > child (672792912) starts, key = 0x0
> > child (672792912) TSS key 0x0 readen as 0x28083500
> 
> I am not a POSIX.1 expert, but I suspect this is a pthread bug. This TSS
> has not been set in the child, so reading the key in the child should
> fail. I just say I am suspicious, since it might be a grey area in the
> specification, where both behavior are compliant to the letter of the
> standard.
> 
> Example on NetBSD 5.0.2
> parent (-1080033280) TSS key 0x0 set to 0xbfbfecd8
> parent (-1080033280) TSS key 0x0 readen as 0xbfbfecd8
> child (-1151336448) starts, key = 0x0
> tss: pthread_getspecific() failed in child
> 
> Could you run the test on the previous DragonflyBSD release, which ran
> milter-greylist fine? 

DragonFly 2.6 give the same answers as most other systems:

parent (672727152) TSS key 0x0 set to 0xbfbffbc8
parent (672727152) TSS key 0x0 read as 0xbfbffbc8
child (672727376) starts, key = 0x0
tss: pthread_getspecific() failed in child

DragonFly 2.8 and 2.9  give these answers:

parent (5505216) TSS key 0x0 set to 0x7ffffffff1d0
parent (5505216) TSS key 0x0 read as 0x7ffffffff1d0
child (5505568) starts, key = 0x0
child (5505568) TSS key 0x0 read as 0x800533900
child (5505568) exit
parent (5505216) exit

I'll get in touch with the kernel guys to try to get this fixed.

-- 
Francois Tigeot

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-30 by Francois Tigeot

On Sun, Jan 30, 2011 at 07:55:17PM +0100, manu@... wrote:
> Francois Tigeot <ftigeot@...> wrote:
> 
> > I have cleaned-up some of the mg_log() calls and added your last patches.
> > milter-greylist now crashes in a different way:
> (...)
> 
> Now you have to find, by adding mg_log everywhere, the location where
> conf_release is called wile GET_CONF or conf_retain has not.

I think I won't need to: I've reported the pthread issue with your test
program to the kernel list.
Matt Dillon has found the bug and has just commited a fix:
http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/bfa38597b518100bb00712479ce7836a551e04a2

I'm upgrading the servers right now; it may take a while but I hope to
report good news tomorrow.

-- 
Francois Tigeot

Re: [milter-greylist] Milter-greylist crashes on DragonFly-2.8

2011-01-31 by Francois Tigeot

On Sun, Jan 30, 2011 at 11:25:14PM +0100, Francois Tigeot wrote:
> On Sun, Jan 30, 2011 at 07:55:17PM +0100, manu@... wrote:
> > Francois Tigeot <ftigeot@...> wrote:
> > 
> > Now you have to find, by adding mg_log everywhere, the location where
> > conf_release is called wile GET_CONF or conf_retain has not.
> 
> I think I won't need to: I've reported the pthread issue with your test
> program to the kernel list.
> Matt Dillon has found the bug and has just commited a fix:
> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/bfa38597b518100bb00712479ce7836a551e04a2

I have good news: since upgrading the system, milter-greylist has stopped
crashing and has now been running for more than a hour without any bad
side-effects.

I'll consider this one really fixed :-)

Thanks for your insight Manu, you've been a tremendous help.

-- 
Francois Tigeot

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.