Re: Understanding, trapping and debugging stack overflow on an AVR
2011-01-12 by Gary Skinner
Yahoo Groups archive
Index last updated: 2026-04-28 22:41 UTC
Thread
2011-01-12 by Gary Skinner
Are you sure the breakpoint at main is not because of a watchdog timeout? Gary Skinner, ESI [Non-text portions of this message have been removed]
2011-01-13 by David VanHorn
On Wed, Jan 12, 2011 at 12:19 PM, Gary Skinner <g.skinner@elec-solutions.com> wrote: > Are you sure the breakpoint at main is not because of a watchdog timeout? Very likely. Suggestion: Put a breakpoint at the reset vector, if you get there, you will know.
2011-01-15 by Chuck Hackett
> From: Gary Skinner > > Are you sure the breakpoint at main is not because of a watchdog timeout? Good thought, but, Yes I'm sure for a couple of reasons: 1) I disable the WDT when debugging (more below), and 2) I had a breakpoint in the 'init3' section where I decipher the reason for a reset such as power, WDT, brownout, or software trap. If the code reaches a "This can never happen" condition I use a macro to store a trap code in no-init SRAM and then loop until the WDT resets the processor - at which time I pick up the (non-zero) trap code and flash it in an on-board status LED and the controller recovers if this was caused by a particular event timing/sequence. If this (trap code) condition happens when debugging I can break into the infinite loop (since the WDT is disabled) and look at the execution context. > From: David VanHorn > .... > Suggestion: Put a breakpoint at the reset vector, if you get there, > you will know. Yup, I did that, as well as adding the "Catch All" ISR as suggested by Don, below, no interrupts were occurring ... > From: Don Kinzer > > .... > > Other than that, and suggestions on tracking this > > down would be welcome. > It is often useful to capture the "reset flags" (typically in the MCUCR > register but check your datasheet) and then clear them. This register has > bits for each reset cause. If you get back to main() and no reset flags are > set then you know that you got there by a means other than a reset. Yup, had this code in for other reasons (see my answer to Gary above) and this is how I determined that the 'restart' was not due to "the usual suspects". > You can set a breakpoint on the "catch all" ISR to detect when that is the > cause of restarting. Thanks for that suggestion - I have now added that code > You can "seed" the stack area with a known value and then inspect it at > various times to detect when the stack is all "used up". This will be more > complicated in a multi-tasking environment because you undoubtedly have > multiple stacks. FreeRTOS does this for you a little bit even when 'stack checking' is not turned on (compile time). With it turned on it fills the stack with a known value so it can report the 'high-water' mark for each task. > >[...] but very shortly after that it re-executes the "main" > >function (as shown by a breakpoint). > That sounds like it could be data corruption due to the stack colliding with > statically allocated data. It could, however, be an interrupt for which you > have defined no handler. Or in my case, since FreeRTOS allocates task stacks from the heap, either the stack growing out of its allotted area or a "wayward pointer" or "buffer overflow" of another allocated area. In my case, I was suspecting stack overflow, but ... I ended up finding the problem - it was caused by a wayward "&" in pointer calculations within one of my doubly-linked list routines. The clue was that the problem occurred when the second item was added to a particular linked list. It was using the location of the pointer (on the stack) as opposed to the contents of the pointer. The return PC value happened to be located such that it was clobbered when the function was setting the "previous" pointer of the linked element. One of those head-slapping "dah!" moments :-( Thanks to all for the assist. Now I can get back to making progress instead of beating my head against the wall - or at least move on to another wall :-) I've got about three weeks to get as much functionality done as possible before I have to have the next version of the controller in the field for testing during a large gathering we'll be having in February. Lots of trains running, lots of opportunities to shake things out ... Cheers, Chuck Hackett "Good judgment comes from experience, experience comes from bad judgment" 7.5" gauge Union Pacific Northern (4-8-4) 844 http://www.whitetrout.net/Chuck