Yahoo Groups archive

Lpc2000

Index last updated: 2026-04-28 23:31 UTC

Message

A tale of timing and short pulses

2004-02-06 by Robert Adsett

I just spent a day tracing down a problem.  I thought I'd share the, umm 
adventure, in the hopes that the information might be useful (or 
entertaining in a "thank god it happened to him not me" way).

I've been updating my timing routines and testing and documenting 
them.  One of the final steps was to test the one-wire communication 
library I've been working on.  Now this library is a port of the Dallas 
library adapted for the LPC and they've just updated it so thought I'd 
bring over the changes at the same time.  That was probably a mistake.

Transferring the changes went without too many problems.  (It did reveal an 
apparent bug in PC-Lint but I haven't managed to make a small enough test 
case that shows the problem).  The first test of the functionality went 
without problem.

Feeling confident I moved on to the second test.  Compile, load run.  Looks 
OK.  Run first step, that's odd where's the output.  No output, I figured 
something had changed in the way the routines worked or the display 
routines and I hadn't noticed. Some time later and I hadn't found anything.

Well, next step is to add debugging prints to figure out where it stops 
behaving as expected. Add a few and narrow it down to an area where the 
devices are being enumerated.

Add debug prints at coarse level to enumeration.  Uh Oh! It suddenly starts 
working.  Now it's looking like some sort of memory problem or timing issue.

Feeling confident in the timing routines (I'd just finished testing them 
after all) I starting down the road looking for memory issues, such as 
overflow, bad pointers (especially in newly modified code) and even code 
generation.  Hours later no progress except rather more puzzlement.

Oscilloscope time.  The puzzle deepens.  With the debug prints in the 
enumeration process is clearly proceeding with each device responding in 
turn.  Take out the prints and the process collapses into something that is 
not even consistent between tests.  Eeekk!

Replace debug prints with calls to WaitUs (a delay routine).  With large 
waits between enumerations (5mS) everything appears to work.  I didn't see 
why that should be necessary, time to explore further.  Shorten the delay, 
the whole thing collapses again.  Hmm, that's pointing back to timing again.

Look more carefully at the collapse.   Things make less sense than 
before.  A wait after the enumeration appears to affect the 
enumeration.  IE enumerate(), wait for 5ms works, but enumerate wait for 
1ms fails.  The program is managing to effects events after they 
occur?  Obviously I'm missing something.  Time to stop trying to solve the 
problem and just start gathering information, something I should have 
started earlier.

Add pin toggles around the individual enumerations so that the process is 
easy to correlate on the oscilloscope.  Looks right on the 
oscilloscope.  The enumeration bit pattern pauses at the pause between 
enumerations.

Remove the delay between enumerations.  It still works.  What the...!  The 
pin toggle appears not to happen between enumerations.  After a few more 
tests and fooling around I find it.  The pin toggle is just narrow enough 
to hide behind an oscilloscope graticule.

Remove the toggle between the enumerations.  It collapses again.  Good, at 
least it hasn't just disappeared on me.

Now, let's zoom in on the enumeration and see if I can't find out exactly 
where it's breaking down.  Gather up code to trace along the path as I 
manually decode.  (Note that each bit signal is 80-90uS long and there are 
70 odd bits so seeing each bit does involve going to a higher level of 
detail, and as it turns out the devil is in the detail).  Repeat several 
measurements at higher resolution, the first 1/2 dozen or so bits are not 
consistently repeatable, they seen to fall into two patterns.  Ahh, now I'm 
are getting somewhere.

Freeze a sample and look in greater detail (thank heavens for DSOs).  Reset 
OK, bit 1 OK, bit 2 OK, bit 3 OK, bit 4 OK. Wait a minute what's that 
glitch doing there and why is bit 4 so much longer than the others?

Take a break.  An important part of the process, probably too long delayed.

Come back to problem.  Take several measurements.  Save the first one as a 
reference and take others until I have one that is different from the first 
for comparison.  Hmm the two measurements are very close.  Where I have a 
glitch on the first one, I have a full rise on my second 
measurement.  Check the timing and indeed the glitch occurs where the 
output should normally rise and the extended low period covers the low 
period for two bits.  It appears something is killing the output of the 
timers match register as it occurs.

Check, read code, check documentation.  Some time later I notice the 
following sequence and realize its possible significance

   - Set output low
   - Set timer match to set output high in 75 uS
   - Wait 15 uS
   - Sample input
   - Wait 60 uS
   - Disable timer match (1)
   - Wait additional time

The disable at (1) is there to prevent the match from occurring on wrap 
around after it's no longer needed.  Unfortunately if the waits are 
accurate (and I just spent a fair amount of time ensuring they were) that 
disable happens at approximately the same time that the match actually 
occurs and cancels it just as it starts to affect the output.  It appears 
there is a race condition in the timer module.  The fix is simple 
enough.  Move the disable to later in the process.

I don't think I could hit this consistently if I tried (I expect the timing 
would change if you looked at it crosswise), so why am I (relatively) 
confident that this is the source of the problem?  A couple of 
reasons.  The sensitivity of the effect to code changes.  Almost any change 
in code position or timing eliminates the problem.  Changing the timing 
using the fix suggested works as does adding time to the waits before the 
disable.  Also the glitch coming out of the CPU is only 30nS wide and I've 
not run across anything in the micro that can produce a pulse that narrow 
under normal circumstances (hmm if that could be made repeatable it might 
be useful).  Finally reducing the wait  time in the sequence above 
eliminates the rise entirely.

I don't suppose anyone can confirm that the timer has this apparent 
glitch?  I know whenever I see an issue like that mentioned I wonder "how 
did anyone ever run across that?".  Well now I know :)

Robert


" 'Freedom' has no meaning of itself.  There are always restrictions,
be they legal, genetic, or physical.  If you don't believe me, try to
chew a radio signal. "

                         Kelvin Throop, III

Attachments

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.