Yahoo Groups archive

Lpc2000

Index last updated: 2026-04-28 23:31 UTC

Thread

A tale of timing and short pulses

A tale of timing and short pulses

2004-02-06 by Robert Adsett

I just spent a day tracing down a problem.  I thought I'd share the, umm 
adventure, in the hopes that the information might be useful (or 
entertaining in a "thank god it happened to him not me" way).

I've been updating my timing routines and testing and documenting 
them.  One of the final steps was to test the one-wire communication 
library I've been working on.  Now this library is a port of the Dallas 
library adapted for the LPC and they've just updated it so thought I'd 
bring over the changes at the same time.  That was probably a mistake.

Transferring the changes went without too many problems.  (It did reveal an 
apparent bug in PC-Lint but I haven't managed to make a small enough test 
case that shows the problem).  The first test of the functionality went 
without problem.

Feeling confident I moved on to the second test.  Compile, load run.  Looks 
OK.  Run first step, that's odd where's the output.  No output, I figured 
something had changed in the way the routines worked or the display 
routines and I hadn't noticed. Some time later and I hadn't found anything.

Well, next step is to add debugging prints to figure out where it stops 
behaving as expected. Add a few and narrow it down to an area where the 
devices are being enumerated.

Add debug prints at coarse level to enumeration.  Uh Oh! It suddenly starts 
working.  Now it's looking like some sort of memory problem or timing issue.

Feeling confident in the timing routines (I'd just finished testing them 
after all) I starting down the road looking for memory issues, such as 
overflow, bad pointers (especially in newly modified code) and even code 
generation.  Hours later no progress except rather more puzzlement.

Oscilloscope time.  The puzzle deepens.  With the debug prints in the 
enumeration process is clearly proceeding with each device responding in 
turn.  Take out the prints and the process collapses into something that is 
not even consistent between tests.  Eeekk!

Replace debug prints with calls to WaitUs (a delay routine).  With large 
waits between enumerations (5mS) everything appears to work.  I didn't see 
why that should be necessary, time to explore further.  Shorten the delay, 
the whole thing collapses again.  Hmm, that's pointing back to timing again.

Look more carefully at the collapse.   Things make less sense than 
before.  A wait after the enumeration appears to affect the 
enumeration.  IE enumerate(), wait for 5ms works, but enumerate wait for 
1ms fails.  The program is managing to effects events after they 
occur?  Obviously I'm missing something.  Time to stop trying to solve the 
problem and just start gathering information, something I should have 
started earlier.

Add pin toggles around the individual enumerations so that the process is 
easy to correlate on the oscilloscope.  Looks right on the 
oscilloscope.  The enumeration bit pattern pauses at the pause between 
enumerations.

Remove the delay between enumerations.  It still works.  What the...!  The 
pin toggle appears not to happen between enumerations.  After a few more 
tests and fooling around I find it.  The pin toggle is just narrow enough 
to hide behind an oscilloscope graticule.

Remove the toggle between the enumerations.  It collapses again.  Good, at 
least it hasn't just disappeared on me.

Now, let's zoom in on the enumeration and see if I can't find out exactly 
where it's breaking down.  Gather up code to trace along the path as I 
manually decode.  (Note that each bit signal is 80-90uS long and there are 
70 odd bits so seeing each bit does involve going to a higher level of 
detail, and as it turns out the devil is in the detail).  Repeat several 
measurements at higher resolution, the first 1/2 dozen or so bits are not 
consistently repeatable, they seen to fall into two patterns.  Ahh, now I'm 
are getting somewhere.

Freeze a sample and look in greater detail (thank heavens for DSOs).  Reset 
OK, bit 1 OK, bit 2 OK, bit 3 OK, bit 4 OK. Wait a minute what's that 
glitch doing there and why is bit 4 so much longer than the others?

Take a break.  An important part of the process, probably too long delayed.

Come back to problem.  Take several measurements.  Save the first one as a 
reference and take others until I have one that is different from the first 
for comparison.  Hmm the two measurements are very close.  Where I have a 
glitch on the first one, I have a full rise on my second 
measurement.  Check the timing and indeed the glitch occurs where the 
output should normally rise and the extended low period covers the low 
period for two bits.  It appears something is killing the output of the 
timers match register as it occurs.

Check, read code, check documentation.  Some time later I notice the 
following sequence and realize its possible significance

   - Set output low
   - Set timer match to set output high in 75 uS
   - Wait 15 uS
   - Sample input
   - Wait 60 uS
   - Disable timer match (1)
   - Wait additional time

The disable at (1) is there to prevent the match from occurring on wrap 
around after it's no longer needed.  Unfortunately if the waits are 
accurate (and I just spent a fair amount of time ensuring they were) that 
disable happens at approximately the same time that the match actually 
occurs and cancels it just as it starts to affect the output.  It appears 
there is a race condition in the timer module.  The fix is simple 
enough.  Move the disable to later in the process.

I don't think I could hit this consistently if I tried (I expect the timing 
would change if you looked at it crosswise), so why am I (relatively) 
confident that this is the source of the problem?  A couple of 
reasons.  The sensitivity of the effect to code changes.  Almost any change 
in code position or timing eliminates the problem.  Changing the timing 
using the fix suggested works as does adding time to the waits before the 
disable.  Also the glitch coming out of the CPU is only 30nS wide and I've 
not run across anything in the micro that can produce a pulse that narrow 
under normal circumstances (hmm if that could be made repeatable it might 
be useful).  Finally reducing the wait  time in the sequence above 
eliminates the rise entirely.

I don't suppose anyone can confirm that the timer has this apparent 
glitch?  I know whenever I see an issue like that mentioned I wonder "how 
did anyone ever run across that?".  Well now I know :)

Robert


" 'Freedom' has no meaning of itself.  There are always restrictions,
be they legal, genetic, or physical.  If you don't believe me, try to
chew a radio signal. "

                         Kelvin Throop, III

Re: [lpc2100] A tale of timing and short pulses

2004-02-06 by microbit

Hi Robert,

I read thru your Email and tried to soak it all up.
Didn't do that 100% yet, but this way I'm sure that when I'm ever stuck
with a pr*ck of a thing like this, it'll ring a bell, and I'll be studying
this deeper :-)

I like the "hmm if that could be made repeatable it might be useful"
!!!!!!!!
30 nS programmable pulses - Yippie !!!

Ok, I think this again confirms clearly the defition of "TRIVIAL" :
" Trivial - means you've found the problem ......"

Best regards,
Kris



----- Original Message -----
From: "Robert Adsett" <subscriptions@...>
To: <lpc2100@yahoogroups.com>
Sent: Saturday, February 07, 2004 3:16 AM
Subject: [lpc2100] A tale of timing and short pulses


> I just spent a day tracing down a problem.  I thought I'd share the, umm
> adventure, in the hopes that the information might be useful (or
> entertaining in a "thank god it happened to him not me" way).
>
> I've been updating my timing routines and testing and documenting
> them.  One of the final steps was to test the one-wire communication
> library I've been working on.  Now this library is a port of the Dallas
> library adapted for the LPC and they've just updated it so thought I'd
> bring over the changes at the same time.  That was probably a mistake.
>
> Transferring the changes went without too many problems.  (It did reveal
an
> apparent bug in PC-Lint but I haven't managed to make a small enough test
> case that shows the problem).  The first test of the functionality went
> without problem.
>
> Feeling confident I moved on to the second test.  Compile, load run.
Looks
> OK.  Run first step, that's odd where's the output.  No output, I figured
> something had changed in the way the routines worked or the display
> routines and I hadn't noticed. Some time later and I hadn't found
anything.
>
> Well, next step is to add debugging prints to figure out where it stops
> behaving as expected. Add a few and narrow it down to an area where the
> devices are being enumerated.
>
> Add debug prints at coarse level to enumeration.  Uh Oh! It suddenly
starts
> working.  Now it's looking like some sort of memory problem or timing
issue.
>
> Feeling confident in the timing routines (I'd just finished testing them
> after all) I starting down the road looking for memory issues, such as
> overflow, bad pointers (especially in newly modified code) and even code
> generation.  Hours later no progress except rather more puzzlement.
>
> Oscilloscope time.  The puzzle deepens.  With the debug prints in the
> enumeration process is clearly proceeding with each device responding in
> turn.  Take out the prints and the process collapses into something that
is
> not even consistent between tests.  Eeekk!
>
> Replace debug prints with calls to WaitUs (a delay routine).  With large
> waits between enumerations (5mS) everything appears to work.  I didn't see
> why that should be necessary, time to explore further.  Shorten the delay,
> the whole thing collapses again.  Hmm, that's pointing back to timing
again.
>
> Look more carefully at the collapse.   Things make less sense than
> before.  A wait after the enumeration appears to affect the
> enumeration.  IE enumerate(), wait for 5ms works, but enumerate wait for
> 1ms fails.  The program is managing to effects events after they
> occur?  Obviously I'm missing something.  Time to stop trying to solve the
> problem and just start gathering information, something I should have
> started earlier.
>
> Add pin toggles around the individual enumerations so that the process is
> easy to correlate on the oscilloscope.  Looks right on the
> oscilloscope.  The enumeration bit pattern pauses at the pause between
> enumerations.
>
> Remove the delay between enumerations.  It still works.  What the...!  The
> pin toggle appears not to happen between enumerations.  After a few more
> tests and fooling around I find it.  The pin toggle is just narrow enough
> to hide behind an oscilloscope graticule.
>
> Remove the toggle between the enumerations.  It collapses again.  Good, at
> least it hasn't just disappeared on me.
>
> Now, let's zoom in on the enumeration and see if I can't find out exactly
> where it's breaking down.  Gather up code to trace along the path as I
> manually decode.  (Note that each bit signal is 80-90uS long and there are
> 70 odd bits so seeing each bit does involve going to a higher level of
> detail, and as it turns out the devil is in the detail).  Repeat several
> measurements at higher resolution, the first 1/2 dozen or so bits are not
> consistently repeatable, they seen to fall into two patterns.  Ahh, now
I'm
> are getting somewhere.
>
> Freeze a sample and look in greater detail (thank heavens for DSOs).
Reset
> OK, bit 1 OK, bit 2 OK, bit 3 OK, bit 4 OK. Wait a minute what's that
> glitch doing there and why is bit 4 so much longer than the others?
>
> Take a break.  An important part of the process, probably too long
delayed.
>
> Come back to problem.  Take several measurements.  Save the first one as a
> reference and take others until I have one that is different from the
first
> for comparison.  Hmm the two measurements are very close.  Where I have a
> glitch on the first one, I have a full rise on my second
> measurement.  Check the timing and indeed the glitch occurs where the
> output should normally rise and the extended low period covers the low
> period for two bits.  It appears something is killing the output of the
> timers match register as it occurs.
>
> Check, read code, check documentation.  Some time later I notice the
> following sequence and realize its possible significance
>
>    - Set output low
>    - Set timer match to set output high in 75 uS
>    - Wait 15 uS
>    - Sample input
>    - Wait 60 uS
>    - Disable timer match (1)
>    - Wait additional time
>
> The disable at (1) is there to prevent the match from occurring on wrap
> around after it's no longer needed.  Unfortunately if the waits are
> accurate (and I just spent a fair amount of time ensuring they were) that
> disable happens at approximately the same time that the match actually
> occurs and cancels it just as it starts to affect the output.  It appears
> there is a race condition in the timer module.  The fix is simple
> enough.  Move the disable to later in the process.
>
> I don't think I could hit this consistently if I tried (I expect the
timing
> would change if you looked at it crosswise), so why am I (relatively)
> confident that this is the source of the problem?  A couple of
> reasons.  The sensitivity of the effect to code changes.  Almost any
change
> in code position or timing eliminates the problem.  Changing the timing
> using the fix suggested works as does adding time to the waits before the
> disable.  Also the glitch coming out of the CPU is only 30nS wide and I've
> not run across anything in the micro that can produce a pulse that narrow
> under normal circumstances (hmm if that could be made repeatable it might
> be useful).  Finally reducing the wait  time in the sequence above
> eliminates the rise entirely.
>
> I don't suppose anyone can confirm that the timer has this apparent
> glitch?  I know whenever I see an issue like that mentioned I wonder "how
> did anyone ever run across that?".  Well now I know :)
>
> Robert
>
>
> " 'Freedom' has no meaning of itself.  There are always restrictions,
> be they legal, genetic, or physical.  If you don't believe me, try to
> chew a radio signal. "
>
>                          Kelvin Throop, III
>
>
>
> --------------------------------------------------------------------------
------
Show quoted textHide quoted text
> Yahoo! Groups Links
>
>   a.. To visit your group on the web, go to:
>   http://groups.yahoo.com/group/lpc2100/
>
>   b.. To unsubscribe from this group, send an email to:
>   lpc2100-unsubscribe@yahoogroups.com
>
>   c.. Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
>
>

Re: A tale of timing and short pulses

2004-02-07 by Richard

Robert,
    Could explain why this is a race condition in the timer?  I too 
am trying abosrb what you are describing.  Excellent description, I 
could feel the hair pulling, especially with the pulse behind the 
graticle!  Ugh!  

Richard

--- In lpc2100@yahoogroups.com, Robert Adsett <subscriptions@a...> 
wrote:
> I just spent a day tracing down a problem.  I thought I'd share 
the, umm 
> adventure, in the hopes that the information might be useful (or 
> entertaining in a "thank god it happened to him not me" way).
> 
> I've been updating my timing routines and testing and documenting 
> them.  One of the final steps was to test the one-wire 
communication 
> library I've been working on.  Now this library is a port of the 
Dallas 
> library adapted for the LPC and they've just updated it so thought 
I'd 
> bring over the changes at the same time.  That was probably a 
mistake.
> 
> Transferring the changes went without too many problems.  (It did 
reveal an 
> apparent bug in PC-Lint but I haven't managed to make a small 
enough test 
> case that shows the problem).  The first test of the functionality 
went 
> without problem.
> 
> Feeling confident I moved on to the second test.  Compile, load 
run.  Looks 
> OK.  Run first step, that's odd where's the output.  No output, I 
figured 
> something had changed in the way the routines worked or the display 
> routines and I hadn't noticed. Some time later and I hadn't found 
anything.
> 
> Well, next step is to add debugging prints to figure out where it 
stops 
> behaving as expected. Add a few and narrow it down to an area where 
the 
> devices are being enumerated.
> 
> Add debug prints at coarse level to enumeration.  Uh Oh! It 
suddenly starts 
> working.  Now it's looking like some sort of memory problem or 
timing issue.
> 
> Feeling confident in the timing routines (I'd just finished testing 
them 
> after all) I starting down the road looking for memory issues, such 
as 
> overflow, bad pointers (especially in newly modified code) and even 
code 
> generation.  Hours later no progress except rather more puzzlement.
> 
> Oscilloscope time.  The puzzle deepens.  With the debug prints in 
the 
> enumeration process is clearly proceeding with each device 
responding in 
> turn.  Take out the prints and the process collapses into something 
that is 
> not even consistent between tests.  Eeekk!
> 
> Replace debug prints with calls to WaitUs (a delay routine).  With 
large 
> waits between enumerations (5mS) everything appears to work.  I 
didn't see 
> why that should be necessary, time to explore further.  Shorten the 
delay, 
> the whole thing collapses again.  Hmm, that's pointing back to 
timing again.
> 
> Look more carefully at the collapse.   Things make less sense than 
> before.  A wait after the enumeration appears to affect the 
> enumeration.  IE enumerate(), wait for 5ms works, but enumerate 
wait for 
> 1ms fails.  The program is managing to effects events after they 
> occur?  Obviously I'm missing something.  Time to stop trying to 
solve the 
> problem and just start gathering information, something I should 
have 
> started earlier.
> 
> Add pin toggles around the individual enumerations so that the 
process is 
> easy to correlate on the oscilloscope.  Looks right on the 
> oscilloscope.  The enumeration bit pattern pauses at the pause 
between 
> enumerations.
> 
> Remove the delay between enumerations.  It still works.  What 
the...!  The 
> pin toggle appears not to happen between enumerations.  After a few 
more 
> tests and fooling around I find it.  The pin toggle is just narrow 
enough 
> to hide behind an oscilloscope graticule.
> 
> Remove the toggle between the enumerations.  It collapses again.  
Good, at 
> least it hasn't just disappeared on me.
> 
> Now, let's zoom in on the enumeration and see if I can't find out 
exactly 
> where it's breaking down.  Gather up code to trace along the path 
as I 
> manually decode.  (Note that each bit signal is 80-90uS long and 
there are 
> 70 odd bits so seeing each bit does involve going to a higher level 
of 
> detail, and as it turns out the devil is in the detail).  Repeat 
several 
> measurements at higher resolution, the first 1/2 dozen or so bits 
are not 
> consistently repeatable, they seen to fall into two patterns.  Ahh, 
now I'm 
> are getting somewhere.
> 
> Freeze a sample and look in greater detail (thank heavens for 
DSOs).  Reset 
> OK, bit 1 OK, bit 2 OK, bit 3 OK, bit 4 OK. Wait a minute what's 
that 
> glitch doing there and why is bit 4 so much longer than the others?
> 
> Take a break.  An important part of the process, probably too long 
delayed.
> 
> Come back to problem.  Take several measurements.  Save the first 
one as a 
> reference and take others until I have one that is different from 
the first 
> for comparison.  Hmm the two measurements are very close.  Where I 
have a 
> glitch on the first one, I have a full rise on my second 
> measurement.  Check the timing and indeed the glitch occurs where 
the 
> output should normally rise and the extended low period covers the 
low 
> period for two bits.  It appears something is killing the output of 
the 
> timers match register as it occurs.
> 
> Check, read code, check documentation.  Some time later I notice 
the 
> following sequence and realize its possible significance
> 
>    - Set output low
>    - Set timer match to set output high in 75 uS
>    - Wait 15 uS
>    - Sample input
>    - Wait 60 uS
>    - Disable timer match (1)
>    - Wait additional time
> 
> The disable at (1) is there to prevent the match from occurring on 
wrap 
> around after it's no longer needed.  Unfortunately if the waits are 
> accurate (and I just spent a fair amount of time ensuring they 
were) that 
> disable happens at approximately the same time that the match 
actually 
> occurs and cancels it just as it starts to affect the output.  It 
appears 
> there is a race condition in the timer module.  The fix is simple 
> enough.  Move the disable to later in the process.
> 
> I don't think I could hit this consistently if I tried (I expect 
the timing 
> would change if you looked at it crosswise), so why am I 
(relatively) 
> confident that this is the source of the problem?  A couple of 
> reasons.  The sensitivity of the effect to code changes.  Almost 
any change 
> in code position or timing eliminates the problem.  Changing the 
timing 
> using the fix suggested works as does adding time to the waits 
before the 
> disable.  Also the glitch coming out of the CPU is only 30nS wide 
and I've 
> not run across anything in the micro that can produce a pulse that 
narrow 
> under normal circumstances (hmm if that could be made repeatable it 
might 
> be useful).  Finally reducing the wait  time in the sequence above 
> eliminates the rise entirely.
> 
> I don't suppose anyone can confirm that the timer has this apparent 
> glitch?  I know whenever I see an issue like that mentioned I 
wonder "how 
> did anyone ever run across that?".  Well now I know :)
> 
> Robert
> 
> 
> " 'Freedom' has no meaning of itself.  There are always 
restrictions,
> be they legal, genetic, or physical.  If you don't believe me, try 
to
Show quoted textHide quoted text
> chew a radio signal. "
> 
>                          Kelvin Throop, III

Re: [lpc2100] Re: A tale of timing and short pulses

2004-02-07 by Robert Adsett

At 12:16 AM 2/7/04 +0000, you wrote:
>     Could explain why this is a race condition in the timer?  I too
>am trying abosrb what you are describing.  Excellent description, I
>could feel the hair pulling, especially with the pulse behind the
>graticle!  Ugh!
Thanks.

As far as explaining the race condition.  Microcontrollers (all that I'm 
aware of) are essentially synchronous devices, everything happens on a 
clock edge.  One of the biggest reasons for doing this is to ensure 
everything happens in a predictable fashion (i.e. if bits from an addition 
operation arrive over several nS it doesn't matter as long as they all 
arrive before the clock edge).

Now the output from the match register is essentially a programmable 
flip-flop.  The output on a match depends on the match occurring, and the 
condition set for match (do nothing, set, clear, toggle, sounds like a j-k 
flip-flop).  This should always produce a well defined output.  The problem 
occurs when the match condition changes at the same time as the match 
occurs (the fact that microcontrollers are synchronous slightly increases 
the chance of this occurring).  The logic output will glitch with its 
output depending on the time through various gates and small differences in 
timing and parasitic delays will change the output.  If you know HW digital 
logic this will be familiar, if not then a simple digital logic 
introduction that discuss flip-flops should introduce the concept.

Could this be fixed in the HW?  Almost certainly.  Is it worth fixing?  I 
doubt it.  It is, however, worth knowing about.

On the other hand it probably would have taken me longer to find the 
problem if the race condition didn't exist and produce a glitch.  Sometimes 
apparent flaws are your friend.

Robert

" 'Freedom' has no meaning of itself.  There are always restrictions,
be they legal, genetic, or physical.  If you don't believe me, try to
chew a radio signal. "

                         Kelvin Throop, III

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.