--- In lpc2000@yahoogroups.com, Tom Walsh <tom@o...> wrote:
>
> rtstofer wrote:
>
>> I have spent literally hours searching the Philips web site for
>> documentation other than the User Manual for the LPC2106 and a few
>> app notes.
>>
>> Is there a document that has the electrical and timing specs for
>> this chip?
>>
>> One reason I am curious is that I don't seem to be able to get
>> very short output pulses when I wiggle a pin. Something on the
>> order of 100 nS is about the shortest I can get. Now, I am
>> running at 4x 14.7456 MHz (I think!) and the VPBDIV is set to
>> 0x01. It would seem that, at 59 MHz, I should be able to get very
>> short pulses.
>>
>> This isn't a problem, just a curiousity. But, I would like to
>> review the specs.
>
>
> You cannot really do that with a modern processor, yeah, ARM falls
> under that category. Modern CPUs do two things differently which
> make computing finite execution times difficult: they cache
> instructions and execute instructions in parallel. While looping
> inside a cache boundry, you get your best performance time, the CPU
> doesn't need to reload / dump cache from External RAM (in the case
> of the LPC2xxx, it is the on-chip SRAM, no difference, just a bit
> faster than external [S]DRAM).
>
> ARM processors also execute opcodes in paralell with each other,
> predicative execution. Take the "moveq r1,r1,#0" instruction, that
> is a conditional instruction based on the result of the zero flag.
> While the previous instruction is executing, ARM pipelines the next
> instruction into the microcode unit and sets it up. In this case,
> it gets a value of ZERO all set to be put into R1, but the
> instruction is held up until the value of the zero flag is known to
> be stable. Once it is time to execute the move, the processor
> either does the instruction or discards it.
>
> Meanwhile, another instruction has already been loaded and it, too,
> is ready to go! The predictive exectution can extend beyond just a
> few instructions, but can encompass the width of the cache.
Actually, the ARM7TDMI-S is not that modern, and it is not that
difficult to calculate the instruction timings. See
http://groups.yahoo.com/group/lpc2000/message/7808
The ARM7TDMI-S has a simple three-stage pipeline, fetch-decode-
execute, and all register accesses and calculations happen in the
execute stage. There are no stalls if an instruction depends on the
result of the previous instruction, be it register or flag values.
There is no out-of-order or super-scalar execution, and there is no
microcode, all instructions are hardwired.
And there are no caches to give unpredictable timing, except for the
MAM, and the registers behind the AHB bridge (the VIC registers and
the VPB peripheral registers). Philips documents the MAM timings in
the User Manual, and I have measured the VPB timings in the link
above. They are not unpredictable.
For the full details of the ARM7TDMI-S core, see the manual at
http://www.arm.com/pdfs/DDI0234A_7TDMIS_R4.pdf , in particular
chapter "Instruction Cycle Timings", table 7-2. All four kinds of
cycles (I,C,N,S) take one core clock in the LPC2xxx, except for the
MAM miss and AHB bridge cases.
For example, LDR with R15 destination (such as LDR PC, [R0, #0]) is
listed as "+N +I +N +2S", which for the LPC2xxx means 1+1+1+2 = 5
clocks. Assuming that R0 points to RAM and the word loaded from
there points to somewhere in flash, add 0 for the RAM load (RAM is
always 0 waitstates), and 2 for the initial nonsequential code fetch
from flash (with MAMTIM=3), giving a total of 7 clocks.
For newer ARM cores, such as the ARM920 with caches and multiple
pipeline stages after decode, it is a different story.
Karl Olsen