Author: Anders Eriksson (t2o82p59.telia.com)
Date: 10-24-2000 04:59
Hello,
here are a few obvious tips to optimize the routine above. Which I'm almost sure you already thought about :-) Anyway here goes..
1. Remove all dbra ! Make a big codelist instead.
2. See if your routine that writes into the chunky-buffer can write in nibbles instead of bytes. That way you can skip the lsl.w #4 instructions.
3. The linedoubling routine, make an external movem.l loop of that using as many registers as you can.
Example of one 160 wide c2p line (to 320 wide screen) which is assuming the pixels in "nibble" format, and that will have linedoubling external (faster).
q: set 0
rept 20
moveq.l #0,d0
move.w (a0)+,d0
lsl.l #2,d0
move.l (a1,d0.l),d0
movep.l d0,q(a2)
moveq.l #0,d0
move.w (a0)+,d0
lsl.l #2,d0
move.l (a1,d0.l),d0
movep.l d0,q+1(a2)
q: set q+8
endr
For more than 1 line, make the codelist larger. 68000 has no caches, and really don't like the dbra loops, also you get rid of the addq's and the line-jump instructions this way.
Maybe not a revolution as your previous routine was quite good (except a few details), but perhaps you get some idea from it.
--
Anders Eriksson
support@atari.org (http://www.dhs.nu)
|