Timings using leau as a 16-bit loop counter

Comparing the use of U vs Y as a 16-bit counter. X is for reference, since it is both faster and smaller. Neither the leau and leas effect the CC flags, and therefor are not as fast or convenient as loop counters.

Using Y as a counter, the initial ldy is 1 cycle longer and 1 byte bigger then ldu, but you save 4 cycles each time through the loop since you don't need a cmpu.

From worst to best:

| lea       | bytes | cycles |
| --------- | ----- | ------ |
| leau -1,u | 11    | 15     |
| leay -1,y | 8     | 12     |
| leax -1,x | 7     | 11     |

3F00 CE000A     3     (   pmode1test.asm):00025 (3)     3               ldu     #10  
3F03 335F       2     (   pmode1test.asm):00026 (5)     8       loop@   leau    -1,u  
3F05 11830000   4     (   pmode1test.asm):00027 (4)     12              cmpu    #0  
3F09 26F5       2     (   pmode1test.asm):00028 (3)     15              bne     start  
                 11   (   pmode1test.asm):00029                         opt     cc  
                      (   pmode1test.asm):00030  
3F0B 108E000A   4     (   pmode1test.asm):00031 (4)     4               ldy     #10  
3F0F 313F       2     (   pmode1test.asm):00032 (5)     9       loop@   leay    -1,y  
3F11 264D       2     (   pmode1test.asm):00033 (3)     12              bne     loop  
                  8   (   pmode1test.asm):00034                         opt     cc  
                      (   pmode1test.asm):00035  
3F13 8E000A     3     (   pmode1test.asm):00036 (3)     3               ldx     #10  
3F16 301F       2     (   pmode1test.asm):00037 (5)     8       loop@   leax    -1,x  
3F18 26FC       2     (   pmode1test.asm):00038 (3)     11              bne     loop@  
                 7    (   pmode1test.asm):00039