Soon after writing the game, I mentioned some of how I'd optimised the code in email to a friend. In case it's of any interest to a wider audience, here it is (slightly modified, as you might expect). My first entry was a port of my `ztrack', which was loosely based on Casio's old "Turbo Drive" LCD driving game. What made this particularly suited to the speccy was the way the display worked. I could stick a bitmap on the screen with all the track lanes and cars in each position then, during the game, I could modify the attributes to light up the relevant bits of the `LCD'. Finally, a situation where the attribute system is actually an advantage... :-) There were just two problems with this: - a full screen bitmap would take 6144 bytes. - I wasn't sure if I could port the ztrack code itself into 1k even without a memory-hogging bitmap. The only practical solution to the bitmap problem seemed to be to use a very small version of one half of the track display, then expand and mirror it to the screen. Just how small I reckoned it would have to be was a bit of a shock - around 32x32. With the mirroring, and the track display not taking up the whole screen vertically, the effective resolution of this is 64x48, i.e. the resolution of ZX81 `PLOT' graphics. Less than great. :-( There was only one real approach to the other half of the problem, the port of the ztrack code - try it and see. I decided to largely ignore how much space things took up, preferring instead to get the thing working. Then once I had it working, I could look at how things were on the memory front. It all seemed to go rather well. The non-display code was shaping up to fit in the 500 to 600 byte range on even the initial implementation, and with the bitmap taking up about 128 bytes, there should have been a respectable amount left over for display code. Because obviously, the display code would be simple. But things were not quite so obvious as I might have hoped. :-) The problem I'd overlooked was how exactly to hide/show the various car and lane graphics. The scheme I ended up with was to use what are essentially fixed-position attribute `sprites', which worked well enough - but adding sprite data and code to the expansion/mirroring code, and the surprisingly large main loop, gave me a rather troubling total. The code was over 1300 bytes. Now, I've optimised Z80 stuff to save memory before with ZCN, so it would be fair to say I've got a few chops in this department. :-) But this one was bloody hard. Having to save over 300 bytes (leaving room for the Basic wrapper, etc.) on something which was already pretty small took me about 10 hours in the end, and drove me to some fairly desperate measures to save room. I can't remember everything, but I'll just give a few examples: - probably the simplest one was replacing my `convert pixel line co-ord to screen address' routine with a call to the ROM's one. I'd originally thought I couldn't use that, as Basic uses a weird co-ordinate system (starting 16 pixels up from the bottom-left of the screen, with the bottom two character lines inaccessible) which the ROM routine follows. But if you call it partway in, you can do lookups for the whole screen with it. It's fair to say this was one of the less desperate measures. :-) - my original format for the attribute sprite data was something like this: defw 05800h+12*32+14 defb 4 defb 01100000b defb 11110000b defb 11110000b defb 10010000b That's the address of the top-left position (in the attribute area), then the number of lines in the sprite bitmap, then the bitmap data showing which attributes to change. But using a whole separate byte for the number of lines - which could actually fit in 3 bits - seemed downright profligate. :-) And since the address was known to be in the attribute area, I thought I could get away with just having an attribute-area offset (which fits in 10 bits), and use the top 4 bits for the number of lines, giving: defw 4*4096+12*32+14 defb 01100000b defb 11110000b defb 11110000b defb 10010000b Now, that's all very well, but you need to be able to get at the data without the code to get at the two separate things taking more than a few extra bytes. Even if I'd needed no more code to deal with the new format, I'd only save 25 bytes, so keeping it small was important. The obvious approach would have involved bit-shifts and ANDs, and would have taken 11 bytes. But I had a sneakier approach which ended up taking 9. Whether it was worth the extra hassle for two bytes is debatable, but still... :-) The Z80 has two instructions intended for shifting BCD digits in memory left and right, probably as an aid to doing long multiplication/division with BCD numbers. As you might expect, these instructions do the shift in 4-bit chunks, and use the accumulator as a carry digit. So for RLD you end up with something like this happening: _______ ____ | | | | v | v | ,-----+-----. ,-----+-----. |hi A lo| |hi (HL) lo| `-----+-----' `-----+-----' | ^ |______________| That led to me using the following code to get the top then bottom half of the byte at HL non-destructively: ld d,(hl) xor a rld ld b,a ld (hl),d rrd ld (hl),d To be honest, I was just chuffed to have finally had a reason to use RLD/RRD. :-) - random numbers are vitally important for ztrack, and unfortunately the easy way out, using the R register, wasn't good enough. The `random' numbers when using that gave really crappy results. So I reluctantly put in my usual Z80 random-number routine, which is about 50 bytes. That stayed in for quite a while. But as I squeezed the rest of the code harder and harder and struggled to find something else to cut down, the `rand' routine was something which had to give. So I turned to the ROM. It doesn't have a directly-callable version of Basic's RND, and even if it did the code uses floating-point numbers, and is slow by machine-code standards. But I had plenty of CPU time to spare, so I had a go. The basic idea was to copy the code from the ROM to RAM, so I could get it to do a RET, and also so I could stop it leaving a copy of the random number on the ROM's FP calculator stack, which would eventually fill the memory if left unchecked. The code for that is below. It shows another example of the insane extent to which I took things - the printer buffer happens to come directly after the attribute area, and I happened to have a routine to initialise all the attributes just before. So I decided to save 3 bytes by using the printer buffer for my copy of the RND code. :-) ;set up basic attrs ... ldir [this leaves de pointing just past the screen memory] ;copy guts of the ROM's RND routine so we can call it sanely ;b is still zero ;de is pointing at rnd_bit (printer buffer) ld hl,025fdh ld c,40 ldir ;stick a RET on the end ld a,0c9h ld (de),a ;stop it doing a dup to leave an FP return value, which otherwise ;would gradually fill memory ld hl,038h ;end-calc/nop ld (05b18h),hl (The "end-calc" is a bytecode used by the ROM's FP calculator, BTW. When you call it you use these bytecodes inline, with "end-calc" marking the endpoint.) I also needed a little wrapper routine to replace `rand', but this still cut the RNG overhead to about 20 bytes. By the time I managed this one, I have to say, saving 30 bytes seemed like a miracle. :-) At any rate, I eventually ended up getting the thing to fit. And wouldn't you know it, no sooner had I struggled to manage that than I noticed an easy way to save another 9 bytes. Bah.