Well he did say it was just a brain exercise, rather than some project code. So I figure anything goes, I didn't post my fastest methods though since the goal was size.
Just some (partially or fully) unrolled loops anyways.
But I definitely agree though that code should be kept easy to read and understand. Except inside of loops that run several times per frame, then you definitely should use little tricks like the one dvdmth posted wherever possible. Ditching that compare when copying 32 bytes saves 64 cpu cycles (over half a scanline), the 2 less ROM bytes only matter if you're writing a 1024 byte game or something.
Also nothing wrong with writing code that works, then coming back later and optimizing if needed. Like when I first wrote Roadkill (while still learning how to code), most of the frame was taken up by hit detection code. When I optimized it some years later, the results were the same but the code was something like 300% faster. When handling 62 sprites, that's a tremendously, ridiculously huge difference.