(Started writing this and then noticed others got to it before. still, I think this might give some more explanation on what's happening :)
The skinny on NES scrolling is still the best reference on this. Though I know some people have had difficulty getting a grip on it, so here's a "quick" summary.
- $2005 and $2006 both enable writing to the same temporary register 't'. 't' is normally copied into the vram address 'v' only at start of the first scanline, but v may be partially updated during rendering by writing $2005 and $2006. v is what the PPU uses for the addressing and fine scrolls when rendering the screen.
- The layout of the 18-bit register v is yyyVHYYYYYXXXXXxxx. HXXXXXxxx is the 9-bit x scroll coordinate and VYYYYYxxx is the 9-bit y scroll coordinate.
- The double-write functionality of $2005 and $2006 shares the *same toggle*. Thus, first writing to $2006 and then to $2005 would write the high byte of $2006 as usual and then the Y-scroll instead of the X-scroll, since the first write toggled the double-write latch for $2005 as well.
- Writing the second byte to $2006 is the *only* way to update the Y-scroll during rendering, transferring t to v.
The way some games used $2005 to do split-screen effects by changing the x-value was by writing $2005 twice during on a particular scanline. Quoting Loopy's doc, this is what happens:
2005 first write:
t:xxxxxxxxxxxABCDE=d:ABCDExxx
x=d:xxxxxABC
Or, if you prefer:
t[4:0] = d[7:3]
fine scroll x is directly set to d[2:0]
- t now contains 00?????? ???XXXXX
2005 second write:
t:xxxxxxABCDExxxxx=d:ABCDExxx
t:xABCxxxxxxxxxxxx=d:xxxxxABC
Or, if you prefer:
t[9:5] = d[7:3]
t[14:12] = d[2:0]
- t now contains 0yyy??YY ???XXXXX
scanline start (if background or sprites are enabled):
v:xxxxxAxxxxxBCDEF=t:xxxxxAxxxxxBCDEF
Or, if you prefer:
v[13] = t[10]
v[4:0] = t[4:0]
Note here that at scanline start, the bits in v corresponding to the Y-scroll value are *NOT* copied from t - they remain as they were. Thus, changing the Y-scroll through $2005 during rendering does nothing at all.
The most flexible way to update v if by actually writing $2005 and $2006 in the order $2006,$2005,$2005,$2006. To see what happens here, again refer to Loopy's doc:
2006 first write:
t:xxABCDEFxxxxxxxx=d:xxABCDEF
t:ABxxxxxxxxxxxxxx=0 (bits 14,15 cleared)
Or, if you prefer:
t[13:8] = d[5:0]
t[15:14] = '00'
- t now contains 00yyVHYY ????????
2005 'second' write: (*TOGGLE FLIPPED BY WRITE TO $2006*)
t:xxxxxxABCDExxxxx=d:ABCDExxx
t:xABCxxxxxxxxxxxx=d:xxxxxABC
Or, if you prefer:
t[9:5] = d[7:3]
t[14:12] = d[2:0]
- t now contains 0yyyVHYY YYY?????
2005 'first' write:
t:xxxxxxxxxxxABCDE=d:ABCDExxx
x=d:xxxxxABC
Or, if you prefer:
t[4:0] = d[7:3]
fine scroll x is directly set to d[2:0]
- t now contains 0yyyVHYY YYYXXXXX
2006 second write:
t:xxxxxxxxABCDEFGH=d:ABCDEFGH
v=t
Or, if you prefer:
t[7:0] = d[7:0]
v[17:3] = t[14:0]
- t now contains 00yyVHYY YYYXXXXX
- v has been updated to yyyVHYYYYYXXXXXxxx (highest bit of yyy got implicitly set to 0)
scanline start (if background or sprites are enabled):
v:xxxxxAxxxxxBCDEF=t:xxxxxAxxxxxBCDEF
Or, if you prefer:
v[13] = t[10]
v[4:0] = t[4:0]
Thus, the trick is to consistently write the same bits over again redundantly - just to get that update of v that the second write to $2006 triggers. This requires some bit shuffling that's best done using lookup tables, like in Memblers's example.
But as Tokumaru says, this is probably overkill for the application you are writing now. Since you just want to display the same cubes over and over again, you can get away with just writing zero to $2006 twice after each palette update. This will make your display start rendering from coordinate (0,0), which will just contain the same blocks of pixels separated apart. And those blocks will of course have been updated with fresh colors from your skillfully hblank-hidden palette updates. But it's still good to know the whole picture I think.
Happy coding! :)