Here are approximate times (in CPU clocks) that the tests look for:
Code:
0 VBL begins
2272 Flag cleared
2429 Earliest it can be set on first scanline
2465 Latest it can be set for first scanline (when 64th sprite causes overflow)
2542 Earliest it can be set on second scanline
29595 Earliest it can be set on last scanline
The overflow flag time for a scanline is the earliest time for that scanline + 2 PPU clocks * the sprite # that caused the overflow (numbering sprites from zero). The earliest match can be on the sprite #8, and the latest on the sprite #63, so the timing above is 63 - 8 = 55 sprites * 2 PPU clocks per sprite = 110 PPU clocks / 3 = 36.7 CPU clocks.
I added another useful test case to 3.Obscure and updated the archive. Oh, and luckily in this case 3.Obscure doesn't depend on 2.Timing working at all; it only depends on 1.Basics passing. I think my summary of the obscure behavior in the readme should be easier to understand than that listed on the Wiki, since mine is geared towards describing this one aspect of behavior only, rather than all the internal operations.