In working with the SNES, I've found the 6KB per frame upload limit to be very restrictive. I've been very reluctant to use forced blanking, but I thought to myself about how most TVs probably don't even display the entire screen. Mine doesn't; it leaves off the top 8 pixels and the bottom 8 pixels almost exactly (it fluctuates based on how bright the screen is).
Too bad you made this an "only one option" poll, or I'd be able to go through all TVs in my house and NovaSquirrel's and run the overscan test in 240p Test Suite (PVSnesLib edition) on all of them. So I went based on the HDTV in front of me, which shows all 256x224 pixels when zoomed out. (When zoomed in, it's closer to 256x168.) One Apex TV that I have packed away has a huge overscan; I've been meaning to figure out how to get into its service menu.
The official title safe area (for text and status bars) is 224x192, or a margin of 16 pixels from each side.
Agreed, I have 3 TVs and they all show differently. I'll try to measure it sometime.
I just edited it to include more poll opti ons. The way I know my TV blocks out that much is that R-Type III cuts off the top and bottom 8 pixels, and where its showing is perfectly after the black from comparing it an emulator. I think 90% of games would not fit under the safe zone, (I even have trouble with some games) that seems very harsh. I never payed attention to how much if the screen is not showing on the sides, but I just had the idea you could cut 4 pixels off the sides of the screen with one window layer (?) and only need a 32x32 tilemap. I had been looking at the possibility of using a 16 pixel tile tilemap as I'm trying to port a Neo Geo game, but I found you actually still save on memory because you can better cut out a BG if it has large transparent areas. Also, I like how if I were to update tiles, I could do them in a straight shot.
I see all the active video, horizontally and vertically (but only because I've confed all my TVs to do that).
Espozo wrote:
In working with the SNES, I've found the 6KB per frame upload limit to be very restrictive. I've been very reluctant to use forced blanking, but I thought to myself about how most TVs probably don't even display the entire screen. Mine doesn't; it leaves off the top 8 pixels and the bottom 8 pixels almost exactly (it fluctuates based on how bright the screen is).
...and with OAM you only have a little more than 5KB left, and only 4KB if you're doing a lot of individual 16x16 sprites.
Quote:
How much does your TV display?
And how are we supposed to know ? Isn't there a test ROM or a common game that can be used to test this ?
Since I have a PAL SNES and PAL TVs I suspect that overscan is always showing anyways but this is not an interesting case so I do not answer the poll.
You can count the number of wood panels around the title screen of Super Mario World.
Bregalad wrote:
Quote:
How much does your TV display?
And how are we supposed to know ? Isn't there a test ROM or a common game that can be used to test this ?
tepples wrote:
run the overscan test in 240p Test Suite (PVSnesLib edition)
That's
this.
Espozo wrote:
SNES [...] 6KB per frame [...] forced blanking [...] leaves off the top 8 pixels and the bottom 8 pixels almost exactly
So. SNES. Therefore 170.5 bytes of DMA per scanline.
6KB per frame means ≈35 scanlines of blanking. Meaning you're
already talking about NTSC SNES in the 224 line mode.
I haven't seen an NTSC CRT TV manufactured since the late '80s that displayed meaningfully fewer than 224 lines of picture.
Does the SNES not bother outputting 240 whole lines? Huh.
The SNES can either output 240 lines (which nearly no game does) or 224 lines (99% of games). Many games, already using 224 lines, will use forced blanking to be able to upload more data. I want to use 224 lines, and then force blank out the top 8 and the bottom 8 for 208 lines total.
The SNES emits either 224 or 239 (yes, really) scanlines.
I'm really skeptical of Espozo's assertion that "many" or, for that matter, more than a tiny handful used forced blanking to reduce the vertical height from 224 scanlines.
lidnariq wrote:
I'm really skeptical of Espozo's assertion that "many" or, for that matter, more than a tiny handful used forced blanking to reduce the vertical height from 224 scanlines.
Agreed. Also it'd be interesting to know which of those (so called) 1% of games uses the 240 lines mode.
It's actually 165.5 cycles per line because of the stupid 5 cycle DRAM refresh cycle.
I think the 240 line mode is only available on PAL systems. I've tried using 240 mode before, nothing happened.
The 239-line mode works. I think Tetris & Dr. Mario uses it, but I could be wrong. I know 240p Test Suite uses 224, 239, 448, and 478 line modes.
It's the Sega systems where the bigger-than-224-line mode was broken on NTSC.
I'm highly triggered by the lack of "as much as I set it to display via technician's/configuration mode" poll choice. ;-)
And I'm triggered by manufacturers making service mode instructions hard to find and not exposing picture size controls to the user the way CRT computer monitors used to.
But seriously, I think the intent was to check the box for what kind of overscan you see on a TV set as you received it, because most end users will be using TVs set as they received them.
lidnariq wrote:
I'm really skeptical of Espozo's assertion that "many" or, for that matter, more than a tiny handful used forced blanking to reduce the vertical height from 224 scanlines.
There are quite a few games with suspicious black borders:
Super Mario Kart
Wolfenstein 3D
Out Of This World
Star Ocean (during combat)
Final Fight 1/2/3
Street Fighter 2/Turbo/Super/Alpha 2
Every Super FX game (yes, even Yoshi's Island)
probably more I'm forgetting or unaware of
...not to mention Unholy Night...
I'm pretty sure most games use the default screen height, but it's hardly unknown for games with large sprites or extensive software rendering to need (or at least look like they need) a little more space...
I can't speak for any other games than Final Fight (the original) -- and that's because I actually took the time to examine it.
The game sets forced blank ($2100=$8f) at scanlines 214 and 225. It clears forced blank ($2100=$0f) at scanline 18. I checked with a screenshot: scanlines 0-18 are blanked out (black), as are scanlines 214 through 224. Thus, you only get scanlines 19 through 213 displaying anything.
I simply set a write breakpoint on $2100 (thankfully the debugger is smart enough to know 2100 means "in/for any bank"). Couple frames just to demonstrate:
Code:
Breakpoint 0 hit (31).
0088c7 sta $2100 [002100] A:008f X:0006 Y:0050 S:1fe4 D:0000 DB:00 NvMxdIzc V:214 H:319 F:26
Breakpoint 0 hit (32).
0088c7 sta $2100 [002100] A:008f X:0004 Y:0040 S:1fe6 D:0000 DB:00 NvMxdIzc V:225 H:193 F:26
Breakpoint 0 hit (33).
008817 sta $002100 [002100] A:000f X:0002 Y:30a4 S:1e24 D:0000 DB:00 nvMxdIzc V: 18 H:298 F:27
Breakpoint 0 hit (34).
0088c7 sta $2100 [002100] A:008f X:0006 Y:0010 S:1fe4 D:0000 DB:00 NvMxdIzc V:214 H:310 F:27
Breakpoint 0 hit (35).
0088c7 sta $2100 [002100] A:008f X:0004 Y:0000 S:1fe6 D:0000 DB:00 NVMxdIzc V:225 H:197 F:27
Breakpoint 0 hit (36).
008817 sta $002100 [002100] A:000f X:0002 Y:30a4 S:1e24 D:0000 DB:00 nvMxdIzc V: 18 H:293 F:28
Here's the relevant code for both points. $88c2-88c7 says it all -- Capcom sets this intentionally.
Code:
0088c2 lda $00e8
0088c5 ora #$80
0088c7 sta $2100
008813 lda $0000e8
008817 sta $002100
And for fun, I decided to
nop out the
ora #$80 just to see what the effect was. I expected bad graphical corruption. Result: backgrounds were fine, barring the very bottom (looks like more or less what I'd expect), but also sprites were basically MIA (all invisible/non-rendered). Going through things frame by frame, I could see the player and enemies basically "undrawing" a frame at a time (hard to put into words). I've attached a screenshot of the full results of that. I should've tried forcing rendering on ($2100=$0f) at scanline 0 or scanline 1, just to see what the results were at the top of the screen. I imagine just more corruption.
I get the strong impression they did this because they truly did need the extra CPU time for general processing.
Street Fighter 2 uses the bottom bar to update OAM, so Final Fight probably does the same.
Which games get passed the DMA limit through alternating DMA priorities?
-DKC
-DKC 2
-DKC 3
-Alisha's Adventure (my game)
Haunted: Halloween '85 gets past the NES's counterpart of the DMA limit by double-buffering all six actors' sprite cels. When a frame of animation is displayed, it guesses the next frame based on that frame. Then each actor's CHR RAM uploader is active for one or two frames, transferring 128 bytes per frame, until the next frame is ready. In predictable cases, such as a run cycle or subsequent frames of an attack combo, this completely hides lag. Though some actions are less predictable, such as the player jumping or skidding or the first frame of an attack combo, the fact that all other sprites' cels are predicted means you end up with only one extra frame of lag most of the time.
I wish they did the DKC trick for beat'mups. They would've had 2 players and 6 enemies with little to no force blank.
psycopathicteen wrote:
I wish they did the DKC trick for beat'mups. They would've had 2 players and 6 enemies with little to no force blank.
Assuming you limited each character to 2k, you'd have nothing left in the sprite segments of vram for anything else (items on the ground, weapons, etc).
That's why you need to do my glorious cpu-hungry dynamic vram allocation scheme.
I find people having problems fitting all the sprite data into 16KB when they're using 8x8 and 16x16 sprites to be a little ridiculous. I mean, shoot, every sprite could have its own 16x616 slot, although that would be terrible for uploading tile data.
Espozo wrote:
I mean, shoot, every sprite could have its own 16x616 slot, although that would be terrible for uploading tile data.
In that case, you would use the "peeing on the registers" technique.
Code:
macro dma(n) {
txs //stack points to $4305
pei ({dma_bank}+{n}) //high byte is dma legnth
pei ({dma_address}+{n})
ldy.b {dma_destination}+{n}
sty $2116
sta $420b //a = $01
}
Just played around with 240 line mode and it does work for NTSC, just not for every emulator.
If you want to make a game using 256x240 mode, just make sure you do a good job hiding 32x32 sprites. Either make them pop off the top and bottom of the screen, turn them into a pair of 16x16 sprites, hide them behind a status bar, or simply not use 32x32 sprites.
Goes to show how widely it was used... Surprisingly, of all emulators, I remember ZSNES having it. That's what I used to use before I knew better. I had a giant list of ROMs I got, and I randomly went clicking around (which is actually how I discovered R-Type III) and saw this crappy James Bond Jr. game, and the window expanded vertically a little (224 to 240) when it loaded. I guess the reason this wasn't really used is because the amount of bandwidth with 224 rows is already bad enough, and whatever you're going to gain is going to be off-screen. Now that I think about it, NES games are cut off more than SNES games on my TV. It cuts off the top 8, and then the bottom 24 pixels if I remember correctly.
With 240 mode, you only have about 3.5kB of DMA total, with OAM about 3kB left, and for a lot of small sprite updates only 2kB. I guess, if you have 12 characters, about 1kB in size, animated at 10fps, you can just make it. Or if you're using larger, less flexible slot sizes, at 15fps. I know that if I change the limit in Alisha's Adventure, from 4kB to 2kB, some sprite animation would even get stuck because some of the animation frames are bigger than 2kB, and would have to be trimmed/optimized to fit.
You know, I had thought I had thought of a few optimizations for my tile uploading routine, and I figured I'd post them here. Here's the complete code that I'll talk about:
Code:
.proc tile_uploader
sep #$10
rep #$20
lda #$4300
tcd
lda #$1801 ;Set DMA mode (word, normal increment) and destination register (VRAM write register)
sta $00
sta $10
sta $20
sta $30
ldy #$80
sty a:$2115
lda a:TileRequestCounter16x16
beq tile_uploader_32x32
ldx #$00
tile_uploader_16x16_loop:
lda #$0040
sta $05
sta $15
;16x16 Top Half
lda a:TileRequest16x16LoWordTable,x
sta $02
clc
adc #$0040
sta $12
lda a:TileRequest16x16BankByteTable,x
tay
sty $04
sty $14
lda a:TileRequest16x16VramAddressTable,x
sta a:$2116
ldy #$01 ;Initiate DMA transfer (channel 0)
sty a:$420B
;16x16 Bottom Half
clc
adc #$0100
sta a:$2116
ldy #$02 ;Initiate DMA transfer (channel 1)
sty a:$420B
inx
inx
beq tile_uploader_done
cpx a:TileRequestCounter16x16
bne tile_uploader_16x16_loop
tile_uploader_32x32:
lda a:TileRequestCounter32x32
beq tile_uploader_done
ldx #$00
tile_uploader_32x32_loop:
lda #$0080
sta $05
sta $15
sta $25
sta $35
;32x32 Top Part
lda a:TileRequest32x32LoWordTable,x
sta $02
clc
adc #$0080
sta $12
adc #$0080
sta $22
adc #$0080
sta $32
lda a:TileRequest32x32BankByteTable,x
tay
sty $04
sty $14
sty $24
sty $34
lda a:TileRequest32x32VramAddressTable,x
sta a:$2116
ldy #$01 ;Initiate DMA transfer (channel 0)
sty a:$420B
;32x32 Upper Middle Part
clc
adc #$0100
sta a:$2116
ldy #$02 ;Initiate DMA transfer (channel 1)
sty a:$420B
;32x32 Lower Middle Part
adc #$0100
sta a:$2116
ldy #$04 ;Initiate DMA transfer (channel 2)
sty a:$420B
;32x32 Bottom Part
adc #$0100
sta a:$2116
ldy #$08 ;Initiate DMA transfer (channel 3)
sty a:$420B
inx
inx
cpx #$40
beq tile_uploader_done
cpx a:TileRequestCounter32x32
bne tile_uploader_32x32_loop
tile_uploader_done:
lda #$0000
tcd
stz TileRequestCounter16x16
stz TileRequestCounter32x32
rts
.endproc
The main thing that I have had done is set direct page to #$4300, because the 256 byte area I am interacting with the most is there, and because it's on a page boundary, it's one cycle less. I am using Y for loading and storing when I can, which is faster as it is only 8 bit. And finally, I am only using "clc" at the beginning of when I'm doing my additions, because the number shouldn't ever overflow because a graphic won't be partially in two different banks. I'm sure you've already done these optimizations though; I'm still learning.
One thing I've been dealing with that I should stop worrying about is the vram finding routine. I thought of some extra stuff to speed up finding slots and although I shouldn't need to look through as many slots to find an empty one, each slot is slower. I'll have to see what's faster, but for that I'd have to get something working first...
Edits: I found a
couple lot of bugs that just so happened not to affect anything by chance. I just fixed them.
If 16x16 tiles are set up like that, you don't need to add #$0040 to $4302. It leaves off at the end of the dma copy.
Okay. I tested it, and you're right, I'll change the code above. I don't think there's much left you can do to make it faster now.
It only works if it's the same channel.
God, I'm an idiot. I realized I only said it worked because I only ran the 32x32 code that I noticed didn't have the changes done to it yet. Although you don't need to do the addition thing if it's the same channel, you still have to set everything else up again, so I'll pass.
After careful observation, I'm losing exactly zero pixels on both of my test TVs (1 LCD, 1 CRT), on my US (NTSC) SNES.
That's on all 4 edges.
Oh...
I've come under the conclusion that I'm going to have to end up using forced blanking though, as there's no way I'll be able to fit all the updates in the small 6KB window. Although I might be able to pull of updating sprite tiles evenly between 20 and 30fps, there'd be absolutely no room for anything else and possibly not always sprite tiles either.
How many of you have a TV where you can see a little overscan on the left or right, but none on the top and bottom? My girlfriend's TV does it when we played Super Nintendo games.