Hi.
I'm trying to implement clean 4-way scrolling. Jurassic Park is a good example of this working really well. Im planning to also hide the top/bottom 8 pixels, but that's note done yet.
I'm getting really close to a bug-free version of it working, but now my code has become kind of a mess.
When I started I was hoping for something clean and symmetric, but since then many things crept in:
- Because of mirroring, vertical and horizontal scrolling have to be treated quite differently, so that almost double the code right off the bat.
- Crossing the nametable boundary (or wraping) adds some complexity.
- Having 16x16 metatiles, but scrolling by adding 8x8 tiles at a time requires the code to properly handle coming from the right/left, top/bottom and adding the individual tiles. Also dealing with the possibility of single tiles on the corners.
- Using the upper left corner as a reference point makes going right/left, up/down ever so slightly different adding more tests/branches.
- Attributes are also super annoying to manage.
Anyone here managed to implement this while keeping the code relatively compact and clean? If so, do you have any advice on what you did you keep this as painless as possible.
-Mat
You will need separate counters for Camera Y position and Nametable Y position to keep things simple.
bleubleu wrote:
I'm trying to implement clean 4-way scrolling. Jurassic Park is a good example of this working really well. Im planning to also hide the top/bottom 8 pixels, but that's note done yet.
Do note that Jurassic Park blanks scanlines by using blank tiles, rather than forced blanking, so it can still use the MMC3's scanline counter normally. If you have mapper IRQs at your disposal, this is indeed a pretty good solution, but if you don't, timing the blank scanlines right may be harder than it sounds, specially at the bottom of the screen.
Quote:
Because of mirroring, vertical and horizontal scrolling have to be treated quite differently, so that almost double the code right off the bat.
This gets better if you use 4-screen instead.
Quote:
Crossing the nametable boundary (or wraping) adds some complexity.
Having your row/column updates always be composed of 2 parts (even if the length of one of those parts is 0) will make this the norm, instead of an exception.
Quote:
Having 16x16 metatiles, but scrolling by adding 8x8 tiles at a time requires the code to properly handle coming from the right/left, top/bottom and adding the individual tiles. Also dealing with the possibility of single tiles on the corners.
I update whole metatiles at once, so I don't really have a problem with this. If you're updating only half metatiles because of VRAM bandwidth concerns, I suggest you still do things like if you were updating entire metatiles (i.e. decode entire metatiles from the level map and buffer the tiles in RAM), but upload them to VRAM one half at a time on consecutive vblanks. You may need to pick which half goes first depending on the direction the camera is moving, but that should be easy. I don't really get what the problem with the corners is.
Quote:
Using the upper left corner as a reference point makes going right/left, up/down ever so slightly different adding more tests/branches.
You just need to add the camera's dimensions to the respective axes and handle wrapping, no big deal. Or keep a separate set of coordinates for the bottom right corner that you update in sync with the top left, so you can just mix and match coordinates as needed depending on where the new tiles are supposed to go.
Quote:
Attributes are also super annoying to manage.
Yup. I usually keep a mirror of the attribute tables in RAM, which I modify in place when necessary, so that later I can just copy entire rows/columns of bytes to VRAM.
Quote:
Anyone here managed to implement this while keeping the code relatively compact and clean?
I've done it a couple of times for unfinished projects, and the things you mentioned do indeed have to be actively taken care of. The NES wasn't really designed for free scrolling like this, so things like the attribute tables and mirroring will definitely get in the way if you don't design everything around them from the start.
Quote:
If so, do you have any advice on what you did you keep this as painless as possible.
If you can, ditch the mirroring and go with 4-screen, that'll definitely make things easier. I made the switch to 4-screen in my most recent scrolling engine and I don't regret it. It may seem like the cheap way out, but if you're using CHR-RAM in a discrete logic mapper, 4-screen is basically free since you can't even buy 8KB RAM chips anymore, so you can easily wire the cartridge to use the excess CHR-RAM as NT RAM.
Dwedit's advice is good too. Keeping different sets of Y coordinates (one relative to the level and another relative to the name tables) and updating them in sync will save you from having to convert back and forth between them, which is a pain.
Another thing worth mentioning is that there's another technique that can be used to achieve glitch-free 8-way scrolling on the NES, particularly useful if you don't have mapper IRQs at your disposal: use horizontal mirroring and have the PPU hide the leftmost 8 pixels of the screen, while sprites are used to hide the rightmost 8 pixels. Alfred Chicken and Felix the Cat do this. The main drawbacks are the amount of sprites used and the fact that you only get to place 7 more sprites worth of actual game objects per scanline before flickering begins. These 2 games are pretty cool though... Felix the Cat even manages to display pretty large sprites, like vehicles Felix can get in, despite having lost 1 sprite per scanline.
It does get pretty messy no matter what you do.
Here's my Lua prototype implementation of what I was planning to put in my engine:
https://github.com/fo-fo/ngin/blob/mast ... roller.lua. It might give you some ideas.
It's completely generic, supporting any mirroring mode. It can scroll 8 pixels at a time, but multiple calls to the function can be made to scroll more pixels per frame. The API basically consists of
scrollHorizontal() and
scrollVertical(). Those then call
scroll() which is sort of a generic routine that can scroll in any direction. Then
scroll() calls
update() and
updateAttributes() to fill the PPU update buffer. There are some simplifications for purposes of prototyping, though. For example, map data is read by "random access" (
MapData.readTile( mapX, mapY )). In an optimized implementation you would want to avoid recalculating the map data address all the time (but this, too, can get tricky if you implement stuff like metatiles in metatiles...)
If you can, I'd suggest using macros to write the code for just a single scroll direction (e.g., to the right), and then expand that macro for all 4 scroll directions. It might turn into a bit of an if/else mess, but it's still better than duplicating a ton of code. (Just a fair warning: it's very easy to make oversimplifications in the code when considering only one scroll direction...)
It has sort of already been mentioned but conditionally updating 2 rows and 2 columns at any time the NT:s need to be overwritten helps with the attribute problem
thefox wrote:
It does get pretty messy no matter what you do.
That. I couldn't have said it better.
Just plan well from the beginning and it isn't so bad, specially if you use 4-screen.
tokumaru wrote:
If you can, ditch the mirroring and go with 4-screen, that'll definitely make things easier.
That was going to be my advice also. The other thing that can make things easier (in some ways) is using 32x32 metatiles. That way you can write an entire attribute byte when you draw instead of having to mirror attribute ram or read-then-write it.
Quote:
This gets better if you use 4-screen instead.
Quote:
Just plan well from the beginning and it isn't so bad, specially if you use 4-screen.
If for some reason you give any care at developing a game with the same limitations as actual games were developed in the NES' life, do not use "4-screen" unless you absolutely must, as only 3 games did that, period. RAM was expensive back then and wasting it just for ease of coding was unrealistic.
Now if you don't care, then go ahead and use 4-screen, then it sounds like it'd be much easier but it'd feel like cheating.
For "perfect" scrolling with no actifacts you need to blank at least the top 8 scanlines (if using 8x8 sprites), or even the top 16 scanlines (when using 8x16 sprites) in order to avoid sprite pop-up when scrolling vertically. For that reason you'll need to mess with timing and PPU/mapper registers mid-frame for any clean vertical scrolling. So the only con of using vertical mirroring instead of 4-screen is that you need to blank 16 scanlines, and 8 is not an option anymore. (Technically 15 would be enough when using 8x16 sprites, but that doesn't make a great difference).
gauauu wrote:
The other thing that can make things easier (in some ways) is using 32x32 metatiles. That way you can write an entire attribute byte when you draw instead of having to mirror attribute ram or read-then-write it.
Except that every other screen, vertically, is misaligned with the attribute grid due to name tables being 30 tiles (7.5 attribute bytes) tall. Unless you "cheat" and treat the last row of each screen of the level map as non-existent, simplifying rendering but complicating collision detection.
Bregalad wrote:
RAM was expensive back then and wasting it just for ease of coding was unrealistic.
Tons of games packed an extra 8KB of WRAM for "ease of coding" though.
Some NES programmers nowadays like to role-play and pretend they are in the 80's, so they won't do things that weren't commonly done back then (they'll hardly give up modern emulators, debuggers and tools though), even if they're cheap/trivial today.
I prefer to think that as technology advanced, the way things were done changed during the years the NES was active. Hardware got cheaper, tools got better, and so on, and it's only natural that development happening after that time continues to change. Each developer will have a different opinion on what's cheating and what's not.
tokumaru wrote:
Except that every other screen, vertically, is misaligned with the attribute grid due to name tables being 30 tiles (7.5 attribute bytes) tall. Unless you "cheat" and treat the last row of each screen of the level map as non-existent, simplifying rendering but complicating collision detection.
Even then, you can work with attribute nybbles of 4x2 tiles instead of having to work with quantities of 2-bit at a time; this is still complicated but less than working with entirely 16x16 metatiles.
Quote:
I prefer to think that as technology advanced, the way things were done changed during the years the NES was active. Hardware got cheaper, tools got better, and so on, and it's only natural that development happening after that time continues to change. Each developer will have a different opinion on what's cheating and what's not.
In this case, why limit yourself to 4 screens ? You can just put enough RAM so that the entiere level is decoded to VRAM at once, and nametables are just bankswitched in and out as the player scrolls. This makes a lot of sense.
tokumaru wrote:
gauauu wrote:
The other thing that can make things easier (in some ways) is using 32x32 metatiles. That way you can write an entire attribute byte when you draw instead of having to mirror attribute ram or read-then-write it.
Except that every other screen, vertically, is misaligned with the attribute grid due to name tables being 30 tiles (7.5 attribute bytes) tall. Unless you "cheat" and treat the last row of each screen of the level map as non-existent, simplifying rendering but complicating collision detection.
That's absolutely true. Which is why I qualified the "some ways" -- attributes are easier. Other factors become issues instead.
(I "cheat" in my game, and skip the last 16 pixels of each room. It does complicate things in other spots)
Bregalad wrote:
You can just put enough RAM so that the entiere level is decoded to VRAM at once, and nametables are just bankswitched in and out as the player scrolls.
I wouldn't do this myself, but if someone decided that this was worth the trouble of creating a new mapper (including modifying emulators for testing), I wouldn't think this was a bad idea. It's pretty cool, actually.
Like I said, everyone draws the line somewhere. Creating mappers is a little beyond my skillset, so I'd rather use what's readily available, but I won't restrict myself to the configurations that were common because that's supposedly more "authentic". We're not even talking about something new here, 4-screen WAS used back then, just not by many games.
tokumaru wrote:
Bregalad wrote:
You can just put enough RAM so that the entiere level is decoded to VRAM at once, and nametables are just bankswitched in and out as the player scrolls.
I wouldn't do this myself, but if someone decided that this was worth the trouble of creating a new mapper (including modifying emulators for testing), I wouldn't think this was a bad idea. It's pretty cool, actually.
It's a fun idea for sure... could even be prototyped on PowerPak with its 512 KB PPU-mappable RAM.
Another option uses a mapper with a timer (e.g. MMC3) and vertical mirroring.
Code:
$2000 $2400
___:___:___:___:___:___:___:___+___:___:___:___:___:___:___:___
| |
| Playfield 512x224 . |
| |
| . |
| |
| . |
| |
| . |
| |
| . |
| |
| . |
| |
|_______________________________._______________________________|
|_Status bar 256x16_____________|_Blank area_256x16_____________|
Rows 0-27 are the playfield: 512x224 pixels, 64x28 rows, or 16x7 attribute bytes.
Rows 28-29 are things that appear above the playfield.
Divide the screen into 4 horizontal strips from top to bottom
- A 16-pixel-high blank strip near the top, drawn using the bottom right 256x16 of the map
- A 16-pixel-high status bar, drawn using the bottom left 256x16 of the map
- The variable-height part of your playfield from the current scroll position to the end of row 27
- The remainder of your playfield starting at row 0
The screen then looks like this:
Code:
___:___:___:___:___:___:___:___
|_Blank area_256x16_____________|
|_Status bar 256x16_____________|
| |
| 256x208 chunk of playfield |
| |
| |
| |
| _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |
|^^^ Skip status after IRQ |
| |
| |
| |
| |
| |
|_______________________________|
You set an IRQ such that when the scroll is about to reach row 28/line 224 of the tilemap, where the status bar is, the scroll position is reset to the top of the tilemap.
This way, you can update entire 2 rows (16 pixels) of the map at a time without artifacts and without having to shift attribute nibbles, though you may need to combine nibbles from two different 32x32 pixel metatiles at the seam. Adding an additional strip of blank area at the bottom reduces visible area to 256x192, eliminating even that.
tokumaru wrote:
Tons of games packed an extra 8KB of WRAM for "ease of coding" though.
That's ... not true?
NesCartDB says 271 out of 1385 games had WRAM. Of those, 61 did NOT have a battery. It's not as extremely rare as the set that had 4-screen nametables, but it's still quite rare. Space for extra program state is also conspicuously more useful and can solve more problems than 4-screen nametables, for the same monetary cost.
thefox wrote:
It's a fun idea for sure... could even be prototyped on PowerPak with its 512 KB PPU-mappable RAM.
Mappers with ROM nametables (VRC6, N163) would already let you play with this using contemporary hardware, albeit at the loss of compression.
4-screen might not have been common, but how is gauntlet "cheating"? (Other than its publisher falling from nintendos' grace that is).
To me, the NES is mostly an interesting interface providing a common centerpoint of expressions and experiences for its associated software. Every board except a NROM could in some way be considered as "cheating", be it battery backed ram, program bank switching, irq timers, or making use of that empty memory range for 1 or 2 more nametables, or some fill mode functionality, but the thing is this was always intended in order to extend the life of the console. To me, that's exactly what homebrewers are effectively doing: extending its life some more. You don't necessarily need to come up with new hardware tricks to do so, but.. i think anything goes.
If someone came up with a game + pcb that used nametable bankswitching to produce a virtually larger nametable space, i'd play that and be intrigued.
Why is Gauntlet cheating?
Because 4-screen makes the problem easy to solve. And we all know that something is only worth doing if it's terribly complicated.
Only half sarcastic.
Why climb a mountain when you can use the escalator?
One argument that occurs to me is that Gauntlet literally just runs in a 64x60 world, the entire level is just stored in the nametable RAM, and there's no dynamic updates for scrolling.
It feels like the least interesting thing you could possibly do with 4-screen nametables while still technically using all of them.
One side effect of that is that you can have a larger 'naturally occuring' wraparound of 2x2 screens' worth rather than 1x1, 2x1 or 1x2. Of course with software implementations you can have a wrap-around as big as you want if that's a feature of your design.
Would it be cheating to prerecord some PCM quality Audio and put a tiny mp3 player inside the cartridge, and feed it to some expansion audio port, to play independent of the game code?
We could do all sorts of crazy things these days and get it to work on a real NES.
I suppose the question is SHOULD we do things that most NES games didn't do?
dougeff wrote:
Would it be cheating to prerecord some PCM quality Audio and put a tiny mp3 player inside the cartridge, and feed it to some expansion audio port, to play independent of the game code?
IMO, yes.
But I'd say no if the buttons on the MP3 player were controlled by the 2A03.
By "independent" I meant the CPU wouldn't have to waste any time processing the audio, just triggering it to play.
I suppose in that scenario, you would still have the 2A03 to play sound effects.
In the case of an mp3 player, that'd pass one threshold i think is rather significant: the sound isn't processed in any way by the NES. It's just passively mixed. A side-point to that is it could be done with or without the internals; and both methods require some tampering. But on the other hand, the trigging of sounds is still some level of interaction between the unit, the player module, and the user.
(some comments on the last item were made just as i hit post)
Edit: expansion sound synths are also just passively mixed.
lidnariq wrote:
dougeff wrote:
Would it be cheating to prerecord some PCM quality Audio and put a tiny mp3 player inside the cartridge
I'd say no if the buttons on the MP3 player were controlled by the 2A03.
And it's especially not cheating if the MP3-playing MCU presents an interface to the 2A03 resembling that of the
mapper of Moero Pro Yakyuu (the Japanese version of
Bases Loaded).
I mean, an MP3 player isn't particularly different from streaming Red Book audio, and that's basically half of what's supported by the PCEngine CD, SegaCD, or MSU-1. Sure, the Famicom was the previous generation, but I don't think that's a big enough difference.
It's more that mp3 files are a symbolic representation of the period between mid-90:s and most of the 00:s. So some might experience a cultural dissonance stemming from associations and categorizations. Which is kind of what i think if 4-screen is legit or not is stemming from, too.
dougeff wrote:
Would it be cheating to prerecord some PCM quality Audio and put a tiny mp3 player inside the cartridge, and feed it to some expansion audio port, to play independent of the game code?
It would be a little lame though since it wouldn't work without modifying the console. (Or without an audio passthrough dongle... and even with that not on NES-101.)
dougeff wrote:
Would it be cheating to prerecord some PCM quality Audio and put a tiny mp3 player inside the cartridge, and feed it to some expansion audio port, to play independent of the game code?
We could do all sorts of crazy things these days and get it to work on a real NES.
I suppose the question is SHOULD we do things that most NES games didn't do?
Asking "is it cheating" is inherently the wrong question. There are no "rules" for hobby homebrew games (other than if you consider legal rules about copyright). The question comes down to: What do YOU want to do?
Some people care about making (or buying, or playing) games that use only the technology available at nes release (NROM?)
Some people care about games that use technology that was common (or at least available) at the time.
Others just want to push their nes to do whatever they can make it do.
None of these goals are invalid, just different niches of the hobby, or different goals. I may not be interested in some goofy mp3 player cart, or an ARM chip that runs the game logic separately and shoves data into CHR-RAM, but that doesn't make those things somehow less worthy to exist. This is a hobby, and people can make what they want without being criticized for cheating.
Now that I got that off my chest, my preference for my own games, and what I find interesting, is: Was this technology available during the commercial life of the system?
4-screen wasn't common, but yes, it existed.
"Happened ever" isn't exactly the same as "was seen with any significant frequency"
"Happened 3-4 times in an entire library of 1300-2000 games" is awfully rare. More games used ROM nametables than had 4-screen RAM nametables, and we don't talk about them anywhere near as often.
On the other hand, Gauntlet was a smash hit in the arcades and influenced games like Druid. It probably had influence on Diablo as well even if Rogue is cited as David Breviks personal source of inspiration (he wanted it to be turn based, but there was a group decision against it). If you look past procedural map generation, i think Diablo has more in common with Gauntlet (multiplayer cooperation, swarming enemies, realtime action). Anyway, It has a prominent cultural significance. I'm not sure i can say the same most of the time if i single out a game from the lower half of the NES library (which i feel like noone wanted to remember up until reviewing disappointing retro games became a youtube phenomenon).
The nes adaptation was very decent, even if it had to butcher the smoothness of enemy movement and some other things.
I think Rad Racer II is of less significance, but at least it's a follow-up to a very strong title in its day.
Anyway.. rare or not, if 4-screen helps me or somebody i collaborate with putting out a game that people can enjoy, that's all the motivation i need to use it. And one of the most financially project-friendly pcb:s out there has it, so if you're using that pcb, you might as well use its features whenever it might be called for.
Yeah, modern hardware that gives 4-screen (i.e. GTROM or INL's 4-screen variant of UNROM512) for incrementally "free" is a very different calculus.
Don't forget the four Sachen games (Jurassic Boy II, Zhongguo Daheng, Rocman X, Street Heroes) as part of the 4-Screen universe. Together with Gauntlet, Rad Racer and Napoleon Senki, that's an astounding seven games!
I had been trying for a while to hack Super Mario Bros. 3 to use four-screen mirroring but gave up trying. Now that's what I would call a worthy improvement hack.
Unfortunately, four-screen mirroring does not solve the problem of sprites suddenly appearing at the left and upper edges. I've been wondering how cycle-intensive it would be for a CHR-RAM game to automatically shift the pattern data to provide a clean cutoff without hard-clipping left eight screen pixels completely. I usually play in a window, where I find such clipping quite objectionable.
FWIW, the VS Uni/Dualsystem also has 4-screen nametable memory.
The original poster didn't say whether they wanted a status bar or not. I bet they also didn't expect this to turn into a multi-page philosophical discussion either, heheh. Welcome to the forum, bleubleu.
I bring up the status bar because I made the nametables banked on GTROM to make that easier. You can put the status bar in it's own nametable bank, so the main screen can scroll freely.
Memblers wrote:
FWIW, the VS Uni/Dualsystem also has 4-screen nametable memory.
I'm not certain how many of those games really take advantage of all four screens; a bunch (especially ports from NES games) seem to treat it as "simultaneously your choice of H or V"
Quote:
Unfortunately, four-screen mirroring does not solve the problem of sprites suddenly appearing at the left and upper edges.
off the top of my head...
The left edge problem can be fixed by turning the left 8 pixels off.
The top edge could be fixed with an 8 Sprite limit (using 8 blank sprites) on the top 8 pixels, and keeping BG rendering off for the top 8 pixels. Perhaps 9, since sprites are 1 pixel lower.
But, old TVs would hide the top 8+ pixels anyway, so most NES games didn't care about it.
dougeff wrote:
Why climb a mountain when you can use the escalator?
Because you enjoy climbing mountains, and don't want to become overweight have a earth attack in the next 10 years ?
Quote:
"Happened 3-4 times in an entire library of 1300-2000 games" is awfully rare. More games used ROM nametables than had 4-screen RAM nametables, and we don't talk about them anywhere near as often.
Exactly my point. If 4-screen is really useful I'd advocate going ahead and using it. However I think it's really lame to use it if it's not actually needed and it's just because it's easier to code scrolling engine, as Tokumaru advocates.
dougeff wrote:
The left edge problem can be fixed by turning the left 8 pixels off.
That's not a solution but a cop-out. I don't want anything clipped. The whole point of considering the use of four-screen mirroring is to have four-way scrolling without having to chop off a part of the screen completely. Four-screen mirroring will be completely useless if one then still has to chop off a part of the screen because of the NES' inability to properly handle negative sprite X/Y positions.
4 screen mirroring would still get rid of the glitch colors you see in SMB3 and Kirby's Adventure.
Quote:
I don't want anything clipped.
Only solution i can think of that prevents sprites from wrapping around because of being scrolled out that doesn't disable the leftmost column is to check if OAMbuffer+OAMstruct::xpos,x is > #248 and if so move them off-screen on the y axis. It means sprites will snap in/out to the far right, but it can be considered cleaner.
For wider metasprites, you may also want to do some checks to the left with the conditionals based on the entities' xpos and xradius so that you can conditionally "cancel" tiles that have wrapped
all the way around.
It helps make a simpler algorithm if all entities are
either center-aligned or left-aligned, xpos-wise. Don't mix anchorpoints.
dougeff wrote:
But, old TVs would hide the top 8+ pixels anyway, so most NES games didn't care about it.
Not on PAL systems, unfortunately. That is why I prefer to try and keep any scroll glitches on the sides. But sprite pop-in is hard to hide.
NewRisingSun wrote:
I don't want anything clipped. The whole point of considering the use of four-screen mirroring is to have four-way scrolling without having to chop off a part of the screen completely. Four-screen mirroring will be completely useless if one then still has to chop off a part of the screen because of the NES' inability to properly handle negative sprite X/Y positions.
That was exactly my point, and since you HAVE to disable both the left and the top 8 scanlines to ever hope of scrolling sprites smoothly in the NES screen (this is a hardware limitation), this makes 4-screen mirroring almost entierely useless.
The only way you can have smooth scrolling without sprite pop up AND not having to "hide" any part of the screen is to scroll by 8 pixel increment. For some cases it might work (Famicom Wars or Dragon Quest's dungeons comes to mind), but in the general case this is going to look worse than side screen artifacts and/or screen clipping in my opinion.
Quote:
4 screen mirroring would still get rid of the glitch colors you see in SMB3 and Kirby's Adventure.
So will "vertical mirroring".
well, there's still a span of different expressions how artifacts and/or safety measures manifest themselves. Each to their own preference. It's a judgement call game-for-game, from one designer to another.
Quote:
and since you HAVE to disable both the left
You don't *have* to if you widen the range of acceptable ideals a little. You have the somewhat combineable choices of:
-Column disabling
-Sprite overflow
-Prioritize background over sprites (perhaps conditionally) & have the status bar be all-solid or mostly solid.
-Snapping troublesome hardware sprites out of view. This is done conditionally & piecemeal to the buffer between the 1st write pass (when metasprites are written to the buffer) and the oamdma upload. This causes virtual cancellation on the rightmost column as a tradeoff.
-Accepting some wraparound
-(Maybe something else i don't know about?)
Each is fine. The OP asked for techniques for making an all-directional scroller simpler, but not necessarily more compliant to a specific ideal other than preventing nametable/attribute glitches as i understood it. 4-screen mode means you have a treadmill larger than a screen in both axes simultaneously, so that certainly helps.
Quote:
And since you HAVE to disable both the left and the top 8 scanlines to ever hope of scrolling sprites smoothly in the NES screen (this is a hardware limitation), this makes 4-screen mirroring almost entierely useless.
Not really. If you have CHR pattern memory to spare, you can keep shifted versions of each sprite to use at the left and upper edges. So if your two-horizontal-sprites object (with anchor point in its top-left corner) is at position X=0, you put sprite A-normal at X=0 and sprite B-normal at X=8, but if the object is at position X=-2, you put sprite A-shift2 at X=0 and sprite B-normal at X=6. Same thing for the upper edge.
Of course, you'd have to prepare each sprite eight by eight times for all possible shift positions in CHR-ROM memory if you want one-pixel granularity, which is why I was wondering how CPU-intensive it would be for a CHR-RAM game to create these shifted sprites on-the-fly as they are needed. I could also imagine a custom mapper hardware monitoring writes to RAM at $0200 to detect sprite data that needs to be shifted (for example, by setting one of the unused bits in OAM Byte 2), and automatically providing shifted CHR pattern data when the PPU requests them.
Or instead of shifting, if you have chr-ram, and have it organized in dedicated chr space slots for entities (like in
solstice) , you can mask individual lines of pixels off the contents in each slot in order to mask out any lines causing a visual wraparound. So, the wraparound is still happening, but the player would never know.
The problem with this approach is you likely have to update the CHR in round-robin/cascadewise. Maybe some clever chr bank cycling can be used depending on mapper to either improve the update rate or what slivers of chr space are masked.
But... it's quite the effort for something as trivial as a little bit of wraparound. It's for those of you who want a mountain to climb.
I thought sprites disappear off the side, not wrap around.
Best example i could find in under a minute:
Look at this video:
https://www.youtube.com/watch?v=EEtz3g8_kXsPlay it back at 0.25x and pause-play when the enemy wraps around. Note that the leftmost column is disabled though.
sprites wrap around because the sprite xpos is 8 bits (ie 256 positions) which is exactly the width of the screen area. There are no positions off-screen except on the y axis.
Kid Icarus makes frequent use of this as a feature. Vanilla Metroid could have done it in vertical shafts but didn't in practice.
Games that make sprites disappear when going over the left/right edge do so willfully in software by moving them to a non-visible y-position.
Quote:
sprites wrap around because the sprite xpos is 8 bits (ie 256 positions) which is exactly the width of the screen area. There are no positions off-screen except on the y axis.
That does not explain why a sprite at X=254 would not just be drawn at X=254..255 but also at X=0..5.
psycopathicteen wrote:
not wrap around.
You mentionned wrap-arround quite a few times, but actually this does not happen (in hardware) for sprites. It happens for BG on the wrong mirroring axis. Normally any metasprite engine would check individual sprites so that no wrapping arround happens, period.
Quote:
Not really. If you have CHR pattern memory to spare, you can keep shifted versions of each sprite to use at the left and upper edges. So if your two-horizontal-sprites object (with anchor point in its top-left corner) is at position X=0, you put sprite A-normal at X=0 and sprite B-normal at X=8, but if the object is at position X=-2, you put sprite A-shift2 at X=0 and sprite B-normal at X=6. Same thing for the upper edge.
Technically this is correct but it'd be so complicated to implement that... it's much simpler to disable the top and left pixels.
Just repeating what Bregalad said.
Sprites do NOT wrap on the NES.
They do wrap on the SNES. Which makes pushing them offscreen downward potentially more problematic, if sprite sizes are set larger than 16x16.
One thing that I will never understand is why someone would program a vertically-scrolling game to use Vertical Mirroring. I'm looking at you, Star Force, Star Soldier, Legendary Wings...
dougeff wrote:
They do wrap on the SNES. Which makes pushing them offscreen downward potentially more problematic, if sprite sizes are set larger than 16x16.
Only if 239-line mode is set and the sprite sizes are 32 and 64. In 224-line mode, y=225 should be safe.
NewRisingSun wrote:
One thing that I will never understand is why someone would program a vertically-scrolling game to use Vertical Mirroring. I'm looking at you, Star Force
Tecmo's
Star Force scrolls in all four directions. Its playfield is slightly wider than 1 screen, with the background moving horizontally in the opposite direction of the player's craft. It loads the nametable straight across for the same reason that
Super Mario Bros. 3 loads the nametable straight down: not having to manage the complexity of both vertical and horizontal scroll seams.
NewRisingSun wrote:
One thing that I will never understand is why someone would program a vertically-scrolling game to use Vertical Mirroring. I'm looking at you, Star Force, Star Soldier, Legendary Wings...
Because it makes it easy to use the other screen and a sprite-0 split for a status bar or some other non-scrolling element. My nesdev competition games were both vertically-scrolling, and used vertical mirroring (so I could put a status bar at the top of Spacey McRacey, and put spikes at the bottom of Robo-Ninja Climb) But why games like Legendary Wings used it and DIDN'T have a status bar, I don't know.
bregalad wrote:
but actually this does not happen (in hardware) for sprites.[...] Normally any metasprite engine would check individual sprites so that no wrapping arround happens, period.
Ah! Thanks for correcting. Then the problem is even less of a problem.. since you can display the background in the leftmost column while hiding sprites.
gauauu wrote:
But why games like Legendary Wings used it and DIDN'T have a status bar, I don't know.
The code might've been copied directly from another project in their repository to save a bit of time.
FrankenGraphics wrote:
Ah! Thanks for correcting. Then the problem is even less of a problem.. since you can display the background in the leftmost column while hiding sprites.
You can, but this is not what I'd call scrolling properly. Many games do not write to $2001 during gameplay so I invite you to use some cheat code to write $1a to $2001 (instead of the normal $1e or $18) during gameplay and see what happens. This looks super weird and is not what I'd call "proper scrolling". $18 is the canonical way to get 100% "proper" scrite scrolling, but in many cases using $1e and having sprites pop-up is acceptable.
I must say I find occasional sprite pop-up less annoying than constantly-black left 8 columns, especially if the graphics obviously weren't designed for it. A Boy and His Blob is the perfect example: thanks to the black bar on the left, the entire red border becomes assymetric. Ugly.
If people don't notice any popping on Kid Icarus, then it's not a big problem.
I don't mind a little sprite popping, but it's annoying when games make entire characters pop.
NewRisingSun wrote:
I must say I find occasional sprite pop-up less annoying than constantly-black left 8 columns, especially if the graphics obviously weren't designed for it. A Boy and His Blob is the perfect example: thanks to the black bar on the left, the entire red border becomes assymetric. Ugly.
On a real TV, you won't notice any black border, let alone with a real CRT TV. However if the graphics aren't designed around it it's another problem. I must say 31 columns is not a very easy number to deal with
Hi!
First of all, thanks everyone for the advice. I'm not going to reply to everyone but i did read the whole thing. I managed to make the whole thing work, the code is somewhat elegant, i think.
One thing I would recommend anyone doing that kind of thing is to take a few hours to create yourself a little reference implementation in a language that is a bit more expressive/flexible than ASM. I made myself a little C# control that behaves exactly like a PPU and and can show me which tiles/attributes are updated (red = tile, yellow = attribute). It allowed me to figure out what my algorithm was going to be and then I simply translated it in ASM. And when I had bugs in the ASM, I could simply compare and figure out where things went wrong. See attached image. If mesen could do this, it would be awesome.
One last problem I have is that in extreme conditions, like when going diagonally and being perfectly aligned in X and Y, and being on a frame where a full row and column of tiles AND attribute will load in, i will exceed the NMI cpu cycle limit by about ~400 cycles.
Since I am going to blank the top/bottom 16 scanlines, would it be possible to offload some of the PPU update work there? Like update the palettes there or part of the tiles/attributes? How common is this as a technique?
(I am also aware I could simply change my algorithm to, for example, just process 1 row or column per frame, but im too lazy to change that right now).
-Mat
bleubleu wrote:
One last problem I have is that in extreme conditions, like when going diagonally and being perfectly aligned in X and Y, and being on a frame where a full row and column of tiles AND attribute will load in, i will exceed the NMI cpu cycle limit by about ~400 cycles.
I'm fairly sure it should be possible to fit updates in VBlank, assuming you're talking about ONE row and ONE column of 8x8 tiles (not 16x16 metatiles).
You should use $2000.4 to your advantage when updating the nametable column; when updating an attribute table column this is more limited but you can still use this to your advantage knowing it will skip 3 rows, but you can still use 4 bulks of 2 bytes instead of 8 bulks of 1 byte.
So you should have the following:
- Update a nametable row : Done in two bulks (because of vertical mirroring, you need to write to two screens), total of 32 bytes
- Update an attribute table row : Done in one bulk of 8 bytes
- Update a nametable column : Done in one bulk, total of 30 bytes (uses column mode)
- Update an attribute table column : The most annoying, it has to be done in 4 bulks of 2 bytes. (uses column mode)
This means, in the absolute worst case, you have to write new address to $2006 8 times, and write 78 bytes of data to $2007. Assuming 4 cycles for load and 4 cycles for writing to the register, that's 8*(4+4+4+4) + 78*(4+4) = 752 cycles. Of course more cycles are needed for logic, etc... but this should be doable in VBlank without using any further tricks.
Quote:
Since I am going to blank the top/bottom 16 scanlines, would it be possible to offload some of the PPU update work there? Like update the palettes there or part of the tiles/attributes? How common is this as a technique?
This technique is uncommon, but was made probably popular by the game Battletoads (and it's sequel Battletoads and Double Dragon) which are very popular among NESDevers. Personally unless I'd
really need the extra blanking time, I'd rather hide them using either a blank CHR-ROM bank or by disabling the background only and having 8 high priority sprites at Y=0 hiding the real sprites, avoiding Battletoads-style forced blanking on the top of the screen and all the problems this creates.
Also: if you aim at great scrolling you should hide the top scanlines, not the bottom, because sprites can't be shown partially on the screen on the top of the screen, but they can on the bottom. Also turning sprites rendering off during the frame can cause erratic problems.
If you turn rendering off at the top of the screen, as opposed to using blank tiles like Jurassic Park does, you can indeed use that time to keep accessing VRAM, but there are a couple of catches: Firstly, the NTSC dot crawl pattern will be different, because the variable PPU cycle at the beginning of the frame doesn't happen when rendering is off; Secondly, you don't get to use the MMC3 scanline counter to time the blanking area anymore, because it doesn't work when rendering is off. Sprite 0 hits are also not an option.
If you can deal with the slightly different appearance of the image (IIRC, Battletoads is like this, for example), and you have an alternate way to time the blanking area, then yeah, you can get quite a bit of extra vblank time.
Are you using a zero page buffer?
Quote:
You should use $2000.4 to your advantage when updating the nametable column; when updating an attribute table column this is more limited but you can still use this to your advantage knowing it will skip 3 rows, but you can still use 4 bulks of 2 bytes instead of 8 bulks of 1 byte.
Right now i split my stuff in 3 buffers which use different strides: 1, 8 and 32. 1 and 32 uses $2000 to avoid having to increment the address manually. The 8 byte one is for attributes and needs to be handled manually.
But you are right, I think will try to avoid using generic buffers (which needs loops/logic) and I will try to unroll them in common update scenario (like a full column, etc.) in order to minimize the update cost.
Quote:
Are you using a zero page buffer?
No. Right, that should save a few cycles. I will look into that.
Quote:
If you turn rendering off at the top of the screen, as opposed to using blank tiles like Jurassic Park does, you can indeed use that time to keep accessing VRAM, but there are a couple of catche
Thanks. I have a lot to learn...
-Mat
You can also use the stack instead of a zero page buffer. (If you're not.) Then you don't need to do iny or inx (if you are). Just pla, sta $2007 X times.
Wouldn't you need rows of 33 tiles instead of 32?
Bregalad wrote:
I'm fairly sure it should be possible to fit updates in VBlank, assuming you're talking about ONE row and ONE column of 8x8 tiles (not 16x16 metatiles).
You can actually fit a lot in vblank depending on how optimized your code is. My engine can do both a column and a row of metatiles (i.e. 132 tiles) plus their attributes, along with a sprite DMA. I use completely unrolled code (i.e. no index increments or branches, which saves a lot of time) to barely fit this all in standard vblank time, and other types of updates (palettes, patterns, etc.) can only be done when the scrolling isn't taking all the time, but that's OK, because no game will ever scroll diagonally at 16 pixels per frame every frame, so there are plenty of opportunities for other types of updates.
Kasumi wrote:
You can also use the stack instead of a zero page buffer.
The stack is slower, though. That being said, I do find it a bit difficult to take advantage of ZP's faster load time. If you use indexing, the speed advantage is gone (takes the same time as absolute indexed or PLA, which's 4 cycles), so you need unrolled code to load from constant memory locations, but since 8-way scrolling means that rows and columns are nearly always split across 2 name tables, that's not trivial. It can be done, but you have to be clever.
tokumaru wrote:
My engine can do both a column and a row of metatiles (i.e. 132 tiles) plus their attributes, along with a sprite DMA. I use completely unrolled code (i.e. no index increments or branches, which saves a lot of time) to barely fit this all in standard vblank time, and other types of updates (palettes, patterns, etc.) can only be done when the scrolling isn't taking all the time, but that's OK, because no game will ever scroll diagonally at 16 pixels per frame every frame
That infamous hill in Sonic the Hedgehog 2: Chemical Plant Zone act 2 is the exception that proves the rule.
It's a good thing I'm not particularly fond of Chemical Plant Zone so I wouldn't want to design a level like it anyway. Still, full speed on both axes is way too fast, so if at least one of the axis is slightly slower than 16 pixels per frame, maybe 14 or so, there'll still be some opportunities for other types of updates.
Another thing that prevents this from being a huge problem is that when the screen is scrolling that fast, the lack of other updates is much harder for the human eye to notice, and if someone does notice, they'll slow to look at it and things will immediately go back to normal, and there'll be nothing to see!
If you're using an unrolled loop, how do you jump across name tables?
The unrolled loop has several entry points, that you select based on the amount of tiles to transfer, and by using indexed addressing the index can be manipulated so the correct part of the buffer is read.
bleubleu wrote:
If mesen could do this, it would be awesome.
Not a bad idea, shouldn't be too hard to highlight tile/attribute modifications in the nametable viewer, I think - I'll add it to my list.
All right guys.
Thanks to all your advice I got my NMI running in < 1820 cycles all the times, even with crazy diagonal updates.
I unrolled all column loops, optimized the row (tile/att) updates, moved some stuff on ZP and everything works. My palette update loop wasn't unrolled, and not on ZP... shame on me.
It even simplified the X scrolling algorithm a bit.
Thanks!
-Mat
tokumaru wrote:
I use completely unrolled code (i.e. no index increments or branches, which saves a lot of time)
Most of the time, a partially unrolled loop will do almost just as well as a fully unrolled loop, but without wasting a ridiculous amount of ROM. For example, in a "normal" loop you'd spend half the time doing increment/decrement and compare, and half the time actually transfering data, 50%/50%, this is the worst case. A partially unrolled loop can get you arround 20%/80%, while a fully unrolled loop would get you to 0%/100% ; the closer you get to fully unrolled, the more ROM you waste for a very marginal time gain.
Bregalad wrote:
Most of the time, a partially unrolled loop will do almost just as well as a fully unrolled loop
I can assure you I do need every cycle I can get. I have like, 6 or so cycles of vblank time left in the case I mentioned above, where both a column and a row of metatiles need to be updated.
Quote:
...without wasting a ridiculous amount of ROM.
ROM is a cheap resource most of the time, so I'll gladly sacrifice it if that means improvements on aspects that are not as flexible (e.g. RAM and CPU time). Not all kinds of unrolled loops need "ridiculous amounts of ROM" though: 128 bytes (PLA + STA $2007 32 times) is hardly a ridiculous amount of space. I usually put my vblank handlers in a separate PRG bank anyway, so I have plenty of room for unrolled code that'll allow me to make the most out of the limited vblank time.
Yeah, if you have to use fully unrolled loops because partially unrolled doesn't make it, then using the stack with PLA makes the most sense, because it uses much less ROM (even though it's the same speed as a LDA $xxxx,X).
I don't deny that fully unrolled makes sense in some cases, such as yours, but in the majority of cases where a fully rolled loop barely don't make it in time, a partially unrolled loop will. I was just pointing out that.
Oh yeah, if what you have is a fully rolled loop, just partially unrolling it should result in a significant speed improvement.
On the NES how many cycles does OAM DMA take? Are you doing any kind of dynamic animation?
According to
this wiki article, itself referring to Dischs' document:
Quote:
On NTSC, count on being able to copy 160 bytes to nametables or the palette using a moderately unrolled loop, plus one 256-byte display list to OAM
And in the
PPU OAM article, there's a more precise specification:
Quote:
Not counting the OAMDMA write tick, the above procedure takes 513 CPU cycles (+1 on odd CPU cycles): first one (or two) idle cycles, and then 256 pairs of alternating read/write cycles. (For comparison, an unrolled LDA/STA loop would usually take four times as long.)
Any dynamic updates are best done to the buffer, rather than individual edits to OAM. If you want to beat OAMDMA you must restrain yourself to update less than 16 sprites (or less than 64 OAM entries in any case) per average vblank.
Dynamic updates as in CHR-RAM updates. Although if I was making an NES game, I'd probably use bankswitched CHR-ROM instead.
Hey guys.
Sorry to resurrect my own thread. I'm doing the black bars at the top and bottom.
Jurassic park does it by bank switching. One thing i overlooked is that they assume that every palette has a black color at the same location. I think this is a big constrain on colors, so I would like to avoid that. But id still like the bars to be black (having them the BG color is easy).
For the bottom one, its kind of easy. Receive and IRQ, wait until hblank, disable PPU and do a palette swap real quick and the bottom of the screen goes black. Done. If timed correctly its 100% clean.
The top is really hard. Almost impossible. Palette swaps are possible, but it messes the scrolling. It can be fixed, but it requires that weird 2006/2005/2005/2006 trick described in that skinny doc. But by the time you swapped the palette and fixed the scrolling, 1-2 lines of garbage will have time to draw. Am I missing something here?
The other way that was mentioned is with a sprite overflow. But i'm not sure it solves anything. Will I still have to do a palette swap in that case too i guess.
Is there a 3rd option I am missing?
Thanks!
-Mat
What you're trying to do is way more complicated than what Jurassic Park does. Messing with the palette mid-screen is not for the faint of heart. Disabling/enabling rendering mid-screen can also corrupt sprites in certain cases.
If you're dead set on doing this the harder way just so you can have black borders, I suggest you figure out how "that weird 2006/2005/2005/2006 trick" actually works, otherwise you'll end up in a world of frustration.
Another tip that may help you out, is that when rendering is off and the PPU address register (the one you set via $2006) is pointing to the palette area, the color at that address gets displayed instead of the background color. Combine that with the fact that even though they're never displayed during rendering, color 0 of the last 3 background palettes ($3F04, $3F08 and $3F0C) still exist in memory, so you can have one of them set as black and simply point at it when rendering is off, eliminating the need to change the background color. For the top border, you still have to set the scroll, of course, since the address register will be pointing at the black palette entry. And you'll need to time the top border using something other than MMC3 IRQs if rendering is off.
A sprite overflow would only mask sprites, you'd still have to do something about the background.
Are you sure that the background color isn't good enough for the border? If you settle for that you can just do what JP does and use blank (i.e. color 0) tiles for the border.
You could also use the grayscale and/or color emphasis bits to make the border look different from the background color without much trouble, since those don't affect the scroll.
Actually, I'm a moron. Never mind.
The top/bottom 8px that don't get rendered normally (or cropped by real CRTs) are enough to hide any attribute garbage.
No need for additional black bar, if you code it correctly. Which I am doing right now.
-Mat
bleubleu wrote:
The top/bottom 8px that don't get rendered normally
They absolutely ARE rendered normally.
Quote:
(or cropped by real CRTs)
This is the case of most NTSC sets (although it's not a clean "8 at the top, 8 at the bottom", it can vary a lot depending on the TV), but I've heard that PAL TVs show everything.
Overscan is very close to a clean 8/8 on digital sets, but on analog sets, you're right that it varies.
I've measured a couple TVs in the wild, and you can measure your own with a PowerPak and
240p Test Suite.
I would recommend not assuming that the top and bottom eight scanlines are shown --- so don't put anything important there --- but not assuming that they are not shown either. If you can avoid showing garbage in any part of the screen, then by all means, do avoid it. As a PAL NES owner, I have always borne a grudge against game developers who just nonchalantly tolerate ugly scroll seams at the top and bottom screen edges.
tokumaru wrote:
but I've heard that PAL TVs show everything.
Those extra 50 scanlines in the PAL PPU "vblank" are actually part of the active picture area in a PAL TV, so there's around 25 scanlines of black padding at the top and bottom of the picture. That's why the full 240 scanlines (well, actually 239 in the PAL PPU) are displayed despite most TVs hiding some of the scanlines.
NewRisingSun wrote:
I have always borne a grudge against game developers who just nonchalantly tolerate ugly scroll seams at the top and bottom screen edges.
I agree, and this is >95% of japanese-NTSC developed games.
NewRisingSun wrote:
I would recommend not assuming that the top and bottom eight scanlines are shown --- so don't put anything important there --- but not assuming that they are not shown either. If you can avoid showing garbage in any part of the screen, then by all means, do avoid it.
How insistent are you about "
all means"? Would you accept, for example, a game where the NTSC cartridge costs $40 but the PAL cartridge costs $50 because it contains a more advanced mapper or more RAM on the cartridge, such as an upgrade from UNROM/UOROM to MMC3, from no WRAM to WRAM, or from CHR ROM only to CHR ROM and CHR RAM, solely to implement means of hiding the seam? Would you be willing to wait for the developer to see how many copies a game sells in NTSC markets and only a year later make a PAL version reengineered to hide artifacts if the sales quantity warrants? And if so, how many others like you would be willing to wait and/or pay just to hide the seam?
Why an upgrade from UNROM/UOROM to MMC3 might be necessary:
The means of hiding artifacts means requires access to a programmable interval timer (PIT) or fine-grained CHR bank switching, such as switching to the bank containing CHR data for sprites that have been software-clipped at the top.
Why adding WRAM might be necessary:
Additional scratch space to hold CHR data for sprites that have been software-clipped at the top.
Additional scratch space to hold additional rows of cached decoded tilemap data.
Why an upgrade adding VRAM might be necessary:
Hold CHR data for sprites that have been software-clipped at the top for display.
Hold additional nametable data for 4-screen VRAM.
Hold additional nametable data for status bar.
tepples wrote:
How insistent are you about "all means"?
As insistent as the phrase "
If you can avoid showing garbage" implies. If it's unavoidable, then so be it, but don't nonchalantly show garbage thinking that nobody will ever see it and judge your game for it. For example, PAL Solstice's top screen garbage at the title screen is completely avoidable, therefore inexcusable, and should have been grounds for "Seal of Quality" (cough) denial.
Quote:
They absolutely ARE rendered normally.
You are right. I just meant they are not visible... From what I've read (and please correct me if I'm wrong) emulators will generally tend to crop the top/bottom 8 scanlines. CRTs are similar (on average) but some will be biased a bit lower (like show from 12 to 236).
Either way is fine with me, if I ever need to, it will be very easier to add a small black at the bottom to account for these CRTs.
Anyhow, I managed to get everything working. Completely artefact free 4-way scrolling, NO need to additional 16px black bars like JP did (again, working under the assumption that only lines 8 to 232 are visible). A cool ~1000 lines of ASM (lots of spaces/comments tbh).
-Mat