I figured I should not clog up the forum by asking a question every so often, so I made this thread...
Currently I'm struggling with understanding what CHR-RAM exactly is. So it's a 8k RAM chip on the board... but how do I use it? The confusion first began when I read about converting an NROM game to use CHR RAM, which seemingly copy-pasted all the tile data in the PPU "manually" using $2007. Then I had a look at a few UNROM games, to see where that RAM is present. Emulators only show a huge wall of simultaneously changing bytes, including my own assembled UNROM project, which does nothing to the PPU yet other than initializing it and performing OAM transfer in NMI.
This made me even more confused...
Then I checked online what an UNROM board looks like, to see if there really is RAM there. It is, so now I have no idea how I need to use CHR RAM. Do I simply copy from PRG to $6000-$7FFF and whatever I put in $6000, the PPU will automatically see at PPU $0000? It would make sense considering that CHR ROM is connected and the PPU "sucks in" whatever a mapper allows it to see. Or do I need to take the graphics data from PRG ROM, copy it to CHR RAM, and then copy it again to the PPU via $2007 manually? This would make the whole idea of using a RAM chip almost pointless, because you could just copy stuff directly with $2007 and leave the RAM completely unused. I know the truth is somewhere in there, but I can't see it myself.
Also, how does construcing a cart work with UNROM donors, if they have the wrong nametable mirroring soldered? Can you simply remove that and solder the other one?
Great, I'm not 100% on this so now someone can correct me if I'm wrong, but here it goes:
When you are using CHR-RAM, you are simply storing the gfx data in the PRG-ROM, and then use
your own routines to copy that data to the gfx RAM in the PPU. The CHR-ROM way stores the gfx
on a separate ROM chip on the GamePAK.
The $6000-7FFF area is WRAM or Work RAM. It's just extra RAM you can add to a GamePAK that
your game can use.
CHR-RAM can be good if you want to animate your background, like creating faux parallax-scrolling, or
compress the gfx data(don't know if any NES game does this).
The CHR RAM chip lies under the full control of PPU (the PPU bus), so writes to $2006/$2007 during vblank and forced blank is the only way to do it.
za909 wrote:
Do I simply copy from PRG to $6000-$7FFF and whatever I put in $6000, the PPU will automatically see at PPU $0000?
Unfortunately no. like DoNotWant said, $6000-$7FFF is a completely separate optional RAM chip that lies in the CPU/PRG bus.
DoNotWant wrote:
When you are using CHR-RAM, you are simply storing the gfx data in the PRG-ROM...
Better said as "you are reading the gfx data contained in the PRG-ROM".
za909 wrote:
Also, how does construcing a cart work with UNROM donors, if they have the wrong nametable mirroring soldered? Can you simply remove that and solder the other one?
I think so.
In the same way that a game copies 30 rows of nametable data, each 32 bytes in length, to $2000-$23BF, a CHR RAM game copies up to 512 tiles of pattern table data, each 16 bytes in length, to $0000-$1FFF. See
Switching to CHR RAM.
Thank you, so now I understand how this works. Simply the name was very misleading to me because it made me assume that this RAM being mentioned is something on the cart and not the PPU pattern RAM $0000-$1FFF.
I assume it's very easy to make the faux background parallax scrolling by ROL-ing or ROR-ing twice through all 16 bytes of a tile, I just have to get the lowest bit from the last one into the carry to start off.
Also, mirrored sprites could be easier to make this way, becaus instead of physically swapping the left and right sprites, I can swap them in CHR RAM instead. At least for the player I could do that, since I will never draw anything during VBlank as no scrolling will be used for the playfield, only fadeouts, drawing the next room with rendering turned off, and then reenabling it next VBlank when it's finished. This gives me a lot of VBlank time to use for tile animations and palette changes.
Now that sound is fully operational, I'm wondering if there's a better way to test run times than putting a breakpoint at the jsr to the Play routine and then Stepping over in the FceuX debugger. Some automated test would be much better because I usually run everything in 1300-1400 cycles when not much is happening, but when a sound effect plays and/or music commands are read and processed, I can get around 2300 cycles. I kind of need to know what the absolute biggest workload can be, and not just assume based on averages and high peaks I happened to catch manually.
za909 wrote:
Thank you, so now I understand how this works. Simply the name was very misleading to me because it made me assume that this RAM being mentioned is something on the cart and not the PPU pattern RAM $0000-$1FFF.
It is on the cart. The cart controls what memory is mapped to each PPU region within $0000-$2FFF. It can map a given address to its own memory (the CHR RAM) or to the 2K video memory in the console. In fact, a few carts (Gauntlet, Rad Racer II, and Napoleon Senki) map nametable addresses to a RAM chip in the cart, and one (Magic Floor) uses half the video memory in the console for pattern table instead of nametable. But for the vast majority of carts, $0000-$1FFF is on the cart, and $2000-$2FFF is in the console (with some form of mirroring). And the UNROM board contains a RAM that it maps to $0000-$1FFF.
Quote:
I'm wondering if there's a better way to test run times than putting a breakpoint at the jsr to the Play routine and then Stepping over in the FceuX debugger. Some automated test would be much better because I usually run everything in 1300-1400 cycles when not much is happening, but when a sound effect plays and/or music commands are read and processed, I can get around 2300 cycles. I kind of need to know what the absolute biggest workload can be, and not just assume based on averages and high peaks I happened to catch manually.
To visually see how long your code is taking, you can turn on bit 0 of PPU port $2001 (layer enable and tint control) for about 340 cycles and then turn it off. This temporarily forces the PPU to use column 0 of the palette (light grays) for a few scanlines, which should produce a gray stripe across your screen. The lower the stripe, the more CPU time you're using.
Code:
PPUMASK = $2001
BG_ON = %00001010
OBJ_ON = %00010100
LIGHTGRAY = %00000001
; Draw a light gray bar for 3 scanlines, which is 341 cycles
draw_timing_stripe:
ldy #BG_ON|OBJ_ON|LIGHTGRAY
sty PPUMASK
ldy #67 ; this is (340/5) - 1
@wait_around:
dey
bne wait_around
ldy #BG_ON|OBJ_ON
sty PPUMASK
rts
za909 wrote:
I assume it's very easy to make the faux background parallax scrolling by ROL-ing or ROR-ing twice through all 16 bytes of a tile
Simple, yes, but also slow. VBlank time is quite short (about 2273 CPU cyles), so even with a lot of tricks and code unrolling you can't send more than 20 or so tiles each frame. With forced blank you can maybe double that number before players notice that a big part of the screen is missing.
But that would be if you were only updating patterns, which is almost never the case, since you also have to update sprites, palettes, name tables, and so on.
Quote:
Also, mirrored sprites could be easier to make this way, becaus instead of physically swapping the left and right sprites, I can swap them in CHR RAM instead.
You might enjoy programming for the Master System, where there's no hardware sprite flipping (it has background flipping though, which the NES lacks), but on the NES this is not practical at all, because of the short VBlank time I already mentioned.
If you think that physically mirroring the sprite positions is too much trouble, a better solution is to write 2 metasprite definitions for each animation frame, one facing left and another facing right.
Quote:
At least for the player I could do that, since I will never draw anything during VBlank as no scrolling will be used for the playfield
You'll have to do the math and decide if this is worth the trouble. Even if you're doing nothing more than a sprite DMA and setting the scroll in your VBlank handler, there's only enough time left for updating 12 or so tiles, with highly optimized code.
Thanks, I got it to work and at least I made a (for the time being, not very tidy) sprite 0 hit loop to wait for the end of the prerender line in and then for the sprite 0 hit, since I don't need any kind of screen split. It looks fairly consistent, but every so often a huge peak occurs, it's good to know, hopefully I'll see the improvement as I'm trying to hunt down badly optimised routines. Don't mind the periodically occuring test sound effect.
EDIT: I was relying on the FceuX OLD PPU startup state with the pattern tables, so it would not work on most emulators, I fixed that.
If each character is 16x32 pixels, each cel of animation is 8 tiles. This means you can easily upload one cel to VRAM per vblank without extending vblank, so long as you don't have all characters changing their cel at the same time. For example, you can animate five characters independently at the Disney-standard 12 fps.
I have read around, now that I've also got the controller working, and can fill CHR-RAM at a tile-level, I've asked my friend who could put together carts for me if this actually becomes a thing... and apparently I could have 256k EPROMs available, I'm not sure though if the UNROM donors natively support bankswitching a 256k ROM due to the 4-bit latch, or they do not even physically have that bit, leaving UOROM donors to work with without replacing the mapper logic to support 16 banks instead of 8.
Discrete logic boards are so very simple that you should seriously look into making them new instead of reworking donors.
That said, all UxROM boards always have a 74'161 and a 74'32 and could be modified into supporting 256KiB by rewiring four lines. And even if you do start with a donor, your field of options is a lot larger; you can add a 74'161 and 74'32 (since both should be quite cheap) to basically anything with CHR-RAM (and a CIC if you're in NES-land instead of Famiclone-land).
(Also, make sure you got 256 KiB of EPROM or EEPROM, not 256 Kibit)
Ok I've been making my new system functions quite confidently. But now that I'm getting around to designig my metatile sytem for the background I need to ask this: What are the pros and cons of different data formats, having hard coded collison maps vs. Generated ones in RAM. Also what kinds of data sizes should be expected when planning my data budget. Currently I have one fixed bank, one sound bank, two graphics banks for bg and sprites, and another one with 4kB tile animations, palette data and any common routines that don't have to take space from my fixed bank.
So I've been thinking, and for a cutscene I will probably want to show a scrolling background (just with the two nametables repeating over and over again) and show text at the same time, so I'd have to split the screen with a sprite 0 hit. But when do I need to do that? I read that accessing $2000 during rendering with vertical mirroring can cause problems, or is that completely gone if I turn rendering off? Getting a few lines of blackness would be fine because the text field would be a black rectangle at the bottom anyway. Or do I need to time setting the new scroll in HBlank? If so, which PPU cycles are actually during HBlank? From 257 and onward or what?
And I've also been thinking about my own way of detecting the system region. Is it a good idea to select a one-shot NMI handler for the first NMI which then does a loop to burn more CPU cycles than the length of an NTSC VBlank, but less cycles than a PAL VBlank, and then check if the VBlank flag has been cleared or not?
Quote:
I read that accessing $2000 during rendering with vertical mirroring can cause problems, or is that completely gone if I turn rendering off?
I wouldn't worry about that too much. This bug was only discovered quite recently, and it only happens sometimes, and in the worst case you'll just get 1 glitchy scanline before the scrolls catch up again with the value you wrote. If you write to $2000 near the end of the scanline, you'll have nothing to worry about.
On the other hand, turning the rendering off and on again is more complex and more sensitive to bugs, gotchas or NTSC/PAL differences.
Writing to $2000 and $2005 to change the horizontal scrolling is (in my opinion) the simplest split-screen effect you can do.
Quote:
If so, which PPU cycles are actually during HBlank? From 257 and onward or what?
The naming of cycles within a scanlines are arbitrary and if I remember well there is different conventions. Using Nintendulator if I remember well HBlank is between 256 and 341. In all cases, the best is to do it by trial and error (adding or removing nops before the register writes), using an accurate emulator such as Nestopia and Nintendulator, and then verify on real hardware (if you're more patient you could test directly on hardware).
za909 wrote:
And I've also been thinking about my own way of detecting the system region. Is it a good idea to select a one-shot NMI handler for the first NMI which then does a loop to burn more CPU cycles than the length of an NTSC VBlank, but less cycles than a PAL VBlank, and then check if the VBlank flag has been cleared or not?
Here's the code I use to distinguish among NTSC, PAL NES, and Dendy systems.
za909 wrote:
And I've also been thinking about my own way of detecting the system region. Is it a good idea to select a one-shot NMI handler for the first NMI which then does a loop to burn more CPU cycles than the length of an NTSC VBlank, but less cycles than a PAL VBlank, and then check if the VBlank flag has been cleared or not?
No need to use NMIs for this.
Here's how I do it. I like tepples' method too.
za909 wrote:
I read that accessing $2000 during rendering with vertical mirroring can cause problems
This
only happens if you write to $2000 at the exact start of hblank. If you write to it during ANY other pixel on the line it's perfectly fine. With a sprite-0 hit it's extremely easy to avoid writing at the wrong time.
The reason it's a problem for SMB is that it is
not timing its write to $2000. It's using it to turn the NMI back on when the game logic (of variable, untimed length) completes.
Also, 2/3 of the time, the PPU will start up in an alignment that makes the glitch impossible to trigger. There are 3 PPU cycles per CPU cycle, so there's a 2/3 chance that on reset the PPU ends up aligned so the start of hblank is in between CPU cycles, where it will never be interfered with. (Edit: correction,
should have said 3/4, it's a different kind of alignment than I was thinking.)
rainwarrior wrote:
Also, 2/3 of the time, the PPU will start up in an alignment that makes the glitch impossible to trigger. There are 3 PPU cycles per CPU cycle, so there's a 2/3 chance that on reset the PPU ends up aligned so the start of hblank is in between CPU cycles, where it will never be interfered with.
I thought it was more related to the fact that there are 4 master clocks per PPU cycle and 12 per CPU cycle. The CPU can access any dot alignment to the PPU; in fact the alignment it shifts by 4 master clocks after each scanline. It just can't access fractions of a dot.
Oh, so there's an other
other form of PPU alignment? Hmm. Okay, well then I should have said 3/4 instead of 2/3, I guess.
Here's the thread where it was discovered and discussed:
Random glitchy line in Super Mario Bros. on real hardware?Edit: found a thread explaining the PPU alignments:
CPU - PPU clock alignment (this info maybe needs to migrate to the wiki)
So it's been a while, and things are going pretty well, I'm slowly getting my game loop system in order, but the thing is, I'm already quite "scared" of running out of resources, especially that whenever I have routines that at some point, do bankswitching (to call a sound effect or something), I have to put in the fixed bank. And this fixed bank is also supposed to contain the "script" of my game, but what if that 16kB bank is eaten up at some point?
One temporary solution I came up with is putting the called sound id in a variable and calling the sound in the last bank after the routine that wanted the sound is over, and returns to the last bank, so this would allow me to move some code to another bank. But it only works with my options menu because it only bankswitches for the menu sound effects, I could not do this for more complex things.
What's the right way to go about this?
You should be able to put anything in a bank, really. Other than DPCM samples, there's very little that has to stay banked-in all the time.
rainwarrior wrote:
Other than DPCM samples, there's very little that has to stay banked-in all the time.
That and the reset stub, and the IRQ and NMI handlers when IRQ or NMI is enabled.
Well, if you want to get pedantic about it, no, none of those need to be always banked-in. They could all go in RAM, for example.
(This is kind of a joke, but I did release a
cart that does this.)
More practically, though, in my game I found it useful to just have a stub NMI routine that bankswitches to call the main NMI handler. This saved some space where it was a bit critical.
If you want high level advice about how to organize your banks for switching, what I recommend is to divide up your banks by function. Put the screen loading code, along with the screen data, in its own bank together. Put the music and sound data and player code together in a bank. Put all your sprite data and sprite rendering code together in a bank. Et cetera.
When you do need to do inter-bank calls, you can facilitate this with little trampoline routines in your fixed bank. Something along the lines of:
Code:
; in fixed bank
play_sound:
lda current_bank ; every bank should store its ID at a fixed address
pha
lda #SOUND_BANK
sta $8000 ; bankswitch to sound bank
jsr play_sound_internal
pla
sta $8000 ; bankswitch back to previous bank
rts
rainwarrior wrote:
Well, if you want to get pedantic about it, no, none of those need to be always banked-in. They could all go in RAM, for example.
Yeah I thought that would enable a huge amount of usable DPCM samples for games, it just needs extra ram and a mapper capable of switching in the $C000-$FFFF range. I guess that could be another way to utilize the MMC5 ExRAM.
And turns out I had a subroutine to call the sfx init like that but never used it... so yeah that is the right way. I still need to clean up my macros related to loading CHR-RAM because those all need to switch to the selected graphics bank.
I'm still shocked that I tried finding 256kB EPROMs but I could only ever find 256kilobit ones... Well I guess NROM projects are quite easy to get the hardware for...
za909 wrote:
I'm still shocked that I tried finding 256kB EPROMs but I could only ever find 256kilobit ones... Well I guess NROM projects are quite easy to get the hardware for...
You might be looking for the wrong part number? But I'd recommend using FLASH nowadays anyway, such as the SST39SF020
za909 wrote:
I'm still shocked that I tried finding 256kB EPROMs but I could only ever find 256kilobit ones... Well I guess NROM projects are quite easy to get the hardware for...
I'm surprised that you can't find 256k EPROMS. Do you know that the 8-bit EPROM naming convention changes when 1 Megabit (128k) is reached?
32k = 27C256
64k = 27C512
128k = 27C010
256k = 27C020
etc.
rainwarrior wrote:
I'm surprised that you can't find 256k EPROMS.
Yeah, 128 and 256KB used to be some of the most common sizes a while ago... The rarest were definitely the smaller ones, anything below 32KB was nearly impossible to find. Good thing that you can use partially filled bigger chips by grounding address lines or replicating the content over and over.
So... I wanted to save an extra 32 bytes by using 1-byte looped DPCM ramps instead of the 17-byte unlooped ones, but I got a weird behavior, where the amplitude changes in APU2 didn't happen fast enough or something... is there really something different going on when the loop flag is set?
Also, since I've relocated all my game loop code to a switchable bank, I can use much more space for samples, and I'd like to experiment with custom loop-points for melodic samples like a choir or a guitar by using IRQ to start the looped section of the sample when the one-shot part has finished. Now NMI is probably going to interfere every once in a while, but can it also COMPLETELY make the CPU forget about the asserted IRQ and thus, never starting the sample I want? (If IRQ and NMI happen at the exact same moment)
Better way to organize game loops?
I've finally come up with a system I can make use of universally, but I'd be interested to hear what other ways there are (This post and answers to it could be split into a new thread actually, so if someone's looking for game loop systems it's an easier find or something)
There are parts of it I suppose could be made more efficient but are they really better that way?
So this is my main loop:
Code:
WaitForNMI:
lda vblankready
beq WaitForNMI
lda #$00
sta vblankready
inc fpsregulator
; 50 FPS regulator for NTSC
lda systemregion
bmi RunThisFrame ; if systemregion bit 7 is set, skip the skipping for PAL
lda fpsregulator
cmp #$05
bcc RunThisFrame ; less than 5, don't skip first 5 frames
lda #$00
sta fpsregulator ; reset this
jmp MAINLOOPEND ; skip frame
RunThisFrame:
; Run the current game loop
inc randomnum+0 ; frame counter for randomness
ldy #$03
jsr SwitchBanksSafe
CommonLoop:
jmp (commonloopvector+0)
GameLoop:
jmp (gameloopvector+0)
MAINLOOPEND:
; All processes have been completed, waiting for next NMI
jsr draw_timing_stripe ; CPU usage graphic
lda #$01
sta gamelogicready
jmp WaitForNMI
CommonLoop is used to skip having to include the same jsr-s in every game loop such as calling the music engine, reading the controller, etc. and I still get to decide what runs and when it does by setting the vector to another set of calls.
But can these calls be packed better? (pushing their address-1 to the stack?)
Code:
RUNCommonLoop0:
; Music, Controller, Fade engine, Random generator, Scrolling
; Animations
jsr CallDoScrolling
jsr CallController
jsr CallAnimation
jsr CallFadeEngine
jsr CallRandom
jsr CallPlaySound
jmp GameLoop ; end common routines
I have noticed that my CPU usage has slightly increased, and all these jsrs might be the culprit, because these all have to switch banks from the fixed bank
Code:
CallInitSound:
; Inits the song or sfx in the low 7 bits of A
pha
ldy #$0E ; switch to sound bank
jsr SwitchBanksSafe
pla
jmp InitSound
CallFadeEngine:
; Fade palette if requested
ldy #$02
jsr SwitchBanksSafe
jmp FadePalette
CallRandom:
; Generate a pseudo-random 16-bit value
ldy #$02
jsr SwitchBanksSafe
jmp RandomGenerator
And all these Call routines, are they realy better like this or unified with a jump table?
Thanks for any responses, this is getting more and more exciting!
So I have to make this a bumped triple post... (at least it started a new page)
But is it a good time to read the controller at the end of NMI (which is not using all its time anyway) ?
I now use DMC IRQs for raster splits and I launch the sample at the prerender line, and disable the DMC during NMI, so hopefully there's no conflict and therefore no reason to do the comparison between two controller reads?
Read the controller in the same thread in which you run game logic. If you run game logic in NMI after uploading VRAM changes, as Super Mario Bros. does, read the controller in NMI. If you run game logic in the main thread, read the controller in the main thread. This way, a single pass through the game loop won't see two different values for the controller.
Thanks, I got it to work, and the DMC IRQ as well (moved to the start of NMI after saving the registers), the thing is though, it seems to have a really instable timing to it, and I don't know if it is to be expected or I'm doing something incorrectly because there's a 0 to 3 scanline error up and down in its accuracy. I could hide it though because there's two rows of blank tiles around the intended split point:
This could get problematic during gameplay use if I want multiple scroll layers, as I won't be able to do accurate waits in an IRQ to get to the exact scanline I want.
Because the 0 to 3 scanline delay is constant within any single frame, you can measure this delay and use it later to compensate for the delay.
- Each frame, trigger a DMC IRQ at a known time, such as when the sprite 0 or sprite overflow flag turns off or on. You could even trigger it based on NMI if you don't have much to upload to VRAM.
- Count how long until the IRQ occurs. Save this as t.
- Trigger another DMC IRQ.
- Count to k - t, where k is a constant.
Thanks, that got me somewhere, though the behavior seems to be different in every single emulator I tried.
In fceuX I get the effect perfect (too perfect in fact because there is no jitter at all) but the measurement gives results all over the place. Even though I wait for line -1 , start the sample immediately (lenght 0, rate f), get out of NMI and start polling $4015, it still takes 3300-3500 cycles somehow to get the measurement IRQ. Nintendulator on the other hand shows no scrolling effect at all, but the measurements are in the expected range!
So I'm really clueless now as to what is causing the problem.
This is my measurement code right after exiting NMI:
Code:
; IRQ sync measurement
sei
IRQmeasure: ; poll $4015
lda APU_STATUS ; 4
bmi @result ; 2
inc IRQcount+0 ; count more ; 5
jmp IRQmeasure ; 3
; this loop is 14 cycles
@result:
lda #31 ; the approx. number of loops in 432 cycles
sec
sbc IRQcount+0
sta IRQcount+1 ; save amount of delay time
cli
And then the IRQ handler waits like so:
Code:
; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
; delay the IRQ handling by the time required
; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ldy IRQid ; which IRQ is this?
beq @dontwait
ldx IRQcount+1
; might have to subtract the time of the following code?
@loop:
bit temp+0 ;3
bit temp+0 ;3
bit temp+0 ;3
dex ; 2
bne @loop ;3
; this loop is 14 cycles
; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
; launch next IRQ now
; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@dontwait:
; ...
EDIT:I needed to stop the runaway DMC at the start of NMI, because to get away with less code I start a long sample more than enough to keep any more IRQs from happening, from the last IRQ of the frame, and this fixed most of the randomness of the measurement. But every so often my measurement sample takes ~64 lines to end for some reason, and during those lines the CPU keeps polling $4015 so I lose CPU time for a random result which overflowed multiple times anyway.
It seems that I'm misinterpreting the entire process here... I could be close or really far here, and if I can't make it work I'll just have it as it is. Don't get, don't have it.
So I have tried to set it up in numerous ways with the cycle wasting at different points:
NMI:
- At the start of NMI, launch a rate F DMC, measure by polling $4015
- Save the result, then calculate time to waste (number of loops in 432 cycles - result)
- Immediately start a new DMC, wait by polling $4015
- Waste 432-result cycles (should I do it here?)
- Update VRAM
- At line -1 start the DMC again (should I waste cycles here?), which should be used for splits
IRQ:
- Decrement number of IRQs left before the split
(should I waste cycles here?)
- Restart next DMC
- If IRQs left is not 0, rti
(should I waste cycles here?)
- If it's 0, do the split
I've dug up this thread to avoid cluttering the forum more. I'm currently at the point where I need to run the AI of my objects. There are 32 slots for objects, and there is an automated system before this to spawn objects on a timer, with conditional spawns and other "effects" similar to a music engine. Then every existing object is moved using speed variables, and then the speed variables are modified by acceleration constants. If something needs to be moved by its own AI instead I can give the object an initial speed of 0 and 0 acceleration. But in what order should I run the AIs? The player ($00) always occupies the first slot, and I might save the next few slots for the player bullets (this is what Megaman games do). Or does it really matter? And should collision checks be intertwined with this process? The only things I need to check are: player bullets vs. enemies, player vs. everything else except player bullets.
za909 wrote:
But in what order should I run the AIs?
I think this probably depends a lot on your game structure.
In my current project, I separate players, enemies, player bullets, and enemy bullets into different loops, so that each of them can be as minimal as possible. The workload for each is quite different. Bullets don't need to detect background detections or run through metasprite engines.
The logic order has a lot to do with this separation as well. There are various different ways I could have grouped players and enemies together, but I wanted things to happen in a certain order to maximize the processor frame. Players always move first to give player control top priority over anything. I plan to make the game quite hard but I don't want control to ever be responsible for that. So I do, player, then I do player bullets, which eliminate any enemies before their own logic. So, the player also actually has an extra frame to respond compared to enemies. Overall this won't really impact the difficulty or be particularly noticeable, but I don't want the player to ever feel the game didn't respond properly.
Then before enemies, I do enemy bullets. The main reason for that order, is so that bullets will never need to be spawned and execute logic in the same frame. Spawning a bullet isn't a particularly small task, but eliminating the logic of one bullet for every one spawned makes it even out. Since the bullet doesn't move, the enemy basically has a point blank shot in that one single frame. I'll probably tie a muzzle flash sprite to the enemy behavior for that first frame, so you'll at least have a visual indication if you get hit.
Anyway, I don't expect that there's any standard at all. I imagine it's all what works best for your game. All I know to do is share my recent thoughts on the matter for my project and hope it helps. You may need something totally different.
Quote:
And should collision checks be intertwined with this process?
Definitely.
If you're talking about collision checks with backgrounds, you need to do them after you move the object, and before you finalize its position. The collision check might force a change in object position, and if that's done at a different time, then you would have been executing object logic from an errant position.
If you're talking about collision checks with objects, you're going to need to adjust your object accordingly. Maybe you kill the object right then. Most likely the object goes into a death state for a few frames and probably even changes speed, so you'll need to apply this before moving the object.
IMO, the main thing that complicates the order of object updates is when objects are able to affect other objects' positions. Things like moving platforms, moving walls, items that can be carried, and so on. Ideally, objects that carry others, for example, would be updated first, and each object that can be carried would then copy the motion from the previously updated object (i.e. apply the same X and Y displacements). This can get a little complicated for objects that can mutually affect each other's positions.
One thing I'm particularly against is having one object deliberately manipulate another object, doing things like changing its position, its state, or anything else. Only the object itself should be allowed to modify its own attributes, to make sure this is done in a consistent and safe way. This means that an enemy's A.I. shouldn't be directly modifying the player's health or changing its state to "dead"... it should merely inform the player about their collision and let the player handle the consequences.
This may seem obvious, but many games have problems with these things. If I'm not mistaken, moving platforms in the Sonic games have flags indicating which characters are standing on them (Sonic, Tails, and/or Knuckles), probably so the platforms can displace the characters... It works, I guess, but it's not a very good model. The platforms can't even be universally solid, because only a few select objects are able to ride on them.
Anyway, think carefully about the types of interactions you're gonna need between objects, and the optimal order handle these interactions. For example, you don't want to update the player first, and then when an enemy collides with it go back to the player's A.I. to handle this collision. In many cases, a 1-frame delay for collision responses isn't such a big deal. It is a big deal when objects can carry/push or be carried/pushed by other objects though, because their movement will look very jerky.
Yes I'm mostly worried about problems that could arise due to these 1-frame delays, like things phasing through other objects. The background collision is handled very primitively (just a pair of Y coordinates for each stage the player is not allowed to cross) so I only need to do object vs. object stuff. I'll definitely have to do something about enemies spawning bullets though. Not sure how that'll work out, but I might separate them from regular objects... or it could be more standardized if I could allocate object slots (with $FF or something special) for the bullets of the parent object down the line so that the parent can spawn them but they will be placed in slots that come before the parent slot, so they can't have their AI run in the same frame they were crated. Thank you, I guess I'll be able to figure something out.
za909 wrote:
so they can't have their AI run in the same frame they were crated. Thank you, I guess I'll be able to figure something out.
This might not be a serious concern for you. If your game doesn't have a lot of bullets, and if those bullets don't have to aim in the direction of the player, then I don't think you need to structure your logic around it.
The main point that I wanted to make is that how you structure your logic is going to depend a lot on the design aspects of your game.
Quote:
Yes I'm mostly worried about problems that could arise due to these 1-frame delays, like things phasing through other objects.
With this, I think a bigger concern is the speed of your objects and their hitbox sizes, and how you handle the collisions. If you check every viable set of collisions every frame, and no object is fast enough to pass through another, then you won't miss a collision. If you check the collision at the end of a frame after an object has already been handled and placed in the OAM buffer, the important thing would be to make sure that the object executes the proper logic for having been in a collision at the next time it's handled.
You could avoid this being an issue by moving every object which will be involved in a collision before checking it. I did it my way for speed because I'm supporting a lot of bullets. And since player bullets execute before enemy objects, I arranged it so that only the player actually has this one frame delay. Essentially this means that a player could evade damage from a collision which would have happened, or fire a bullet on the same frame that they're killed. To me I feel like this is less of a "delay" and more of giving priority to responsive control.
From what I've heard described of your game, honestly though, I think the biggest concern with how you structure this might be what is the easiest code to write and maintain. I doubt the 1-screen game will max out the processor unless it's a complex 1-screen game. It could be. But if you don't need to worry about speed, I wouldn't.
You'll undoubtedly need portions of the object logic that are different for different objects. Enemies won't read the controller (unless your AI outputs a mock 'controller' byte), players and bullets won't have AI, different enemies work differently, etc. Is it more logical for you, and for your game design, to accomplish providing these different objects with different routines via some sort of conditional branching, or by creating different loops for different object types? What's easier and more clear for you to code and create changes to?
This thread has brought me to reconsider the way I was structuring some things and I feel that I've probably given you somewhat bad advice.
The issue with testing two objects before both has been moved is that it creates a variance between the positions of the objects displayed in the game, and the way the game itself is processing collisions. It won't miss collisions in the sense of one object passing through another, but it may miss collisions relative to what's shown on the screen.
Object 0 is in frame 0 position and moves to frame 1 position, then tests collisions against object 1's frame 0 position
Object 1 moves from frame 0 position to frame 1 position
Frame 1 position of both objects are updated to vRAM
So it's possible that two objects may show a collision on the screen and never intersect by the way the game is perceiving time, in a manner of speaking.
After consideration, I think that all collisions need to be tested from their states in the same frame. I don't see why it would matter if this is before or after movement as long as it's the same for all. Both the positions going in and coming out should be stable. This is a key point though to your main question about structuring the logic and collisions along with object movement. The only time that an object's position should be potentially unstable is during its own movement loop, and it should be corrected through background collision detection during that loop. Even a simple bounding box collision should have these tests as part of the movement. Otherwise your positions may be unstable for larger sections of your logic. Object collisions should be executed either before all objects involved have been moved or after all objects have been moved and corrected.
Testing for collisions BEFORE moving the objects would mean detecting collisions AFTER they've already been displayed on the screen, which would result in jitter if two solid objects were pressing against each other.
Perhaps it means calculate all objects' new positions, find pairs of objects whose new positions collide, and then resolve those objects' collisions.