APU I/O ports, TV cycles, Cx4 and other miscellaneous things

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic

APU I/O ports, TV cycles, Cx4 and other miscellaneous things
by Gekko on 2009-07-01 (#48476)

1) Every document I've come across that deals with the APU I/O ports go over the SNES side briefly, not giving me enough information on what format I should upload the data in, and normally, how I should upload it. (I do have the latter information now but I'm unsure if I'm thinking of it properly)
As such, I want to know what the format the data is uploaded in or preferably a clear explanation of what you are doing at any step, as opposed to merely saying "copy a byte here" or whatever. An example of what I don't want is here.
2) This is simply statistical. Anyway, I want to know how long the following take, in terms of FastROM master clock cycles, although machine cycles and non-FastROM (SlowROM?) would be acceptable too.
Anyway, the things I want this data for, are: V-blank, H-blank and the difference between H-blanks (time to go from the beginning of a scanline to the end of it).

Any help would be appreciated, greatly!

by Memblers on 2009-07-01 (#48479)

1. I found the explanations to be confusing, too. There's nothing special about the data format, but what was helpful to me was just to disassemble the internal ROM of the SPC. It's only 64 bytes I think, you can extract it from any .SPC file. I found the code much easier to follow than any explanations of the loading process. You only need to use that once, then you can communicate any way you want (from within your loaded code).

by Gekko on 2009-07-01 (#48482)

If I understand your post properly, you think I'm talking about the SPC700 data as opposed to the format that the SNES stores it in. (Correct me if I'm wrong.) In the SNES Development manual, I got the following information before hand:
1) The first word is the length of the block
2) The second word is the address of the block
3) The rest is the SPC700 data.
What I want to know is:
1) What is that address for? Is it for where the SPC700 puts it in its RAM/cache/whatever it is? (I'm not well versed in SPC700. In fact, I'm still learning it. It's just I'm asking for information on the SNES side, not the SPC700 side).
2) How is the SPC700 data sent? I'm pretty sure it would just be on word at a time but would that be reversed byte order (little-endian) or standard order (big-endian).
In addition, I want to know what exactly you are inputing into the APU I/O ports.
Sorry for not being clear enough before hand! (And not realizing what exactly it was that I wanted)

by Memblers on 2009-07-01 (#48484)

I don't remember about the endien-ness. It's been years since I've used the SPC. I don't know why, but I can't find that part of my source code (just a broken commented out one). If that code was right, the data is sent as bytes. And you're right about the address.

by Memblers on 2009-07-01 (#48485)

oh nevermind, I found what I did. i ripped someone else's code.

Code:
;Sound.ASM
;
;Support routines for the SPC700
;(C) 1999 Realtime Simulations and Roleplaying Games
;
;Grog's worst nightmares come true with this bloody CPU

InitSoundCPU:
phk
plb
php
rep #$30
sep #$20
.mem 8
.index 16

ldx #$0400 ;Target SPC address for program
stx $2142
ldx #$0000
lda #$01
sta $2141
lda #$CC
sta $2140
- cmp $2140 ;Wait for SPC to sync
bne -

SoundSendLoop:
lda spcprogg,X
sta $2141 ;Set the address
txa
sta $2140 ;Set the data
- cmp $2140
bne - ;Wait for SPC to sync
inx
cpx #spcend-spcprogg ;Check for last data byte
bne SoundSendLoop

stz $2141 ;Mark end of data
ldy #$0400 ;Set starting address of SPC code
sty $2142
inx
inx
txa
sta $2140 ;Tell SPC to begin executing its program

plp
rtl

spcprogg:
.incbin "SPC.obj"
;.incbin "FLUTEC4.BRR"
;.incbin "MOO.BRR"
spcend:
.dcb $FF

by Gekko on 2009-07-01 (#48490)

O.K., a few things.
1) While this is definitely easier to understand than the other routines I've seen, I am going more for an actual understanding of the ports. It doesn't matter much but it might be nice. (After all, having a routine isn't like knowing what the routine does.)
2) Why did you never check $2140 to be #$AA and $2141 to be #$BB? I recall that being the SPC700's signal that it is ready. Do you assume that someone already did that?
3) What is spcend-spcprogg?
4) Just in case you were wondering, little-endian reverses the byte order. (so the highest address would store the highest byte and the lowest address would store the lowest byte) while big-endian is what we would expect, namely the lowest byte is at the highest address and the highest byte would be stored at the lowest address.

Yet again, I apologize for a lack of clarity and any such examples to be shown in the future.

by Memblers on 2009-07-02 (#48508)

1. IIRC, you can only write 8-bits to the port at a time. I'm not sure what else I can add. I couldn't locate the disassembly of the SPC's ROM. I did have another routine in my SNES program, to transfer data to it every frame (and definitely not using the internal loader thing), I could dig that out if that'd be useful.

2. I didn't write that particular routine, but the one I did write did check for #$AA and #$BB like the docs say, but it apparently didn't work because I commented it all out. Seems it can be ignored. I'm guessing it may be for when games want to re-use that loader routine later (for further data, samples, etc.).

3. The "spc.obj" file there is a binary, it's the whole SPC program/data, assembled with TASM (table assembler). spcend-spcprogg lets it calculate the filesize.

by Gekko on 2009-07-02 (#48510)

1) Well, don't worry about it. I looked over the places I previously found information and that helped me understand even further.

2) What do you mean? Do you mean sound effects or different music? Also, should I add a
Code:
LDA #$BBAA
CMP $2140
BEQ $02 ;Skip the PLP and RTL or RTS
PLP
RTL ;or RTS

between the REP #$30 and the SEP #$20?

3) That clears that one up!

Another thing, though: It just occurred to me that that routine sets the address manually. Does this mean that if I were to use multiple sounds, (as in sound effects and music at the same time) I would want to use indexed pointers, for example?

Also, thank you for your great help!

by Memblers on 2009-07-02 (#48512)

2. I mean music, code, basically anything.

For doing music/sfx at the same time, really you'll want all the sample data to be preloaded in SPC's RAM. Then have some variables that you transfer to the SPC every frame, to trigger the sound effects. I suppose you could reuse that loader thing to transfer a few variables, but it seems kinda slow and complicated.

The debugger in old ZSNES was really helpful when I was working on communications since it could step through the code for both CPUs at the same time.

Gekko wrote:
ean? Do you mean sound effects or different music? Also, should I add a
Code:
LDA #$BBAA
CMP $2140
BEQ $02 ;Skip the PLP and RTL or RTS
PLP
RTL ;or RTS

between the REP #$30 and the SEP #$20?

I'd highly suspect a 16-bit access won't work, as the port is probably an 8-bit link between the chips.

by Gekko on 2009-07-02 (#48515)

Memblers wrote:
I'd highly suspect a 16-bit access won't work, as the port is probably an 8-bit link between the chips.

Actually, I've seen it before, even in a commercial games.

Anyway, thank you for taking so much time to answer my questions!

If anyone could answer my other question, I would be thankful as well.

by koitsu on 2009-07-02 (#48519)

Memblers wrote:
Gekko wrote:
Also, should I add a
Code:
LDA #$BBAA
CMP $2140
BEQ $02 ;Skip the PLP and RTL or RTS
PLP
RTL ;or RTS

between the REP #$30 and the SEP #$20?

I'd highly suspect a 16-bit access won't work, as the port is probably an 8-bit link between the chips.

The CMP in question would actually read 8 bits from $2140 (APU_PORT_0) and $2141 (APU_PORT_1). So the ""16-bit read"" method should work just fine. The official developers manual even uses to this method in example SPC bootloader code; see Section D.4 (of manual revision A, dated 1992/05/01).

And yes, you should wait for APU_PORT_0==$AA and APU_PORT_1==$BB before doing transferring any data to/from the SPC.

by Memblers on 2009-07-03 (#48524)

koitsu wrote:
And yes, you should wait for APU_PORT_0==$AA and APU_PORT_1==$BB before doing transferring any data to/from the SPC.

A couple guys built my NSF player on a cartridge, and it worked. Looks like I run InitSoundCPU somewhat early in reset, too (after RAM/VRAM clearing). So maybe I just got lucky that time. SPC loading was quite an ordeal at first, so I'm glad if I can help anyone else with that.

Thanks for clearing up my guesses, probably would make more sense if I referred to docs instead of my old source code and memories.

by byuu on 2009-07-03 (#48531)

It doesn't take the S-SMP long to clear 240 bytes of stack and signal to the S-CPU that it's ready. Clearing all of WRAM / VRAM gives you plenty of time for it to finish its setup process.

Still, it's good advice. For a one-time check, especially for a library routine, it's best to just add the cmp #$bbaa. Never know when someone might call it right off the bat.

by Gekko on 2009-07-10 (#48764)

I already posted two unrelated questions, so why not more?
3) This is a Cx4 question. (or set) What is the purpose (or what do you store to, and in what format) of the registers that are labeled MSBs/MSB of the above in this documentation? For example, how does this work: (command $00, subcommand $03)
$7f8c: Height
$7f8d-e: ??? (MSBs of above?)
In addition, for that same command, what about the format of $7F83-5 and $7F86-8? Are they bb.bbbb?
EDIT: It's the most significant byte/bytes for the value, not of the value.
4) Does DMA actually stop the CPU from reading off more instructions (like using the DSP-1 would) or can the CPU still go on reading instructions during DMA?
5) This isn't a question as much as the others but I've noticed that many places reference things like SPC7110 information floating around. Since I want to know about all of the chips, where can I find information on the following chips:
DSP-2
DSP-3
DSP-4
OBC-1
S-DD1
S-RTC
SPC7110
MX15001TFC
ST010
I already know of this place but I don't understand C, except for the parts that English enough. (And the if, then, else things)

by koitsu on 2009-07-11 (#48769)

Gekko wrote:
4) Does DMA actually stop the CPU from reading off more instructions (like using the DSP-1 would) or can the CPU still go on reading instructions during DMA?

Yes. Once you enable a DMA transfer via $420B, the transfer takes place and the main CPU is essentially "halted" until the transfer finishes.

E.g.:

Code:
...

$8004: LDA #$01 ; Enable channel 0
$8006: STA $420B ; Begin DMA transfer
;
; CPU is stalled/held until the DMA transfer is completed.
; Below instructions won't occur until the DMA transfer has
; completed.
;
$8009: REP #$10
$800B: LDX #$1234

...

by Gekko on 2009-07-13 (#48802)

For #2..., I have the following estimations/exact numbers in terms of master clock cycles. (I figured them out from the numbers here.)
Scanline: 1024
H-blank: 340
V-blank: 586 (31958 on 256x239 and 52418 on 256x224)
Correct me if I'm wrong, of course.

by byuu on 2009-07-13 (#48820)

All scanlines are 1364 cycles long, with one exception:
NTSC-only with interlace off on scanline 240 on an odd field ($213f.d7=1) is only 1360 clock cycles. It has something to do with the NTSC color burst, I don't know exactly, but it really is short, crazy as it sounds.

Also, 40 clocks from every scanline are spent on DRAM refresh, so you effectively can only execute 1324/1320 clocks per scanline.

DRAM refresh occurs at H=530 on CPU revision 1, and it alternates between (starting at) H=538 and H=534 on each subsequent scanline (except that scanline that's only 1360 clocks, it's the same on that scanline as the scanline after that). DRAM refresh will supercede everything on the S-CPU side, even DMA.

NTSC has 262 scanlines per field, PAL has 312. Interlace even fields have 263 or 313 scanlines, respectively.

I know that (1364*262*60)-4 != exactly 315/88*1,000,000 (eg 21.477MHz). It's apparently close enough for TVs to display the image though.

Like I said, I'd really appreciate it if someone could run an oscilloscope on the S-CPUs of both NTSC and PAL consoles.

by tepples on 2009-07-13 (#48827)

byuu wrote:
All scanlines are 1364 cycles long, with one exception:
NTSC-only with interlace off on scanline 240 on an odd field ($213f.d7=1) is only 1360 clock cycles. It has something to do with the NTSC color burst, I don't know exactly, but it really is short, crazy as it sounds.

The NTSC NES has similar behavior. Ordinarily, the PPU runs four master clock cycles per dot and 341 dots per scanline. But on every other field, if rendering is turned on, it skips one dot near the end of the "pre-render" scanline (y=-1). This dot is not skipped in games like Battletoads that leave rendering disabled during the pre-render scanline.

by koitsu on 2009-07-13 (#48830)

Re: cycles and timing: I posted some details in the below thread which might be helpful, but byuu's mostly covered them. :)

http://nesdev.com/bbs/viewtopic.php?t=5367&start=15

by byuu on 2009-07-13 (#48832)

Quote:
This dot is not skipped in games like Battletoads that leave rendering disabled during the pre-render scanline.

Since you mentioned that, I should add that I've tested to see if that was the case with the SNES as well. It is not, that dot is always skipped no matter what. And to get more complicated, the two 'long dots' (6 clock cycles each instead of 4) of the PPU counters aren't 'long' anymore, so you still see dots 0-340 when you latch the counters. IRQs are based off the S-CPU's own counter, so they of course don't have any understanding of the long dots. Isn't the SNES fun? ;)

I believe blargg mentioned that on the NES Battletoads, that it messed with the 3-scanline 'dot crawl' effect of the video output. I could be mistaken, however.

Quote:
Number of clock cycles per pixel: MODEs 5,6 == 2 cycles, MODEs 1-4,7 == 4 cycles

I see what you're going for, but I'm not sure that's what's really happening.

My belief is that both normal and hires (and indeed pseudo-hires) are one and the same. The differences are just in the frequency of tile fetches, and how the main and sub screens blend together to produce the output.

I'm afraid to test, but my theory is that it's possible to toggle pseudo-hires mid-scanline. If so, that will pretty much force internal rendering to happen at 512-width for the sake of all that is sane. Unless of course you want to write an HQ2x filter that can blend a screen where every single line is potentially a different width ;)

Of course, it's largely pointless to know for sure since it's theoretical. The analog output of the chip cannot be tested or emulated, at least to the extent that there is absolutely no reason to do so, as the difference cannot be observed by an SNES program.

by Gekko on 2009-07-14 (#48836)

I thought that the pixels outside of the visible scanline counted as H-blank and the scanlines outside of the actual viewing range were counted as V-blank. Again, correct me if I'm wrong. (I forgot about DRAM refresh. I'm assuming that it would would cut 40 cycles out of H-blank and V-blank, though.) If I'm wrong, though, I think that would leave H-blank and V-blank with very few cycles.

Also, another question:
6) What is the clock speed of the Cx4 chip? How many cycles does it take to use each instruction?

by koitsu on 2009-07-14 (#48838)

byuu wrote:
Quote:
Number of clock cycles per pixel: MODEs 5,6 == 2 cycles, MODEs 1-4,7 == 4 cycles

I see what you're going for, but I'm not sure that's what's really happening.

My belief is that both normal and hires (and indeed pseudo-hires) are one and the same. The differences are just in the frequency of tile fetches, and how the main and sub screens blend together to produce the output.

I'm afraid to test, but my theory is that it's possible to toggle pseudo-hires mid-scanline. If so, that will pretty much force internal rendering to happen at 512-width for the sake of all that is sane. Unless of course you want to write an HQ2x filter that can blend a screen where every single line is potentially a different width ;)

Of course, it's largely pointless to know for sure since it's theoretical. The analog output of the chip cannot be tested or emulated, at least to the extent that there is absolutely no reason to do so, as the difference cannot be observed by an SNES program.

My numbers come directly from the official developers manual, so if they're wrong, you know where to send complaints to. :)

by Gekko on 2009-07-22 (#49124)

The NES's V-blank is 6810 Ricoh 5A22 master cycles. I still need to know if that's consistent with the SNES's V-blank and what it would be for H-blank. (Also, this means that overscan is not part of H/V-blank.)

And now, for another question: Would using an indirect long mode that uses $2180 as its destination (i.e. LDA [$80] if the DP were $2100), would you get what would be the equivalent of [$[2181]] or would would it include $2181 and $2182 as the intermediate and high bytes, respectively?

by byuu on 2009-07-23 (#49144)

I already mentioned exactly how many cycles per scanline and scanlines per frame there were. This was based off the 21MHz clock (315/88*6m). To get your numbers, just divide.

If you lda [$80] with D=2100, you will end up fetching whatever byte is at the address pointed to by $2181-2183, and then that value will be repeated for the next two reads. So say $2181-3 point to $7e0123, which contains #$5a. You'll end up reading from $5a5a5a. This is because $2181-3 are write-only. Reads return open bus, or the MDR (memory data register.) That register will be set upon the $2180 port read.

by Gekko on 2009-07-23 (#49145)

I know how to calculate what you think I'm trying to calculate.
What I'm actually trying to calculate is the length of H-blank and V-blank (in cycles). I am not trying to calculate the length of each scanline, pixel or whatever. I know that H-blank and V-blank are not overscan, so my previous numbers were wrong. In addition, that also means that they are not considered by your formula or they are in a part of a scanline, where I do not know the length (and the H-blanking and V-blanking (moving the opposite way) themselves are not actually calculated).

Oh right, they are open bus.... Well, at least there is an easy around not being able to use it as 16-bit/24-bit. Thank you!

by byuu on 2009-07-24 (#49180)

Quote:
What I'm actually trying to calculate is the length of H-blank and V-blank (in cycles).

Have you tried looking at emu sources for your answers? Eg:
src/cpu/scpu/mmio/mmio.cpp:mmio_r4212() states that 0-3 + 1096-1363 (1359 short scanline) are hblank, and lines 225-261 (262 interlace) are vblank. If overscan is enabled, Vblank starts at V=240.

272 clocks for 99% of Hblanks, 268 clocks for NTSC non-interlace scanline 240 on odd fields.

NTSC Vblank for non-interlace even: 50468 clocks.
NTSC Vblank for non-interlace odd: 50464 clocks.
NTSC Vblank for interlace even: 51832 clocks.
NTSC Vblank for interlace odd: 50468 clocks.

PAL has 312/313 scanlines, and it's a tossup whether overscan is on or not for these games.

PAL Vblank with overscan on non-interlace even: 30008 clocks.
PAL Vblank with overscan off non-interlace even: 118668 clocks.

Should be able to calculate the rest from here.

by tepples on 2009-07-24 (#49182)

byuu wrote:
Quote:
What I'm actually trying to calculate is the length of H-blank and V-blank (in cycles).

Have you tried looking at emu sources for your answers?

I thought we all learned not to do that near the end of the Nesticle era.

by Gekko on 2009-07-24 (#49187)

Ah, thank you!
Also, I would have but I don't know much C/C++ and since I can't find anything in x86 I'd have to use a disassembler which would take even longer since it would not only be unlikely for it to be disorganized, due to the way compilers work but it would also have a lot of junk bytes (if you don't believe me, look here) and a lack of comments, making it very difficult. (Not to mention that I'm not extremely well acquainted with x86, anyway. Just more so than C/C++.) Well, I guess I'll try to learn C/C++ more quickly so I can hope to understand the data in emulator source code soon. Sorry for troubling you!
I guess this thread can be closed or whatever you do to threads that have served their purpose.

by byuu on 2009-07-24 (#49189)

No trouble, just figure the sources would be quicker than waiting on us is all.

by Gekko on 2009-07-30 (#49369)

Oh, joy! I can't read it at all. I even tried BSNES's source code but I still got nothing out of it. Oh well!

Anyway, here is another question:
How does offset-per-tile mode work (particularly mode 6, although I presume that mode 2 is the same)? I know that you specify locations with layer 3 registers and the offset is for columns, not tiles. However, I want to know what gets displayed when you place one column of tiles on top of another one? In other words, if the offsets were all zero aside from one of them, other than the first, which was $10 (or would it be 8, since it uses half pixels?), what would show up on the column to the right of the normal position of the previously mentioned column?