So the project has officially be accepted by my department at school there is a
little blurb about the project. I've been slacking on posting up our progress so you'll have to excuse the large post to attempt to catch up.
The post to do with initial planning is
here.
So instead of making separate posts for each little thing that comes up I'll try to keep everything in this post for now.
For the most part we've been spending the summer playing around with the kazzo considering using some variant of it to program the onboard memory with an onboard AVR mcu. But really we won't need to be as versatile as the kazzo since I would like the reading/writing of the all memory via USB to be independent of whatever mapper is currently loaded on the CPLD.
We've also played around with an NROM dev cart, gotta start somewhere, may as well be something with no mapper
Used the ReproPak, with some modifications to allow for battery backing the SRAM used. Thing I found out that seemed a little backward to me. He ties the PRG memory's /CE to ground and has PRG /CE driving the /OE. Wouldn't be a problem generally, but since we were using the kazzo to program the memory, having it constantly enabled proved to be a problem since kazzo ties the PRG and CHR busses together. So we just cut the ground on /CE and have /OE and /CE tied togther. The battery back up circuitry is the same as most NES carts, but we had to add pull up resistors on the /CE lines to prevent from draining the battery. Two of the switches in the top control whether the /WE signal is controlled by the cart edge or tied to Vcc for write protection. That was the only way we could find to keep the data from being corrupted everytime. One switch was for mirroring and the other for PRG ROM size 16KB/32KB selection. Everything is in working condition now, but I've got some bug that causes the first byte in CHR to always be programmed to 0xA0 vice 0x00. Not sure why, think it's the kazzo, but I was able to write my one program for the kazzo to load up via bootloader and write it back to 0x00 after everything else was programmed.
Last weekend I made up what I like to call the "NES protoboard" I've seen a similar idea mentioned before I think it was Memblers. But basically the board holds the PRG, CHR, And WRAM and routes any signal that may be used or controlled by a mapper to the back of the board where I'll be using some female header to connect up the CPLD. But since I had a little space I found some room for things that could possibly be used. The only things I didn't route to the back end were the lower address byte of CHR. But I do have some extra pads on the outer "wing" area that anything could be routed to by hand.
I added some other bells and whistles because they seem to frustrate me commonly. One being mounting the board to the back shell of a cart. That's where I made those little tabs on to allow the board to be screwed to the cart plastic bottom securely without needing the top shell that wouldn't fit anyways.
I also extended the cart edge pins INTO the actual case because if/when you want to tap off of them they're actually outside of the case housing and soldered wires could interfere with a connector potentially without my exteneded pins.
The only thing that changes from flash/EPROM/EEPROM/SRAM whatever is the upper address bytes and control signals. Those are conveintly controlled by the mapper generally speaking so they all got sent to the back of the cart. Because of this my protoboard could accept most any memory in a DIP package.
I've also provided battery backup circuitry and each memory has it's power selectable by solder bridge. May also prove prudent if the cart is powered via USB to prevent attempting to powering the NES through the cart connector.
Here's the
schematic and pcb files including the cart connector and other items I've had to create for the library. I've designed everything using
DesignSpark which is free and pretty easy to learn in my experiences thus far.
Here's a preview, I should have them in next week.
There's nothing actually over the connector, it just defaults user created items to that 3d rendering height...
Our goal over the next month before school and the project officially start, is to get a simple "discrete mapper devboard" working. We're hoping to support N/A/U/BNROM etc and program it with the kazzo. We'll be able to connect up our little 72 macrocell CPLD to the NES protoboard and test out some designs from Xilinx IDE we've started designing.
Took the first step with a CPLD on the NES tonight. Replicated the UxROM mapper with Xilinx schematic and programmed it on a CPLD devboard. Just tested it out with a standard board with EPROMs and the '32 and '161 removed. Next up is to upgrade the NROM devboard and test out flashing the cart. Then we'll try out some other mappers.
Here's some pictures
Much better than those two little DIPs
Apologies, for the lack of updates as of late. Not certain how many people are still interested in the project, but things are still moving forward. Most of our time spent working on the project was in the form of papers required for the first term which happens to be a writing intensive course...
We've been posting everything formally
here as is required for school credit. If you snoop around there are block diagrams and such.
Now that most of the writting/research is done we're working on the final design for the next 2 weeks. For the most part I've narrowed it down to using a Lattice CPLD (Mach Xo2 with 640 Mcells) because of the excessive capabilities it has compared to it's Xilinx/Altera equivalents while at a lower price. I'm tring to determine which mcu to use now and am considering the atmega325A and atmega128A. They are nearly pin compadable with the exceptions being USB and I2C pins. The atmega325 would do the trick for the project, but the 128 may be more desirable it were ever used as a coprocessor for the NES (which is a little outside the scope).
Because the CPLD is so massive I'll be using it to io extend the mcu and a mapper on the same flash. I would really like to get all discrete mappers, MMC1 and MMC3 in the CPLD at the same time. Allowing the mcu to select the desired mapper without reflashing the CPLD.
We're looking at 512KB of PRG/CHR memory in the form of SRAM, 32KB WRAM, and potentially an extra 512KB of PRG-ROM.
The goal is to allow the rom to be programmed to the cart while connected to the NES by "removing" it from the NES with level shifting buffers for the whole 72 pin connector. Satisfying the ultimate goal of quick and non-cumbersome programming which current solutions lack.
Total cost of components is still less than $50. PCB and case costs vary heavily depending on needs.
I'll be posting up the final design once it's complete next weekend. If anyone has inputs I'm more than willing to hear them out, but I can't really consider large changes at this point... Specifically any input on mcu/CPLD/memory connections and capabilities are open for modifications in the next week.
I'm interested in how this goes. Like I mentioned previously, it's sorta like a board I want to make eventually, with a Spartan3 FPGA and PIC32 MCU (both are 5V tolerant). It's pretty damn cool to see new NES boards with this kind of stuff on it, yours would be the most advanced to date.
I looked at that Mach XO2, sounds pretty nice!
I appreciate the feedback. At times I feel like the project is a bit overboard, and I question if any of the features I'm trying to provide will actually be exercised. But in reality I'm setting it up to be more than just a ROM/game development cart, but a hardware development board too. So I guess that gives the broad scope some justification.
I agree it is always fun to see these new parts connected up to the NES, I don't think that'll ever get old
Yeah that XO2 man, the only downer is it's not 5V tolerant. We had been planning on using a 9500 series Xilinx to stay 5V tolerant. But we decided to just level shift everything for other reasons and now the XO2 is too tempting to pass up with it's size, cost, and all the hardened features of SPI, I2C, Flash, Dual ported RAM, etc, etc...
The idea would be that you could design a mapper with the Dev cart for another CPLD though that may be 5V tolerant atleast. Even if it weren't though and a game was actually produced with the XO2, the requirement to level shift isn't nearly as bad if you've defined all your signals you would only need a few extra cheap ICs for level shifting if you kept 5V memory. But with all the signals undefined essentially it was easier to just level shift everything on the way into the cart and gain the benefit of 3.3V for everything on board.
Just caught wind of this. Awesome project!
So most things are pretty well laid out for this thing now. If anyone's curious of specifics for the design we had to post everything up here:
http://beaversource.oregonstate.edu/projects/44x201109My goal has kind of transformed to allow for a lot of capability and functionality. If I've seen a mapper idea mentioned recently I've noted it and tried to provide the hardware required to make it possible. The parts cost is around $50 right now for individual quantities. If it ever actually gets produced it should be able to stay under $100 as hoped.
Highlights:
512 KB of PRG and CHR SRAM
512 KB of PRG Flash
32 KB of WRAM
possibly some extra serial eeprom tossed on because it's cheap
Atmega325
Mach X02 644 MCell CPLD
entire board is level shifted to 3.3V as the signals enter the cart. This also allowed for the cartridge to be "removed" from the NES for programming. The main goal was to be quick and convenient to program. So assuming your PC is close enough for a USB cable to reach you can leave everything plugged in from one build programming to another. You should be able to leave the power on as well you'd just have to hit reset after programming (the NES would freeze up during programming unless you kept it running off internal RAM for some reason).
USB read and write access to ALL memories on board, and should be able to reconfigure the CPLD as well via USB (flash configuration on CPLD provides over 100K write cycles). I ran a quick demo of programming 8KB of SRAM in under a second with a prototype of our firmware and software. Picture below and quick video here:
http://www.youtube.com/watch?v=jYlYKQpxwA4&context=C3dc84ddADOEgsToPDskLuEW6BpKAuzi5yxhSsTajc[EDIT: image attached]
I calculated it out to run about 40KB/sec for this setup. The final one will be a little different. But at that speed most games program around 10sec and all memory space on board could be programmed in 20+ seconds.
But the CPLD is sort of dual functioned. It I/O extends the mcu durring programming and also runs as the mapper during play/testing. Since the CPLD was fairly large we were able to do this to keep chip count and part cost down.
I've got nearly every pin connected to the CPLD. So all PRG addresses can be decoded and optionally all CHR as well. I've left some of this configurable for the time being with jumpers because I started to run out of I/O on the CPLD. So It's possible to get as low as 128byte banks on CHR memory but at a cost of not having lower CHR address bits as inputs. 128byte pages gives you down to A6, 256byte to A5 etc.
I had planned for 8KB PRG bank switching but I was wondering if anyone can think of a benefit to less than that? I can't think of any and the I/O seemed more useful for decoding all PRG addresses for things such as dec $4011 and such.
The CPLD is what I think I'm most excited about though. I discussed it above but it's stuffed with goodies. Lots of capabilities with the Dual ported SRAM, and other hardened features without the cost of logic elements. It opens the door for using the mcu as a co-processor and everything possible there. It's not as cheap as some CPLD's but still reasonable to put in a production cart. It has a LOT more to offer than a $4-5 CPLD but at about twice the cost.
I don't think it would ever be that reasonable to produce a game that made use of everything on board. But reading through some of the old posts everyone has their own ideas of what they'd like. I tried to remove limitations where possible with the thought that limitations could be placed by the user in a final production.
I kind of think my goals are a bit lofty at times, but I'm having fun working on it and getting school credit at the same time so either way I win
Next steps are to port the firmware from the atmega8 in the demo to the kazzo and do some testing with the NESprotoboard. Should be ordering the prototype and parts within a month.
NSFs use 4KB PRG banking. Musicians would definitely be interested in a cart they can use for performances with that fast uploading.
bunnyboy wrote:
NSFs use 4KB PRG banking. Musicians would definitely be interested in a cart they can use for performances with that fast uploading.
Thanks for that info, I had originally planned that PRG A0 would get lost as in input if one wanted 4KB banks for PRG ROM. But I think I'll make it the default to have 4KB banks and all PRG address inputs then. I might make it so you can't have CHR A0 and PRG A0 at the same time instead.
I've talked a bit with Andy over at
http://www.batslyadams.com/ (floats around nesdev a bit too) he's pretty involved with the music scene. I'll have to check back in and see if he has any last minute inputs.
infiniteneslives wrote:
bunnyboy wrote:
NSFs use 4KB PRG banking. Musicians would definitely be interested in a cart they can use for performances with that fast uploading.
Thanks for that info, I had originally planned that PRG A0 would get lost as in input if one wanted 4KB banks for PRG ROM. But I think I'll make it the default to have 4KB banks and all PRG address inputs then. I might make it so you can't have CHR A0 and PRG A0 at the same time instead.
I've talked a bit with Andy over at
http://www.batslyadams.com/ (floats around nesdev a bit too) he's pretty involved with the music scene. I'll have to check back in and see if he has any last minute inputs.
Then it would be a totally different mapper than FME-7/SS-5b, because it is not backwards compatible.
Since you are changing it, Maybe you can make it backwards compatible by making a new mapper revision to use a MODE for PRG bank size.
for this mode you can add in another port to write in at $A000 as Port $10 (%0001xxxx).
Hamtaro126 wrote:
Then it would be a totally different mapper than FME-7/SS-5b, because it is not backwards compatible.
The mapper isn't intended to be anything concrete. The goal is for it to be something that can be reprogrammed over USB as well.
But since the mapper is pretty big do plan to make some mappers that are selectable. For instance an "all-in-one" that would contain all the discrete mappers on one CPLD configuration. Then the user can select which one to use when programming or by having the hardware decode the mapper number from the .nes file header.
So it's possible one could make a FME-7 mapper with selectable bank size like your saying, I just don't think it'll be the default configuration.
Okay, That makes sense then.
Some previews of the PCB. We're ordering tomorrow hopefully, getting excited to see it all together.
I now know why they made NES carts so huge...
[EDIT: images re-attached]
Looks great! What are the final dimensions of your pcb?
captncraig wrote:
Looks great! What are the final dimensions of your pcb?
They are around 4.4"x5.0" basically takes up all the space in the cart.
Right now I'm just trying to figure out where I can get a small quantity run for $200 or less. It's proving to be a pain with the ass-ton of signals on the 4 layers since most deals have minimum specs that are pretty big.
If anyone has any ideas of manufactures let me know. So far advanced circuits is a no go. Imagineering is a possibility depending on some technicalities. I'm trying to see if I can route to Dorkbot's minimums right now...
I have got good figures from several chinese places through a friend. I am working on a rather big project which PCB is 170 x 245mm and 4 layers and it costs me 260eur for 10 boards
Remember that NES carts are not the normal PCB thickness. They are 1.2mm which many places will not do. You can jam a 1.5mm board into the cart slot but the connector won't last long. I haven't done 4 layer stuff with them but get all my other boards from MyroPCB.
I order from MyroPCB also, been using them since 2005 and they've always been good.
Thanks for the tips. Does myro have some sort of membership fee or somthing? I seem to remember you saying something to me about that once ago Memblers. Maybe it was for assembly or something though because I can't find anything like that on their site.
EDIT: has anyone ever tried to run the NES off of 3.3v before? I'm guessing most things MIGHT work but I'm figuring the video filtering and such might be gunked up. I've got a breakout board for my CPLD but no buffers set up until I get the PCB. I think my FCmobile runs off 3.3v though so I could probably just do some prelim testing with that.
Well I was pretty impressed with running the NES at a lowered voltage. I was kind of right about my predictions of the video. Turned NES (REV-07 PCB) ran pretty stable at 3.3v and even down to around 3.15-3.2v below that it would crash. The video looked pretty good, it was just darkened and the sound was a little quieter which makes sense.
For reference if anyone is curious I had the CIC defeated and was running on of Memblers' repro carts with more modern memories on board because I did't want the cart to be the cause of not operating at lowered voltage. I was also running a UNROM mapper (california raisins)
My FCmobile looks like it's pretty close to my max input of my CPLD (3.75v) My scope showed some signals ghosting above from time to time but probably won't cause much damage.
Wwll I couldn't get it made for anything less than $400 looking for only a couple boards... So I finally got some sense and broke down and routed thing by hand. I Had to completely reassign pins to all the memories and CPLD as I drew every last signal by hand. I was able to drastically optimize it by doing that and took it from a 4layer board to 2 layers. with twice the design tolerances and less than half the vias...
Auto routing madness on 4 layers:
Hand routed on 2 layers:
[EDIT: re-attached below]
What it'll kind look like now:
[EDIT: re-attached below]
So tomorrow I'll be ordering the first boards. I can get em for less than $40 each in single quantities. Some autoquoting showed that I should be able to get unassembled boards for under $10 for quantities of 25. So that was nice to see.
There's around $50-60 worth of parts onboard, $8-10 for case and CIC. So looks like it'll still be reasonable to obtain for $100 or less with assembly figured in.
I also got my hands on a breakout board with the mach XO2 cpld on it. I asked the guys at Lattice and they hooked me up for free sponsoring the school project.
Had to do a quick video demo for school so there's a video here:
http://www.youtube.com/watch?v=MuHV_ATmHw8&context=C333a56dADOEgsToPDskJwe_q-gpTOsTGPKN8ZGBmt
Your 2-layer version there is looking much better than the other one.
BTW, I'm not sure what CAD software you're using, but if you can export/import Specctra files, there is a really good autorouter here:
http://www.freerouting.net/
I actually use it even for doing manual routing, because I use Proteus which doesn't have the "push and shove" feature (which is really handy for making dense boards). With it's autorouter you might have to leave it optimizing for 5 days straight depending on your board complexity, but it gives a pretty good result.
Thanks for those resources Memblers.
Yeah I used DesignSpark for the board, it's not my favorite but it's what I've got the most experience with and haven't decided to devote more time to learning another tool and my component libraries. It doesn't look like it supports Specctra export/import, looks like people have been requesting it though.
Even still I don't think any computer tool could have come close to optimizing like I could. The main reason being I had to assign memory address and data lines as I routed the signals by hand interweaving them between pins and such as they best fit. I literally created the netlist for most parts as I drew it out. I had to ignore the memory pinout from the datasheets. No D0-7 or A0-18, just D's and A's... And obviously the same with the CPLD with the general IOs.
That tool can't optimize by reassigning the netlist like I could right? If so that would be AWESOME. Even if it would have taken 5 days on it's own, it would have been better than the 5 days I spent doing everything by HAND... At least I've got an intimate relationship with every trace on the board now
I haven't checked out that Proteus yet, but if you and Farid are having good results with it I might check that one out if/when I decided to toss DesignSpark in the trash.
That boardl ooks very awesome
It makes me feel like such a masochist for doing stuff like this without any netlists and autoroute and manually created components :
http://www.fileden.com/files/2008/4/21/ ... arDone.png
I'm doing same kind of "don't care about A and D order for sake of easy routing" on all memory chips whenever possible
TmEE wrote:
It makes me feel like such a masochist for doing stuff like this without any netlists and autoroute and manually created components :
http://www.fileden.com/files/2008/4/21/ ... arDone.png
Wowzers, NO auto-tools huh? I've seen your board before but knowing that especially after my simple project makes yours much more impressive. What are you using for software?
I kind of know what you're saying, and would have to agree that masochism is a fitting word.
Once you get so much done by hand you start to lose interest in the auto-tools and want the gratification of doing it ALL by hand regardless of the pain/time involved...
I used Minimal Board Editor to create it. I think I am the biggest user of it, and I have made quite a few feature requests which have been added now too
http://www.suigyodo.com/online/e/
Only real problem with it is lack of component library, I had to make everything myself (which I actually enjoy doing).
TmEE wrote:
I'm doing same kind of "don't care about A and D order for sake of easy routing" on all memory chips whenever possible
That might make it slightly harder to program a flash chip when the bits assorted with the write commands get switched around or (worse) when it makes individual banks non-contiguous. I think that's why the GBA Movie Player, for example, has such a strange write sequence to reprogram the flash. (ObNES: Dwedit made a version of PocketNES that installs to unused parts of the GBAMP's flash.)
Actually with Proteus there are some features like that, but I've never tried to use them (so I'm not sure if it's 100% what we'd want). One is gate-swap, which is more typically used for swapping different gates in a 74HC00 for example. When creating components there's a "swappable pins" section that I never filled out when making stuff. The other feature is back-annotation, where somehow you can change stuff on the PCB and it can automatically change the schematic to match. I've not tried either of those, though I really should. Currently in that situation I just use the 'ratsnest' view and change the pins on the schematic manually until it looks like the signals don't cross as much.
Unfortunately I don't think that autorouter supports that kind of stuff. But I donno really, I suppose the 'swappable pins' theoretically could be part of the Specctra format. But as usual with routing and autorouting alike, the main part of good routing results lies in the component placement (and pin use).
The names certainly suggest what you wrote, if reality is same is another question. I only tried Protel briefly and decided I liked something else more. Eventually I settled with MBE because it did all I wanted in easy and painless way.
tepples wrote:
That might make it slightly harder to program a flash chip when the bits assorted with the write commands get switched around or (worse) when it makes individual banks non-contiguous. I think that's why the GBA Movie Player, for example, has such a strange write sequence to reprogram the flash. (ObNES: Dwedit made a version of PocketNES that installs to unused parts of the GBAMP's flash.)
I have actually kept flash pins as original for the very reason, luckily for me things have matched up with not so much headache when keeping original signal order. I could only swap individual bits in the 4 bytes anyway but not across the bytes, unless I want some extremely messy firmware update routine, at least for boot flashes anyway...
tepples wrote:
TmEE wrote:
I'm doing same kind of "don't care about A and D order for sake of easy routing" on all memory chips whenever possible
That might make it slightly harder to program a flash chip when the bits assorted with the write commands get switched around or (worse) when it makes individual banks non-contiguous. I think that's why the GBA Movie Player, for example, has such a strange write sequence to reprogram the flash. (ObNES: Dwedit made a version of PocketNES that installs to unused parts of the GBAMP's flash.)
Effectively the Flash was the only part that I wasn't able to swap pin assignments around with because of what you're saying. Since it was the only one it didn't really effect my routing. I just let the flash be the deciding factor on which net was which. The SRAM and CPLD simply followed suit.
Memblers wrote:
The other feature is back-annotation, where somehow you can change stuff on the PCB and it can automatically change the schematic to match. I've not tried either of those, though I really should. Currently in that situation I just use the 'ratsnest' view and change the pins on the schematic manually until it looks like the signals don't cross as much.
Yeah I had hopes that I could use the back annotation feature of DesignSpark but it turned out it was ONLY for remaming nets and parts. It would NOT actually revise the netlist so I had to go back and back annotate the nets myself. Hopefully Proteus doesn't have that issue.
I did do the 'ratsnest' method and the memories seemed to line up pretty well with the pinout assigned with the data sheet. But it still epically failed when it came to the autoroute. It probably would have done better if I had through-hole components or smaller design tolerances to give some breathing room. Even still I don't think it would have taken me from 4 to 2 layers though.
Moving right along...
I've got the programming interface completed (some IO extending within the CPLD). No more jumpers to swap around during programing. I've also got the valuable part of the .nes header stored inside the CPLD to implement the proper mapper, rom size, and mirroring etc. I've officially got NROM and UxROM support now with everything working cleanly. I'll be adding the rest of the discrete mappers after the boards show up and everything gets ported over next week.
At that point the project is pretty much complete as far as school is concerned. But I'll be keeping it going and start implementing come complex mappers and features. I'll test out qbradq's FME-7 and see if he actually has it working (where is he nowadays anyways?) Assuming there are bugs I'll work on getting MMC1/3 working then dig back into FME-7.
The other day someone mentioned that you'd be crazy to try to develop a game by testing on hardware alone. With this thing being so quick and easy to program it's not really much more cumbersome than loading up an emulator. Yeah I won't have all the tools of a emulator but I think I'm going to take that as a personal challenge to build my first game using hardware testing only, no emulator...
infiniteneslives wrote:
The other day someone mentioned that you'd be crazy to try to develop a game by testing on hardware alone. With this thing being so quick and easy to program it's not really much more cumbersome than loading up an emulator. Yeah I won't have all the tools of a emulator but I think I'm going to take that as a personal challenge to build my first game using hardware testing only, no emulator...
I did that with Glider, it was definitely a fun challenge
Didn't use anything but the screen for debugging, so there was lots of printing hex numbers, or using the grayscale bit as a notifier that something happened. I don't think it made me code any more carefully tho. Maybe if it took the 2 minutes to program and swap eproms it would have.
Got some progress and eye candy to share. The CPLD was a bit of a pain to solder. We had a stencil made up that helped out but we still had some shorts. Luckily we had some scopes available to get in there and see what we were doing to remove them. Everything else was cake to solder on. After we learned some lessons with the CPLD.
Got the board together yesterday and slowly moving though and testing all the hardware and connections before we start trying to do a full test with it playing games. I've got it recongized by USB now and should be playing games by the end of the week assuming nothing blows up
Last week with the breakout board set up I got all the discrete mappers working I've got about 6-7 mappers now and I started working on MMC1. Right now the bare bones programming interface and NROM mapper take up about 40-50 Mcells (~7% of available) and after adding most discrete mappers: UxROM, AxROM, BNROM, CNROM, M/GxROM, and colordreams all on the same configuration I'm using up about 110Mcells (~17%). By my calculations I shouldn't have much issue fitting MMC1, MMC3, and FME-7 on the same configuration as well with plenty of room to spare.
The more and more I play around with this Lattice CPLD I'm really happy with my decision to build with it. I like the IDE alot better than Xilinx and haven't had any issues with the whole set up really. Oh and I also was excited to see Lattice's larger MachXO2 cplds are available now. We've got the 1200 with 640 Mcells but the 7000 has >3400Mcells as one of the biggest 3.3V cplds on the market for cheap. Ours is $7.35 @ individual quantities and dirt cheap IMO at $6.40 for quantities of 25. The 7000 is only $15 with several intermediate steps between. They are pin compatible as well, so I know where I'm going if I fill this thing up.
I'll have to ask Loopy how much logic his MMC5 is taking up at the moment.
[EDIT: images re-attached]
Ôh wow, this looks so fantastic
That's awesome! So are the EXP port pins connected (via buffers) to the CPLD?
How does this board differ from the PowerPak? I'm not asking about parts, technologies and such (although that information could be used figure out how the price will differ), but actual use in development/playing.
I'm mainly interested in development obviously, since I don't see much that can be improved about playing when compared to the PowerPak. The PowerPak lacks a lot as a development tool though, even though it's marketed as such.
Is it possible to integrate all the addon-chips on the japanese games?
FME-7 and MMC5 was one of them, but what is with the rest of them?
MMC5 will need 8192 microcells just for the 1024 bytes of ExRAM unless you put the ExRAM in a separate chip. Or do they make CPLDs with a block of RAM?
tokumaru wrote:
How does this board differ from the PowerPak? I'm not asking about parts, technologies and such (although that information could be used figure out how the price will differ), but actual use in development/playing.
Would hope that the on board processor+USB will load games faster than the 5KB/s through controller port or 25KB/s through USB CopyNES. Maybe the USB can be used as a serial debug port too?
Would also be nice if it runs on clones
chykn: The EXP pins are buffered so that they can be I/O of the CPLD. But I was limited on I/O so I have little solder pads that could be used to jumper the signals to the unused I/O made available on the board. The only issue is the buffers are directional and I only set the data buffers up to beable to change direction. So EXP0-3 are always into the cart and EXP 4-9 are always out of the cart.
tokumaru: I really wanted this to be more focused as a development board. My goal was to provide for a lot of capability and remove limitations where possible. The biggest thing is the fact I've got the atmega mcu on board. It allows for a lot of things the powerpak can't do. The mcu provides the USB interface for quickly programming the memories on board. But it is also interfaced with the CPLD so that one could use it as a coprocessor or something if they really wanted playing around with dual ported memory or as a synth etc.
The other notable difference is I'm using a non-5v tolerant CPLD vice the power pak's 5v tolerant FPGA. The CPLD is convinent in that it's non-volatile and doesn't need to be configured at start up like an fpga. IMO it makes it simpler for a developer. That and any cart that would get produced would most likely be on a CPLD so it made sense to me that it was also. Keep in mind the write cycle limit on the CPLD's configuration FLASH is like 100K+ cycles compared the 10K of EEPROM cplds. The fact I'm not 5v tolerant also required level shifters to be used, but they make it easier to program the cart since you don't have to power off the NES or hold reset or anything.
The biggest goal was quick and convenient programming of memories. So it's set up where you can keep the cart plugged into the NES and your PC and when you've got a new build to test you just upload the .nes ROM to the host software on the PC. Then the buffers remove the cart from the NES reguardless of state and program the memories. At that point you would have to turn the console on or press reset if you left it running. No connections to be made or broken just click program and hit reset. And it's pretty quick ~10 sec or less for most games.
Also everything is open source and I hope to have tutorials and stuff on how to modify or create mappers making it a useful tool that doesn't need to be reverse engineered to make full use of for mapper and game development.
im-pulze: FME-7 is actually planned to be implemented as one of the first mappers. Lots of the MMC5's capabilities are easily possible but I can't be sure that it would all fit in the CPLD at this point. But realistically I could support any number of mappers if one was willing to design them.
tepples: That's one of the awesome things about the mach xo2 It has 10bit of distributed SRAM that can be configured as dual port without adding much more logic. I'm not certain I can do the full 1KB with this CPLD but I shouldn't have much problem with the larger cplds in the family.
bunnyboy: It's running at around 40KBytes/sec right now. There is some room for improvement though. One should be able to do some level of debugging via USB too but I don't plan to focus on this in the near future.
Quote:
Also everything is open source and I hope to have tutorials and stuff on how to modify or create mappers making it a useful tool that doesn't need to be reverse engineered to make full use of for mapper and game development.
I'll buy one of these babies as a complement to my two Powerpak gaming carts for this reason alone :)
bunnyboy wrote:
Would hope that the on board processor+USB will load games faster than the 5KB/s through controller port
It's actually 10KB/s through the controller port.
bunnyboy wrote:
Would also be nice if it runs on clones
I was hoping that it would be. But I'm not sure of what all would make it incompatible. Why is it exactly that the power pak doesn't work on clones? One thought I had was that the clones usually operate at 3-4Vdc and there could be issues with power supply. This should be good in that respect, the level shifters will operate at whatever voltage they get supplied with. And the regulator is low drop out so it should operate without USB power if the console provides ~3.5V. Worst case it would have to have USB power to work with clones that operate at or near 3.3V.
My other thought was that there is some issue with the power pak programming of the memories being done by clone CPUs. Not sure what the exact issue would be there, but it obviously wouldn't be a problem with this since it's all done by the mcu here.
The only clone I have is a FCmobile II. I did test my breakout board set up with it and everything worked. But that didn't include the buffers and power supply circuitry that's implemented in the final design.
Figuring out the problem is on my todo list! Could be power (clones have weak power supplies), or a wiring problem (they expect CHR /A13 and CIRAM /CE to be connected), or something in the timing differences.
Designing for USB power might be a good idea anyways. That way you could do power on tests without losing the SRAM/CPLD contents.
Hmmm, what about hooking up a bluetooth spp module to transfer data?
They can be had for about $6:
http://dx.com/wireless-bluetooth-rs232- ... dule-80711
These things are pretty easy to interface with AVRs, I've used them for a few of my projects.
bunnyboy wrote:
Designing for USB power might be a good idea anyways. That way you could do power on tests without losing the SRAM/CPLD contents.
Yeah it's set up to run on either NES or USB power and seamlessly switch between the two. So you don't need the NES to be on to program it and you don't need it to be plugged into USB to play on the console. But like you're saying you can leave it plugged into USB the entire time so you won't loose power/memory contents after shutting off the console. In the event that a clone doesn't supply 3.5V or more you'd always need USB power supplied.
Also we've got a battery on board that can be used to power WRAM, PRG RAM, or CHR RAM as desired. If you stored something in the CPLD's RAM you'd lose it though, but there is flash in the CPLD one could use. The CPLD user flash would be nifty for saving game data without use of a battery or external non-volatile memory.
drk421 wrote:
Hmmm, what about hooking up a bluetooth spp module to transfer data?
I probably won't implement it myself, I find it hard to beat the speed, reliability, and compatibility of a USB cable. But the good news is, I left the serial pins available on the mcu with solder contacts on the board. So in the spirit of the project one could take this as a base to easily add something like that and DEVELOP your own BT interface for the cartridge
Anything that can be a USB host can be a Bluetooth host. In fact, that's how the Wii console connects to its remote: through a built-in USB Bluetooth adapter. Is this USB chipset OTG (cable-selected host or client) or client-only?
tepples wrote:
Is this USB chipset OTG (cable-selected host or client) or client-only?
I never really thought about someone wanting to use it as a host...
No it's not OTG, just client only. I'm using V-USB, and I'm pretty sure there is nothing available with it that allows it to act as a host. Another option was LUFA, I'm not sure if hosting is possible with that or not. I don't have any USB specific hardware aside from the AVR mcu and a couple resistors. I've never looked into the possibility of hosting but my guess is you'd have to add hardware or write your own USB host code to do it.
Did you have something specific in mind?
I'm guessing using it for peripherals such as a keyboard/mouse or something? Simple items like that would probably be easier and no hardware (aside from a socket/cable) if you used serial that's already built into the mcu.
If one wanted to make it a host for data storage for something like a flash drive, the easier/nicer option would be an SD card slot. There are tools already out there for getting an SD card on a AVR, it would then just need to be connected to the SPI bus on the cart and maybe 1-2 of the free pins on the mcu.
Aside from that adding the BT card might add some hosting capabilities. But that one at least is still slower than USB which should beat it out in data transfer speed by 2-3 times.
Nice looking piece of kit you've got there. I'm looking forward to this entering production.
infiniteneslives wrote:
Also everything is open source and I hope to have tutorials and stuff on how to modify or create mappers making it a useful tool that doesn't need to be reverse engineered to make full use of for mapper and game development.
This is the number one draw for me. I've come up with a few different ideas for new mappers and it would be nice to have a platform to prototype them on. (That and I'm a pretty big fan of open source. Not quite a zealot, but definitely convinced of the benefits.) Most of my ideas could be build out of a handful of discrete chips or a small programmable device, so they'd be cheap to make carts of later.
Quote:
It has 10bit of distributed SRAM that can be configured as dual port without adding much more logic.
I'm assuming that's a typo. Ten bits!? How much RAM does it actually have? Dual ported RAM gives me all kinds of crazy ideas. (But those ideas require a lot of it...)
Karatorian wrote:
Quote:
It has 10bit of distributed SRAM that can be configured as dual port without adding much more logic.
I'm assuming that's a typo. Ten bits!? How much RAM does it actually have? Dual ported RAM gives me all kinds of crazy ideas. (But those ideas require a lot of it...)
Ahh yeah... Not sure how I came up with that... This CPLD has 74Kbits of SRAM that can be easily configured as true dual port still much more than MMC5. The big dog of the family (pin compatible) has a whopping 240Kbits! Should satisfy most desires on the NES
In other news we finished porting over all the AVR code today and I just finished debugging it all. Everything works GREAT! I even tested it out quick on my portable clone that is only operating at ~3.5v and it worked breautifully even without USB power. All my mappers tested out great as well and there doesn't seem to be any issues with the buffer circuitry either.
So now it's time to start exercising this thing
infiniteneslives wrote:
This CPLD has 74Kbits of SRAM that can be easily configured as true dual port still much more than MMC5.
Are you using any of that?
The idea I had was to have a dual ported CHR-RAM mapped into the CPU's address space. This would allow for more time to update the tiles by allowing writes to the offscreen page during rendering and utilizing a simple page swap during v-blank. (It's funny how I just had this idea early this morning and then I read about how your cart could be used to prototype the idea.)
No I'm not using any of it in the base mapper as of right now so it's completely free to play around with. Although I did have plans to play around with it to see how it works.
Isn't your idea basically the same thing as EXRAM that the MMC5 has?
ExRAM can be used only as a nametable, not as a pattern table. MMC5 has no provision for writable pattern tables.
Okay so I dug a little deeper into the SRAM capabilities of this thing. Basically there are two different types of SRAM available. There is Embedded Block RAM (EBR) and Distributed RAM in each LUT (Look-up table, similar to a Macrocell)
The EBR is designed to provide "large" amounts of configurable SRAM (single port, dual port, psuedo dual port etc) and as far as we're concerned comes in 1KB chunks of that are 9 bits wide (9216bits) but assuming the 9th bit isn't used they are just 1KB x 8 bits wide. This EBR can be single, true dual, or psuedo ported (read only on one port, read&write on the other) without costing any logic elements (LUTs). This CPLD has 7 blocks of EBR so I can easily have 7KBytes of true dual ported SRAM. Larger members of the family have 8, 10, and 26 blocks (one KByte per block).
The Distributed RAM that is contained within each logic element (LUT, but you can think of this like a macrocell) So along the lines of what Tepples brought up about making SRAM from macrocells, SRAM can be created from the general logic cells available. However this can only be configured as single or psuedo dual port (true dual port not available here without using obscene amounts of logic) However putting SRAM here is very costly like Tepples brought up. Configuring LUTs as RAM costs about 21bits per LUT. And I've got 640 LUTs. So to make 1KB of pseudo dual ported SRAM it takes about 328 LUTs which is HALF of the logic I have available.
So long story short, there isn't much point to implement distributed SRAM unless you REALLY need it and have lots of logic to spare. The EBR true dual ported SRAM is great in the EBR (why it's there). Unfortunately I'm 1KB short of having 8KB for both nametables.
The only trick is that the SRAM is synchronous... Not much of a problem with PRG RAM I can just drive it off of M2. But the CHR side is a little tricky. From what I can see on my scope the CHR /CE (A13) can't be used as a clock because it doesn't toggle each access like PRG /CE does which makes sense. The only real signal available is CHR /RD and CHR /WR. It looks like CHR /RD toggles nearly every cycle, except for a write cycle in which case CHR /WR toggles. So I'm thinking a clock could be generated by NOR ing CHR /RD and /WR. Only issue being that if the clock is delayed behind CHR /WR then there could be some timing violations with using CHR /WR as my /WE line. But adding some delay to CHR /WR could resolve this if needed. It looks like the Address and data lines hang out long enough to prevent issues there.
Karatorian: did you want to write something up to demo this? If so, what kind of mapper set up were you thinking? Would you want just one full nametable (4KB) or the full 7KB I've got available and just map the original NT to the last/first 1KB? Where would you want it to be mapped on the PRG side? The convenient thing about being just under 8KB is that it would fit in the MMC5's EXRAM location. So if my math is right 7KB could be mapped to $4800-$7FFF. Otherwise it could just sit where WRAM normally does assuming there wasn't any. Or a single NT could be mapped to $6000-$7FFF.
As for the bank switching I'm thinking something like a smaller CNROM but with CHR-RAM. Then just swap the standard VRAM out for the dual ported SRAM like your saying. The standard VRAM would just fill in the whole of the 7KB. Or just swap out a single NT in the 4KB option.
infiniteneslives wrote:
So long story short, there isn't much point to implement distributed SRAM unless you REALLY need it and have lots of logic to spare. The EBR true dual ported SRAM is great in the EBR (why it's there). Unfortunately I'm 1KB short of having 8KB for both nametables.
But with 7 KiB, you could still make a bank of MMC5 style extended attributes for each nametable (1 KiB each), four 1 KiB pattern table banks like
Chinese TQROM, and 1 KiB extra for saving like MMC6.
Quote:
So if my math is right 7KB could be mapped to $4800-$7FFF.
7 KiB would fit in $4400-$5FFF.
tepples wrote:
infiniteneslives wrote:
So long story short, there isn't much point to implement distributed SRAM unless you REALLY need it and have lots of logic to spare. The EBR true dual ported SRAM is great in the EBR (why it's there). Unfortunately I'm 1KB short of having 8KB for both nametables.
But with 7 KiB, you could still make a bank of MMC5 style extended attributes for each nametable (1 KiB each), four 1 KiB pattern table banks like
Chinese TQROM, and 1 KiB extra for saving like MMC6.
Quote:
So if my math is right 7KB could be mapped to $4800-$7FFF.
7 KiB would fit in $4400-$5FFF.
Yeah I need to brush up on my PPU memory map a bit I'm getting all mixed up. I really could to a lot more with that 7KB than I was thinking. As for the last 1KB of saving like MMC6 it would have to work a little differently. That SRAM is volatile and battery backing the whole CPLD isn't really an option. But there is 8KB of user flash memory available on chip. But you could still use that last 1KB for all kinds of stuff. Dual porting with the AVR or other functions within the CPLD.
I hadn't really thought about name tables yet, just pattern tables. The idea was inspired by some GBA code I wrote that faked a bitmap mode by filling the screen with a sequential tile pattern and using a custom tile for each one. (Yes, I'm aware the GBA has a real bitmap mode). So the name tables never needed changing. (Yes, it's horribly inefficient.)
Of course, on the NES, with only 512 tiles total (using both banks), this wouldn't quite work. So it would be necessary to update the name tables too. Assuming that you only use 256 tiles for the background, 8k would be enough. One 4k page for the front buffer and one 4k page for the back buffer. This requires that the sprites share the same bank as the background.
If you wanted the sprites to have their own bank, then you'd need 12k. I'm assuming you could use whatever RAM you currently have onboard for CHR-RxM already. However, switching banks for the sprite table reads would require a level of PPU monitoring similar to the MMC5. If you wanted the sprites double buffered too, you'd need 16k, all of it dual ported. (Or at least psudo dual: PPU read, CPU read/write.)
Unfortunately, the 7k you've got easy access to isn't quite enough for even the basic setup. My suggestion would be to use 6k of it as two 3k pages. That leaves 1k for other stuff (a third name table, extended attributes, etc.) That would give 192 double buffer tiles. The other 64 could be used for something fixed, like alphanumerics. Not too shabby if you ask me.
Of course the real limitation (which I ought to know, but don't) is how fast the NES can update these tiles. How many bytes can the NES move in one frame. Assuming you're just grabbing the tiles from PRG-ROM and they're not dynamically composed (like one needs to do variable width fonts, vector graphics, or bitmap emulation), then all you need to do is move the bits. There's not much point in supporting double buffers larger than the CPU can fill in a frame anyway. (Unless you wanted to cut the frame rate to 30 FPS.)
As for how the rest of the mapper would be setup, I hadn't gotten that far. When I first had the idea, which was basically "Hey, dual ported CHR-RAM could be used for double buffering", I was assuming the dual ported RAM would a separate chip. Then I started thinking about how many IO lines the mapper would need:
Code:
PRG-CART-A 16
PRG-CART-D 8
PRG-ROM-A 15+
PRG-ROM-D 8
CHR-CART-A 14
CHR-CART-D 8
CHR-RAM-P1-A 15+
CHR-RAM-P1-D 8
CHR-RAM-P2-A 15+
CHR-RAM-P2-D 8
Which is a bare minimum of 115 pins for just addressing and data. Not to mention the chip enables and stuff. Plus even more for PRG-RAM. Which isn't required, but nice to have. (At least as an option.) So that's as far as my design went.
Another idea I had for working around the 7k limit was to only use 4k and implement a DMA engine to copy it to the real CHR-RAM during V-blank. It's probably not a viable idea though.
And now for something completely different...
With 7k of built in RAM, you could implement the various things the MMC5 uses ExRam for, all at the same time. The first thing that comes to mind is true four screen mirroring (which is a misnomer, 'cause they're not mirrored at that point), rather than the three the '5 has. And extended attributes on all of them at the same time. That alone would be pretty impressive.
Quote:
Unfortunately, the 7k you've got easy access to isn't quite enough for even the basic setup. My suggestion would be to use 6k of it as two 3k pages. That leaves 1k for other stuff (a third name table, extended attributes, etc.) That would give 192 double buffer tiles. The other 64 could be used for something fixed, like alphanumerics. Not too shabby if you ask me.
When I started the project the only XO2 cplds available were the one I have now. But since the bigger ones have become available I've been temped to officially step up to a bigger one. I didn't have much legitimate reason but when you put it all like this it's more convincing, assuming people want me to produce these. The next larger chip has 8KB of EBR (true dual port) but the one bigger than that is the SAME cost and gives 10KB of EBR. In production quantities we're only talking $3 or less. For a dev cart it seems justifiable. On the flip side if one was ever to want to produce a game with the mach XO2 you could down scale to the smaller $5-6 devices if not using the extra features. Interestingly enough 8K of dual ported SRAM is about the same cost of larger cplds anyways. So if one wanted dual ported SRAM the mach xo2 really looks like the best option (not considering FPGAs)
Quote:
Of course the real limitation (which I ought to know, but don't) is how fast the NES can update these tiles. How many bytes can the NES move in one frame. Assuming you're just grabbing the tiles from PRG-ROM and they're not dynamically composed (like one needs to do variable width fonts, vector graphics, or bitmap emulation), then all you need to do is move the bits. There's not much point in supporting double buffers larger than the CPU can fill in a frame anyway. (Unless you wanted to cut the frame rate to 30 FPS.)
So I feel like I'm starting to dive too deep into what may be possible with this, I'm sure some people will say if you want to do all this go to a different console. But Nintendo went from NROM to MMC5 didn't they? I'll share the thought and you can do with it what you will. Depending on what you wanted to do exactly there are several different ways you could greatly increase the number of bits that got banged around. It really all depends on what you were trying to do, but if you were just moving bits from the PRG-ROM you could provide specific instructions to some logic in the CPLD running at HIGH speed 50-100Mhz. Then have it remove the PRG-ROM from the NES with the buffers (not possible at the moment but would be by re-appropriating one CPLD pin). Then while the CPU sat idle for a couple cycles several KB of data could be moved around. And even more complex yet if you wanted some processing done you could do all kinds of stuff with the AVR.
But enough of all that non-sense...
Quote:
Which is a bare minimum of 115 pins for just addressing and data. Not to mention the chip enables and stuff. Plus even more for PRG-RAM. Which isn't required, but nice to have. (At least as an option.) So that's as far as my design went.
I think you're a little off there. many of those assignments can and should be doubled up. Why are PRG CART and ROM on different pins? For the higher non address able pins sure, but not A0-13 and the data bus. Same argument with CHR side. And why would each page of CHR RAM have is own full set of address and data lines??? Unless I'm missing something you only need to toggle ONE upper address bit to swap the pages. If you wanted a cart with CHR-RxM, PRG-ROM, WRAM and separate dual ported SRAM mapped to fixed locations on both busses most of those memories would be tied together. So the PLD would only need to have IO for the upper address lines, control signals, and PRG-data bus for controlling bank switching. That could be done with a cheap little ~40 pin CPLD. Something comparable to the MMC3 really. Now my cart has other things going on and really does need to the full CHR and PRG Address and data busses since the dual ported SRAM is inside it, but still you could do it with quite a bit less than 115 IO, I'm doing it with 108 but could do it with 80-90 assuming you didn't have a mcu to interface with like I do.
But yes like we're all saying there is still a LOT that can be done with what I've already set up and that 7KB available.
Karatorian wrote:
The idea was inspired by some GBA code I wrote that faked a bitmap mode by filling the screen with a sequential tile pattern and using a custom tile for each one.
And I did the same thing for the menu system in the last versions of
Lockjaw. I'd bet some GBA programs did the same so that they could mix bitmapped text with tiled game objects or get a backdrop layer behind the bitmap layer, as the GBA's 8bpp and 16bpp bitmap modes support only one layer. Furthermore, the DS's 2D is mostly the same as the GBA, and a 4bpp surface takes up far less VRAM than an 8bpp or 16bpp bitmap.
Quote:
If you wanted the sprites double buffered too, you'd need 16k, all of it dual ported. (Or at least psudo dual: PPU read, CPU read/write.)
Even if you have a separate pair of tiles for each of 64 8x16 pixel sprites, double-buffered sprite cels would need only 2 KiB per buffer.
Quote:
There's not much point in supporting double buffers larger than the CPU can fill in a frame anyway. (Unless you wanted to cut the frame rate to 30 FPS.)
And look at how slow the frame rates were in a few Super NES games, namely Wolfenstein 3D, Jurassic Park, and Star Fox/Wing.
Quote:
Another idea I had for working around the 7k limit was to only use 4k and implement a DMA engine to copy it to the real CHR-RAM during V-blank. It's probably not a viable idea though.
That or reuse the circuitry for counting fetches and detecting end of scanline to implement what kevtris has called a "stuffer": queue up to sixteen writes in a FIFO, take CHR RAM off the bus, and execute them during the garbage nametable fetches at x=257, 259, 265, 267, ...
infiniteneslives wrote:
Interestingly enough 8K of dual ported SRAM is about the same cost of larger cplds anyways.
That's kinda strange. Good to know for future reference.
Quote:
So I feel like I'm starting to dive too deep into what may be possible with this, I'm sure some people will say if you want to do all this go to a different console.
"The person who says it cannot be done should not interrupt the person doing it." --Chinese Proverb
Quote:
I think you're a little off there.
More than a little off actually. Thanks for pointing out this glaring thinko. As the design never made it out of my head and onto paper (or pixels), I missed the obvious.
Quote:
Why are PRG CART and ROM on different pins? For the higher non address able pins sure, but not A0-13 and the data bus. Same argument with CHR side.
So we can have bit wisel granularity with mapping! Just kidding. Simply because the ideas in my head where abstract and when tried to make them concrete, I didn't take the time to think things through all the way.
Here's a block diagram of the version with all those pins:
Obviously, this is not ideal.
Quote:
And why would each page of CHR RAM have is own full set of address and data lines?
Um, P1 and P2 where the two ports of the dual ported SRAM. If the CPU is writing to VRAM and the PPU is reading from VRAM, then they need to be on separate addressing and data buses. Of course, as you pointed out, the CHR and PRG buses are already separate.
Quote:
If you wanted a cart with CHR-RxM, PRG-ROM, WRAM and separate dual ported SRAM mapped to fixed locations on both busses most of those memories would be tied together.
Quote:
So the PLD would only need to have IO for the upper address lines, control signals, and PRG-data bus for controlling bank switching. That could be done with a cheap little ~40 pin CPLD.
You are entirely correct. Here's a block diagram of the proper way:
tepples wrote:
Even if you have a separate pair of tiles for each of 64 8x16 pixel sprites, double-buffered sprite cels would need only 2 KiB per buffer.
Doh. I didn't think of that. So then, no combination of sprites really need the whole 4k page. I'll have to remember that.
Quote:
That or reuse the circuitry for counting fetches and detecting end of scanline to implement what kevtris has called a "stuffer": queue up to sixteen writes in a FIFO, take CHR RAM off the bus, and execute them during the garbage nametable fetches at x=257, 259, 265, 267, ...
That is an interesting idea. Sounds like it could have it's uses. Kinda like an H-Blank DMA.
Well it's been awhile since I've given an update...
I had to post this video for school:
http://www.youtube.com/watch?v=sqHCfyMRl24&feature=g-u-u it's a few months old, but shows the operations of it. The only thing that has really changed is some mapper support. As I've posted in other threads, I've gotten the MMC1 and MMC3 up and running on it. I started on the FME-7 and have everything working except the counter (the only part that's a challenge). Once I sit back down and debug it I shouldn't have much issue, I just haven't devoted the time yet.
So as far as the project goes it's pretty much done in respect to what was required for the school project. But have no fear I'm not dropping it here
As for future plans I plan to finish up the FME-7 and then RAMBO-1 because it's basically a cross between the MMC3 and FME-7. So the RAMBO-1 (or similar) I would imagine would appeal to a lot of people and I should be able to fit into my MMC3(ish) reproboards as well.
I don't have much plans for original game mappers beyond that unless there are some big requests. Maybe some VRC stuff but I don't know if I have the motiviation to figure out the sound stuff. I might know someone who can help out though. Since there is a HDL source available for the sunsoft 5B's YM2149 (AY-3-8910 with bells on) I plan to implement that once I toss a DAC on the board. I think that would satisfy most sound guys.
I do plan to develop some of the MMC5's abilities, but I don't intend to copy it EXACTLY. Mostly because there are too many unknowns and I don't see the value in probing the real deal to try and reverse engineer everything to the T. I'd rather re-design something that is just as good, or better actually. And since we came up with it everything about it everything would be KNOWN, no guessing and hoping things are proper. After all it's not meant to be a device to play every rom on the planet, it's truly a development tool.
I may be slow to make progress in the up coming months due to graduation, moving, and the new job. But I'm looking forward to more free time to work on it once things get going at the new place. I'm also taking some time to learn how to program on the NES and get a better understanding of the PPU. I've gotten through most of bunnyboy's nerdy nights in the past week and my knowledge holes are quickly becoming filled. So thanks for those bunnyboy
And with the programming knowledge I'll be able to write some ROMs to test out some of these hair brained idea's we've been discussing with the dual ported memory and all. The more I learn about the PPU the more I get excited about what will become possible with this thing. So that's some good motivation. My only concern right now is getting the synchronous SRAM to act as asynchronous, I've got a plan so hopefully it'll work out.
AFAIK only DAC is not enough. VRC6, VRC7 using wave tables. I would try to make these sound extensions with discrete IC's
Well the VRC7 is running a derivative of the YM2413 as well. With the logic available in the CPLD I won't need much more than a DAC to get the YM2413 running. Unless I'm missing something here I wouldn't expect the VRC7 to be much different. What are you suggesting I'd need for hardware? Sorry I know you have a love for the discrete IC's but I don't think there will be much of those in the design...
I'm no sound guru though (nor do I have plans to become one), so I'm probably wrong. That's why I was suggesting I'd need some collaboration to get anything beyond the standard YM2413. I know someone with the VRC6 and he may very well be interested in doing some work with this.
When you can do discrete, it's more compact and easy to convert into CPLD. I know, that VRC7 is derivate, but how far? Haven't compared them yet, but have real YM2413. And i will try to get real VRC7 as soon as one will turn up with right price(Hello to Poland)
I think you need a quite large CPLD to put the YM in it, people put YMs in FPGAs and they use up fair chunk of them.
infiniteneslives wrote:
I plan to implement [Sunsoft 5B audio] once I toss a DAC on the board.
A
delta-sigma DAC can be as simple as an adder and a register as wide as the digital signal. Every M2 cycle, add the DAC's current value to the register. Put the adder's carry out on one pin, and that's your audio signal. There'll be a bunch of noise well over 100 kHz in the signal, but a simple low-pass filter will remove that.
The other problem is that most 72-pin consoles haven't been modified for expansion sound. Your DAC would operate by causing an IRQ whenever the audio signal level changes and providing the digital signal on $4011 reads in the range 1 through 127. Then the IRQ handler reads the value from the mapper, writes it back, and returns:
Code:
irq:
dec $4011
rti
Yeah, I know, 19 CPU cycle overhead to process, but it could be worth it for slower-paced parts of a game.
If it's for NES, why not to make sound output from cart?
RCA or 3.5mm jack will do the job
How would the audio from the RCA or 3.5mm jack be mixed with the audio from the NES?
TmEE wrote:
I think you need a quite large CPLD to put the YM in it, people put YMs in FPGAs and they use up fair chunk of them.
Yeah I know it supposedly sucks up around 6000 logic elements on a Xilinx FPGA. Sounds like a problem when the largest Mach XO2 has 7000 elements right? WRONG!
I compiled it on the Mach XO2 and it only took up 200-250 LUTs depending on the implementation. The reason is these babies have built in 'User Flash Memory' which is great for things like the tables in the 8910/2149. So where the Xilinx FPGA sucks up a TON of logic just for ROM, the Mach XO2 puts that data in Flash leaving plenty of room to spare for actual logic use. I've got it compiled on my current CPLD and the 2149 takes up about (EDIT 30-40%) of available logic. Which is pretty awesome when you consider it's only a $5-8 CPLD
There is still plenty of room for a MMC5 with extras on my current device. But I kind of already decided to step up to a larger device since it doesn't cost much more. So on that device It would only use about 10% leaving room for MMC5ish capabilities along with other boards on the same configuration.
Tepples: I like your delta-sigma DAC idea especially since it wouldn't take up much for CPLD pins and only add discrete components. I've had plans to run your DEC $4011 from when I was designing it as well. My thought was to have circuitry to support both the DEC $4011 option and the EXP2/6 method for those modding the NES/exp jumper/or FC.
80sFREAK: Sorry the RCA cable hanging off it isn't artistic enough for this
cheeseburger.
That's very cool ^^
Now I'm thinking shouldn't the tables and stuff end up in SRAM cells of the FPGAs ...? You can easily use them to store LUTs with minimal extra cost...?
It's my understanding that the FPGA would be storing the tables in SRAM cells but there are only a few cells per logic element. The Xilinx FPGAs are very homogenous as I understand it, they don't have a separate area for SRAM it's distributed throughout the logic elements. So when you use that distributed SRAM (in the logic elements) your taking up the entire element for a few bits of SRAM. So the cost isn't minimal and the 64Kb of tables sucks up lots of cells in the xilinx FPGA.
I haven't researched the details of the Xilinx architecture, so take what I'm saying with a grain of salt. But if the YM2149 really does utilize 10% of the XC2S300 like the wiki says I'm not sure what else could explain why the Lattice devices have such a huge advantage.
I am more familiar with Altera FPGAs and they got nice chunks of SRAM to use in your stuff with minimal extra cost. Optimized for storing LUTs and ROM etc. At least so datasheets say ^^
The YM2149 is only around 200 PLD macrocells and the chip itself doesn't have any ROM unlike the FM chips. The cheeseburger cores online use ROM for a 3D volume LUT since you'd have to think outside the bun (R) to calculate logarithmic attenuation, average 3 channels, scale to 16-bit and normalize in hardware.
Edit: Of course Xilinx devices have block RAM, nobody in their right mind would use distributed RAM (or ROM in Altera's case) to hold 64 KiB, I don't think even the latest chips have that much.
Yeah that's what I've come to realize kyuusaku. I tried compiling it on a smaller Mach XO2 that didn't have much flash memory and when I used the larger tables the thing exploded because it filled up the flash and started using logic cells. It wanted to consume several thousand logic units that weren't available. It was okay with the smaller table or a device with more flash.
kyuusaku wrote:
you'd have to think outside the bun (R) to calculate logarithmic attenuation, average 3 channels, scale to 16-bit and normalize in hardware.
As I understand the logarithmic output of an AY-3-8910 or SN76489A, we'll need a 3-cycle sequencer, a mux, a 16-entry log to linear table, and a pair of registers.
Cycle 1: Latch previous accumulator value into DAC output, look up channel 1's volume in 16-entry table, and load it into an accumulator.
Cycle 2: Look up channel 2's volume in 16-entry table and add it to the accumulator.
Cycle 3: Look up channel 3's volume in 16-entry table and add it to the accumulator.
I can has free french fryz nao? Or what did I get wrong in the above pseudo-HDL?
Yeah you get your fries Tepples.
Should be able to run it at a fairly wide range of frequencies on the sequencer by running it off the PLL in the CPLD and find out what works best. Now I just need to study up on VHDL so I can understand what is going on exactly with this code and try to test something out. Being a verilog guy it's mostly greek at the moment...
Do you take (multi-bit) PWM output from the significant accumulator bits? Looks fine for that if the entries are unsigned and the logic runs at 1.79 MHz. Or if you have a wide DAC the adder can be left out for TDM which will sound better. If you're meant to take PCM from it... hm, not sure. Are the table entries signed and the accumulator clamped against over/underflow? Then I think fryz. A lot of people want PCM since they have an AC'97 DAC to work with.
kyuusaku wrote:
Do you take (multi-bit) PWM output from the significant accumulator bits?
That or I could run the adder all the time in delta-sigma (PDM) mode. Something like N163 that inherently processes one channel at a time could benefit just as well from this architecture, as it'd already have the sequencer.
Okay so I've got a bit of an update on this thing.
First off our "final commercial" of the thing for to finish off the school project is
here
Now I know there is at least one person who's been waiting patiently for this thing for a few years now. I want to make my plans public as for the release and my current status. My goal is to have it released before the end of the year, ideally it'll be ready by the end of summer. But mass production availability kind of depends on demand. I haven't concluded on price yet, and won't really until I consider my manufacturing options once it's complete. I expect it will be in the $100 range we'll see.
So right now I'm working on the second more final revision of the PCB. There were a few minor errors and mostly I had some things I wanted to clean up. Also I decided to step up to one of the larger members of the Mach XO2 family for the CPLD. It's only a few dollars more and has 3-4x the capability. One small improvement that has a large effect is 8 more IO. With those I can do away with the annoying solder jumpers that traded minimum bankswitching size for lower address inputs. So the final spec will be:
*All PRG and CHR addressing available. This was really required to support some of my dual ported memory goals and other trickery to come.
*Smallest PRG banks 2KB to support sound needs
*Smallest CHR banks 128bytes. Hooray animations.
Current mapper support:
* most discrete mappers (all possible)
*MMC1
*MMC3
Planned mappers before release:
*FME7
*RAMBO-1
*MMC2/4 are likely in the near future but not necessarily before release.
The main thing I need to complete before release is the ability to configure the mcu and CPLD via USB. Currently they require external programmers and the CPLD programmer happens to be $200. Luckily the CPLD is easily configurable via SPI bus which should make it fairly simple for me to configure with the mcu and will be just as easy as loading new games, just slow though. Keep in mind that's if you create or modify a mapper, not for switching between preconfigured mappers that are already on there. Also I need to get the bootloader working for the mcu.
I also want to experiment around with boosting the upload speed on this thing. NROM is done in a few seconds but something the size of SMB3 takes around 15-20 sec. One easy way I imagine is taylored for development. Basically it assumes when you make sucessive builds the entire ROM isn't modified. One easy one is you usually wouldn't modify the CHR and PRG at the same time. So basically only the modified code would be uploaded. But this would be done by the app and unseen by the user. I've got some other optimizations in the programming sequence that should allow for decent speed ups as well.
The last thing I would to implement is a test rom of sorts. Basically do a quick check of cartridge connectivity and a simple splash screen or something. The idea would be that this would be stored on the FLASH memory. So if you just booted up the cart without loading if via USB you'd have the test run by default. This could easily be removed by the user if desired especially if you wanted to use the flash for something else. But it would provide some verification that everything is working properly and any bugs found in your build wouldn't be as easily blamed on connection and such.
Once I've got all that done it's pretty much ready for the public as far as I'm concerned. I'll also have the bear of writing up all the documentation including the source files and schematics for everything. I don't want anything about this thing to be secret.
After I release it I'll still be working on improvements such as crazy mappers that will create much controversy I'm guessing. I also want to test out some sound features with the 8910 and such. At that point I'd also like to work on creating some simple tutorials on how to create and modify mappers and the mcu code and everything. Basically much more to come if I can help it.