You have to make your own data format (or use an existing one, but there isn't one specifically for sound effects), then have a player program that decodes it and outputs it to the NES sound registers. The player code would be ran every frame (60 times a second). Normally there's a second subroutine to be called for initializing a particular song/effect before playing.
Here's a reference on the registers.
http://nesdev.com/wiki/?page=NES+APU
This is pretty much how the NSF format works too. If you have memory free in the areas the NSF code uses, you can include it in a program and JSR to the play and init routines, and it works. I did that a lot myself with NT2 before I changed the player code to work with the assembler I normally use.