Difference between ld and objcopy (GCC)

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Difference between ld and objcopy (GCC)
by on (#113370)
This is something that have always puzzled me. I understand what is an assembler, compiler, and executable file, and I also have a good concept of what is an executable added with debug info.

So far my understanding is the following. A "modern" assembler does not generate executable code straight, as it was the case for very simple/naive assemblers (where .org statements were your only control of where code is going to be located, and where there was never a debugger).

More modern assemblers instead produces an "object file", which is a mix of executable code but with a header, lacking references (it shows all 0x00 in hexadecimal), and debug labels at the end, so that other programs can link with this one, or even call this file directly while executing (which is the case of .dll files on windows).

Now that was for the part I understood. From now on there is things I'm not too sure about :

The role of the program called "ld" is to deal with those object files. It is possible to combine multiple object files into one, and do this again, again and again it one wants. A simple ld call on a single object file would simply produces the exact same file at output. On every (explicit or implicit) call of ld, a linkscript have to be provided, or else ld don't know where to put which section in memory when resolving the labels.

If a linkscript is not provided, some "default linkscript" whch is either hard coded into ld or located somewhere I have no idea where, will be used. In the case of computer program development, one does not have to worry about linkscripts at all, as everyting will do fine with the default one. On the other side, when doing embedded development, it is extremely unlikely that the default linkscript describes your target sytem correctly. In this case it is necessary to write a good link-script, or to use some kind of development kit which will hide it from you (it is sad but apparently this solution is the most common one, at least 90% of programmers would have no idea a link script even exists, myself I didn't know they existed until last month).

Apparently, under certain condition, ld produces a "final" executable, which is stripped of the debug labels and header. Then the output can't be linked any more. I'm not too sure when this happens or how this is done. For this reason, at least one call to ld has to be done when compiling a program. If a call to gcc is done without "-S" or "-c" a ghost call to ld will be done without the user even knowing it. The only valid reason I see as why they did this is to simplify the compilation of very simple single-file programs.

On the other hand, objcopy is almost the same as ld. The only difference is that it can only take a single object file as input, while ld can take many.

Therefore I don't see the use of objcopy at all.
Re: Difference between ld and objcopy (GCC)
by on (#113371)
On some platforms, ld combines multiple object files into a single file in the same underlying format as object files, so that (for instance) the resulting file can include debugging symbols. An emulator with debug support can load these object files directly, or one can use objcopy to translate them into a raw binary for burning onto a cartridge or use in a more basic emulator. This was true of devkitARM, the major Game Boy Advance homebrew toolchain, back when I was into gbadev: the included link scripts for arm-eabi-ld to target GBA ROM and GBA multiboot produced .elf files, and running them through arm-eabi-objcopy would produce a proper executable.
Re: Difference between ld and objcopy (GCC)
by on (#113373)
OK so in fact .elf is simply a .o that has been created by merging some .o, and that was "artificially" named otherwise, right ? And how comes I do not need objcopy when developing for my PC, but I need it when developing for the GBA, for instance ?

Is this because a ".exe" (or the linux equivalent of it) is not a raw binary file like used on embedded systems, but rather simply an object file that my operating systems "objcopies" into memory at runtime ?

EDIT : Also this would make linking a useless step for single file programs, right ? Because the assembler generates an object file that can already be executed, since all labels are resolved.

For bigger project linking is typically done a single time but it could be done 45 times in series if one wanted to do it, right ?
Re: Difference between ld and objcopy (GCC)
by on (#113379)
Bregalad wrote:
Is this because a ".exe" (or the linux equivalent of it) is not a raw binary file like used on embedded systems, but rather simply an object file that my operating systems "objcopies" into memory at runtime ?

Yes. Linux executables are ELF, and Windows executables are PE, a variant of the COFF format. Wine began as just an experimental PE loader, and it became able to run Windows applications once someone got the bright idea to reimplement Win32 APIs.

Quote:
EDIT : Also this would make linking a useless step for single file programs, right ? Because the assembler generates an object file that can already be executed, since all labels are resolved.

Real world programs on platforms larger than the NES aren't "single file programs", as they use system libraries such as libc. When you link such a program, ld looks through libc for functions that your program calls but does not define.

Quote:
For bigger project linking is typically done a single time but it could be done 45 times in series if one wanted to do it, right ?

Provided you're making 45 different executables. Linking is what allows subroutines in one file to see subroutines in others.
Re: Difference between ld and objcopy (GCC)
by on (#113388)
GBA emulators will load ELF files and internally convert them to BIN as it loads them to make things nicer for developers. But you can't flash that onto a cartridge.
Re: Difference between ld and objcopy (GCC)
by on (#113484)
I'd like to do a hack of a GBA game. It is possible to do a reverse objcopy of the original ROM so that it becomes an object file, and link it (patch it) with the object file of my romhack, which will be carefully mapped so that the addresses overwritten are the ones I'd like to ? That would be amazing
Re: Difference between ld and objcopy (GCC)
by on (#113510)
I think for that purpose, it's better to use macros and incbins to include parts of the original binary.
Re: Difference between ld and objcopy (GCC)
by on (#113515)
Hey, it's a good idea you have there ! I would never have thought of that. So basically I'd compile my "new" version which simply "includes" the old version. I'd have to see how this works.
Re: Difference between ld and objcopy (GCC)
by on (#113518)
Example of a simple GBA hack attached.
Provide your own copy of "Advance Wars.gba".

warning: this is probably broken for handling absolute addresses, will fix later...
Re: Difference between ld and objcopy (GCC)
by on (#113526)
I'm not sure I understand what you are doing here.
Why use a macro ? And what does that \address command do ? Does it acually work ?

And besides is is correct to use baseaddress = 0x8000000 like you did ? Shouldn't it be something like .org = 0x8000000 or . = 0x8000000 ?

And what should I write in my linkscript ?
Re: Difference between ld and objcopy (GCC)
by on (#113532)
Here's my second try: Adding new code works, it gets the correct addresses after being generated, but you can't add literal pools. (boo!) Whenever I try to add a literal pool, it is no longer able to use the "patchat" macro without throwing an error.

What am I doing here?
Using a macro so I can just say "patchat 0x08xxxxxx" and it will automatically grab bytes from the current address to the target address, then the current address becomes what I just put in.
\address is how you use a variable in a macro. You begin it with a backslash, and it will use the provided value of address instead of the literal text 'address'.

Baseaddress is just a regular symbol, I'm setting it to a number. I'm using it so I can use real addresses in "patchat" instead of file addresses. If you prefer using file addresses instead of memory addressees, remove that stuff.
Never use .org in GNU assembler, unless you want to see it generate 128MB sized output files.

I'm trying to do this without any linkscripts, I think I got it working okay...

I also tried to do this with the standard GBA makefiles, but I couldn't figure out how to exclude crt0.
Re: Difference between ld and objcopy (GCC)
by on (#113820)
GENIUS it works !

This will ease the development of the v3.0 of my FF5 Advance Sound Restoration a whole lot. It will also be a breeze to patch the game for the (J) (U) and (E) versions instead of hard-wiring it to the (E) version. @Dwedit : You'll be EVENTUALLY be able to run it on your 32kb flashcard :wink:
Re: Difference between ld and objcopy (GCC)
by on (#113860)
OK it doesn't work so well.

The final patchat macro doesn't work if there is any other ".incbin" in the patch sequence. I don't know what's happening, but I need some ".incbin"s for my samples.
Re: Difference between ld and objcopy (GCC)
by on (#113862)
Doesn't devkitARM or libgba come with bin2s? (If not, download GBFS and look in the tools folder.) That way, you can narrow the problem down to the implementation of .incbin or to a large data size.
Re: Difference between ld and objcopy (GCC)
by on (#113864)
Ah ok I figured out the actual problem had nothing to do with .incbin.

The actual problem was that I used .align 4 after them, and after that the assembler is not able to keep track of the "." variable any longer, which generated an error when I used it.

I guess I'll have to get rid of those .align statements and pad manually where I have to. Note that this suck because technically it's very possible to keep track of the address after an .align statement.
Re: Difference between ld and objcopy (GCC)
by on (#113865)
Or you can pack your added assets in an archive and be certain that each object in the archive is aligned to a 16-byte boundary. (Again, see GBFS.)
Re: Difference between ld and objcopy (GCC)
by on (#114633)
I am afraid that for my hack I will have to use .align statements and I can't do otherwise, so this will cause some problems.

I will have to either find a way to link objects over the ROM, or use dwedit's solution for one part of the hack, and a more conventional solution for another part of it.