I'm not sure if this is the right place to ask this question, but I think the rules are a bit weird. I've programmed C for years, and I am STILL a bit confused. Many books and references are a bit too quick / not enough detailed to my taste to approach such a confusing problem.
So I'll write my understanding of it here with my words, and someone please tell me if my understanding is wrong.
It's a well known fact that the use of an array variable in an expression always yield a pointer to the array itself. This is not hard to understand.
What is however complicated is to know when you declare an array and when you declare a pointer.
As far as I understand you can declare an array using a pointer notation, and you can also declare a pointer using array notation.
RULES FOR UNINITIALIZED POINTERS AND ARRAY VARIABLES
Declaration is what decides if the variable is a pointer or an array (obviously). You cannot declare array without knowing their size :
Size of c is 12 times sizeof int (it consist of 12 consecutive ints in memory), but size of b is just the size of a pointer (consist of a single word in memory).
On a side note, an array of 1 is NOT like a pointer, but more like an element itself.
Here, both declarations are equivalent (they will reserve the space of an int), the only difference is that when you use d in an expression you get an address, while you get directly the variable if you use e.
RULES FOR INITIALIZED ARRAY AND POINTER VARIABLES
If the '=' sign is used at the same time as a variable is declared, this variable is initialised.
Just as the case of uninitialized variables, the notation will differentiate pointers and arrays.
However there is an exeption for char*, which can be initialized with an address of a string explicitly, as in :
This will actually declare an array of chars somewhere in the code section with the data "hello\NULL", and make c point to it as it's declared. d, however, is a full known array that will be created at runtime. For this reason, c is faster and more efficient than d.
The worst is that for the user, it makes absolutely no difference, while what is done internally is very different.
Sizeof c is a single pointer, while sizeof d is 6 (the # of letters + 1 NULL)
Such a notation is impossible for non-string arrays.
Will unfortunately not work.
The size inside the array is only here for checking : if you try to initialize an array with a wrong size, the compiler will issue a warning.
All three declarations below are absolutely equivalent, and will all declare an array of size 3 elements :
RULES FOR ARRAY AND POINTERS WITHIN STRUCTS AND UNIONS
Within structures and unions, the same rules for uninitialized variables applies - i.e. arrays needs to have explicit size or else there's a problem.
RULES FOR ARRAY AND POINTER PASSED TO A FUNCTION
So far the rules were consitant, but now there is a total shift. No matter which of the 3 notations is used (sized array, non-sized array or pointer), it's always a pointer which is passed to the function.
Even if the array has a size indicated, it's just for documenting, and it's completely ignored by the compiler,except for the SIZEOF operator. Unlike with the initialize operator, there is no enforcing that the size is correct, ever.
When it comes to return types, array notation for pointers is not allowed. Only pointer notation is possible. The same goes for typedefs, you can't define array types.
The following function takes 3 pointers to int (we don't even know if they points to a single element, to an array, or if they point to invalid data), and return a pointer to int (again we can make no assumption on what it point to) :
There is no difference internally on how the a, b and c arguments works, it's only different syntax. The only way to pass an array "by value", or to return an array by value, is to pack it within a struct or an union. In this case, the entire content of the array will effectively be copied to/from the function. Not that it's a good idea to do this anyway.
RULES WITHIN EXPRESSIONS
Because arrays and pointers are used the same way in an expression, something like
"a" could be a pointer to an array or an array directly, you don't know (without looking at the declaration of a), and it's the compiler who does the difference.
In all case a[i] will be either the ith element of the a array, or the ith element after the location pointer by a.
Now that's all rules for single dimensional arrays, as I understand them.
Now let's review quickly how this extends to multiple dimentions :
RULES FOR DECLARING TWO LEVEL OF INDIRECTION
Declaration of multiple levels of arrays/pointers is simple enough.
This is simple enough, but what is confusing is that all them can be deferenced once, or twice, and this will yield different code. Again only the compiler can make a difference.
If the variable is dereferenced two times, for the second levels the rules are the same as for a normal pointer/array.
RULES WHEN PASSING TWO LEVEL OF INDIRECTION TO FUNCTION
Again, it doesn't matter which of the four notation above is used, a pointer is passed to the function.
I told before the only change was for SIZEOF. Well this time, there IS a major difference in how the pointer is dereferenced, SIZEOF is used internally when dereferencing (as shown in the above examples).
As we can say a[2] and b[2] will yield 2 elements after the base adress, while c[2] and d[2] will yield in this case 20 elements after it.
The notation used for a and b can be used interchangibly when passing pointers to pointers, or pointers to or array of pointers, and notations c and d when passing pointers to arrays directly (of known size).
In this example :
a, b, and c works exactly the same, exept that sizeof(c) is 30 integer elements while sizeof b is of 3 pointers elements, and sizeof a is unknown.
All sizes exept the last must be specified, so that when you apply a dereference, the sizeof is known.
In this case it might also be preferable to only explicit a single level of indirection, and make the second level with math explicitly in the code, so that it shows you know what you're doing.
So I'll write my understanding of it here with my words, and someone please tell me if my understanding is wrong.
It's a well known fact that the use of an array variable in an expression always yield a pointer to the array itself. This is not hard to understand.
What is however complicated is to know when you declare an array and when you declare a pointer.
As far as I understand you can declare an array using a pointer notation, and you can also declare a pointer using array notation.
RULES FOR UNINITIALIZED POINTERS AND ARRAY VARIABLES
Declaration is what decides if the variable is a pointer or an array (obviously). You cannot declare array without knowing their size :
Code:
int a[]; // INVALID
int *b;
int c[12];
int *b;
int c[12];
Size of c is 12 times sizeof int (it consist of 12 consecutive ints in memory), but size of b is just the size of a pointer (consist of a single word in memory).
On a side note, an array of 1 is NOT like a pointer, but more like an element itself.
Code:
int d[1];
int e;
int e;
Here, both declarations are equivalent (they will reserve the space of an int), the only difference is that when you use d in an expression you get an address, while you get directly the variable if you use e.
RULES FOR INITIALIZED ARRAY AND POINTER VARIABLES
If the '=' sign is used at the same time as a variable is declared, this variable is initialised.
Just as the case of uninitialized variables, the notation will differentiate pointers and arrays.
However there is an exeption for char*, which can be initialized with an address of a string explicitly, as in :
Code:
char *c = "hello";
char d[] = "hello";
char d[] = "hello";
This will actually declare an array of chars somewhere in the code section with the data "hello\NULL", and make c point to it as it's declared. d, however, is a full known array that will be created at runtime. For this reason, c is faster and more efficient than d.
The worst is that for the user, it makes absolutely no difference, while what is done internally is very different.
Sizeof c is a single pointer, while sizeof d is 6 (the # of letters + 1 NULL)
Such a notation is impossible for non-string arrays.
Code:
int *a = {1, 2, 3};
Will unfortunately not work.
The size inside the array is only here for checking : if you try to initialize an array with a wrong size, the compiler will issue a warning.
All three declarations below are absolutely equivalent, and will all declare an array of size 3 elements :
Code:
int a[3] = {1, 2, 3};
int b[] = {1, 2, 3};
int b[] = {1, 2, 3};
RULES FOR ARRAY AND POINTERS WITHIN STRUCTS AND UNIONS
Within structures and unions, the same rules for uninitialized variables applies - i.e. arrays needs to have explicit size or else there's a problem.
RULES FOR ARRAY AND POINTER PASSED TO A FUNCTION
So far the rules were consitant, but now there is a total shift. No matter which of the 3 notations is used (sized array, non-sized array or pointer), it's always a pointer which is passed to the function.
Even if the array has a size indicated, it's just for documenting, and it's completely ignored by the compiler,
When it comes to return types, array notation for pointers is not allowed. Only pointer notation is possible. The same goes for typedefs, you can't define array types.
The following function takes 3 pointers to int (we don't even know if they points to a single element, to an array, or if they point to invalid data), and return a pointer to int (again we can make no assumption on what it point to) :
Code:
int *f(int *a, int b[], int c[3]);
There is no difference internally on how the a, b and c arguments works, it's only different syntax. The only way to pass an array "by value", or to return an array by value, is to pack it within a struct or an union. In this case, the entire content of the array will effectively be copied to/from the function. Not that it's a good idea to do this anyway.
RULES WITHIN EXPRESSIONS
Because arrays and pointers are used the same way in an expression, something like
Code:
a[i];
"a" could be a pointer to an array or an array directly, you don't know (without looking at the declaration of a), and it's the compiler who does the difference.
In all case a[i] will be either the ith element of the a array, or the ith element after the location pointer by a.
Now that's all rules for single dimensional arrays, as I understand them.
Now let's review quickly how this extends to multiple dimentions :
RULES FOR DECLARING TWO LEVEL OF INDIRECTION
Declaration of multiple levels of arrays/pointers is simple enough.
Code:
int **a; // Pointer to pointer
int *b[3]; // Array of pointers
int (*c)[10]; // Pointer to array (of known size)
int d[10][3]; // Array of array
int *b[3]; // Array of pointers
int (*c)[10]; // Pointer to array (of known size)
int d[10][3]; // Array of array
This is simple enough, but what is confusing is that all them can be deferenced once, or twice, and this will yield different code. Again only the compiler can make a difference.
Code:
a[2] // This is a pointer to int got by dereferencing a (2 elements after the one pointer by a)
b[2] // This is a pointer to int within b
c[2] // This is a pointer to int got by dereferencing c, but in this case 20 elements after the one pointer by c
d[2] // This is an array of int, again it's address starts 20 elements after the one pointed by d
b[2] // This is a pointer to int within b
c[2] // This is a pointer to int got by dereferencing c, but in this case 20 elements after the one pointer by c
d[2] // This is an array of int, again it's address starts 20 elements after the one pointed by d
If the variable is dereferenced two times, for the second levels the rules are the same as for a normal pointer/array.
RULES WHEN PASSING TWO LEVEL OF INDIRECTION TO FUNCTION
Again, it doesn't matter which of the four notation above is used, a pointer is passed to the function.
I told before the only change was for SIZEOF. Well this time, there IS a major difference in how the pointer is dereferenced, SIZEOF is used internally when dereferencing (as shown in the above examples).
As we can say a[2] and b[2] will yield 2 elements after the base adress, while c[2] and d[2] will yield in this case 20 elements after it.
The notation used for a and b can be used interchangibly when passing pointers to pointers, or pointers to or array of pointers, and notations c and d when passing pointers to arrays directly (of known size).
In this example :
Code:
void f(int[3][] a, (*int)[3] b, int[3][10] c);
a, b, and c works exactly the same, exept that sizeof(c) is 30 integer elements while sizeof b is of 3 pointers elements, and sizeof a is unknown.
All sizes exept the last must be specified, so that when you apply a dereference, the sizeof is known.
In this case it might also be preferable to only explicit a single level of indirection, and make the second level with math explicitly in the code, so that it shows you know what you're doing.