Tuesday, June 7, 2011

How compiler tool chain handles symbols defined globally?

What happens when the following files are compiled (Lets take GNU C Compiler 'gcc-4.4.3' for this discussion) together? Note, both the files have the same symbol 'g_Var' declared globally.

file01.c
--------
int g_Var;

int main()
{
    g_Var = 4;

    vFunction ();

    if (8 == g_Var)
    {
        printf ("Passed\n");
    }
    else
    {
        printf ("Failed\n");
    }

    return 0;
}

file02.c
--------
int g_Var;

void vFunction()
{
    g_Var = 8;
}

Yes, the compiler tool chain (that would include the compiler, assembler and the linker) would compile successfully. And on execution, the result would be "Passed".

A further read on this article is required only if you wonder "why didn't the compiler tool chain shout for multiple definition of symbol 'gVar' during linking?"

The compiler tool chain didn't shout so because the compiler tool chain considers uninitialized global symbols as common (also called communal) symbols.

What is a common symbol?

A common symbol is symbol that defines a common data area that would be shared by symbols declared/defined in different source files. And the common symbol defined by two or more source files indicates the head address of a common data area.

So the symbol 'g_Var' in files file01.c and file02.c shares the same data area.

Let's see first how the compiler handles the symbol in files file01.c and file02.c. This can be done by probing the assembly files generated by the compiler for the files file01.c and file02.c respectively.

On providing the following on the command prompt, we get the assembly files file01.s and file02.s.

$gcc file01.c file02.c -S

In file01.s, we can see the following:

.file   "file01.c"
        .comm   g_Var,4,4

In file02.s, we can see the following:

.file   "file02.c"
        .comm   g_Var,4,4

From the assembly files, we can see that the symbol 'g_Var' in both the source files are defined as common symbols. Now to see both the common symbols share the same data area, we shall probe the map file generated by linker.

On providing the following on the command prompt, we get the map file file.map.

$gcc file01.c file02.c -Wl,-Map,file.map

In file.map, we can see the following:

Allocating common symbols
Common symbol       size              file

g_Var               0x4               /tmp/cciZ1YQm.o

The above code snip from file.map shows that the symbols in file01.c and file02.c shares the same data area.

What if the symbol 'g_Var' is declared globally in files file01.c and file02.c but with of different sizes as shown below?

file01.c
--------
int g_Var;

int main()
{
    g_Var = 4;

    vFunction ();

    if (8 == g_Var)
    {
        printf ("Passed\n");
    }
    else
    {
        printf ("Failed\n");
    }

    return 0;
}

file02.c
--------
long long g_Var;

void vFunction()
{
    g_Var = 8;
}

This is same as that of the previous case. The point we should make a note is, the common data area defined for the symbol would be as that of the bigger data type.

Let us probe the assembly files and map file for better understanding of this case.

On providing the following on the command prompt, we get the assembly files file01.s and file02.s.

$gcc file01.c file02.c -S

In file01.s, we can see the following:

.file   "file01.c"
        .comm   g_Var,4,4

In file02.s, we can see the following:

.file   "file02.c"
 .comm g_Var,8,8

The common symbol 'g_Var' is defined of size '4' in file01.s while the same is defined of size '8'. Now to see both the symbols share the same data area with size '8' (bigger of the two - '4' and '8'), we shall probe the map file generated by linker.

On providing the following on the command prompt, we get the map file file.map.

$gcc file01.c file02.c -Wl,-Map,file.map

In file.map, we can see the following:

Allocating common symbols
Common symbol       size              file

g_Var               0x8               /tmp/cczBMPJt.o

The above code snip from file.map shows that the symbols in file01.c and file02.c shares the same data area. And the bigger size '8' is allocated for the same.

An exercise for you: With the above discussion, can you say what would be the result on executing the following programs?

file01.c
--------
long long g_Var;

int main()
{
    g_Var = 0xffffffffffffffffLL;

    printf ("g_var before calling vFunction = %llx\n", g_Var);

    vFunction ();

    printf ("g_var after calling vFunction = %llx\n", g_Var);

    return 0;
}

file02.c
--------
int g_Var;

void vFunction()
{
    g_Var = 0x0;
}

What if the symbol 'g_Var' is defined globally in file01.c and declared globally in file02.c as shown below?

file01.c
--------
int g_Var = 4;

int main()
{
    vFunction ();

    if (8 == g_Var)
    {
        printf ("Passed\n");
    }
    else
    {
        printf ("Failed\n");
    }

    return 0;
}

file02.c
--------
int g_Var;

void vFunction()
{
    g_Var = 8;
}

In such cases, the 'g_Var' of file01.c would be defined as a global symbol while the 'g_Var' of file02.c would be defined as a common symbol which refers to the data area defined by the global symbol 'g_Var' of file01.c.

Let us probe the assembly files and map file for better understanding of this case.

On providing the following on the command prompt, we get the assembly files file01.s and file02.s.

$gcc file01.c file02.c -S

In file01.s, we can see the following:

.file   "file01.c"
.globl g_Var
 .data
 .align 4
 .type g_Var, @object
 .size g_Var, 4
g_Var:
 .long 4
 .section .rodata

In file02.s, we can see the following:

.file   "file02.c"
 .comm g_Var,4,4

In file01.s 'g_Var' is defined as global symbol and in file02.s 'g_Var' is defined as common symbol. Now to see that 'g_Var' common symbol would refer to the data area that is defined by 'g_Var' global symbol of file_01.c, we shall probe the map file generated by linker.

On providing the following on the command prompt, we get the map file file.map.

$gcc file01.c file02.c -Wl,-Map,file.map

In file.map, we can see the following:

.data          0x000000000804a014        0x4 /tmp/ccQO9fBj.o
                0x000000000804a014                g_Var

In the map file, we can see that there is no common symbol. I.e. Symbol 'g_Var' in both files file01.c and file02.c refers to the same data area.

And finally, what if the same symbol 'g_Var' is defined in both file01.c and file02.c globally?

The compiler would define the symbol 'g_Var' as global symbol in both the files file01.s and file02.s. And the linker would shout for multiple definition of symbol 'g_Var'. Probe this case on your own as we did for other cases above.