FCI-Helwan blog

Just another FCI-H weblog

Advanced C++ part 9 : Templates Part 1

I’ve decided to jump off a little and talk about templates. So we can dive into the world of STL and the wonders it can do.

As a kick-start, you can think of templates as a search-and-replace facility. No, it is not like preprocessor macros. Preprocessor macros does that at a syntactical level. But templates does it on a semantical level.

What’s the difference between a class and an object of that class ? The class defines how all the set of objects act, and what data they have. A single class might have numerous objects. Similarly is the difference between class template and template class !

A class template to a template class is as a class to an object. A class template defines how the template class look like. A class template might have several template classes.

As an example, consider the class Stack. It’s a simple stack with pop() and push() operations, an int array, and a top-of-stack counter members. The class is not concerned what is in the int array or what’s in the top-of-stack pointer, it merely states that they exist and how they are modified and handled.

Now consider the class template Stack<T>. “It’s a simple stack with pop() and push() operations, a T array, and a top-of-stack counter members”. “The class template is not concerned what is the actual type of the array”. It just states that we need an array of it, and that array will be treated that way.

The template class Stack<int> is typically like the normal stack we mentioned earlier. Note that Stack<int> and Stack<double> is totally different types. There code is written in separate places in the exe file. There is no relation between them whatsoever. Note that’s this is the a very strong difference between templates and generics in Java or C#. Generics just uses the same code that handles Object and casts the types back and forth to the type specified in the generic definition. Since this is not the case in templates, templates can also take variable parameters.
For example, you might define a Stack<int,5> that -for example- allows only 5 items in the stack. Templates also have default parameters, if you might use Stack<> it might default to int.

We’ve been talking about using templates, we still didn’t talk about defining them. That’s the next article isA.

(P.S. Don’t worry at all if you didn’t get anything from this article, it will get clearer as you read and see the examples in the next articles isA. It’s totally normal)

Advertisements

August 25, 2007 Posted by | AdvancedC++ | 1 Comment

Advanced C++ part 6 : Advanced Memory Management part 3 : overriding new and delete, and why you would need that

You can override new and delete for a certain class. So your operators will be called only if that class (or its derived classes) is being dynamically allocated. You can also override the global new and delete so your operators will be called on any dynamic allocation that occurs in your program.

Question:

Why would I need to override new and delete?

Answer:
For global or class-based:
In case you want to implement your garbage collector, or whatever application that needs to keep track of memory allocations (profiler, reference counter, etc.)

For class based:
For some classes you might need to allocate them on a certain memory location (like video memory or a shared memory area, I don’t know if you might need that on a memory-mapped file or not.)

Question:

How to override it then?

Answer:

This is just example code to show you how you can do it, not how you should do it.

Global

void* operator new (size_t size)
{
void* allocated_mem = malloc(size);
if (!allocated_mem)
throw std::bad_alloc(); // if failed to allocate, throw bad_alloc exception
return allocated_mem;
}
void operator delete (void *pointer)
{
free(pointer);
}

Class-based

Definition

static void* operator new (size_t size);
static void operator delete (void* p);

Implementation

void* Class1::operator new (size_t size)
{
void* allocated_mem = malloc(size);
return allocated_mem;
} // At this point, Class1 default constructor get called (automatically)


// At this point Class1 destructor gets called (automatically)
void Class1::operator delete (void* pointer)
{
free(p);
} // At this point Class1 destructor gets called (automatically)


Next article isA will take about placement new.

Further reading:
http://www.informit.com/guides/content.asp?g=cplusplus&seqNum=40&rl=1

February 7, 2007 Posted by | AdvancedC++ | 2 Comments

Advanced C++ part 6 : Advanced Memory Management part 2 : malloc and new, what’s the difference?

In the days of C, dynamic memory allocation was performed through the malloc() function. You just specify the number of bytes, and it returns a pointer to the allocated memory. If the memory allocation fails it will return NULL. malloc might not initialize the allocated memory.

As C++ introduced objects, there were different requirements for dynamic allocation. In old C, you usually allocate space for homogeneous arrays of the same primitive types. Even for arrays of structs, you would usually initialize them with default values, not values dependant a result of some sort of computations. In C++, constructors added the encapsulation of the initialization logic. malloc won’t call any constructors, it just allocates plain bytes.

Operator new, has more information that available to malloc. It knows the type. Which enables that operator to know the size of a single objects, thus releasing you from the burden of calculating how many bytes per object you need to allocate; you just use the number of objects you need. It also makes new able to call the constructor of that type on the newly allocated memory.

In malloc you’d have to check every pointer returned by it against NULL to know if the allocation have succeeded. But in new, it does throw an exception (you can make it return NULL instead)[1]. Example:

try
{
MyClass* m = new MyClass();
}
catch(bad_alloc x)
{
// report an error
}

or you can simply do the following to prevent exception being thrown:

MyClass* m = new (std::nothrow) MyClass();
if(m == NULL)
// report an error

[1]: Microsoft implementation of new doesn’t throw an exception as opposed to the standard. It does return NULL.

There is another thing you can do to handle failed allocations. There is a function that new calls when it fails to allocate. It’s type [2] is new_handler; which is defined as a function that takes no parameters and returns void. The default new_handler is the one that throws the bad_alloc exception. (Thus you can modify that behaviour). To set your new_handler as the one used, you can set_new_hanlder( your_new_handler ). This function returns the old new_handler. Your new_handler is expected to do one of 3 things:

  1. Call abort() or exit()
  2. Throw bad_alloc or a type inherited from it
  3. Make more memory available for allocation by some means

[2]: function pointer type

Operator new and delete automatically call the constructor and the destructor, respectively. We will talk in a forthcoming article about some variants where you must call the destructor yourself or where you can prevent new from automatically calling the constructor so you can call it manually as an optimization in case of very large arrays.

Note: we will take later about details of exceptions and namespaces, just take it now “as is” or read about it in some reference.

Next article in this sub-series isA will talk about overriding new and delete and the reasons why you might need that practically.

References:
http://h30097.www3.hp.com/cplus/new_handler_3c__std.htm

January 26, 2007 Posted by | AdvancedC++ | Leave a comment

Advanced C++ part 6 : Advanced Memory Management part 1

This article was planned to talk only about “placement new” feature. But there is other interesting features in memory management in the C++ standard that are worth mentioning. So we will expand this article to a sub-series of articles talking about those features.

This (part 1) article will merely state the topics that will be discussed.

1. malloc and new, what’s the difference?
2. Overriding new and delete (for a certain class or globally).
3. Placement new and performance tweaks.
4. Allocators.

Note: I’ve found some source on the net that contains some useful information: C++ Reference Guide; be sure to read the table of contents and browse through it (and you might find some information about linking too).

January 20, 2007 Posted by | AdvancedC++ | 1 Comment

Advanced C++ part 5 : Calling Convention and Intermixing C and Assembly

[Note: as calling conventions are not standardized, most details are compiler/vendor/architecture dependent, so details might differ from what stated here and what is actually there. The essence of this article is to give an overview of the concept as far as the author knows it, so don’t take the specific details mentioned here as a reference]

Parameters gets passed to function through the call stack. The call stack is a stack for each process (or thread) in the system. There is exactly one stack for each thread of execution. The stack is used to store the return address too, so a function knows where to return. Local variables are also put on the stack. Calling convention are responsible for the way the parameters are sent between functions.

If we have function W ( int x, int y ), will you push x first or y when calling W? well, it doesn’t matter as long as the called function pops them in the correct order. Calling conventions sets how a function should be called. They don’t only define the parameters ordering, they also define which function pops them from the stack; the caller or the callee. They also define how the return value is returned (i.e. via which register etc.). Name decoration differs from one calling convention to another (the same function decorated-name might differ using another calling convention in the same version of the same compiler). The C++ standard doesn’t specify certain calling conventions.

There are typically 4 calling conventions used commonly (on 32bit x86 architecture; calling conventions are very dependent on the hardware architecture). Other calling conventions on other platforms or obsolete calling conventions on x86 architecture are beyond our scope (calling conventions on 64bit Intel architecture for example provide a greater performance using register windows and register stack, but we are not going to discuss that). The 4 calling conventions are cdecl, stdcall, thiscall, and fastcall. I am not going to mention every single detail about each of them, just the main points, because the more detailed the points is the more they can differ from one compiler to another [see this for details].

cdecl

This is the default calling convention. Parameters gets pushed from right to left, so the first parameter is the nearest one to the top of stack (the callee sees them in natural ordering : first argument = stack[0], the second = stack[1] etc.). The caller pops the parameters; this is useful when using variable length parameters where the callee don’t know how many parameters are there.

stdcall

This is the one used by Win32 API. Remember “LRESULT WINAPI WndProc( HWND hWnd…”? The WINAPI macro is defined to equal “__stdcall” which is the MSVC compiler directive to set this function to stdcall calling convention. This calling convention is almost the same as cdecl except that the callee pops the arguments from the stack, thus there is no variable length parameters. The advantage is that it is a little faster than the caller popping it, becaue it uses a combined instruction that adds a certain value to the top-of-stack pointer before returning. If the caller pops it will take at least one more instruction.

thiscall

The calling convention used for the non-static member functions. There is a hidden parameter; the this pointer. In MSVC, this calling convention is exactly as stdcall except that the this pointer gets passed in some register (namely ECX).

fastcall

This calling convention is used rarely. It’s main advantage is that it sends one or two of the parameters through registers not through the stack (the stack is in memory), thus making calls faster.

Brainteaser

Think of a calling convention that might be faster than fastcall :). You can implement whatever calling convention you like if you are using Assembly; you have direct access to the stack. I’ve implemented a calling convention for my recursive method for solving the 8 queens problem, it proved 0.7 seconds faster that MSVS fast call for number of queens = 15. That was when I was in first grade. It is simple; don’t think too hard.

Intermixing C and Assembly code

In x86 Assembly there is no functions as we know it.You only have a stack and the memory address of the first instruction in that function. There is nothing called parameter or a return value. To call a function you just tell the processor to got execute that address. The only function support is that the return address is pushed in the stack so that you can reutrn using the “ret” instruction[1]. A function name is typically the same as a variable name, there is no syntax difference in it; they both are labels. To send parameters, you have to handle that yourself, using a predefined or a custom calling convention. To call Assemly code from your C code, you just implement a calling convention understood by a C compiler, in your function that is to be called from C source[2]. You are free of course in other functions to implement whichever calling convention you want as long as they are not called from another language.

To call a C method from assembly, you have to know the calling convention that this method implements (including name decoration)[also see 2]. Of course the the C file produces an object file, and your Assembly file produces another one. You will have to use the linker to link both.

[1]: That’s the normal common Assembly. MASM (Microsoft Assembler or Macro Assembler) implements some higher-level constructs by giving you special syntax to call functions, which he then translates into the appropriate calling convention. For reference, see this link and search the page for “invoke”. Off topic speaking: MASM also implements special “if” conditions and “for” loop syntax -similar to higher-level languages- that have nothing to do with Assembly. It even supports Object Oriented Programming in Assembly!

[2]: You also have to declare that “label” as global as they are not so by default (at least in NASM (Netwide Assembler.)) Function names are global be default in C and C++. A global symbol is to be written in the object file header along with it’s address inside the file. A local symbol is not written at all, so the linker can only see global symbols.

Conclusion

Next article isA will be about something different than all this low-level stuff. We will start talking in the higher-level language features. Posts will be shorter and less complicated like this, I hope so. Next article will be about placement new. Google it if you wish. I am trying to follow the original order in which I specified the articles. Anyway don’t forget this “Introduction” to the compilation process, because isA we will be referring to it a lot, and you will need it some article.

Further reading (finally a lot of links 😀 ):
C++ Reference Guide > Understanding Calling Conventions
The Old New Thing : The history of calling conventions, part 3
Calling conventions on the x86 platform
MSDN Calling Example: function prototype and call
CodeProject: Calling Conventions Demystefied

January 15, 2007 Posted by | AdvancedC++ | 2 Comments

Advanced C++ part 4 : Name Decoration and Intermixing C and C++ Code

If a file has two functions, add(int,int) and add(float,float), how would the linker distinguish them if some other file wanted the “add” function ?
The symbol name generated in this case should not be equal to “add”, it might be __add_i_i or __add_f_f. It won’t equal “add” even if there is only one add function defined, in cases there might be another one defined in another file.
The compiler is the one who choses the symbol name, the linker is the one who uses it blindly. That’s because the linker doesn’t have the knowledge about the internal typing system of a language, or whether a language supports overloading or not (some linkers understands C++, but that’s out of our scope). This enables the linking of several languages into the same executable. For example linking pascal with C or C with C++.
The operation of converting the function/variable name to a symbol is knows is name decoration or name mangling. There is no standard defining this operation, each compiler can do it in the way he likes; implementation specific. Even different versions of the same compiler can do it differently. So it might be hard to link with old object files or files generated from other compilers (That’s not a bug it’s a feature). Shared libraries compiled with one version of GNU GCC might not link with programs compiled with another version. Name mangling in C in simpler, it just adds add and underscore ‘_’ before the function name (or in some cases puts the function name as it is). Surprisingly, C does NOT support function overloading. At least not in the standard, to do it you have to modify the compiler to apply name decoration yourself.
Name mangling in C++ specific features, like namespaces and classes, can look more cryptic. For example:


void x(float z){}
void x(int i){}
int main ()
{
x(8.0f);
x(4);
}

would give the symbols (on GCC 4.1.2):


main
_Z1xf
_Z1xi

but in this example:


namespace MyNameSpace
{
void x(float z){}
class MyClass { public:
static void x(int i){}
};
}
int main ()
{
MyNameSpace::x(8.0f);
MyNameSpace::MyClass::x(4);
}

would give the symbols:


main
_ZN11MyNameSpace1xEf
_ZN11MyNameSpace7MyClass1xEi

If some other file wants to call a function from these, he has to put the exact mangled name like above in his object file so the linker can find it.

If you have some old libraries written in C and you are using C++ and want to call them you have to declare them like that:

c++ source file:


extern "C" void x(int i );

or


extern "C" {
void x(int i );
int y (float z):
int f;
}

Note that you can overload these function if you didn’t mark your overloaded version as external C. The extern “C” directive marks that these functions are to mangled in C way, not the C++ way. So the linker can find the right function.
That’s how you could call a C function form C++.
Calling a C++ function from C can be done in 2 ways. Marking your C++ function as external C.

For example:


extern "C" void print() {
cin >> x;
cout << "this is C++ code"; }

This method can be applied only on global function, i.e. functions not members of any namespace or class.
The second method may allow you to be able to call functions inside a namespace or a static member function, but it is a manual way.
We have the function


_ZN11MyNameSpace1xEf

You will name it the same way in the C source, taking into count how will the C compiler mangle it further. Perhaps this method can make you call non-static member functions, but it requires you to send the this pointer yourself.

Next post will be about calling convention and intermixing C and assembly code.

Further reading:
Google “name decoration”

January 14, 2007 Posted by | AdvancedC++ | 6 Comments

Advanced C++ part 3: Linking

This post isA will make the previous post clearer.
When the C++ compiler processes a source file, it generates an object file (.o or .obj file) for each source file it have processed.
The object file typically contains a series of symbols along with their implementation. For now, a symbol is a variable name, or a function name (including member functions, we will discuss calling convention and name decoration later). The symbol can be defined in that file, or marked as external symbol.
The linker operates on a group of files. Knowing what each file have defined and what each files needs from other files. The linker’s mission is to put all the object files into one executable file (.bin or .exe for standalone file, and .so and .dll for dynamic link libraries, we won’t talk about dynamic linking here); linking external symbols from one file to their implementation in other files, hence the name.
An example is better than 1000 words so:
say file a.cpp has the following symbols (for scientific honesty, these would not be the real symbol names generated):


main
count [this is an integer, not a function]
external add
external subtract

And file b.cpp has the following symbols:


subtract
external count

And file c.cpp has the following sybols:


add
external count

The linker would put the address of the add function of the b.cpp into the empty slot in a.cpp, same for subtract. And puts the address of count in the empty slots of b.cpp and c.cpp. And then put all the 3 files together. Now a.cpp can call the functions he wanted from the other files. Note that the real thing that happens will be more complicated but this is a simplifies version of what happens.

The moral of this story 😀 is to show you how the compiler can only worry about one file at a time. Note that producing an executable from source files, is 2-phase process, compilation and linking.

Next post will be about calling convention and name decoration (aka name mangling) (related to function signature), that will show how overloading occurs and what is the use of declaring the argument types of an external function not just it’s name.

As Ramy have suggested:
Futher reading for the last post:
* I am sorry I tried to search google, but I didn’t find something directly useful. My sources however was from reading solutions for the problems I’ve faced before. As a matter of fact, one statement every while and then is where I collected this info into my mind; i.e. it is not from one direct source. The most useful source however was when I was dreaming to make a C++ compiler and I read a lot about compilers, there were some hints here and there about the operation of a C compiler, not even C++. (You can read more about Makefiles and GNU Make, you need those to manage compiling large applications in a custom way, using makefiles will help you to deeply understand the compilation process- it’s tedious at the beginning; a trial and error method, takes a lot of time to know these details)
Further reading for this post:
* Same as the above, I got it from diagnosing linking problems over MSDN. I.e. searching for one linking error after another. The other useful source was when I was working on the OS and I met a lot of problems because linking an OS is something totally different than a normal program (MS linker can’t do it btw). I faced a problem once that I need to put the address of the main function in the first 4 KB of the executable so GRUB can read it, there were no way to enforce that in MS linker. I spent 2 weeks facing several linkers and there were linker scripts involved. There was another one where I needed to get the size of the kernel at the run-time, I didn’t implement any File System yet, so I had to rely on some feature of placing some variable in the end of the kernel and getting it’s address! That one I had to ask on the alt.os.development usenet group, something you won’t find directly on some web page. (You can read about GNU LD for further reading, but that’s would be useful if you are searching for a certain feature, not for common reading)
Further reading for the next post: (finally something that can be directly found)
Wikipedia: Name mangling
Google search: name decoration

January 13, 2007 Posted by | AdvancedC++ | 2 Comments

Advanced C++ part 2 : Necessary Introduction to the Compilation Process

Hello all of you out there :).
I know I am awfully late in putting this part, I apologize about that :$.
Anyway the issue was I have intended to put it in a book, I’ve already made the first chapter, but I found it too hard to make a whole book. So I’ve seen it should be better writing the series here then assembling it into a book.
Okay, let’s start.
This article won’t be so interesting, it just furnishes basics you need to understand how the program gets compiled; because that will make a difference in your understanding of why each feature was designed that way.
Our terminology here is :
source file
: the .cpp file.
header file: the .h file.
When using libraries, like cstdio, which has the printf function and cin and cout,
your program can’t use them unless you include the cstdio header file (in C it was stdio.h, in C++ it is cstdio; yeah with no .h extension).
Unlike C# or Java, the C++ compiler never looks inside any other source file other than the one it is compiling right now. This means, it can’t perfom any compile-time validations if you use an external function/class; as a matter of fact, it can’t even produce a description of that function/class so the linker knows which function/class he wants (we will discuss linker later). This have many consequences. The first one is that this have introduced the need for header files. Header files gets included in the source file before the compiler processes it (#include is a preprocessor). So the compiler processes the preprocessed source file which includes the expanded header files; meaning the compiler doesn’t understand what a header file means, it only knows what a source file means; also meaning if you #include <abc.tyr.eee.338.e8> it won’t say anything. Also you can’t compile a header file; it is only intended to be included in the source file.
So why don’t we #include the other source files instead ? Well, there is a difference between a declaration and definition. The source file have definition, so if you included one source file inside the others, both will claim they have the code of that function, and the linker will not know which one have the right. The header file should have only declaration, which is declaring the type, not the implementation. So if two files includes the declaration, they only say that is the type we want to be linked to us, no collision because after all, some other file will be the only one to claim that he owns the implementation which will be then linked to the other files.
[NOTE: this is why a book is easier in some cases, you can add exmaples and pages as you like]

Next part will be about linkers. I think this is enough for one post.
IMPORTANT: I need feedback to know whether I should simplify more or any comment you want to say.

January 12, 2007 Posted by | AdvancedC++ | 1 Comment

Advanced C++ part 1

This article series will be talking about advanced C++ concepts.

Prerequisites:
Solid understanding of programming. A sense of OOP. Understanding pointers. A slight idea about threads and concurrency. Imagination and enthusiasm.

Assumption:
I assume you can compile and run any C++ program and know how to handle any syntax error you get.

It will contains this topics:

  1. Necessary introduction about the compilation process.
  2. ‘new’ buffer allocation.
  3. Thread-safety using ‘volatile’.
  4. Exceptions.
  5. Namepsaces and anonymous namespaces.
  6. General usages of the ‘using’ keyword.
  7. const pointers and const references.
  8. Real time type-id.
  9. Templates.
  10. Introduction to STL.
  11. Template meta-programming.
  12. General topics like executable file encryption.

Stuff I will NOT talk about :

  1. No function pointers.
  2. Nothing related to common OOP (i.e. no inheritance, no operator overloading, no friend classes, no virtual functions/overriding).
  3. Nothing about visual programming.
  4. Nothing about managed C++.
  5. Nothing about programming style guidelines.

There is other topics I want to talk about, like signals ( <csignal> header file ) but probably I won’t.
I will start publishing them as soon as I can, but I don’t think I can do it before the exams.
As I publish each part, I will associate links and book titles for further reading.

May 14, 2006 Posted by | AdvancedC++ | 5 Comments