main2(), kernel_main (c/c++)

G G

unread,

Jul 10, 2019, 2:24:12 PM7/10/19

to

is it possible that c/c++ programs can start with something other
than a function named main()?

or is it somehow set in the compiler the name of the function
where to start?

http://www.cplusplus.com/articles/zv07M4Gy/

https://github.com/travisg/newos/blob/master/kernel/main.c

https://git.haiku-os.org/haiku/tree/src/system/kernel/main.cpp

Ben Bacarisse

unread,

Jul 10, 2019, 2:51:11 PM7/10/19

to

G G <gdo...@gmail.com> writes:

> is it possible that c/c++ programs can start with something other
> than a function named main()?

In what are called free-standing implementations, there need be no main
function. This applies to both C and C++. Programs written for hosted
implementations must define a main function.

> or is it somehow set in the compiler the name of the function
> where to start?

The "or" is not right. There need not be a main /and/ the name of the
function to start is somehow set in the compiler.

> http://www.cplusplus.com/articles/zv07M4Gy/
>
> https://github.com/travisg/newos/blob/master/kernel/main.c
>
> https://git.haiku-os.org/haiku/tree/src/system/kernel/main.cpp

Did I need to read these? If there is something at these links which
you want people to comment on, it's better to pull out a quote and post
that (along with the URL for reference).

--
Ben.

James Kuyper

unread,

Jul 10, 2019, 2:53:08 PM7/10/19

to

On 7/10/19 2:24 PM, G G wrote:
> is it possible that c/c++ programs can start with something other
> than a function named main()?

Since you're explicitly asking about both C and C++, I'm cross-posting
this to comp.lang.c.

Yes it is possible, but only for freestanding implementations:

"In a freestanding environment (in which C program execution may take
place without any benefit of an operating system), the name and type of
the function called at program startup are implementation-defined." (C
standard, 5.1.2p1)

"It is implementation-defined whether a program in a freestanding
environment is required to define a main function." (C++ standard, 6.6.1p1).

> or is it somehow set in the compiler the name of the function
> where to start?

An implementation of C or C++ can conform either as a freestanding
implementation or a hosted one. A hosted implementation must recognize
main() as the starting point. A freestanding implementation may have
other ways of identifying the starting point, but is required to
document them, and (implicitly) is required to conform to whatever
documentation it provides. Either way, it is indeed up to the
implementation to recognize the permitted ways of identifying the
starting point of a program.

It is common, but not essential, for an implementation to be divided
into several separate parts, such as a pre-processor, a compiler, and a
linker. However, the standard says nothing about how the work of an
implementation should be sub-divided. It is only the implementation as a
whole that has responsibility for implementing the correct start point
for a program.
That being said, in a typical pre-processor/compiler/linker setup, it is
indeed the compiler that is required to recognize the start point, but
it's the linker that actually does what's needed to make the program
start there.

David Brown

unread,

Jul 10, 2019, 2:54:32 PM7/10/19

to

On 10/07/2019 20:24, G G wrote:
> is it possible that c/c++ programs can start with something other
> than a function named main()?
>

Yes.

> or is it somehow set in the compiler the name of the function
> where to start?

On gcc, you can use the "--entry" option.

Using something other than main will be non-standard for hosted
implementations, but is sometimes used in free-standing (bare metal)
implementations where your code runs directly from the power-on reset.
But even there, there is seldom good reason to pick a different name -
it just risks confusion and possible inconsistencies.

G G

unread,

Jul 10, 2019, 3:17:03 PM7/10/19

to

=
>
> Did I need to read these? If there is something at these links which
> you want people to comment on, it's better to pull out a quote and post
> that (along with the URL for reference).
>
> --
> Ben.

no, but, i didn't know if someone needed to see an example
of what i was talking about

Keith Thompson

unread,

Jul 10, 2019, 4:09:03 PM7/10/19

to

James Kuyper <james...@alumni.caltech.edu> writes:
[...]

> It is common, but not essential, for an implementation to be divided
> into several separate parts, such as a pre-processor, a compiler, and a
> linker. However, the standard says nothing about how the work of an
> implementation should be sub-divided. It is only the implementation as a
> whole that has responsibility for implementing the correct start point
> for a program.
> That being said, in a typical pre-processor/compiler/linker setup, it is
> indeed the compiler that is required to recognize the start point, but
> it's the linker that actually does what's needed to make the program
> start there.

I don't believe the compiler is required to recognize the entry point.
Under gcc, for example, it appears to be just another function.

When I compile these two functions:
int main(void) { return 0; }
and
int notmain(void) { return 0; }

with "gcc -S", the assembly listings are identical other than
the function name. It's the linker that recognizes "main" as the
entry point.

The compiler does recognize "main" so it can generate code
equivalent to "return 0;" if control reaches the closing "}";
I deliberately avoided that in my examples. But a conforming
compiler could just do that for all int functions.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Bart

unread,

Jul 10, 2019, 4:59:55 PM7/10/19

to

On 10/07/2019 19:24, G G wrote:
> is it possible that c/c++ programs can start with something other
> than a function named main()?
>
> or is it somehow set in the compiler the name of the function
> where to start?

It depends on compiler+linker.

On my C implementation, I can use the entry point "start" as well as
"main". And with some extra effort, any function can be used as an entry
point.

But if I wanted the starting function to take the argc/argv parameters
commonly used with main(), then the name has to be "main", as only for
that name will the compiler make arrangements for those to be set up.

(This is for Windows; the arrangements for Linux may be different.)

Other implementations may have additional initialisations that have to
be done on start-up, and those can be associated with "main" in the same
way.

Juha Nieminen

unread,

Jul 10, 2019, 5:54:08 PM7/10/19

to

David Brown <david...@hesbynett.no> wrote:
> On 10/07/2019 20:24, G G wrote:
>> is it possible that c/c++ programs can start with something other
>> than a function named main()?
>>
>
> Yes.
>
>> or is it somehow set in the compiler the name of the function
>> where to start?
>
> On gcc, you can use the "--entry" option.

You can also use -nostartfiles, which gives you a much more
barebones executable. Rather than main(), a function with the
signature void _start() (no parameters) will be called at
startup. (The proper way to end the program in this situation
is to call exit(). I'm not sure what happens if the _start()
function just ends. Maybe it crashes or something.)

You'll have a much harder time getting any command-line
parameters, but in many programs they aren't actually needed.

Melzzzzz

unread,

Jul 10, 2019, 5:58:30 PM7/10/19

to

_start is default for ld, you can also change that.
Command line parameters on Linux are on stack so you have to
do it with inline assembly or enter territory of UB.

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

David Brown

unread,

Jul 11, 2019, 3:28:40 AM7/11/19

to

I believe some C++ compilers have special treatment for a function
called main().

For both C and C++, there are things that need to be done before main()
starts. For C, this means zeroing uninitialised static data, copying
initialised data, and setting up the arguments for main(), and
afterwards calling atexit functions. For C++, global constructors need
to be called and possibly code for initialising other global data - and
at the end, global destructors must be called.

On some systems, the link/loader can do all that is needed for C. On
other systems, it is done by part of the C library - the actual entry
point of the code is called something like "__startup" and is part of
the library that is included in the application binary. For C++, you
always need something like that.

But on some compilers I have seen, this startup and breakdown code is
injected directly into the "main" function by the compiler.

David Brown

unread,

Jul 11, 2019, 3:31:25 AM7/11/19

to

You usually use this sort of thing in very small systems embedded
programming. You don't have a command line, much less command-line
parameters, and your _start() function never ends because you invariably
have an infinite loop in your main code.

I have only once needed --nostartfiles, for programming a
microcontroller with no ram at all. It had 32 8-bit cpu registers, and
a three level hardware call stack. Having _start() call main() would
have wasted at least a third of the resources of the cpu!

Martijn van Buul

unread,

Jul 11, 2019, 10:59:20 AM7/11/19

to

* David Brown:

> I have only once needed --nostartfiles, for programming a
> microcontroller with no ram at all. It had 32 8-bit cpu registers, and
> a three level hardware call stack. Having _start() call main() would
> have wasted at least a third of the resources of the cpu!

My crystal ball predicts that this was an ATTiny 1x device.

--
Martijn van Buul - pi...@dohd.org

Keith Thompson

unread,

Jul 11, 2019, 5:15:02 PM7/11/19

to

Is there something specific you're trying to accomplish by having
your program start with a function other than main(), or are you
just asking out of curiosity? (Nothing wrong with the latter, but
it could affect what kind of answer is going to be useful to you.)

Note that there's really no such thing as "c/c++". C and C++ are
two different languages. They're closely related, and they have
similar (but not identical) rules in this area, but it's not clear
that the answer is going to be the same for both.

G G

unread,

Jul 11, 2019, 8:20:44 PM7/11/19

to

>
> Is there something specific you're trying to accomplish by having
> your program start with a function other than main(), or are you
> just asking out of curiosity? (Nothing wrong with the latter, but
> it could affect what kind of answer is going to be useful to you.)

asking out of curiosity, yes, and when looking at some very
large programs, operating systems' source code and a few application
i was not finding a main(). some were C programs some were C++ programs.
PostgreSQL, FreeBSD, Haiku OS, NewOS, ...

>
> Note that there's really no such thing as "c/c++". C and C++ are
> two different languages.

yes, they are. here "c/c++" is only about asking C or C++ programming
language.

They're closely related, and they have
> similar (but not identical) rules in this area, but it's not clear
> that the answer is going to be the same for both.

Ah, yes. James pointed that out.
>
I am currently studying C++, and Operating Systems Design and Implementation
and M68000 assembly language.

the m68000 i know is old, but the book i found is old, was cheap, and looks and
reads great.

And thanks Keith, i hope many of you all take part in An OS group, cause i do
have a few questions. :-) ok way more than a few.

Martijn van Buul

unread,

Jul 12, 2019, 3:21:10 AM7/12/19

to

* G G:

> I am currently studying C++, and Operating Systems Design and Implementation
> and M68000 assembly language.
>
> the m68000 i know is old, but the book i found is old, was cheap, and looks
> and reads great.

Just out of curiousity: Which revision of that book are you reading?

David Brown

unread,

Jul 12, 2019, 3:38:03 AM7/12/19

to

/Please/ keep the standard Usenet attributions!

On 12/07/2019 02:20, G G wrote:
>
>>
> I am currently studying C++, and Operating Systems Design and Implementation
> and M68000 assembly language.
>

The m68k ISA is a really nice design. The original 68000 was very
forward-looking - it was designed with a clear expansion path in the
future, contrasting with the x86 architecture which was an outdated
style almost when it started, and has been a continuous process of
patches, hacks and add-ons. If you want to learn some assembly, then it
is definitely one to recommend - even though it is not much in active
use. (It is used in ColdFire microcontrollers, which are still
available but are no longer being developed for the future.)

G G

unread,

Jul 12, 2019, 3:57:40 AM7/12/19

to

On Friday, July 12, 2019 at 3:21:10 AM UTC-4, Martijn van Buul wrote:
> * G G:

> Just out of curiousity: Which revision of that book are you reading?
>
> --
> Martijn van Buul - pi...@dohd.org

sure, Assembly Language and Systems Programming for the M68000 Family

copyright 1992

looks like the second edition of the book, cause it makes mention the previous
edition.

Scott Lurndal

unread,

Jul 12, 2019, 10:17:51 AM7/12/19

to

David Brown <david...@hesbynett.no> writes:
>/Please/ keep the standard Usenet attributions!
>
>On 12/07/2019 02:20, G G wrote:
>>
>>>
>> I am currently studying C++, and Operating Systems Design and Implementation
>> and M68000 assembly language.
>>
>
>The m68k ISA is a really nice design. The original 68000 was very
>forward-looking

Personally, I find the PDP-11 to be a nice design. 68000 had
differentiated registers (A0-A6, D0-D7) while on the PDP-11 any
register could contain data or an address.

For that matter, modern x86_64 is actually no so bad (the biggest
warts with the original 8086/80286 were related to segmented memory).

David Brown

unread,

Jul 12, 2019, 11:21:28 AM7/12/19

to

On 12/07/2019 16:17, Scott Lurndal wrote:
> David Brown <david...@hesbynett.no> writes:
>> /Please/ keep the standard Usenet attributions!
>>
>> On 12/07/2019 02:20, G G wrote:
>>>
>>>>
>>> I am currently studying C++, and Operating Systems Design and Implementation
>>> and M68000 assembly language.
>>>
>>
>> The m68k ISA is a really nice design. The original 68000 was very
>> forward-looking
>
> Personally, I find the PDP-11 to be a nice design. 68000 had
> differentiated registers (A0-A6, D0-D7) while on the PDP-11 any
> register could contain data or an address.

I haven't used the PDP-11, but I have used the msp430 microcontrollers
which have a very similar ISA. And they too are very pleasant cpus to
work with in assembly language or C.

>
> For that matter, modern x86_64 is actually no so bad (the biggest
> warts with the original 8086/80286 were related to segmented memory).
>

Certainly it seems to be a lot better than older x86 - more registers
and fewer register specific instructions helps.

Bart

unread,

Jul 12, 2019, 12:36:38 PM7/12/19

to

On 12/07/2019 15:17, Scott Lurndal wrote:
> David Brown <david...@hesbynett.no> writes:
>> /Please/ keep the standard Usenet attributions!
>>
>> On 12/07/2019 02:20, G G wrote:
>>>
>>>>
>>> I am currently studying C++, and Operating Systems Design and Implementation
>>> and M68000 assembly language.
>>>
>>
>> The m68k ISA is a really nice design. The original 68000 was very
>> forward-looking
>
> Personally, I find the PDP-11 to be a nice design. 68000 had
> differentiated registers (A0-A6, D0-D7) while on the PDP-11 any
> register could contain data or an address.

You had to look at it in more detail to find out it wasn't as orthogonal
as it appeared at first, and actually wasn't that much better in that
regard than x86.

Those two kinds of registers are a prime example, if your job is to
write a compiler for it (does a function returning int* return it in A0
or D0?). Much better from that era, but eclipsed by x86 and 68k, were
Z8000/0 and NatSemi 32032 family.

> For that matter, modern x86_64 is actually no so bad (the biggest
> warts with the original 8086/80286 were related to segmented memory).

You need to have written both a disassembler for x64 machine code, and
an assembler that generates that code, to see what a ghastly mess the
instruction encoding is.

The underlying instruction set still has largely the same design as for
the 8086, with 3 bits to specify /16/ registers, and 1 bit to specify
/4/ operand sizes, requiring various instruction prefix overrides, or
prefix bytes containing the missing bits. On top of prefix bytes to
extend the number of opcodes.

red floyd

unread,

Jul 12, 2019, 1:44:59 PM7/12/19

to

Z8000, anyone? Similar to 68K, but the registers were all general,
no differentiation. Z8000 was 16 bit registers, but could be subdivided
into upper/lower half for bytes, and combined with an adjacent register
for 32-bit registers (e.g. RR0 was R0 and R1 combined).

Z80000 was the 32-bit version, doubled the number of registers, and
allowed 64-bit "quad" registers.

Scott Lurndal

unread,

Jul 12, 2019, 1:53:56 PM7/12/19

to

Bart <b...@freeuk.com> writes:
>On 12/07/2019 15:17, Scott Lurndal wrote:

>> For that matter, modern x86_64 is actually no so bad (the biggest
>> warts with the original 8086/80286 were related to segmented memory).
>
>You need to have written both a disassembler for x64 machine code, and
>an assembler that generates that code, to see what a ghastly mess the
>instruction encoding is.

So two programmers in the entire world care about the instruction
encoding; The instruction encoding is not relevent to anyone else.

(actually, make that three, for those of us who write processor
simulators for a living, we need to decode (and execute) the instruction
set).

Take a look at Aarch32/Thumb or Aarch64 if you want interesting instruction
set encodings.

Take a look at Burroughs Medium Systems mainframes for a very simple encoding.

http://vseries.lurndal.org/doku.php?id=architecture#instruction_representation
http://vseries.lurndal.org/doku.php?id=instructions

>
>The underlying instruction set still has largely the same design as for
>the 8086, with 3 bits to specify /16/ registers, and 1 bit to specify
>/4/ operand sizes, requiring various instruction prefix overrides, or
>prefix bytes containing the missing bits. On top of prefix bytes to
>extend the number of opcodes.

That's to be expected from such a venerable architecture. And
it isn't relevent to more than a handful of programmers world-wide
who happen to write new assemblers for an architecture with a dozen
or more existing production quality assemblers, most in the public domain.

Scott Lurndal

unread,

Jul 12, 2019, 2:35:55 PM7/12/19

to

sc...@slp53.sl.home (Scott Lurndal) writes:
>Bart <b...@freeuk.com> writes:
>>On 12/07/2019 15:17, Scott Lurndal wrote:
>
>
>>> For that matter, modern x86_64 is actually no so bad (the biggest
>>> warts with the original 8086/80286 were related to segmented memory).
>>
>>You need to have written both a disassembler for x64 machine code, and
>>an assembler that generates that code, to see what a ghastly mess the
>>instruction encoding is.
>
>So two programmers in the entire world care about the instruction
>encoding; The instruction encoding is not relevent to anyone else.
>
>(actually, make that three, for those of us who write processor
>simulators for a living, we need to decode (and execute) the instruction
>set).

Ah, make that four - our hypervisor did have to simulate a handful
of instructions when trapping guest accesses to virtual PCI I/O devices.

e.g.
// Decode the instruction.
c_instr_iterator iter((void *)instr, rip);

len = get_operand_length(&iter);

// Interpret the instruction.
switch (iter.get_opcode()) {

case 0xb60f: // movzb mem, reg
value = trace->mem_read(addr, BYTE);
value >>= ((addr & 0x3) * BITS_PER_BYTE);
value &= (1UL << 8 * BYTE) - 1;
set_reg_value(regs, iter.get_modRM(REG), value, len);
break;

case 0xb70f: // movzw mem, reg
value = trace->mem_read(addr, WORD);
value >>= ((addr & 0x3) * BITS_PER_BYTE);
value &= (1UL << 8 * WORD) - 1;
set_reg_value(regs, iter.get_modRM(REG), value, len);
break;

...etc for most other instructions that access memory.

But it is still very rare for anyone else to care about the
instruction encoding.

Bart

unread,

Jul 12, 2019, 3:14:16 PM7/12/19

to

On 12/07/2019 18:53, Scott Lurndal wrote:
> Bart <b...@freeuk.com> writes:
>> On 12/07/2019 15:17, Scott Lurndal wrote:
>
>
>>> For that matter, modern x86_64 is actually no so bad (the biggest
>>> warts with the original 8086/80286 were related to segmented memory).
>>
>> You need to have written both a disassembler for x64 machine code, and
>> an assembler that generates that code, to see what a ghastly mess the
>> instruction encoding is.
>
> So two programmers in the entire world care about the instruction
> encoding; The instruction encoding is not relevent to anyone else.
>
> (actually, make that three, for those of us who write processor
> simulators for a living, we need to decode (and execute) the instruction
> set).
>
> Take a look at Aarch32/Thumb or Aarch64 if you want interesting instruction
> set encodings.
>
> Take a look at Burroughs Medium Systems mainframes for a very simple encoding.
>
> http://vseries.lurndal.org/doku.php?id=architecture#instruction_representation
> http://vseries.lurndal.org/doku.php?id=instructions

That's not that simple. Simpler IMO was PDP10 with fixed, not multiple
length instuctions. Instruction codes, register specifiers, and
immediate data/address operands are contained within one 36-bit word.

>>
>> The underlying instruction set still has largely the same design as for
>> the 8086, with 3 bits to specify /16/ registers, and 1 bit to specify
>> /4/ operand sizes, requiring various instruction prefix overrides, or
>> prefix bytes containing the missing bits. On top of prefix bytes to
>> extend the number of opcodes.
>
> That's to be expected from such a venerable architecture. And
> it isn't relevent to more than a handful of programmers world-wide
> who happen to write new assemblers for an architecture with a dozen
> or more existing production quality assemblers, most in the public domain.
>

I think the number is more than that. Look at the numerous resources for
online assembly and disassembly.

And you have to wonder how the sprawling instructions impact efficiency:
how much on-chip area is used up decoding them, how many of the limited
number of bytes in an instruction pipeline are holding rex, data,
address, escape and other prefix bytes that would otherwise be more
compactly contained within the instruction.

Scott Lurndal

unread,

Jul 12, 2019, 4:00:12 PM7/12/19

to

I can't speak for x86, but the AArch instruction encoding is baroque
specifically to make the hardware design easier.

Given the microcoded nature of the x86 processor and given that the
majority of frequently used instructions have a single-cycle
latency, I don't see that the complexity of the instruction set has
impacted performance. It does, to a certain extent, impact the area
required, but the vast majority of area on a processor chip is devoted
to L1 and L2 caches.

David Brown

unread,

Jul 13, 2019, 6:46:41 AM7/13/19

to

On 12/07/2019 19:53, Scott Lurndal wrote:
> Bart <b...@freeuk.com> writes:
>> On 12/07/2019 15:17, Scott Lurndal wrote:
>
>
>>> For that matter, modern x86_64 is actually no so bad (the biggest
>>> warts with the original 8086/80286 were related to segmented memory).
>>
>> You need to have written both a disassembler for x64 machine code, and
>> an assembler that generates that code, to see what a ghastly mess the
>> instruction encoding is.
>
> So two programmers in the entire world care about the instruction
> encoding; The instruction encoding is not relevent to anyone else.
>
> (actually, make that three, for those of us who write processor
> simulators for a living, we need to decode (and execute) the instruction
> set).
>

It has an indirect effect on others - a poor choice of instruction
encoding means code is bigger than it need be, meaning instruction
caches are less effective, memory bandwidths must be higher, and so on.

Other than that, you are right - very few people need to look below the
assembly level.

(As a teenager, instruction coding was more important to me - for
learning Z80A assembly on my ZX Spectrum, I had to had assembly the code
to machine code. And I wrote a 6502 disassembler once.)

> Take a look at Aarch32/Thumb or Aarch64 if you want interesting instruction
> set encodings.
>
> Take a look at Burroughs Medium Systems mainframes for a very simple encoding.
>
> http://vseries.lurndal.org/doku.php?id=architecture#instruction_representation
> http://vseries.lurndal.org/doku.php?id=instructions
>
>>
>> The underlying instruction set still has largely the same design as for
>> the 8086, with 3 bits to specify /16/ registers, and 1 bit to specify
>> /4/ operand sizes, requiring various instruction prefix overrides, or
>> prefix bytes containing the missing bits. On top of prefix bytes to
>> extend the number of opcodes.
>
> That's to be expected from such a venerable architecture. And
> it isn't relevent to more than a handful of programmers world-wide
> who happen to write new assemblers for an architecture with a dozen
> or more existing production quality assemblers, most in the public domain.
>

I think that with the x86-64 AMD had the chance to re-do the instruction
encoding entirely while keeping a similar basic ISA. Having a similar
ISA would avoid duplication of too much of the internals (they did not
want a repeat of the Itanium!) and be familiar to programmers - keeping
the same strong memory model would be vital. But would anyone (other
than the 3 people mentioned above) have cared if the encoding for "mov"
was completely different in x86-32 and x86-64 ?