Sections and Objective-C

67 views
Skip to first unread message

Ivan Vučica

unread,
May 9, 2013, 2:28:01 PM5/9/13
to emscripte...@googlegroups.com
I have submitted a pull-request containing some basic testing code for Objective-C. A program is correctly generated, but without a runtime, it naturally does not work. (Related runtime functions cannot be found when the program is started.)

To get a runtime to work, a list of classes and other runtime information is needed. In Objective-C (with Apple runtime's ABI) classes are described as structures tagged with __attribute__((used, section "__OBJC, __class")). There are more section names used.

Any tips on how these can get exposed?

I think the actual problem might be that __attribute__((used)) is ignored. Can I somehow get Emscripten to obey this flag? And, when I do, how can I get the variables out? 

Aside from __attribute__((used)), it's perhpas worth noting that these variables are static -- i.e. per-object. Yet when linking, not only should they be retained, but there will probably be more than one of them. All contain information vital at runtime and need to be exposed at runtime. 


For additional information on what's going on, see README.md in tests/objc/ in my fork (and the pull request #1161).


(Side note: while long term it would be ideal if Emscripten could use .ll directly generated from Objective-C without the trip into the world of C, the little work I've done does not focus on that.)

Here is a snippet of the relevant part of C code that was generated from Objective-C by Clang. 


struct _objc_method {
        SEL _cmd;
        char *method_types;
        void *_imp;
};

static struct {
        struct _objc_method_list *next_method;
        int method_count;
        struct _objc_method method_list[1];
} _OBJC_CLASS_METHODS_X __attribute__ ((used, section ("__OBJC, __cls_meth")))
{
        0, 1
        ,{{(SEL)"print", "v8@0:4", (void *)_C_X_print}
         }
};

struct _objc_class {
        struct _objc_class *isa;
        const char *super_class_name;
        char *name;
        long version;
        long info;
        long instance_size;
        struct _objc_ivar_list *ivars;
        struct _objc_method_list *methods;
        struct objc_cache *cache;
        struct _objc_protocol_list *protocols;
        const char *ivar_layout;
        struct _objc_class_ext  *ext;
};

static struct _objc_class _OBJC_METACLASS_X __attribute__ ((used, section ("__OBJC, __meta_class")))= {
        (struct _objc_class *)"X", 0, "X", 0,2, sizeof(struct _objc_class), 0
        , (struct _objc_method_list *)&_OBJC_CLASS_METHODS_X
        ,0,0,0,0
};

static struct _objc_class _OBJC_CLASS_X __attribute__ ((used, section ("__OBJC, __class")))= {
        &_OBJC_METACLASS_X, 0, "X", 0,1,0,0,0,0,0,0,0
};

struct _objc_symtab {
        long sel_ref_cnt;
        SEL *refs;
        short cls_def_cnt;
        short cat_def_cnt;
        void *defs[1];
};

static struct _objc_symtab _OBJC_SYMBOLS __attribute__((used, section ("__OBJC, __symbols")))= {
        0, 0, 1, 0
        ,&_OBJC_CLASS_X
};


struct _objc_module {
        long version;
        long size;
        const char *name;
        struct _objc_symtab *symtab;
};

static struct _objc_module _OBJC_MODULES __attribute__ ((used, section ("__OBJC, __module_info")))= {
        7, sizeof(struct _objc_module), "", &_OBJC_SYMBOLS
};


Alon Zakai

unread,
May 9, 2013, 3:42:20 PM5/9/13
to emscripte...@googlegroups.com
On Thu, May 9, 2013 at 11:28 AM, Ivan Vučica <iv...@vucica.net> wrote:
I have submitted a pull-request containing some basic testing code for Objective-C. A program is correctly generated, but without a runtime, it naturally does not work. (Related runtime functions cannot be found when the program is started.)

To get a runtime to work, a list of classes and other runtime information is needed. In Objective-C (with Apple runtime's ABI) classes are described as structures tagged with __attribute__((used, section "__OBJC, __class")). There are more section names used.

Any tips on how these can get exposed?

Hmm, I have no idea what a section is. What does that cause to happen in the .ll?
 

I think the actual problem might be that __attribute__((used)) is ignored. Can I somehow get Emscripten to obey this flag? And, when I do, how can I get the variables out? 


That should work, for example

void __attribute__((used)) waka() { printf("waka"); }

is kept alive when I test it now.

- Alon

Ivan Vučica

unread,
May 9, 2013, 4:31:12 PM5/9/13
to emscripte...@googlegroups.com
On 9. 5. 2013., at 21:42, Alon Zakai <alonm...@gmail.com> wrote:

Hmm, I have no idea what a section is.

These are "sections" in executable formats such as ELF.  Variables commonly go into one section, executable code into another. One section can then be loaded into pages that can be executed, but are readonly; while the other section can be loaded into non-executable pages that are readwrite. I think that was the guiding princilpe.


section ("section-name")
Normally, the compiler places the objects it generates in sections like data and bss. Sometimes, however, you need additional sections, or you need certain particular variables to appear in special sections, for example to map to special hardware. The section attribute specifies that a variable (or function) lives in a particular section. For example, this small program uses several specific section names:
 <snip>

Use the section attribute with global variables and not local variables, as shown in the example.

You may use the section attribute with initialized or uninitialized global variables but the linker requires each object be defined once, with the exception that uninitialized variables tentatively go in the common (or bss) section and can be multiply “defined”. Using the section attribute changes what section the variable goes into and may cause the linker to issue an error if an uninitialized variable has multiple definitions. You can force a variable to be initialized with the -fno-common flag or the nocommon attribute.

Some file formats do not support arbitrary sections so the section attribute is not available on all platforms. If you need to map the entire contents of a module to a particular section, consider using the facilities of the linker instead. 


What does that cause to happen in the .ll?

From what I can tell it adds a mark to the variable declaration. I'm not sufficiently familiar with LLVM to figure out what's going on.

@_OBJC_CLASS_METHODS_X = internal global %struct.anon { %struct._objc_method_list* null, i32 1, [1 x %struct._objc_method] [%struct._objc_method { %struct.objc_selector* bitcast ([6 x i8]* @.str1 to %struct.objc_selector*), i8* getelementptr inbounds ([7 x i8]* @.str2, i32 0, i32 0), i8* bitcast (void (%struct.objc_class*, %struct.objc_selector*)* @_C_X_print to i8*) }] }, section "__OBJC, __cls_meth", align 8

Could we have something akin to a FUNCTION_TABLE holding a dictionary of all symbols marked with a section that starts with __OBJC? Key would be either the section name (above "__OBJC, __cls_meth") or the variable name (above "_OBJC_CLASS_METHODS_X"), while the value would be an array of all related objects?

 

I think the actual problem might be that __attribute__((used)) is ignored. Can I somehow get Emscripten to obey this flag? And, when I do, how can I get the variables out? 


That should work, for example

void __attribute__((used)) waka() { printf("waka"); }

is kept alive when I test it now.

I'm attaching the intermediate C code, intermediate LLVM code, output HTML and contents of Apple's /usr/include/objc. (APSL 2.0 licensed, so it should be safe to share publicly.)  

hello.m.c, hello.ll and hello.html are all produced with code and scripts in my fork.

objc.zip
hello.m.c
hello.ll
hello.html

Alon Zakai

unread,
May 13, 2013, 7:54:11 PM5/13/13
to emscripte...@googlegroups.com
Compiling the c file fails, is it missing some #Includes perhaps?

I wonder if the section stuff is not confusing us. Does it work if we remove the section stuff from the c code? (Because in my example the attribute did not have anything but used, that's why I am curious).

We could probably parse out the Section stuff I guess. But I wonder if we actually need it. What would we do with the information?

- Alon



--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan Vučica

unread,
May 14, 2013, 8:50:04 AM5/14/13
to emscripte...@googlegroups.com
On Tue, May 14, 2013 at 1:54 AM, Alon Zakai <alonm...@gmail.com> wrote:
Compiling the c file fails, is it missing some #Includes perhaps?

Can you share the error? I suspect that, since this is the raw output from Clang, it's missing #include <objc/objc.h>, the umbrella header for the Objective-C runtime. 

Please see "emobjcc-miniwrapper"; it's including "objc-prefix.h" from the command line (and that file includes <objc/objc.h>).

In case you don't have them, I have also included the objc runtime headers in the previous email.
 

I wonder if the section stuff is not confusing us. Does it work if we remove the section stuff from the c code? (Because in my example the attribute did not have anything but used, that's why I am curious).

If I remove section stuff from .ll code, indeed, nothing still appears. See attachment.

I've also tried removing just the section attribute on one of the static variables, and nothing appears.

Could it be that linker stage of emscripten determines these static variables are unused and strips them out?
 
We could probably parse out the Section stuff I guess. But I wonder if we actually need it. What would we do with the information?

I'm not sure, but the information itself is definitely needed by the runtime. Each .m file that contains a class definition would produce a static variable marked as "used" and "section(something)", and the runtime needs to know about each of these class definitions.

I'm not sure how to go about exposing them in a natural way.

Note that the current way of doing things is actually a major hack; ideally Emscripten should NOT go through the C stage. Clang is pretty familiar with Objective-C and the LLVM byte code that it spits out is quite different than the one obtained if we go through the C stage  :-)

See hello.direct.ll for example output of:
    clang -c -S -emit-llvm hello.m -o hello.direct.ll
Once this is sent through emcc, note that the metadata clearly present in the .ll file is once again missing in the Javascript code.

For example, this is missing:

@"OBJC_CLASS_$_X" = global %struct._class_t { %struct._class_t* @"OBJC_METACLASS_$_X", %struct._class_t* null, %struct._objc_cache* @_objc_empty_cache, i8* (i8*, i8*)** @_objc_empty_vtable, %struct._class_ro_t* @"\01l_OBJC_CLASS_RO_$_X" }, section "__DATA, __objc_data", align 8

and that is the symbol where I think one would start looking for the class called "X".


--
Ivan Vučica - iv...@vucica.net

hello.html
hello.ll
hello.direct.ll

Alon Zakai

unread,
May 16, 2013, 5:37:21 PM5/16/13
to emscripte...@googlegroups.com
On Tue, May 14, 2013 at 5:50 AM, Ivan Vučica <ivu...@gmail.com> wrote:
On Tue, May 14, 2013 at 1:54 AM, Alon Zakai <alonm...@gmail.com> wrote:


I wonder if the section stuff is not confusing us. Does it work if we remove the section stuff from the c code? (Because in my example the attribute did not have anything but used, that's why I am curious).

If I remove section stuff from .ll code, indeed, nothing still appears. See attachment.

I've also tried removing just the section attribute on one of the static variables, and nothing appears.

Could it be that linker stage of emscripten determines these static variables are unused and strips them out?

LLVM does do dead code elimination. Try to compile with -s LINKABLE=1 which disables that.
 
 
We could probably parse out the Section stuff I guess. But I wonder if we actually need it. What would we do with the information?

I'm not sure, but the information itself is definitely needed by the runtime. Each .m file that contains a class definition would produce a static variable marked as "used" and "section(something)", and the runtime needs to know about each of these class definitions.

I'm not sure how to go about exposing them in a natural way.

Note that the current way of doing things is actually a major hack; ideally Emscripten should NOT go through the C stage. Clang is pretty familiar with Objective-C and the LLVM byte code that it spits out is quite different than the one obtained if we go through the C stage  :-)

It might be though that going through C is easier. Objective-C might use a subset of LLVM IR that is different than C/C++ does, and would require more work for us to support.

- Alon

 

Michael Bishop

unread,
May 23, 2013, 4:37:35 PM5/23/13
to Alon Zakai, emscripte...@googlegroups.com
I couldn't help overhearing this conversation. I've been very interested in porting the Objective-C runtime for my own reasons. I've started doing that by using the cocoatron project and compiling it with Emscripten.

So far, I've been able to compile these frameworks: Foundation CoreFoundation CoreServices CFNetwork objc but Emscripten throws an exception when linking. If anyone wants to help out and try to get this working, you'll need three things:

Prerequisites
------------------

1 - Emscripten with this pull-request (which allows emcc to recognize .m/.mm files):


2 - This fork of Cocoatron (use the 'emscripten' branch)


3 - Ruby installed


Building
-------------
Set your current directory to the cocoatron root. Type 'rake'. This will compile the above-mentioned frameworks and the main.m file in testing/emscripten. Right now, it's not too exciting because emcc crashes but once that is fixed, we might have a program that calls into the ObjC runtime.

Finally, Alon, if you recognize any of the included stacktrace, I'd love to hear your thoughts as to where I should start looking to fix it.

Thanks!

_ michael

---
Michael Bishop
Hitpoint Studios


STACKTRACE
---------------------

tests-Mac-mini:mbtyke-emscripten mbishop$ rake
emcc -s VERBOSE=1 -o .libs/objc/test.js testing/emscripten/main.m .libs/objc/libobjc.bc -I .libs/Frameworks
clang: warning: argument unused during compilation: '-nostdinc++'
warning: unresolved symbol: pthread_cond_signal
warning: unresolved symbol: getgrgid
warning: unresolved symbol: pthread_create

undefined:540
          if (item.tokens[3].text == 'c')
                            ^
TypeError: Cannot read property 'text' of undefined
    at Object._global [as processItem] (eval at globalEval (/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/src/compiler.js:105:8), <anonymous>:540:29)
    at Object.Actor.process (eval at globalEval (/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/src/compiler.js:105:8), <anonymous>:248:26)
    at Object.Substrate.solve (eval at globalEval (/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/src/compiler.js:105:8), <anonymous>:184:25)
    at intertyper (eval at globalEval (/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/src/compiler.js:105:8), <anonymous>:1043:20)
    at finalCombiner (eval at globalEval (/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/src/compiler.js:105:8), <anonymous>:1621:34)
    at JSify (eval at globalEval (/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/src/compiler.js:105:8), <anonymous>:1767:3)
    at runPhase (/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/src/compiler.js:265:5)
    at compile (/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/src/compiler.js:281:5)
    at Object.<anonymous> (/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/src/compiler.js:291:5)
    at Module._compile (module.js:456:26)
Traceback (most recent call last):
  File "/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/emscripten.py", line 806, in <module>
    _main(environ=os.environ)
  File "/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/emscripten.py", line 794, in _main
    temp_files.run_and_clean(lambda: main(
  File "/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/tools/tempfiles.py", line 38, in run_and_clean
    return func()
  File "/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/emscripten.py", line 802, in <lambda>
    DEBUG_CACHE=DEBUG_CACHE,
  File "/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/emscripten.py", line 688, in main
    jcache=jcache, temp_files=temp_files, DEBUG=DEBUG, DEBUG_CACHE=DEBUG_CACHE)
  File "/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/emscripten.py", line 176, in emscript
    assert '//FORWARDED_DATA:' in out, 'Did not receive forwarded data in pre output - process failed?'
AssertionError: Did not receive forwarded data in pre output - process failed?
Traceback (most recent call last):
  File "/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/emcc", line 1437, in <module>
    final = shared.Building.emscripten(final, append_ext=False, extra_args=extra_args)
  File "/Users/mbishop/dev/mozilla_project/michaeljbishop/emscripten/tools/shared.py", line 1082, in emscripten
    assert os.path.exists(filename + '.o.js') and len(open(filename + '.o.js', 'r').read()) > 0, 'Emscripten failed to generate .js: ' + str(compiler_output)
AssertionError: Emscripten failed to generate .js: 


--

Alon Zakai

unread,
May 24, 2013, 2:30:39 PM5/24/13
to emscripte...@googlegroups.com
If you want, send me the bitcode that crashes there and I'll investigate.

- Alon

Ivan Vučica

unread,
May 24, 2013, 2:34:44 PM5/24/13
to emscripte...@googlegroups.com, Alon Zakai
You did far more than me, obviously -- Alon, I'll leave this in Michael's capable hands :-)

Michael Bishop

unread,
May 30, 2013, 11:31:48 AM5/30/13
to emscripte...@googlegroups.com, Alon Zakai
Hi Ivan,

This is really very preliminary. While I've been able to successfully make a script to build the cocoatron source files, there are still some *huge* holes yet to fill. A group effort would definitely be needed (I also like your thoughts for finding interesting ways to integrate with Cappuccino).

Here's what's missing right now:

--- objc_msgSend() and friends ---

We have to write these ourselves.
In all the ObjC runtime implementations I've seen, this is implemented in assembly and for two major reasons:
1. It's faster
2. They can do tricks with the stack using tail-recursion so objc_msgSend removes its stack frame before it calls the found implementation. I'm not sure if we can pull the same trick in Emscripten. If we can't, we just will have a stack with objc_msgSend *all* over the place.

I'm getting all this info from:


--- extensive function pointer casting ---
When clang compiles the objective-C code, it appears to stuff the function pointer to the methods into a generic void * that is stored in the class object. This is a big no-no in ASM.js because it makes tables of function pointers based on the signature of the method. I think we can work around this in an implementation of objc_msgSend() because we have access to the signature of the method and so then can look up the function pointer manually.

But at any rate, there is plenty missing :)

_ michael

---
Michael Bishop
Hitpoint Studios

Reply all
Reply to author
Forward
0 new messages