Is it possible to erase RTTI/Reflection information when compiling Haxe code?

412 views
Skip to first unread message

Saar Korren

unread,
Jan 12, 2014, 6:30:54 PM1/12/14
to haxe...@googlegroups.com
Haxe is a statically typed language which can compile to native code, among other targets. Taking a page from Java, though, Haxe opts to maintain reflection information, such as class and package names, in the resulting code. It even goes so far as to emulate it in targets that do not natively support reflection, by adding class names as strings to the data section.

Reflection can be useful when creating single binary packages for multiple targets, or when using run-time code linking, e.g. for plugin systems. It allows code to discover available features at run time, while offering graceful degradation if any are missing.
For statically compiled stand-alone programs, however, it is not a good thing. Besides the unnecessary storage of extra data, it means that the final binary contains visible internal implementation data which should be blackboxed.

For Java, this was somewhat resolved with the bytecode obfuscator Proguard, which changes the names of classes and methods, and flattens package structure, to erase information that need not be exported. While it's still possible to observe the division into classes and methods, their names and purpose are no longer immediately apparent. It also reduces the resulting package size by removing the long human-readable names, and replacing them with much shorter ones.

For Haxe, such a post-processing solution is not possible, due to the multitude of targets, some of which don't even have native support for reflection.

For Java target, it's possible to use Proguard as usual on the Haxe output. For CPP, code coverage SHOULD remove most of the type information if it is not used (Although there are a few calls during the initialization phase which might confuse the optimizer).

But for targets such as SWF and JS, things are not so simple. In fact, because of the way Haxe emulates packages in ECMAScript, it doesn't lend itself to existing obfuscation tools. Even if the package and class names are never used in the code, and even if they are not exported to the global scope in any way, they are still present and visible in the resulting code, and obfuscators can't easily detect that they cannot be used.

For PHP, due to the use of a reflection-based auto-loader, post-compile obfuscation is completely impossible.

For that reason, I'm wondering if there's some compiler flag or class annotation to indicate code that does not need to use reflection, and should not maintain the same human-readable names in the produced result.

P.S. I would much prefer if Haxe had an option to flatten its code in targets where this is possible, like what you get when using Emscripten on C++ code.

Nicolas Cannasse

unread,
Jan 13, 2014, 9:15:42 AM1/13/14
to haxe...@googlegroups.com
Le 13/01/2014 00:30, Saar Korren a �crit :
> Haxe is a statically typed language which can compile to native code,
> among other targets. Taking a page from Java, though, Haxe opts to
> maintain reflection information, such as class and package names, in the
> resulting code. It even goes so far as to emulate it in targets that do
> not natively support reflection, by adding class names as strings to the
> data section.
>
> Reflection can be useful when creating single binary packages for
> multiple targets, or when using run-time code linking, e.g. for plugin
> systems. It allows code to discover available features at run time,
> while offering graceful degradation if any are missing.
> For statically compiled stand-alone programs, however, it is not a good
> thing. Besides the unnecessary storage of extra data, it means that the
> final binary contains visible internal implementation data which should
> be blackboxed.

This has to be correctly handled in the specific code generator.

For instance for client technologies where obfuscation is more common,
Flash is not concerned since we use native Reflection and for JS
Reflection code is disabled if no reflection method is used. Using a
custom JS exporter might be an alternative if you still want
reflection+obfuscation.

Best,
Nicolas


Juraj Kirchheim

unread,
Jan 13, 2014, 9:20:22 AM1/13/14
to haxe...@googlegroups.com
A few things:

1. If you are having any kind of rtti in your code, then that's
because you're using it at some point. If you don't, Haxe simply won't
generate it. If you look at this code (that just prints its own
output): http://try.haxe.org/#a5E84 and the variant that uses
reflection: http://try.haxe.org/#8E3a0

2. Reflection has a wide range of applications. Some people use it for
template engines or tweening engines. Haxe serialization and thus Haxe
remoting uses it. So it's a bit more then some quirky edge case code
discovery feature.
The compiler only includes the additional data if you actually use
reflection. Including it if you don't use it or not including it, if
you do use it would be quite unreasonable after all.

3. The haxe/js should be compatible with google closure advanced mode
- the reflection free sample is compiled to this:

(function(){function a(){}a.a=function(){var
a=window.document.getElementsByTagName("script"),b;b=window.document.createElement("pre");b.innerText=a[a.length-1].textContent;window.document.body.appendChild(b)};a.a()})();

I would say that's pretty good obfuscation. Obfuscating JS is not
really hard and if you feel you can do a better job, you can always
use macros to plug in a custom JS generator.

4. The fact that Haxe generates human readable code is usually thought
of as a feature. It makes tracking errors a lot easier. Since for most
backends there are already excellent obfuscators that manage to turn
beautifully written code into meaningless output, those are the tools
that you should be using. For php and nodejs you can issue precompiled
binaries, for swf you can obfuscate the actual swf. And so on. There's
no really good reason to duplicate that work. And if you have some
specific tool that has trouble processing the Haxe output, then I
think that is what we really should be talking about.

5. Don't obsess about obfuscation. Most programmers are so strongly
passionate about reinventing the wheel, that they wouldn't take your
code even if it came for free all with documentation ;)

Regards,
Juraj

Nicolas Cannasse

unread,
Jan 13, 2014, 12:27:45 PM1/13/14
to haxe...@googlegroups.com
Le 13/01/2014 15:20, Juraj Kirchheim a �crit :
> A few things:
>
> 1. If you are having any kind of rtti in your code, then that's
> because you're using it at some point. If you don't, Haxe simply won't
> generate it. If you look at this code (that just prints its own
> output): http://try.haxe.org/#a5E84 and the variant that uses
> reflection: http://try.haxe.org/#8E3a0

Note: that's only true for the generators we did efforts to optimize
this way (JS only ATM I think).

Best,
Nicolas

Saar Korren

unread,
Jan 13, 2014, 12:35:21 PM1/13/14
to haxe...@googlegroups.com
Taking a slice of code from my own haxe-built JS output:
var org = {}
org
.slugfiller = {}
org
.slugfiller.aspect = {}
org
.slugfiller.aspect.Advice = function() { }


And this from an auto-build interface.
This code cannot be obfuscated with traditional obfuscators, because of JavaScript's expando features which allow something like this:
org["slug"+"filler"]

I agree that this should probably be reduced to "a.a.a.a", or even flattening the package structure to make it just "a". But this is not something that can be handled at the JS level, which is no longer package-aware.

I'm not sure how the SWF source looks like, since it's auto-compiled, but a look at the binary with a text editor shows the package and method names are all there in plain-text, so I'm guessing it suffers from the same issues as JavaScript.

As for CPP, I've opened my executable in a text editor, and found all the class and method names there. This, despite not using anything from the Haxe reflection API. I believe the culprit is this:
void TestMain_obj::__register()
{
    hx
::Static(__mClass) = hx::RegisterClass(HX_CSTRING("org.slugfiller.haxetest.TestMain"), hx::TCanCast< TestMain_obj> ,sStaticFields,sMemberFields,
   
&__CreateEmpty, &__Create,
   
&super::__SGetClass(), 0, sMarkStatics, sVisitStatics);
}
This method is apparently called from "__boot_all", and ensures code-coverage for the class name and all fields and members.
Again, no remoting or reflection APIs were used.

if at least the CPP target would not generate this information, I would have no issues using Emscripten for JavaScript. I would generally prefer to target asm.js anyway, for performance reasons.

In PHP, this little line:
spl_autoload_register('_hx_autoload');
makes any attempt at obfuscation or code pre-compilation useless.
I've actually spent quite a few pages riffing on Zend's SPL, and frameworks that use it. Just like people who use Java are perfectly capable of using "import" for every class they use in a file, there's no reason why PHP should not use "require_once", perhaps with some healthy helping of "dirname(__FILE__)".

Since the package and class names are in the produced filenames themselves, which are addressed at run-time, even using code-encryption wouldn't work, since the filenames can't be changed without breaking the code that loads the classes and imports.

I can see how this direct translation can be useful for debugging. But for release builds, packaging everything into one flat file would be better.

I realize output like that is not "for everyone". That's why I've been looking for a compiler flag, both in the compiler documentation, and in the "tips and tricks" (Why isn't there a single reference/man page with ALL the flags?), but couldn't find anything. What I did find is that there is a specific metadata for enabling RTTI ("@:rtti") for a specific class (Although I'm not sure what additional RTTI could be added to what is already present), but none for disabling all RTTI and reflection for a class or interface.

Also, I don't think it has to be fully done at the code generation level. Something like converting "my.really.long.package.name.AndClass" to "a1234" can be done at the typing level. It won't completely remove unused reflection code for targets like CPP, but it would at least reduce the amount of whitebox information in targets like JS, PHP and SWF.

Juraj Kirchheim

unread,
Jan 13, 2014, 1:49:34 PM1/13/14
to haxe...@googlegroups.com
On Mon, Jan 13, 2014 at 6:35 PM, Saar Korren <slugf...@gmail.com> wrote:
> Taking a slice of code from my own haxe-built JS output:
> var org = {}
> org.slugfiller = {}
> org.slugfiller.aspect = {}
> org.slugfiller.aspect.Advice = function() { }
>
> And this from an auto-build interface.
> This code cannot be obfuscated with traditional obfuscators, because of
> JavaScript's expando features which allow something like this:
> org["slug"+"filler"]

If you avoid reflection, then it's reasonable to assume that Haxe also
does. In that case there is no reason why the generated JS should
contain anything that breaks.

> I agree that this should probably be reduced to "a.a.a.a", or even
> flattening the package structure to make it just "a". But this is not
> something that can be handled at the JS level, which is no longer
> package-aware.

Quite simply, you either want this to work or not. You can just as
well do `Type.resolveClass("org.sl"+"ug.filler"+".aspect.Advice")`.
I suggest you run your output through closure and report any concrete issues.

> I'm not sure how the SWF source looks like, since it's auto-compiled, but a
> look at the binary with a text editor shows the package and method names are
> all there in plain-text, so I'm guessing it suffers from the same issues as
> JavaScript.

There is no SWF "source" as the generator spits out byte code directly.
As for obfuscation, the same argument applies as above. If you're
building and resolving class names at runtime, any obfuscator will
break. If you don't they will work.

> As for CPP, I've opened my executable in a text editor, and found all the
> class and method names there. This, despite not using anything from the Haxe
> reflection API. I believe the culprit is this:
> void TestMain_obj::__register()
> {
> hx::Static(__mClass) =
> hx::RegisterClass(HX_CSTRING("org.slugfiller.haxetest.TestMain"),
> hx::TCanCast< TestMain_obj> ,sStaticFields,sMemberFields,
> &__CreateEmpty, &__Create,
> &super::__SGetClass(), 0, sMarkStatics, sVisitStatics);
> }
> This method is apparently called from "__boot_all", and ensures
> code-coverage for the class name and all fields and members.
> Again, no remoting or reflection APIs were used.
>
> if at least the CPP target would not generate this information, I would have
> no issues using Emscripten for JavaScript. I would generally prefer to
> target asm.js anyway, for performance reasons.
>
> In PHP, this little line:
> spl_autoload_register('_hx_autoload');
> makes any attempt at obfuscation or code pre-compilation useless.

I agree that it makes it somewhat harder, but not impossible/useless.
For example, if you have a clean build in Haxe, all the files that are
in the output are used. Crawling the directory to generate an
include_all.php to work with is trivial. With `--next -main
MakeIncludeAll --interp` you can add that to your build without any
external tools.

> I've actually spent quite a few pages riffing on Zend's SPL, and frameworks
> that use it. Just like people who use Java are perfectly capable of using
> "import" for every class they use in a file, there's no reason why PHP
> should not use "require_once", perhaps with some healthy helping of
> "dirname(__FILE__)".

It would bloat output or impact on runtime performance.
You can either put the requires at the top, which makes you loose lazy
loading and thus is expensive in uncached php environments (which
unfortunately are still pretty much the default). Or you put them into
the method bodies. Which makes the code rather unreadable.

> Since the package and class names are in the produced filenames themselves,
> which are addressed at run-time, even using code-encryption wouldn't work,
> since the filenames can't be changed without breaking the code that loads
> the classes and imports.
>
> I can see how this direct translation can be useful for debugging. But for
> release builds, packaging everything into one flat file would be better.

Flattening is just as trivial as generating an include_all.
And again, depending on your environment it might not be what you want.
If you have a lousy webspace, where caching is not enabled, then this
will impact execution time. And if the bandwidth is in the same
league, deployment will take significantly longer, since you always
have to upload the whole app instead of the files that were changed.

> I realize output like that is not "for everyone".

That's exactly the point. The output now is for everyone. And if you
have specific needs, then you will have to walk the extra mile to meet
them.

> That's why I've been looking for a compiler flag, both in the compiler documentation, and in the
> "tips and tricks" (Why isn't there a single reference/man page with ALL the
> flags?), but couldn't find anything. What I did find is that there is a
> specific metadata for enabling RTTI ("@:rtti") for a specific class
> (Although I'm not sure what additional RTTI could be added to what is
> already present), but none for disabling all RTTI and reflection for a class
> or interface.
>
> Also, I don't think it has to be fully done at the code generation level.
> Something like converting "my.really.long.package.name.AndClass" to "a1234"
> can be done at the typing level.

To make my above point clear: If - and only if - you write your code
in such a way that you avoid building and resolving names at runtime,
then obfuscation is easily achieved. If not, then it's pretty much
impossible (unless you can add some sufficiently cryptic name
resolution code).

For example you can use Context.onGenerate to add a macro that will
decorate every class with a `@:native("<CrypticName>")` metadata to
replace the name in the output. Similarly, with a build macro you can
take every method and give it some obscure name, and generate an
inline method that will just forward the call. With a few extra tricks
you can even make it work for accessors and fields.

Or use any of the established tools for the platform.

Regards,
Juraj

Tarwin Stroh-Spijer

unread,
Jan 13, 2014, 2:49:24 PM1/13/14
to haxe...@googlegroups.com
I think using "autoload" in PHP is really the suggested way to work these days. Just a note.



Tarwin Stroh-Spijer
_______________________

phone: +1 650 842 0920

Developer at Fanplayr Inc. (Palo Alto)
Original at Touch My Pixel (touchmypixel.com)
_______________________



--
To post to this group haxe...@googlegroups.com
http://groups.google.com/group/haxelang?hl=en
---
You received this message because you are subscribed to the Google Groups "Haxe" group.
For more options, visit https://groups.google.com/groups/opt_out.

Saar Korren

unread,
Jan 13, 2014, 3:20:15 PM1/13/14
to haxe...@googlegroups.com
I will concede that Google Closure at maximal settings produces good output, erasing human readable data, removing unused interfaces, and flattening package names. The code isn't as efficient as asm.js, but I can just use Emscripten for that.

For PHP, I don't see how the code can be post-processed when it uses an auto-loader with calculated filenames. It will simply create issues once it tries to require a file which doesn't exist. Such processing would require complete replacement code for the boot-loader.
Also, from what I can gather, your argument is that it will be more efficient in non-code-caching PHP environments (I assume you mean CGI, since mod_php and FastCGI both cache), provided they only use a small fraction of the defined classes per call. Do tell me if you ever find such gloriously modular code in the wild.

I am interested in the idea of using a macro to @:native-ize all the classes to something more flat. The suggestion to use @:build for methods is less practical, though, not the least bit if @:build macros are already used in the code. If @:native could work on methods, that would solve that issue.

I would be willing to see this thread resolved if you can answer me this: How do I prevent the CPP target from generating and/or using "__register" methods for every single class. (And if possible, also "hx::RegisterResources", if no resources are used)
Even if obfuscated, it adds unnecessary data and bloat to the final program which is not optimized by any compiler I know. Even some "ifndef" in the generated "__boot_all" would probably do the trick. That should make all the other data (aside from "__GetType" which is unfortunately virtual) be removed by code coverage.

Nicolas Cannasse

unread,
Jan 14, 2014, 5:16:20 AM1/14/14
to haxe...@googlegroups.com
[...]
> I would be willing to see this thread resolved if you can answer me
> this: How do I prevent the CPP target from generating and/or using
> "__register" methods for every single class. (And if possible, also
> "hx::RegisterResources", if no resources are used)
> Even if obfuscated, it adds unnecessary data and bloat to the final
> program which is not optimized by any compiler I know. Even some
> "ifndef" in the generated "__boot_all" would probably do the trick. That
> should make all the other data (aside from "__GetType" which is
> unfortunately virtual) be removed by code coverage.

I'm not sure it's possible atm. Try to submit a feature request on Github.

Best,
Nicolas

Hugh

unread,
Jan 15, 2014, 12:06:19 AM1/15/14
to haxe...@googlegroups.com
Hi
All used classes must be registered so that their statics can be marked by the GC.
Most of the other stuff is needed for dynamic access - including in generics etc, like:

function foo(bar:{x:Float})
will do a named lookup on "bar.x", as will:
var bar:Dynamic = new Point();
trace(bar.x);
trace(MyClass) will need the name of "MyClass".
var bar = { x:1.2, y:1.2 }; should still be ok if you removed the named-lookup on fixed class members.

Removing the named-lookup should be possible with a simple #define, but should be accompanied by a compiler error if any of the dynamic-lookup fields are used.  Or, ideally, only include the fields that are named.  Both these options are  harder than simply removing the values with a define.

Removing the member lists should be easy enough, although would not offer much savings unless the dynamic-lookups are removed too, since the strings for each function name would still be present.

Not too sure about code bloat, if you are trying to remove 100 bytes * 100 classes, it might not make that much difference.  Would have to see the final numbers.

Hugh

Saar Korren

unread,
Jan 15, 2014, 8:23:06 AM1/15/14
to haxe...@googlegroups.com
Well, I realize any setting of a Dynamic member inherently uses Strings for field keys. As a rule, I try to avoid using that class altogether.

Code like this, however

var bar:Dynamic = new Point();
Could simply be translated to
var temp = new Point();
var bar:Dynamic;
bar
.x = temp.x;
bar
.y = temp.y;

So the RTTI information is only available at the point of assignment, and not globally for the class.

I was somewhat disappointed to learn that anonymous objects are implemented with Dynamic. I had hoped they were implemented with helper classes, considering their signature is fully known at compile time.

I am interested in what you said about the GC, though. Why does the GC need to know a class's structure? AFAIK, it only needs to know its reference graph. Something along these lines:
// Excuse my pseudo-code
class MyClass
{
private:
 
GCNode __node;
 
GCStrongReference<MyOtherClass> myMember;

public:
 
MyClass() {
    __node
.setObject(this);
    myMember
.setNode(__node);
 
}

 
void __newNode(GCStrongReference<MyClass>& obj) {
    obj
.setValueFromNode(__node);
    __node
.allowGC();
 
}

 
void setMyOtherClass(GCStrongReference<MyOtherClass>& newval) {
   
// myMember = newval;
    myMember
.setValueFromReference(newval);
 
}

 
GCStrongReference getMyOtherClass() {
   
return myMember;
 
}

 
void generateMyOtherClass() {
   
// myMember = new MyOtherClass();
   
var temp = new MyOtherClass();
    temp
.__newNode(myMember);
 
}

 
void doFoo() {
   
// myMember.doBar();
   
GCStrongReference temp;
    temp
.setValueFromReference(myMember);
    temp
.getNode().getObj().doBar();
 
}
}

Alternately, if the GC must have run-time information of a class's internal memory structuring, wouldn't it be better to have it set as a public member, as is done with the virtual function table, instead of globally registering the type?

All of this static registration really makes it hard for a compiler to optimize out unused data or classes.

Hugh

unread,
Jan 16, 2014, 12:00:36 AM1/16/14
to haxe...@googlegroups.com
The marking of instances field is held in an instance (virtual) method, as is the lookup-method-by-name.  It is the marking of the static fields that needs the registration (they are GC roots).

The class structure itself only takes about 50 bytes. On top is this is the member lists.  The member lists could be pruned, but since the member names will also show up in the instance find-field-by-name, you would need to remove those too to have a significant saving. Haxe, with dead-code-elimination will remove unused class entirely, so they will not be registered.

Anon fields and type specifications mix up a bit, and can be abused, eg:

var x:Dynamic = { a:1, b:"hello" };
  x.x = 1.1;

function foo(bar:{x:Float}) ...
foo(new Point(1,1));
fool(x);

For foo to use strongly typed values, adapter classes would need to be generated for each combination of source(fake interface adapter) and destination (fake interface definition). The anon-type x, could be made of an fake class, with members a and b, plus optional other fields. These changes is actually possible, but pretty low on the priority list.  Also, if you are worried about program size, this would not help.

The fake class for anon-types is something that could possible be done at the haxe AST level for all targets (so I'm waiting for someone else to do this :) ), or may even be possible with a macro (left as exercise to user).

I'm still not 100% sure of the problem - is it obfuscation, or perceived or measured exe or memory size?  Generally, I'm prepared to trash 50k of memory if it makes my life easier.

Hugh

Saar Korren

unread,
Jan 16, 2014, 7:28:11 AM1/16/14
to haxe...@googlegroups.com
You don't need an adapter class, just a constructor that takes another class as argument. The advantage is that optimizing compilers can inline such small methods, and save the method call.

I come from an era where 64k was all you needed, so all this dynamic typing doesn't sit well with me. I don't like verbose typing, string-based addressing, unnecessary virtuals,  or anything else that can't be inlined or removed by LLVM. Haxe's DCE can remove entire classes, yes, but it doesn't remove unused members, especially not ones that aren't defined in the code in the first place.

The reason I picked up Haxe in the first place is because I wanted JS's coding ease, but without the expando mess. Although the thing that really made me pick it over C was the macro system. If C's preprocessor was half as powerful, I'd have probably stuck with it.

But if Haxe can't produce code that is roughly equivalent to what I'd make in C (If I could be arsed to use a C-based GC, and do all the syntax desugaring and macro resolution manually), it kind of defeats the purpose. Like, if I wanted native deployment with all the mess of an untyped language, I could just use V8, it has BSD license, and libuv is MIT. But when I target native, I want real native, not just "standalone' or "native-like".

Haxe is nicely statically typed, which is all a language should need for generating static code. At the very least, if I avoid using haxe.Dynamic and haxe.Reflect, it should allow me to create flat code without any string-based member/class indexing. The sort of code static optimizers can have a field day with.

Saar Korren

unread,
Jan 16, 2014, 8:04:33 AM1/16/14
to haxe...@googlegroups.com
Actually, on second thought, I don't get the point on adapters, since Haxe doesn't allow "down-ssignment":
var x = {a: 1, b: 2, c: 3};
var y : {a: Int, b: Int} = x; // Causes a compile-time error

Anonymous aren't just statically typed, they are also rigidly typed - they do not extend lesser objects. They're not particularly polymorphic, so they wouldn't really require virtual functions for implementation. They may as well be implemented with C-style "struct"s, or the language equivalent thereof.

The only time you'd need to class adapt is when converting to and from Dynamic, and that applies to pretty much all the types.

But, as I've said before, I'm talking about compiler flags for indicating "I don't intend to use Dynamic or Reflect or anything from the haxe.rtti package in this code. And give me a compile-time error if I accidentally did". You know, for us old-school programmers who grew up on assembler.

Juraj Kirchheim

unread,
Jan 16, 2014, 9:12:13 AM1/16/14
to haxe...@googlegroups.com
On Thu, Jan 16, 2014 at 2:04 PM, Saar Korren <slugf...@gmail.com> wrote:
> Actually, on second thought, I don't get the point on adapters, since Haxe
> doesn't allow "down-ssignment":
> var x = {a: 1, b: 2, c: 3};
> var y : {a: Int, b: Int} = x; // Causes a compile-time error
>
> Anonymous aren't just statically typed, they are also rigidly typed - they
> do not extend lesser objects. They're not particularly polymorphic, so they
> wouldn't really require virtual functions for implementation. They may as
> well be implemented with C-style "struct"s, or the language equivalent
> thereof.

Not exactly true. The following code compiles:

var x : {a: Int, b: Int, c: Int } = {a: 1, b: 2, c: 3};
var y : {a: Int, b: Int} = x;

So does this by the way:

var p: { x:Int, y:Int } = new Point(3,4);

So in fact to allow for this, Point will have to ad-hoc extend { x:Int, y:Int }.
It's not really trivial, but IIRC the Java backend actually does this
at least partially. Maybe Caue can comment on that.

> But, as I've said before, I'm talking about compiler flags for indicating "I
> don't intend to use Dynamic or Reflect or anything from the haxe.rtti
> package in this code. And give me a compile-time error if I accidentally
> did". You know, for us old-school programmers who grew up on assembler.

The problem is not that easy to solve. And at this point it has become
very unclear what the problem is.
If it is memory, then there simply is no problem. It's irrelevant that
the result of programming a few decades back used a lot less
resources. If that is really what you're after, then it'd be best to
use programming tools that were designed to meet such constraints.

If it is obfuscation, then I think you have more options within Haxe.
You could tweak the hxcpp's Class.cpp to make the registration calls
be nops and thus be thrown out by the final compilation and write a
Context.onGenerate macro that statically checks whether no problems
were caused by that.

Regards,
Juraj

Saar Korren

unread,
Jan 16, 2014, 2:17:35 PM1/16/14
to haxe...@googlegroups.com
The discussion about anonymous objects has gotten a bit OT, and I can just not use them, so I'm going to set that aside and focus on the relevant part.

I've tried the suggested change to Class.cpp, and also tried adding every DCE-related flag to MinGW, but no go. I could still find the names of every class and field in the binary in plaintext, including ones that are never used or instantiated.

As for using different tools, I agree that I could do that. I will miss the freedom given by the macro system, but if all I was looking for was a language with meta-programming, I could use Lua or Python. The front page to Haxe's official website boasts:
If you could only learn one programming language, Haxe would be it.
It's universal. It's powerful. It's easy-to-use.
If it can't produce clean output, then that's somewhat of an exaggeration.

I agree that this is no simple problem to solve. That's why I asked about it as soon as I observed it. I need to know if it can be solved. If it can't, then I'm barking up the wrong tree. The tree being Haxe.

Juraj Kirchheim

unread,
Jan 16, 2014, 4:08:52 PM1/16/14
to haxe...@googlegroups.com
I think this discussion is leading nowhere. You do not clearly state
what it is you want.

But whatever it is, it can't be that hard. Worst case you will just
have to load all true strings (as in *content*) externally and
redefine HX_STRING to return NULL or something. That way there
shouldn't be any strings in your binary. Or you just add a step that
throws out the registration calls from the generated cpp source. Or
you figure out how to make your cpp compiler eliminate the unused
function arguments. Or reevaluate the practicality of the other macro
approach I suggested. But barking is definitely not going to solve
your problem ;)

Regards,
Juraj

Juraj Kirchheim

unread,
Jan 16, 2014, 4:10:06 PM1/16/14
to haxe...@googlegroups.com
That's assuming HX_STRING is a macro, but it seems like a good guess ...

Saar Korren

unread,
Jan 16, 2014, 4:58:44 PM1/16/14
to haxe...@googlegroups.com
I don't want to eliminate all the strings from the program(As a NULL HX_STRING would do), just the ones that hold class and field names. And I also want the arrays that hold them eliminated. And I don't want them to be loaded externally, either.

As far as I understand, they are mainly used for interacting with haxe.Dynamic, haxe.Reflect and haxe.rtti. Basically, they are meant for achieving, in C++, what JS, SWF, Java, and PHP have natively, on account of being reflective VM languages.

Haxe allows me to make working code, even cross platform one, without having to use the above classes and package. But even if I follow that kind of convention, it still generates the code to support them.

Mind you, that what I'm asking isn't for Haxe to automatically detect that I'm doing this, but rather give me some way to explicitly indicate "I'm following this convention, please don't add garbage to my code". You know, in the same way that, in JavaScript, having the string "use asm" at the top of a function indicates that you are using a subset of JavaScript that can be easily compiled ahead of time. I mean, I suppose you can equally argue that you don't see the point to asm.js, or the problem it aims to solve.

The problem I want to solve is roughly the same. Here's a language that is strictly typed 95% of time. I want a flag for indicating that I'm using a subset that is 100% strictly and rigidly typed, and to gain the implied optimization advantages of that.

Hugh

unread,
Jan 17, 2014, 12:06:01 AM1/17/14
to haxe...@googlegroups.com
Hi,

> You don't need an adapter class, just a constructor that takes another class as argument. The advantage is that optimizing compilers can inline
> such small methods, and save the method call.
This is what I mean by an adapter class. However, there will be no inlining since all functions are virtual, and only the signature is visible in the header file, and the function may store a reference to the class.

The class names will pretty much always appear in the exe - as the default 'toString' function
class Test
{
    public function new() {}
    public static function main() { trace(new Test()); }
}
So I think all that is really possible is to remove the name-list from the class registration at the same time as removing the lookup-member-by name.
But since functions size implementation is considerably greater that function name data, the percentage saving would be minimal, and therefore it is not a priority.

The hxcpp target is very much like a vm with ahead-of-time compiling, and size efficiency is simply not a goal of the project.  I understand that you are after a more "raw" direct translation, but this is just not what hxcpp is about, so consider this fair warning that it may not in fact be suitable for your project.

Hugh

Saar Korren

unread,
Jan 17, 2014, 5:26:05 PM1/17/14
to haxe...@googlegroups.com

The hxcpp target is very much like a vm with ahead-of-time compiling, and size efficiency is simply not a goal of the project.  I understand that you are after a more "raw" direct translation, but this is just not what hxcpp is about, so consider this fair warning that it may not in fact be suitable for your project.


Yeah, this is exactly what I asserted above, and said I have somewhat of an issue with.

I've been thinking about it for a while, and my issue is not with the language, but the compiler. I considered looking into the compiler source code, but for some reason, it's written in OCaml. Like, I was kind of expecting it to be written in Haxe.

I might consider building a semi-compatible Haxe to LLVM compiler, written in Haxe. It'd be simple enough to bootstrap, because the NodeJS target works smoothly enough. It's a large undertaking, though, so I have to consider pros and cons.

Saar Korren

unread,
Jan 21, 2014, 10:04:07 PM1/21/14
to haxe...@googlegroups.com
Well, I tried to do this simply with changes to HXCPP, but quickly hit a brick wall, since both haxe.Log.trace and cpp.Lib.println only accept Dynamic as parameters. And despite crippling the program to the point it no longer worked, I still haven't managed to erase the field names from the final executable.

I also tried Juraj's suggestion, but trying to add "@:native" to built in types (e.g. haxe.String) causes a compilation error. Also, there is no "@:native" for methods, only classes.

Reply all
Reply to author
Forward
0 new messages