Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Haskell-cafe] hs-plugins and memory leaks

31 views
Skip to first unread message

Evan Laforge

unread,
Oct 20, 2010, 5:14:06 PM10/20/10
to haskell
I was happy to see the recent announcement about hs-plugins being
updated to work with newer ghc. I have a project and had always been
planning to use it.

However, there are some questions I've had about it for a long time.
The 'yi' paper mentions both 'yi' and 'lambdabot' as users of
hs-plugins. However, both those projects have long since abandoned
it. I can't find any documentation on why, or even any documentation
at all for Yi wrt its dynamic code execution system, but from looking
at the source it looks like it uses hint for dynamic code execution
and dyre for configuration. Dyre in turn uses serialization to pass
the old state to the reconfigured app. So we have retreated from the
idea of hotswapping the application state.

It seems to me that the advantages as put forth in the 'yi' paper
still hold. Changing the configuration in yi is rather heavyweight.
Relinking the entire editor takes a long time, and yi is still a
relatively small program. Editors can keep most of their state on
disk and can have very simple GUI state, so perhaps the serialization
and deserialization isn't such a problem, but this doesn't hold for
other programs. It seems to me the loss is significant: there's a big
difference between being able to experiment with a command by editing
and rerunning it immediately, and having to wait 10s or more for the
app to recompile, relink, shut down the ui, serialize all state, and
restart. And if you add hint, you are linking in large parts of ghc,
with an even slower link time. So, yi is no longer a dynamically
reconfigurable application, and is now merely a configurable
application.

The apparent loss of such a useful feature (you might even say a
defining feature) would presumably only happen if keeping it was
untenable. And of course that makes me reluctant to make any kind of
design that relies on it without first knowing why all existing users
jumped ship.

I can think of one possible reason, and that's a memory leak. In
ghc/rts/Linker.c:unloadObj there's a commented out line '//
stgFree(oc->image);'. In a test program I wrote that behaves like
'plugs', every executed line increases the size of the program by
12-16k. I have to remove the resolveObjs call from plugs for it to
work, but once I do it displays the same leak.

So my questions are:

Why did lambdabot and yi abandon plugins?

Is unloadObj a guaranteed memory leak? As far as I can tell, it's
never called within ghc itself. If the choices are between a memory
leak no matter how you use it and dangerous but correct if you use it
right, shouldn't we at least have the latter available as an option?
E.g. a reallyUnloadObj function that also frees the image.

If I uncomment that line will it fix the problem? Is it safe to do so
if I first force all thunks that might contain unloaded code?

Long shot, but are there any more principled ways to guarantee no
pointers to a chunk of code exist? The only thing I can think of is
to have the state be totally strict and consist only of types from the
static core. Would it be possible to hand responsibility for the
memory off to the garbage collector?

GHC now supports dynamic libraries. Given that plugins may need to
link large portions of the static core "library", can it be loaded as
a dynamic library so both the core and the plugins can share the same
code? I haven't been able to find many references to ghc's support
for dynamic linking.
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Don Stewart

unread,
Oct 20, 2010, 5:19:37 PM10/20/10
to Evan Laforge, haskell
qdunkan:

> However, there are some questions I've had about it for a long time.
> The 'yi' paper mentions both 'yi' and 'lambdabot' as users of
> hs-plugins. However, both those projects have long since abandoned
> it. I can't find any documentation on why, or even any documentation
> at all for Yi wrt its dynamic code execution system, but from looking
> at the source it looks like it uses hint for dynamic code execution
> and dyre for configuration. Dyre in turn uses serialization to pass
> the old state to the reconfigured app. So we have retreated from the
> idea of hotswapping the application state.

Once active development of hs-plugins stopped, along with the
portability issues, it behooved projects like e.g. xmonad or yi, to aim
for simpler reconfiguration strategies, other than native code hot
loading.

> I can think of one possible reason, and that's a memory leak. In
> ghc/rts/Linker.c:unloadObj there's a commented out line '//
> stgFree(oc->image);'. In a test program I wrote that behaves like
> 'plugs', every executed line increases the size of the program by
> 12-16k. I have to remove the resolveObjs call from plugs for it to
> work, but once I do it displays the same leak.

> So my questions are:
>
> Why did lambdabot and yi abandon plugins?

Because it was unmaintained for around 5 years, and was fundamentally
less portable than simpler state serialization solutions that offered
some of the same benefits as full code hot swapping.

> Is unloadObj a guaranteed memory leak? As far as I can tell, it's
> never called within ghc itself. If the choices are between a memory
> leak no matter how you use it and dangerous but correct if you use it
> right, shouldn't we at least have the latter available as an option?
> E.g. a reallyUnloadObj function that also frees the image.

GHC never unloads object code, so yes, it will "leak" old code.

> Long shot, but are there any more principled ways to guarantee no
> pointers to a chunk of code exist? The only thing I can think of is
> to have the state be totally strict and consist only of types from the
> static core. Would it be possible to hand responsibility for the
> memory off to the garbage collector?

It's really hard.

-- Don

Evan Laforge

unread,
Oct 20, 2010, 6:26:45 PM10/20/10
to Don Stewart, haskell
>> So my questions are:
>>
>> Why did lambdabot and yi abandon plugins?
>
> Because it was unmaintained for around 5 years, and was fundamentally
> less portable than simpler state serialization solutions that offered
> some of the same benefits as full code hot swapping.

Fair enough. The idea of being able to make changes and see them
quickly enough for it to have an interactive feel is very appealing,
but maybe there are other ways to get there, such as improving link
time with dynamic linking (my current link time is around 24 seconds).
State serialization + restart is definitely simpler and more robust.
But if it's impossible to get it fast enough otherwise, and there
aren't any other show stopping problems (I think even a known memory
leak may be dwarfed by the amount of data the app keeps in memory
anyway), then it might be worth it to me to maintain hs-plugins.

>> Is unloadObj a guaranteed memory leak? �As far as I can tell, it's
>> never called within ghc itself. �If the choices are between a memory
>> leak no matter how you use it and dangerous but correct if you use it
>> right, shouldn't we at least have the latter available as an option?
>> E.g. a reallyUnloadObj function that also frees the image.
>
> GHC never unloads object code, so yes, it will "leak" old code.

So would freeing oc->image fix the leak? In my case, it's not too
hard to force all data structures that might reference it.

>> Long shot, but are there any more principled ways to guarantee no
>> pointers to a chunk of code exist? �The only thing I can think of is
>> to have the state be totally strict and consist only of types from the
>> static core. �Would it be possible to hand responsibility for the
>> memory off to the garbage collector?
>
> It's really hard.

It happens in python for python bytecode, since it exists as a plain
data structure in the language. E.g. 'code = compile('xyz')'.
Couldn't a haskell solution be along the same lines? 'code <- load
"X.o"; makeFunction code', and then makeFunction holds a ForeignPtr to
the actual code and there's some kind of primitive to call a chunk of
code as a function.

Andy Stewart

unread,
Oct 20, 2010, 9:05:41 PM10/20/10
to haskel...@haskell.org
Hi Evan,
Evan Laforge <qdu...@gmail.com> writes:

>>> So my questions are:
>>>
>>> Why did lambdabot and yi abandon plugins?
>>
>> Because it was unmaintained for around 5 years, and was fundamentally
>> less portable than simpler state serialization solutions that offered
>> some of the same benefits as full code hot swapping.
>
> Fair enough. The idea of being able to make changes and see them
> quickly enough for it to have an interactive feel is very appealing,
> but maybe there are other ways to get there, such as improving link
> time with dynamic linking (my current link time is around 24 seconds).
> State serialization + restart is definitely simpler and more robust.
> But if it's impossible to get it fast enough otherwise, and there
> aren't any other show stopping problems (I think even a known memory
> leak may be dwarfed by the amount of data the app keeps in memory
> anyway), then it might be worth it to me to maintain hs-plugins.
>

I have project design for use dynamic linking, i even build 'pdynload'
(http://hackage.haskell.org/package/pdynload-0.0.3) with Don's PhD
thesis.

Last, i remove pdynload code from my project temporary with below reasons:

1) Hold running state is difficult, like network state in browser or
running state in terminal emulator.

2) Linking time is too long, I have haskell OS project
(http://www.flickr.com/photos/48809572@N02/) have many sub-module, every
sub-module is very big, and linking time is too long.

3) Memory leak like you said.

>>> Is unloadObj a guaranteed memory leak?  As far as I can tell, it's
>>> never called within ghc itself.  If the choices are between a memory
>>> leak no matter how you use it and dangerous but correct if you use it
>>> right, shouldn't we at least have the latter available as an option?
>>> E.g. a reallyUnloadObj function that also frees the image.
>>
>> GHC never unloads object code, so yes, it will "leak" old code.
>
> So would freeing oc->image fix the leak? In my case, it's not too
> hard to force all data structures that might reference it.

It's not safe for GHC runtime system since you don't know when time
unload old code is safe.

Don's idea is hold old state in memory even you load new state for
hot-swapping safely.


>
>>> Long shot, but are there any more principled ways to guarantee no
>>> pointers to a chunk of code exist?  The only thing I can think of is
>>> to have the state be totally strict and consist only of types from the
>>> static core.  Would it be possible to hand responsibility for the
>>> memory off to the garbage collector?
>>
>> It's really hard.
>
> It happens in python for python bytecode, since it exists as a plain
> data structure in the language. E.g. 'code = compile('xyz')'.
> Couldn't a haskell solution be along the same lines? 'code <- load
> "X.o"; makeFunction code', and then makeFunction holds a ForeignPtr to
> the actual code and there's some kind of primitive to call a chunk of
> code as a function.

Anyway, i was re-thinking hot-swap haskell some time, my idea
is :

multi-processes framework
+ hot-swapping core entry
+ mix old/new sub-module in runtime

Core and sub-module all in separate processes.

With my project (http://www.flickr.com/photos/48809572@N02/), editor and
browser (many other sub-module ...) are sub-module.

Core don't do anything, just control how to load sub-module.

Core have 'entry code', like 'pageBufferNewFun' in
https://patch-tag.com/r/AndyStewart/manatee/snapshot/current/content/pretty/Manatee.hs
'sourceBufferNew', 'browserBufferNew' are 'entry function' to load sub-module in *new*
process.

Core process always running, so we just need hot-swapping 'entry code' after we update sub-module
library by cabal, then we can use new 'entry code' load sub-module in new process, at the same
time, old sub-module code still running in old process.

Welcome to discuss. :)

Cheers,

-- Andy

Evan Laforge

unread,
Oct 20, 2010, 9:37:23 PM10/20/10
to Andy Stewart, haskel...@haskell.org
> Last, i remove pdynload code from my project temporary with below reasons:
>
> 1) Hold running state is difficult, like network state in browser or
> running state in terminal emulator.

This doesn't seem too hard to me. Provided you are not swapping the
module that defines the state in the first place, simply reload the
module, and replace the old symbol in the state with the reloaded one.

> 2) Linking time is too long, I have haskell OS project
> (http://www.flickr.com/photos/48809572@N02/) have many sub-module, every
> sub-module is very big, and linking time is too long.

This is discouraging, since one of the main reasons to use dynamically
loaded code instead of recompiling the whole app is to avoid long link
times. Presumably you would compile the majority of the app (the API
that the plugins use, and the internal code also uses) as a dynamic
library:

main.o -> tiny stub that just calls app.so
app.so -> large library containing all app logic
plugin.so -> links against app.so when loaded

So the plugin needs to read a lot of hi files when recompiling, but
the dynamic link time should be proportional to the number of
unresolved symbols in plugin.so that point into app.so, not
proportional to the overall size of the app, right?

>> So would freeing oc->image fix the leak? �In my case, it's not too
>> hard to force all data structures that might reference it.
> It's not safe for GHC runtime system since you don't know when time
> unload old code is safe.

But that's just my question, I *do* (think I) know when it's safe,
which is after the data that has passed through plugged-in code has
been fully forced. Can't I just call unloadObj then?

E.g., loading and unloading plugins for audio processing is totally
standard. Since the data is strict arrays of primitive types, there's
no risk of stray pointers to unloaded code.

> Anyway, i was re-thinking hot-swap haskell some time, my idea
> is :
>
> � � multi-processes framework
> � + hot-swapping core entry
> � + mix old/new sub-module in runtime
>
> Core and sub-module all in separate processes.

How would you pass state between processes?

Andy Stewart

unread,
Oct 20, 2010, 10:50:43 PM10/20/10
to Evan Laforge, haskel...@haskell.org
Evan Laforge <qdu...@gmail.com> writes:

>> Last, i remove pdynload code from my project temporary with below reasons:
>>
>> 1) Hold running state is difficult, like network state in browser or
>> running state in terminal emulator.
>
> This doesn't seem too hard to me. Provided you are not swapping the
> module that defines the state in the first place, simply reload the
> module, and replace the old symbol in the state with the reloaded one.
>
>> 2) Linking time is too long, I have haskell OS project
>> (http://www.flickr.com/photos/48809572@N02/) have many sub-module, every
>> sub-module is very big, and linking time is too long.
>
> This is discouraging, since one of the main reasons to use dynamically
> loaded code instead of recompiling the whole app is to avoid long link
> times. Presumably you would compile the majority of the app (the API
> that the plugins use, and the internal code also uses) as a dynamic
> library:
>
> main.o -> tiny stub that just calls app.so
> app.so -> large library containing all app logic
> plugin.so -> links against app.so when loaded
>
> So the plugin needs to read a lot of hi files when recompiling, but
> the dynamic link time should be proportional to the number of
> unresolved symbols in plugin.so that point into app.so, not
> proportional to the overall size of the app, right?

Yes, not proportional the size of application,
but link time depend on the dependent packages that haven't linked.

Example like the GHC API in 'pdynload' package, it will search symbol define in GHC
database to get which packageId that need re-link, then use below code
link:

Linker.linkPackages flags [packageId]

Function 'linkPackages' will link specified package and it's "dependent
packages", if dependents packages is bigger, link time is longer.

So the long link time is unavoidable for *big* package.

>
>>> So would freeing oc->image fix the leak?  In my case, it's not too
>>> hard to force all data structures that might reference it.
>> It's not safe for GHC runtime system since you don't know when time
>> unload old code is safe.
>
> But that's just my question, I *do* (think I) know when it's safe,
> which is after the data that has passed through plugged-in code has
> been fully forced. Can't I just call unloadObj then?

Yes, unloadObj can work if you careful design, well it's also easy to crash
your program if something miss.

>
> E.g., loading and unloading plugins for audio processing is totally
> standard. Since the data is strict arrays of primitive types, there's
> no risk of stray pointers to unloaded code.
>
>> Anyway, i was re-thinking hot-swap haskell some time, my idea
>> is :
>>
>>     multi-processes framework
>>   + hot-swapping core entry
>>   + mix old/new sub-module in runtime
>>
>> Core and sub-module all in separate processes.
>
> How would you pass state between processes?

Infact, i won't pass any state between processes.

My framework like this: http://www.flickr.com/photos/48809572@N02/5031811365/lightbox/

Every sub-module running in render process, and render process for
daemon process just a *Tab*.

When you need update current sub-module, just recompile new code in
Cabal/GHC database, then startup *new* process to load new code, and we
can use dyre technology to restore state in new process.

Though it's not powerful as hs-plugins do, but perfect safety and no
*memory leak*.

-- Andy

0 new messages