#1 problem with scoutess at the moment

Jeremy Shaw

unread,

Nov 14, 2012, 10:43:36 PM11/14/12

to scou...@googlegroups.com

There has been some debate about whether the current design of scoutess is the best approach or not. I have outlined how the current design works, and it would be great to hear some concrete descriptions of exactly where that design 'goes wrong'.

One thing that is clear, though, is that there is a major flaw in the current implementation of that design. The VersionInfo type currently includes a full, parsed version of the .cabal file:

data VersionInfo = VersionInfo
    { viGPD            :: GenericPackageDescription -- ^ The parsed .cabal file
    , viVersionTag     :: Text                      -- ^ If two packages have the same name and version,
                                                    --   this is the tiebreaker
    , viSourceLocation :: SourceLocation            -- ^ Where this package came from
    } deriving Show

As a result, fetchVersionsHackage in this module:

http://hub.darcs.net/alp/scoutess/browse/Scoutess/Service/Source/Hackage.hs

Will attempt to load all the parsed .cabal files for every package ever uploaded to hackage into RAM at once -- which requires an absurd about of RAM (multiple gigabytes). We should certainly be able to do much better than that. Keep in mind that cabal-install has to have access to the entire set of packages on hackage in order to do the dependency solving it does -- and it runs just fine! cabal-install works by never loading all the package database into RAM at once. It first creates a file ~/.cabal/packages/hackage.haskell.org/00-index.cache which contains a bunch of lines like:

pkg: detrospector 0.1 b# 2

pkg: detrospector 0.2 b# 7

pkg: helisp 0.1 b# 13

pkg: hscurses-fish-ex 1.3.0 b# 17

pkg: hscurses-fish-ex 1.3.1 b# 20

pkg: unpack-funcs 0.1.2 b# 23

pkg: unpack-funcs 0.1.0 b# 27

pkg: unpack-funcs 0.2.0 b# 31

pkg: unpack-funcs 0.1.1 b# 35

pkg: unpack-funcs 0.3.0 b# 39

pkg: MemoTrie 0.4.8 b# 43

This is basically a list of available package names, versions, and the offset into the 00-index.tar where the rest of the package information can be found. I assume that when solving dependencies it then loads the .cabal files it needs on demand -- and probably only the relevant information from those .cabal files (aka, not the package descriptions, which are large and of type String).

I think we can use a similar trick for scoutess. VersionInfo should only contain the essential information that we need such as the package name, the version number, its SourceLocation, and instead of a parsed package description, it should just contain the path to the .cabal file on the disk. (Unlike cabal-install, we currently unpack the .tar file, making it easier to access the individual files).

This seems like an easy modification for someone to make, and should make scoutess far faster and more useable.

- jeremy

Jeremy Shaw

unread,

Dec 12, 2012, 5:39:19 PM12/12/12

to scou...@googlegroups.com

This has been 'fixed' now. Performance could still be improved -- but the server can now run with < 600MB of free RAM. Still not great, but better than the ungodly amount it was using before. I think we can get it down more if we make a go at it.

It still takes a very long time to run. Would be nice if the log had more details and time stamps so we could figure out why (and fix that).

Anyway, the fact that you can now run it (on a system with at least 1GB of RAM) makes it a lot easier to make more improvements. Hopefully this will get us unstuck.

- jeremy

Alp Mestanogullari

unread,

Dec 12, 2012, 7:51:27 PM12/12/12

to Jeremy Shaw, scou...@googlegroups.com

This has been improved again. Now scoutess eats something like 300MB of RAM here. We probably can reduce this further, but it's now usable and we can tackle the rest of the tasks.