Comparison of LIAM2 with Genesis (SAS based model)

43 views
Skip to first unread message

Howard Redway

unread,
Apr 26, 2013, 3:50:54 PM4/26/13
to liam...@googlegroups.com
LIAM2 Development Team (and others who may be interested)

Background

As you know I have been evaluating LIAM2 as one of several long term options for replacing Genesis, the dynamic microsimulation tool we use in the Department for Work and Pensions.  Genesis is a program generator that read the specification of a model in Excel workbooks and writes, then runs, a large SAS program.  Most of the program is Generic Code, generated from Excel, but some procedures are coded by the model developer in SAS, we call this Static Code as it does not need to be modified by users to model different scenarios.  The main strength of Genesis is that users can change the model without writing any SAS code.  The main disadvantage, and the reason we are considering alternatives, is the long runtimes of most of the models.  The Static Code is efficient but accounts for only about 3% of the runtime.  It is the Generic Code, generated from the Excel, were we believe significant reductions are possible.

Comparison Test

To compare Genesis and LIAM2 runtimes I specified a model that includes examples of most types of procedure used in our models.  I coded and ran this model in both Genesis and LIAM2.  Runtime for these could then be used to estimate the runtime
for one of our models, Pensim2, if run under LIAM2, taking account of the frequency of each type of procedure.  I have modified the original model to take advantage of any major new features and efficiency improvement in each release of LIAM2.  With the help of Geert and Gaëtan, to whom I am most grateful, I have been able to incorporate into my model the global arrays available in release 0.6.1.

Unfortunately we are not yet able to run the two versions of the model on comparable hardware.  But even with LIAM2 running on a modest mid-range PC and Genesis on a UNIX server LIAM2 runs the Generic Code processes considerably faster, overall by a factor of about 20.  On the assumption that there would be no improvement for the Static Code if coded in Python then a LIAM2/Python model could take 10% of the time of Genesis.

Limitations of LIAM2

The current version of LIAM2 does have at least one limitation that prevent our models being implemented in LIAM2, the control of random numbers.  LIAM2 appears to draws random numbers from a single stream in the order required by the simulation with the only control being the specification of the initial seed.  Genesis manages random numbers in such a way that if a procedure is unchanged then, if required, the same random numbers can always be drawn even if the number of random draws required for earlier procedures differs.  This is an important feature of Genesis as it reduces stochastic differences to a minimum when comparing two scenarios.  This is essential for the analysis of gainers and losers from a policy proposal as it ensures that for most policy options the same units exist in two runs and can be compared at the micro level.  This would be a major obstacle were we to wish to convert Genesis models to the current release of LIAM2

Other differences would require workarounds, be less transparent, or increase maintenance
Missing Values:  LIAM2 only supports a missing values indicator nan for float variables.  Missing integers are codes as -1 and boolians and false, both of which are valid values.  SAS, and therefore Genesis, supports missing values indicators for all variable type and Genesis code makes use of these.  It even allows different indicators to be specified by the user for different type of missing values.
Selection Groups:  Many procedures in Genesis models are large and apply different probabilities or regression equations to different population groups.  Genesis has a reasonably concise syntax for specifying such procedures.  The introduction of global arrays in 0.6.0 was a major improvement in converting Genesis code to LIAM2 procedures using choice, particularly when probabilities were period-specific; the enhancement also enable significant reductions in runtimes to be achieved.  But if a Genesis regression involves several different formulae then a separate LIAM2 regression process is required for each, although these may be specified within nested if terms.
Alignment with Choice:  Several Genesis procedures that would use choice in LIAM2 are aligned but LIAM2 does not support alignment for choice.  There are several ways to work round this.
Ordered Regressions:  Genesis supports ordered multinomial regressions but LIAM2 only binary stochastic outcomes.  A workaround would be necessary in LIAM2 requiring as separate assignment for each of the ordered outcomes.
Support for Dates:  SAS support a large number of date functions, most of which can be used in the specification of Genesis generic code.
Modular Structure:  Genesis has a modular structure that enables groups of related procedures to be specified in separate Excel workbooks.  A LIAM2 model is specified as a program in a single text file, together with a data file that may contain macro variable and alignment parameters.  Even for the small comparison model the program of 1700 lines has becoming too large to work with conveniently.  A LIAM2 implementation of Pensim2 may be between 10 and 20 times larger.
Processing LIAM2 if:  The processing of the LIAN2 if statements can result in unexpected crashes, requiring a workaround, and may also involve unnecessary additional processing.  Gaëtan explained the reason for this in an email to me in March “ . . . the vectorized nature of Liam2 which means that both branches of an if are always evaluated (for all individuals), and the if only "selects" one value or another.  For expressions which evaluate to a "bad" value outside the filter, this is not a problem, for expressions which "crash" outside the filter, like in this case, it is.”  Does the same issue apply to the use of filters?
Final Remarks

The above points may also be relevant to other users of LIAM2, particularly any with very large and complex models.  In addition there are several issues for the Department, including the interface with SAS, that will continue to be the main data processing and analytical tool, and the size of any conversion process.

I have inevitably focused on the negative side of LIAM2 from the rather narrow perspective of a potential alternative to Genesis.  Even on this basis LIAM2 has several positive features, in addition to the very impressive runtimes.


Howard Redway
Model Development Unit
UK Department for Work and Pensions

how...@howard-redway.co.uk

unread,
Apr 29, 2013, 4:07:45 AM4/29/13
to liam...@googlegroups.com
I have a  furthen point on the limitations for which workarounds are required

Probabilities in Choice:  Gaëtan also explained that “the choice function currently does not support "vector" arguments (ie a different value for each person)”.  This means that if the probabilities vary between groups of individuals and by period then a separate choice with if filter procedure is necessary for each combination.  So it is not possible to specify all the probabilities in a global array and use these as prob_option parameters in choice.  In my model I have used Gaëtan’s suggestion of not using choices but implementing the procedure by testing uniform() < prob_option.  When more than two outcomes are possible it is necessary to convert the probabilities for each outcome to a series of cumulative probabilities (or specify these in the global array) and use a series of nested ifs.  Recoding my previous model to use global arrays in this way reduced runtimes considerably and would also make updating the probabilities much easier. 

Howard Redway
Model Development Unit
UK Department for Work and Pensions

Gaëtan de Menten

unread,
Apr 29, 2013, 5:47:43 AM4/29/13
to liam...@googlegroups.com

On 26/04/2013 21:50, Howard Redway wrote:

> *As you know I have been evaluating LIAM2 as one of several long term
> options for replacing Genesis,

Thanks a lot for sharing the results of your comparison.

> On the
> assumption that there would be no improvement for the /Static Code /if
> coded in Python then a LIAM2/Python model could take 10% of the time of
> Genesis.

Well, it is very possible (and even probable) that (the current version
of) Liam2 is slower than SAS on your "static" code.

> *Limitations of LIAM2*
>
> The current version of LIAM2 does have at least one limitation that
> prevent our models being implemented in LIAM2, the control of random
> numbers. LIAM2 appears to draws random numbers from a single stream in
> the order required by the simulation with the only control being the
> specification of the initial seed. Genesis manages random numbers in
> such a way that if a procedure is unchanged then, if required, the same
> random numbers can always be drawn even if the number of random draws
> required for earlier procedures differs. This is an important feature
> of Genesis as it reduces stochastic differences to a minimum when
> comparing two scenarios. This is essential for the analysis of gainers
> and losers from a policy proposal as it ensures that for most policy
> options the same units exist in two runs and can be compared at the
> micro level. This would be a major obstacle were we to wish to convert
> Genesis models to the current release of LIAM2

This feature is planned for the 0.8 release (which should hopefully come
out sometime during the summer). I will write a separate mail about this
feature.

> Other differences would require workarounds, be less transparent, or
> increase maintenance
>
> /Missing Values: / LIAM2 only supports a missing values indicator
> /nan/ for float variables. Missing integers are codes as -1 and
> boolians and false, both of which are valid values. SAS, and
> therefore Genesis, supports missing values indicators for all
> variable type and Genesis code makes use of these.

Yes, this is indeed a problem. It has been on my TODO list for years
now, but it never came to the top of my priority list yet, so do not
hold your breath on this...

> It even allows
> different indicators to be specified by the user for different type
> of missing values.

How does that work exactly?

> /Selection Groups: / Many procedures in Genesis models are large and
> apply different probabilities or regression equations to different
> population groups. Genesis has a reasonably concise syntax for
> specifying such procedures. The introduction of global arrays in
> 0.6.0 was a major improvement in converting Genesis code to LIAM2
> procedures using choice, particularly when probabilities were
> period-specific; the enhancement also enable significant reductions
> in runtimes to be achieved. But if a Genesis regression involves
> several different formulae then a separate LIAM2 regression process
> is required for each, although these may be specified within
> nested/if /terms.

I am not sure I understand what you mean. Could you provide an example
and possibly a suggestion of syntax you would like?

> /Alignment with Choice: / Several Genesis procedures that would use
> choice in LIAM2 are aligned but LIAM2 does not support alignment for
> choice. There are several ways to work round this.

> /Ordered Regressions:/ Genesis supports ordered multinomial
> regressions but LIAM2 only binary stochastic outcomes. A workaround
> would be necessary in LIAM2 requiring as separate assignment for
> each of the ordered outcomes.

> Support for Dates: SAS support a large number of date functions,
> most of which can be used in the specification of Genesis generic code.

These three items should be relatively easy to implement.

> /Modular Structure: /Genesis has a modular structure that enables
> groups of related procedures to be specified in separate Excel
> workbooks. A LIAM2 model is specified as a program in a single text
> file, together with a data file that may contain macro variable and
> alignment parameters. Even for the small comparison model the
> program of 1700 lines has becoming too large to work with
> conveniently. A LIAM2 implementation of Pensim2 may be between 10
> and 20 times larger.

An import functionality is already implemented in the development
version (future 0.7).

> /Processing LIAM2 if: / The processing of the LIAN2/if /statements
> can result in unexpected crashes, requiring a workaround, and may
> also involve unnecessary additional processing.

It should of course not crash. This is simply a bug you uncovered, that
will be fixed before the next release.

> Ga�tan explained
> the reason for this in an email to me in March � . . . the
> vectorized nature of Liam2 which means that both branches of an /if
> /are always evaluated (for all individuals), and the /if/ only
> "selects" one value or another. For expressions which evaluate to a
> "bad" value outside the filter, this is not a problem, for
> expressions which "crash" outside the filter, like in this case, it
> is.�

> Does the same issue apply to the use of filters?

The answer is different on a case by case basis. I usually choose
whichever way is fastest for a particular operation. Taking a subset of
the population is a more costly operation that one can imagine because
you have to allocate a whole new block of memory and copy all
individuals that match the filter, so sometimes, computing for the whole
population is the faster alternative. Yes, this is not ideal as it does
some (potentially a lot of) useless computation, but when it is the
fastest alternative on average on real-world scenarios (with the set of
technologies/libraries Liam2 currently use), I do not see any reason to
implement it otherwise.

> *Final Remarks
>
> *The above points may also be relevant to other users of LIAM2,
> particularly any with very large and complex models. In addition there
> are several issues for the Department, including the interface with SAS,
> that will continue to be the main data processing and analytical tool,
> and the size of any conversion process.
>
> I have inevitably focused on the negative side of LIAM2 from the rather
> narrow perspective of a potential alternative to Genesis. Even on this
> basis LIAM2 has several positive features, in addition to the very
> impressive runtimes.

All these limitations should be reasonably easy to overcome with a few
months of development. Should you choose to switch to Liam2 and
implement them yourselves, I will try to help you as I can, and I will
certainly be open to merge them back in the core product. If you would
prefer me to implement them, we are always open to discuss a formal
collaboration of some sort.

Best regards,
Ga�tan


----------------------------------------------------------------------------

Disclaimer: please see "www.plan.be/disclaimer.html"

Please consider your environmental responsibility before printing this email

----------------------------------------------------------------------------

Reply all
Reply to author
Forward
0 new messages