Rethinking deterministic problem file generation in Pyomo

41 views
Skip to first unread message

Bill Hart

unread,
Jul 5, 2018, 1:20:19 PM7/5/18
to Pyomo Developers
Hi, all:

Currently, problem files like LP and GAMS files are ensured to be deterministic by sorting variable, constraint and other component names when generating these files.

With recent changes in Pyomo's master branch, Set objects now have a deterministic internal data structure, which should eliminate the need for this sorting.  I've prototyped this change in PR #602.

However, this PR exhibits non-deterministic behavior when using Python 3.5, which is almost surely due to the fact that that CPython implementation does not guarantee that dictionary keys are traversed in insertion order.  In fact, until Python 3.7 (just released), Python did not provide any guarantees of dictionary key traversal order, though in practice the CPython 3.5 release was anomalous.

I'm left wondering what strategy makes the most sense for ensuring determinism in Python 3.5 (and perhaps other earlier versions of Python).  Here are two options that seem most interesting to me:

1. Used OrderedDict in Pyomo for Python 3.5 and earlier instead of dict()

2. Keep the 'determinism' option in the writers, but change the default:  use sorting for Python 3.5 and earlier, and do not use sorting for Python 3.6 and later.

I'm leaning towards (2), though I fear that it could complicate testing.  We may need two different baselines for tests that depend on the version of Python being used to test.

--Bill

Gabriel Hackebeil

unread,
Jul 5, 2018, 1:36:42 PM7/5/18
to pyomo-de...@googlegroups.com
I vote for (1), for the reason you state about complicating testing. It will be a non-issue within 2 years, so why go through all the trouble of rewriting tests. It think using OrderedDict will have a relatively small impact on performance as well.

Gabe

--
You received this message because you are subscribed to the Google Groups "Pyomo Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyomo-develope...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bill Hart

unread,
Jul 5, 2018, 1:53:06 PM7/5/18
to Pyomo Developers
We might be able to support (2) without separate baselines by fixing some tests to use sorting (e.g. file_determinism=2).  This would especially be appropriate for tests outside of the core writer logic (e.g. pysp).  I'm looking into that now.

--Bill
Gabe

To unsubscribe from this group and stop receiving emails from it, send an email to pyomo-developers+unsubscribe@googlegroups.com.

Siirola, John D

unread,
Jul 6, 2018, 7:29:54 PM7/6/18
to pyomo-de...@googlegroups.com

I feel pretty strongly that switching Python versions should not cause a user’s model to perform differently.  So, regardless of the Python version we should output the same model (in the same order) to the solver.  That would imply an “option 3: continue to do what we have done in the past to ensure determinism across platforms”

 

john

--

You received this message because you are subscribed to the Google Groups "Pyomo Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to pyomo-develope...@googlegroups.com.

Bill Hart

unread,
Jul 6, 2018, 10:47:37 PM7/6/18
to Pyomo Developers
John,

I think that your suggestion (3) amounts to making sorted files the default, regardless of the underlying representation.  Yes?

Your suggesting that we go beyond the principle of generating the same file deterministically from one run to the next, and that generating the same sorted file representation for all versions of Python should be the default behavior.

There are performance implications for that, though not particularly large (~2% cost for sorting in pmedian8).  I guess I'm OK with this default behavior unless we see larger performance implications somewhere.

--Bill


To unsubscribe from this group and stop receiving emails from it, send an email to pyomo-developers+unsubscribe@googlegroups.com.

Bill Hart

unread,
Jul 6, 2018, 11:22:22 PM7/6/18
to Pyomo Developers
I should note that making sorted representations the default means that Pyomo will not intrinsically support indexing with noncomparable set values (in Python3).  

--Bill

Siirola, John D

unread,
Jul 7, 2018, 3:37:15 PM7/7/18
to pyomo-de...@googlegroups.com

Yes.  I am advocating that we should be more conservative than Python when it comes to determinism.  This is both for our own sanity (a single set of tests) and because as has been pointed out to me on numerous occasions, Pyomo is a math programming environment and our core users are OR/optimization folks and not CS/Python programmers.  That said, I would be happy with eventually moving to insertion order as the default for all discrete sets.  I argue for staying with sorted order for the time being because (a) that is what the writers are already doing, (b) it breaks less code (we are not yet as deterministic as we would like to be in the rest of the code – like in transformations), and (c) you advocated using OrderedDict, which is painfully slow in Python 2.7 [and despite the push for 3.x, 2.7 is still the fastest python environment].

 

In the set-rewrite branch, I have started prototyping a slightly different (and more robust/consistent) approach to sets, and am (re)learning some things along the way.  The first is that “sorted” only makes sense in the context of discrete sets (a distinction that is confounded with “concrete/virtual”  on master).  This is causing me to rethink some of our approaches to storage and iteration in IndexedComponents – especially for IndexedComponents indexed by non-discrete or semi-nondiscrete sets (think “RangeSet * Any”).  The other is that we should rely on a common sorting interface (and NOT directly use Python’s `sorted` function).  We frequently confound “sorted” with “ordered”.  Some things – like the determinism flags – draw this distinction (usually as “deterministic” vs “sorted”).  I think this should be more prominently called out in the API.

 

Finally, there is nothing that prevents Pyomo from sorting noncomparable values.  Pyomo already has a `sorted_robust` method that will deterministically sort lists of mixed-type objects.  In set-rewrite, this utility is used for the `sorted()` method on discrete sets (and can be extended in that context to deterministically order types that support no comparison operators at all by sorting by insertion order).  That said, the preceding paragraph implies that `sorted_robust` (which I have never liked the name of) should be renamed something closer to `robust_ordered()` (and the `sorted()` method on new Sets should be `ordered(sorted=bool)`.

 

john

 

From: pyomo-de...@googlegroups.com [mailto:pyomo-de...@googlegroups.com] On Behalf Of Bill Hart
Sent: Friday, July 06, 2018 9:22 PM
To: Pyomo Developers <pyomo-de...@googlegroups.com>

To unsubscribe from this group and stop receiving emails from it, send an email to pyomo-develope...@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "Pyomo Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to pyomo-develope...@googlegroups.com.

Gabe Hackebeil

unread,
Jul 7, 2018, 5:38:51 PM7/7/18
to pyomo-de...@googlegroups.com

> OrderedDict, which is painfully slow in Python 2.7 [and despite the push for 3.x, 2.7 is still the fastest python environment].

Can you verify that this actually creates a noticeable slowdown for a large model before throwing this idea out? I’ve used OrderedDict by default in Kernel since the beginning and I never noticed any performance issue, but I very likely worked mostly in Python 3.

Gabe

Siirola, John D

unread,
Jul 7, 2018, 5:57:43 PM7/7/18
to pyomo-de...@googlegroups.com
We used to have OrderedDicts in Block and removing them (and adding the PseudoMap) reduced overall runtimes significantly. I have tested it (in isolation) several times -- and of course cannot find the tests or the email where I sent out the results. However, Michael looked at it a little over a year ago and found 20% for lookup and nearly 5-10x for insert/delete. From his note:

For the timing results below, I added one million items, checked containment for all million of those items, then deleted all million of those items.

Python 2.7.12:
dict __setitem__ time: 0.288820028305
OrderedDict __setitem__ time: 2.47396302223
dict __contains__ time: 0.275412082672
OrderedDict __contains__ time: 0.331539869308
dict __delitem__ time: 0.2212870121
OrderedDict __delitem__ time: 1.47877311707

Python 3.6.1:
dict __setitem__ time: 0.20435404777526855
OrderedDict __setitem__ time: 0.244096040725708
dict __contains__ time: 0.16991281509399414
OrderedDict __contains__ time: 0.17050814628601074
dict __delitem__ time: 0.1795508861541748
OrderedDict __delitem__ time: 0.20944905281066895

John

-----Original Message-----
From: pyomo-de...@googlegroups.com [mailto:pyomo-de...@googlegroups.com] On Behalf Of Gabe Hackebeil
Sent: Saturday, July 07, 2018 3:39 PM
To: pyomo-de...@googlegroups.com
Subject: Re: [EXTERNAL] Rethinking deterministic problem file generation in Pyomo


> OrderedDict, which is painfully slow in Python 2.7 [and despite the push for 3.x, 2.7 is still the fastest python environment].

Can you verify that this actually creates a noticeable slowdown for a large model before throwing this idea out? I’ve used OrderedDict by default in Kernel since the beginning and I never noticed any performance issue, but I very likely worked mostly in Python 3.

Gabe

William Hart

unread,
Jul 8, 2018, 1:04:40 AM7/8/18
to pyomo-de...@googlegroups.com
John:

I think that your comments confound sorting in the set representation from sorting in the problem writers. 

As with dictionary ordering, it's trickier to ensure that you can ensure that sorted operations user a custom sorting function, unless you're proposing that we replace the native sorted() function (which seems like a bad idea).

So, I still see the changes to the writers as being a bit orthogonal to the set representation logic.

--Bill
Reply all
Reply to author
Forward
0 new messages