Multi-Dimensional Arrays

21 views
Skip to first unread message

Howard Redway

unread,
May 2, 2014, 6:46:23 AM5/2/14
to liam...@googlegroups.com

This post raises to separate issues:

1. A Problem with the Use of Boolean Variables in Arrays and the Benefits of Booleans

2. The Relevance of the Variable Names in Multi-Dimensional Array Files


1. A Problem with the Use of Boolean Variables in Arrays and the Benefits of Booleans

I have a problem when I try to use a multi-dimensional global array where one dimension is an integer and the other a Boolean variable.   

I have reduced my problem to a simple two dimension test.

The array file imported to global array Array_Probability is

SEX,INWORK,

,False,True

1,0.10,0.90

2,0.20,0.80

The array is referenced in the following process (note SEX is an integer taking values 1,2):

entities:

    PA1:

        processes:

            LM03_OUTOFWORK_Array:

                - ASSIGN1: Array_Probability[SEX-1,FULLYR]


This give the error message

- 6/6 LM03_OUTOFWORK_Array ... 

    * ASSIGN1 

ERROR: shape mismatch: objects cannot be broadcast to a single shape

If I change the FULLYR variable from Boolean to a 0,1 variable the same process runs without errors and produces the expected results.

Can anyone see an error in the Boolean version of my test?

In section 8.1 of the User Guide (LIAM2 0.8.0) there is an example of an array with a mix of integer and Boolean dimension, although this is in the context of alignment arrays.

gender | work | civilstate | | |

| | 1 | 2 | 3 | 4

False | False | 5313 | 1912 | 695 | 1222

False | True | 432 | 232 | 51 | 87

True | False | 4701 | 2185 | 1164 | 1079

True | True | 369 | 155 | 101 | 116

Are there any memory or runtime performance benefits of using a Boolean variable rather than an integer 0,1?  

I have not been able to find any on the internet.

https://docs.python.org/2.3/whatsnew/section-bool.html states that “Python's Booleans were added with the primary goal of making code clearer”.

http://legacy.python.org/dev/peps/pep-0285/ gives no indication of any performance benefits.

Unless there is an error in my code or clear advantages in using Boolean variables I plan to convert all the Boolean variables to integers.  This gives me the flexibility to use these with other integer variables to access arrays should this be a requirement at a future date.


2. The Relevance of the Variable Names in Multi-Dimensional Array Files

Except for arrays with a period column the index value of the dimension is used to access the data.  So in the example of Array_Probability array above could I access the data by Array_Probability[x,y] where x and y are any 0,1 integers?


Gaëtan de Menten

unread,
May 7, 2014, 7:23:53 AM5/7/14
to liam...@googlegroups.com
On 02/05/2014 12:46, Howard Redway wrote:
> This post raises to separate issues:
>
> 1.A Problem with the Use of Boolean Variables in Arrays and the Benefits
> of Booleans
>
> 2.The Relevance of the Variable Names in Multi-Dimensional Array Files
>
>
> 1.A Problem with the Use of Boolean Variables in Arrays and the Benefits
> of Booleans
>
> I have a problem when I try to use a multi-dimensional global array
> where one dimension is an integer and the other a Boolean variable.
>
> I have reduced my problem to a simple two dimension test.
>
> The array file imported to global array Array_Probability is
>
> SEX,INWORK,
>
> ,False,True
>
> 1,0.10,0.90
>
> 2,0.20,0.80
>
> The array is referenced in the following process (note SEX is an integer
> taking values 1,2):
>
> entities:
>
> PA1:
>
> processes:
>
> LM03_OUTOFWORK_Array:
>
> - ASSIGN1: Array_Probability[SEX-1,FULLYR]
>
>
> This give the error message
>
> - 6/6 LM03_OUTOFWORK_Array ...
>
> * ASSIGN1
>
> ERROR: shape mismatch: objects cannot be broadcast to a single shape
>
> If I change the FULLYR variable from Boolean to a 0,1 variable the same
> process runs without errors and produces the expected results.
>
> Can anyone see an error in the Boolean version of my test?

Well, the "indexing" operation ([]) currently only supports integer
indices. If you have a boolean variable, you have to convert it to
integers before you use it to index an array. There are many ways to do
so (none of them really pretty). bool_expr + 0 is probably the fastest
way to do it currently.

> In section 8.1 of the User Guide (LIAM2 0.8.0) there is an example of an
> array with a mix of integer and Boolean dimension, although this is in
> the context of alignment arrays.
>
> gender | work | civilstate | | |
>
> | | 1 | 2 | 3 | 4
>
> False | False | 5313 | 1912 | 695 | 1222
>
> False | True | 432 | 232 | 51 | 87
>
> True | False | 4701 | 2185 | 1164 | 1079
>
> True | True | 369 | 155 | 101 | 116
>
> Are there any memory or runtime performance benefits of using a Boolean
> variable rather than an integer 0,1?

There are both:

* memory: bool uses 1 byte whereas integers use 4
* runtime: you need one extra operation in many places. ex:
if(inwork and married, x, y) is two operations (and and if) but
if(inwork == 1 and married == 1, x, y) is four !
This is because all boolean operators (and, or and not) as well as the
"if" function require booleans, not integers.
* readability: if(inwork and not married, x, y) is a lot more readable
than if(inwork == 1 and married == 0, x, y)

That said, numerical operations like == are so fast in comparison with
some other operations like alignment that I would not worry about the
runtime if I were you. Memory might be an issue depending on your model,
but readability is certainly the biggest issue for me.

> I have not been able to find any on the internet.
>
> https://docs.python.org/2.3/whatsnew/section-bool.html states that
> “Python's Booleans were added with the primary goal of making code clearer”.
>
> http://legacy.python.org/dev/peps/pep-0285/ gives no indication of any
> performance benefits.
>
> Unless there is an error in my code or clear advantages in using Boolean
> variables I plan to convert all the Boolean variables to integers. This
> gives me the flexibility to use these with other integer variables to
> access arrays should this be a requirement at a future date.

Well this is something we are currently working on. More specifically,
we are creating a new "n-dimensional array" class. When it is done, it
will be used in several internal projects here at the BFP and for Liam2.
I will make sure to find an acceptable solution for this problem. I
think the best way would be to introduce a specific method to index the
array instead of using the [] operator, which has already some precise
meaning in the libraries we use. So you would have to write something like:

- ASSIGN1: Array_Probability.lookup(SEX, FULLYR)

This might take a few months before we get to that point though.

> 2.The Relevance of the Variable Names in Multi-Dimensional Array Files
>
> Except for arrays with a period column the index value of the dimension
> is used to access the data.

Indeed.

> So in the example of Array_Probability
> array above could I access the data by Array_Probability[x,y] where x
> and y are any 0,1 integers?

Of course you can. x and y can be any expression returning valid indices
for your array.

Hope it helps,
Gaëtan
Reply all
Reply to author
Forward
0 new messages