NMatrix constructor refactoring

24 views
Skip to first unread message

John Woods

unread,
Aug 23, 2013, 1:56:43 PM8/23/13
to sciru...@googlegroups.com
The NMatrix constructor is up for a re-factoring. Clearly no one but me uses the current initialize, which is 

NMatrix.new(stype, shape, c, dtype)

- with optional stype defaulting to :dense, and
- with optional c as the initial capacity for yale, the default value for a list matrix, or an initial array or value for dense, and
- with dtype optional for dense or list when c is provided, and never optional for yale.

This has the following problems:

- It confuses new users that c can mean so many things.
- It's slow.
- It's currently not possible to control Yale's default value, except via #cast, but it should be.

I propose that we rewrite this constructor to be as clear and fast as possible and then make use of it in each of the shortcuts (zeros, ones, N[], etc.).

Here are the parameters that should be provided:

- stype mandatory
- initial_capacity for Yale (technically can be thought of as 0 for list and #size for dense)
- default_value (optional; 0 for yale/list, 0 for :object or :rational dense, undefined for other dense)
- default_array (allows us to pass in an Array of initial values to be repeated throughout the matrix; overrides default_value for dense)
- dtype (can be extrapolated from default_value/default_array)
- initial_capacity (optional and ignored for non-Yale types)

I suspect an options hash will be slow, but would probably be the clearest way to set all of these options. I want fast. I'm tempted to just ask for all six values and ignore those that are irrelevant.

We'll probably still allow the old type of construction too, for backwards compatibility.

Thoughts?

John


John Prince

unread,
Aug 23, 2013, 4:08:36 PM8/23/13
to SciRuby .
I think this is generally a good idea, although it is not exactly simple.  Still, simple is what users will want, generally.

One paradigm that I've seen work well (and this is what NArray and Array does) is to use NMatrix.new to focus on initializing objects in a very complete way.  Then, you can use the method NMatrix[ ] as a convenience method, mainly for converting over arrays with code written by hand.

There has been some debate over the use of the 'N' shortcut.  On one hand, I really like it for brevity.  I suspect that NMatrix eventually is going to be used *a lot* and it should be trivial to create numeric arrays.  Still, new users may be confused by what the 'N' represents, or they may want to use 'N' for some other designation.  I propose this solution (sort of an amalgamation of all the above thoughts):

1. NMatrix.new focuses on initializing empty arrays and having a fast, complete interface for doing so.  It is the rough equivalent to NArray.new and Array.new.  I think we should study Array.new and NArray.new for inspiration.

# create an 8 row, 6 column matrix initialized to 5.0
# (this still needs development, but the current .new method is a good start)
NMatrix.new( [8, 6], 5.0) 

2. The NMatrix[ ] method is the defacto convenience method for creating arrays easily.  This is equivalent to the NArray[ ] method, the '[]' (make a ruby array method)  and also the numpy array() method.  It does not aim to provide a complete interface, or even to be fast.  It is merely the simplest method to create numeric arrays.

NMatrix[ [3,4,5], [5,3,2] ]  # create a 2 row, 3 column matrix

3. N[ ] can be used by requiring one single file, which also requires nmatrix.  Alternatively, the user could add the line "N = NMatrix" to their code.  It now becomes explicit (since the require makes it obvious what the 'N' is) but it also makes it really easy to get up and running with the N notation (which is for sure popular, or will be, with at least a subset of the community).  Here's how it would look:

require 'nmatrix/N'  # merely calls "require 'nmatrix'" and "N = NMatrix"
N[ 1, 2, 3 ]
N.ones( 8, 4 ) 
...

This doesn't really solve the interface rewrite issue (I don't address dense/yale, etc), but it provides some separation of responsibilities and may make it easier to do so.  If we can achieve simple, fast, and complete all in one interface, then we could certainly set NMatrix.new to equal NMatrix[] and that would be great (have cake and eat it too).  However, it may be useful to at least consider these approaches separately for a time.

one other note on this: 

We'll probably still allow the old type of construction too, for backwards compatibility.

My personal thinking on this is that in pre-beta release that we don't worry too much about backwards compatibility.  Maybe just rename your current new method something like oldnew so it is easy to keep things running (i.e., find and replace most things).  Then we can gradually go back in and phase out oldnew as needed.



--
You received this message because you are subscribed to the Google Groups "SciRuby Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Carlos Agarie

unread,
Aug 28, 2013, 9:48:12 PM8/28/13
to sciru...@googlegroups.com
I like the idea of refactoring it if we can really improve its speed. Its simplicity (for a newcomer to NMatrix) isn't really that important, because of the MATLAB-inspired shortcuts and NMatrix[]/N[].

And... we're not even on 0.1.0 yet, backwards-compatibility shouldn't be an issue anyway. Let's break things! :)


-----
Carlos Agarie
Software Engineer @ Geekie (geekie.com.br)
@carlos_agarie


2013/8/23 John Woods <john.o...@gmail.com>

Ryan Taylor

unread,
Sep 3, 2013, 6:53:50 PM9/3/13
to sciru...@googlegroups.com
As a user of the current initializer, I believe that simple and intuitive is the Ruby way.   http://rubyhacker.com/coralbook/intro.html#what 

Right now, I think the standard initializer should contain few if any explicit requirements.  Personally, I would love to specify only have to specify the shape.  I think all others can be explicitly declared with the ruby options type declarations.  2.0 has some beautiful simplicity we could definitely leverage here.  I think it is easy to therefore make the stype and dtype declarations possible from the same initializer, while also providing the parsed version which we are discussing ( N[...] ) for ease of declaration, primarily for usages like irb and the like (at least in my experience).  

Currently, I like that default values do not require formatting, but can be dumped as an array to the initial value argument.  

I would love to see this then: 

NMatrix.new [4,5], [1,2,3,4,5,6,7,8,9, ...] 

and this

NMatrix.new [1,2]

and this 

NMatrix.new [4,5] stype: :yale, dtype: :float32

Reasonable defaults are great, and provide the simplicity we want to offer a new user (or why would they choose NMatrix over SciPy or the default Matrix class??) without sacrificing the power of options for the advanced user.  

I don't think the default value should be required, nor should its behavior change with other options (if at all possible) because Ruby is for the sanity of the programmer, right?

Just my 2¢...

Ryan

John Woods

unread,
Sep 16, 2013, 8:20:02 PM9/16/13
to sciru...@googlegroups.com
I'm finally crunching through this email, and it's got some good points in it.

I, too, agree that we need to be able to do:

NMatrix.new [4,5], [1,2,3,4,5,6,7,8,9, ...] 

or something like it -- but that's part of the problem. This is a great c'tor for Dense. But it's a pretty terrible one for Yale or List, because you *don't* want to store that entire initial Array at any point during the construction of those sparse types.

I think what it's going to come down to is having a "flat" c'tor where all options must be provided -- and a Hash c'tor where args have defaults. But even figuring out the proper order of options for the flat c'tor is kind of complicated.

John


Ryan Taylor

unread,
Sep 17, 2013, 5:41:01 PM9/17/13
to sciru...@googlegroups.com
While the complexity of NMatrix requires the consideration of yale and dense and list separately, I think that for purposes of the constructor, it can be adequate to consider them as options of each other.

I think it isn't a horrible consideration to store (pass the pointer?) to the given initial values array.  Maybe I don't understand why it would be bad, but it can't be that big of an initial values array, and clearly it is already in the code somewhere, so keeping it around for long enough to reference the values from it shouldn't cause major problems.

I think that I should be able to say 
NMatrix.new [4,5], :yale, [1,0,0,0,0,0,0,0,0,0,0,2] 
and have a formed :yale NMatrix returned to me.  

While the internal complexities are there, I don't think the user should never HAVE to see them.  

Perhaps the .new constructor can accept a :sparse_initial_values term which would contain something like  [v1, i1, j1, v2, i2, j2, ...] referenced values to build a :yale matrix.  That might be a good model for sparse datapoint entry, but that doesn't have to be part of the default constructor behavior.  I think, to be clean, it really does have to be a different input, or perhaps constructor.

Just my musings on the topic... I just don't think we have to hamper the simplicity of the constructor to handle the sparse data entry.  It is my understanding that the majority of the methods we have will only work for dense anyway, so it is a reasonable default (primary) behavior to present :dense as the first interface.  I think a sparse constructor option would be great for those cases when it does make more sense.  

Also, I think that we should try to use a single constructor, just by passing defaults.  There might be sets of required parameters, like :yale or :list as :stype if you want to pass the :sparse_values array, but that is the beauty of named parameters, and beautiful code makes happy users?  

John Woods

unread,
Sep 17, 2013, 7:17:02 PM9/17/13
to sciru...@googlegroups.com
I actually hammered out a constructor today. It's been pushed, along with all the necessary spec changes. I'm working on documentation right now. I think it addresses all of the concerns people raised and incorporates a lot of the various ideas folks suggested.

John
Reply all
Reply to author
Forward
0 new messages