>> On that point, would it make sense to just make npyio an extended wrapper
>> around the io.Reader object?
>
> it already is, kinda.
> (or maybe I am not completely getting your point?)
>
>> API idea (may or may not be any good):
>>
>> Rank() int64
>> Shape([]int64) []int64 //
>> Write(interface{}) (int, error)
>> Err() error
>>
>> Datatype(type interface{}) bool
>>
>> Float64() []float64
>> Float32() []float32
>> Uint64() []uint64
>
> I am not sure I completely get the intended usage of this API.
>
> 1) Rank() int64.
> ok. not sure if we need an int64. surely int is enough. (famous last words?)
>
> 2) Shape([]int64) []int64
> I get the intended return value usage (but, here again, I would just use []int)
> but what is the input slice argument needed for?
>
> 3) Write(interface{}) (int, error)
> so a reader is also a writer?
> is it to support data file updates?
>
> 4) Err() error
> ok.
>
> 5) Datatype(typ interface{}) bool
> what is this meant to do?
> is it to ask npyio if the given typ type is compatible with on-disk data type?
> if so, having a proper DataType interface as I was hinting in my first
> mail would, IMHO, be a better avenue.
> and all the "Foo() []Foo" methods wouldn't be needed.
>
> what do you think?
>
> -s
On Fri, Mar 11, 2016 at 8:45 PM, Kunde21 <kun...@gmail.com> wrote:
>> >> On that point, would it make sense to just make npyio an extended
>> >> wrapper
>> >> around the io.Reader object?
>> >
>> > it already is, kinda.
>> > (or maybe I am not completely getting your point?)
>
>
> By this I mean we would interact with the npyio object by reading and
> writing buffers rather than passing slice/mat64.Dense/numgo.Array64 objects
> into npyio and requiring object-specific read/write logic within the npyio
> library. I haven't tackled slicing within numgo, but that could cause the
> numgo object to read the same buffer (float64 slice) in a different way.
> I'd rather force that logic within numgo than heap that on npyio.
one could indeed imagine having something like that:
func (r *Reader) Reader() io.Reader { /* ... */ }
which would be very similar to the archive/zip.Reader API.
or, departing a bit from zip.Reader:
func (r *Reader) Bytes() []byte { ... }
but I believe having people implement an interface would be better and
more inter-operable.
>> > 3) Write(interface{}) (int, error)
>> > so a reader is also a writer?
>> > is it to support data file updates?
>
>
> In my mind, this would replace the data buffer in the file. There would be
> a requirement to write the new shape before writing the buffer to the file
> (thus, returning an error if the received buffer is the wrong size). This
> is where I'm not certain about the writer API, ensuring the shape and buffer
> match before the file is closed.
that's why npyio.Write is done in one go.
alternatively, if (e.g.) a dtype-based interface is devised, it could be used.
I suppose it depends on the use case but I personally don't modify
.npy files, they are mostly write-once, read-many.
if updates were felt that important, I probably err towards providing
a dedicated npyio.Update(w, ptr) function.
>
> With your suggestions, a better API idea might look more like:
>
> Rank() int
> Len
> Shape([]int) []int // Should this force []int64 or []int32 type?
> The npy file spec isn't very clear, using the hardware int type to read and
> write the shape buffer.
> Read [ReadBuffer?] (buf interface{}) int // Will this read in
> chunks (read pointer and Seek functionality) or is it an all-or-nothing
> read?
> Write [WriteBuffer?] (buf interface{}) int // Will this write in
> chunks (read pointer + Seek) a la Read call question?
> Append [AppendBuffer?] (buf interface{}) int
>
> Open(fname string) npyio, error
> Close() error
> Flush() error
> Err() error
I don't know.
this seems to conflate an API for npyio and one for describing n-dim data.
I'd like to try to disentangle the 2.
- have, say, a numpy-like dtype type (interface or otherwise) to describe (possibly n-dim) data
- have a way, leveraging this dtype, to seamlessly handle user types (and matrix, n-dim arrays, ... user types) inside npyio (but also other gonum- and non-gonum-related) formats.
dtype could look like:
package dtype
type Type interface {
Kind() Kind // reflect.Kind + ndim-array-kind
Rank() int // 0: scalar, 1: slice/array, ... (perhaps not needed as the same info can be obtained from len(Shape())
Shape() []int
// more stuff ?
// basically reflect.Type without the Func/Method/Interface/Chan support ?
}
On Tue, Mar 22, 2016 at 12:47 AM, Kunde21 <kun...@gmail.com> wrote:
> Ugh, hit post a bit early. Let's try that again.
>
> dtype package would be useful in converting from the numpy-style text to and
> from a readable/writeable interface.
>
> I would see dtype useful as a separate package with an API like:
>
> package dtype
>
> type DTypeElement {
> Name string
> Type reflect.Type
> Shape []int
> Elements []DTypeElement
> }
>
> type DType struct {
> Order *binary.byteOrder
> Elements []DTypeElement
> }
>
> GetType(dtype string) DType
> DTypeOf(interface{}) DType //
> (d *DType) String() string
something like that, yes.
but, dtype.GetType(name string) dtype.Type is probably too pythonic.
also, dtype shouldn't be too tied to numpy's dtype.
I'd prefer it to deal only with types, w/o any disk-based
representation considerations (so, no binary.Order) and in effect a
superset of reflect.Type.
IMHO, for dtype.Type to be useful and widely used, it should be
"reflect.Type with support for describing ndim-data."
I think having this:
///
package dtype
type Type interface {
Shape() []int
Kind() Kind
Elem() Type // pointers, arrays, slices, etc...
NumField() int
Field(i int) StructField
// ... like reflect.Type ...
}
///
is workable.
there are of course interesting implementation details I am glossing over.
eg:
- what dtype.Type should [2][3][4]int be? especially its Shape() and
Elem(): []int{2,3,4} and TypeOf(int(0)) ? or []int{2} and
TypeOf([3][4]int{}) ?)
- how should "ragged arrays" be represented? should they be
represented? e.g.: [][][]int. Shape() of []int{-1,-1,-1} and Elem() of
int ?
also: I am not clear on what (*NpyFile).Append(buffer interface{})
(int,error) is supposed to do.
is it a method for a .npz file instead of an .npy one (.npz files are
zip-like, with multiple key/values ndim-data. .npy files have only one
ndim-data) ?
or is it really:
data := []float64{0,1,2,3,4}
f, err := npyio.Create("foo.npy")
_, err = f.Write(data)
f.SetShape([]int{len(data)})
data = append(data, 42)
_, err = f.Append(data[5:])
f.SetShape([]int{len(data)})
?
if the latter, then I think this is a different concept that warrants
its own set of interfaces. (and I am not sure the .npy file format
really supports that use case.)
AFAICT, the .npy file format is meant to be used as a pure one-shot
save/load facility.
That's why I "designed" sbinet/npyio the way it is, with top-level
npyio.Write and npyio.Read functions.
thanks for taking the time to discuss these interesting topics.
-s
From a brief Google search (I rarely use NumPy) npz is just a ZIP archive containing a NumPy data files, so it's merely a matter of composing from the examples in Seb's docs and in pkg/archive/zip https://golang.org/pkg/archive/zip/#pkg-examples
f, err := npyio.CreateNpz("file.npz")f.AddNpy(string name, npyio.File npy)...f.Close()f, err := npyio.CreateNpz("file2.npz")f.Write( map[string]npyio.File { {"a", fileA}, {"b", fileB} } )f.Close()Or read like:NpyMap, err := npio.OpenNpz("file.npz")NpyMap being a simple map[string]npyio.File object.