I have the following map:
/////////
type tids struct {
taxid int
name string
}
type taxon struct {
*tids
taxon string
}
type Taxnode struct {
this taxon
parentId int
}
tree := map[int]*Taxnode
//////
It is populated by reading and processing a lot of records from a file (1.4M).
The whole process takes some time. What is the best way to "dump" the data structure (the map) to a file and read it back (as the same data structure) efficiently?
Thanks in advance!
M;
Trying to figure out if this is possible I have found a post [1] that
does this with structs:
func struct2bytes(x *myStruct) []byte {
return (*[unsafe.Sizeof(*x)]byte)(unsafe.Pointer(x))[0:]
}
but I am not able to convert it back:
func bytes2struct (s []byte) *field {
f := &field{}
f = (*field)(unsafe.Pointer(&s))
return f
}
Causes a segfault.
In case I can solve this... Would it be also possible to use the same
strategy with a map[int]*MyStruct ?
Thanks in advance,
M;
[1]
http://groups.google.com/group/golang-nuts/browse_thread/thread/85a3306f24a6ebb2
Please try gob and let us know if it's too slow.
You could use unsafe if your data structure had no
pointer in it, but once pointers are involved you can't
just write the memory out to disk and read it back in.
That's a recipe for disaster.
Russ
> Would it also be possible to use the "unsafe" package to "map" the
> data structure to the file and read it back again?
> I understand that this would be very fast.
>
> Trying to figure out if this is possible I have found a post [1] that
> does this with structs:
>
> func struct2bytes(x *myStruct) []byte {
> return (*[unsafe.Sizeof(*x)]byte)(unsafe.Pointer(x))[0:]
> }
Note that this views the object as an array and then slices the array.
> but I am not able to convert it back:
>
> func bytes2struct (s []byte) *field {
> f := &field{}
> f = (*field)(unsafe.Pointer(&s))
> return f
> }
Here you are trying to view a slice, not an array, as though it were the
object. You need to get to the underlying array, not the slice. In
other words, do this instead:
func bytes2struct (s []byte) *field {
return (*field)(unsafe.Pointer(&s[0]))
}
Not that I recommend this kind of thing. It works until something,
anything, changes, and then you can't read any of your old data files.
Doing proper serialization and unserialization takes longer, but it
means that you won't lose access to your data.
You said it took "some time" to read 1.4M. It shouldn't take all that
long. How are you reading it? What does "some time" mean to you?
Ian
Go has support for those.
--
André Moraes
http://andredevchannel.blogspot.com/
> Protocol Buffers maybe?
>
> Go has support for those.
Protocol buffers require you to write a data specification and can only handle structs; on the other hand, they allow interoperability between Go and Java, C++, and Python.
Gobs do not require a data specification and can handle anything except a channel or function, including slices, arrays, and basic types; on the other hand, they are Go-only. They're much easier to use and comparable in performance.
For what you're asking, I think gobs are the better choice but either would work.
-rob
Oops... I see, thanks!
>
> Not that I recommend this kind of thing. It works until something,
> anything, changes, and then you can't read any of your old data files.
> Doing proper serialization and unserialization takes longer, but it
> means that you won't lose access to your data.
>
Yes, that's true. Speed is important in this particular case. I will try
with gob and compare the implementations.
> You said it took "some time" to read 1.4M. It shouldn't take all that
> long. How are you reading it? What does "some time" mean to you?
>
It is 1.4M of records, not 1.4Mb of data.
Thanks for the help!
M;
> Ian
>
Gobs do not require a data specification and can handle anything except a channel or function, including slices, arrays, and basic types; on the other hand, they are Go-only. They're much easier to use and comparable in performance.
> On Fri, May 13, 2011 at 06:56, Miguel Pignatelli
> <miguel.p...@uv.es> wrote:
>> Would it also be possible to use the "unsafe" package to "map" the data
>> structure to the file and read it back again?
>> I understand that this would be very fast.
>
> Please try gob and let us know if it's too slow.
Using gob I am able to speed the process ~2.6x
A good improvement without much work involvement.
Cheers
M;
What is this compared to?
The original idea was to speed up the process of building a tree (based on a map) from data stored in 2 text files (80Mb of data). So the idea is to build the map (tree) once and dump the data structure to a file. Then to upload the generated file/tree when it is needed (without preprocessing).
These are the numbers (some code below):
$ ./test_IOtree
Parsed the tree in 8.07656 seconds
30216971 bytes successfully written to file
tree written to file in 3.47415 seconds
tree read from file in 3.06392 seconds
I speed up the process by a factor of ~2.6x (based on several runnings).
I know this is only useful in this particular context.
In a more general thought, dumping 29Mb of data (using gob) in more than 3 segs doesn't seem very fast to me. I am aware that the Go's developer are not currently optimizing the code, so... i) for now I am happy with this speed up of 2.6x; ii) I expect the IO to become more efficient with time.
The code....
type tids struct {
Taxid int
Name string
}
type taxon struct {
Tids *tids
Taxon string
}
type Taxnode struct {
This taxon
ParentId int
}
type taxonomy map[int]*Taxnode
func New(nodes, names string) taxonomy {
/////// Function to build the tree from the files nodes and names
}
func (t taxonomy) Store (fname string) os.Error {
b := new(bytes.Buffer)
enc := gob.NewEncoder(b)
err := enc.Encode(t)
if err != nil {
return err
}
fh, eopen := os.OpenFile(fname, os.O_CREATE|os.O_WRONLY, 0666)
defer fh.Close()
if eopen != nil {
return eopen
}
n,e := fh.Write(b.Bytes())
if e != nil {
return e
}
fmt.Fprintf(os.Stderr, "%d bytes successfully written to file\n", n)
return nil
}
func Load (fname string) (taxonomy, os.Error) {
fh, err := os.Open(fname)
if err != nil {
return nil, err
}
t := make(taxonomy)
dec := gob.NewDecoder(fh)
err = dec.Decode(&t)
if err != nil {
return nil, err
}
return t, nil
}
func main() {
nodes := "/Users/pignatelli/src/tests/nodes.dmp"
names := "/Users/pignatelli/src/tests/names.dmp"
s1 := time.Nanoseconds()
newtax := New(nodes, names)
s2 := time.Nanoseconds()
if newtax == nil {
fmt.Fprintf(os.Stderr,"the map is empty\n")
}
fmt.Printf("Parsed the tree in %.5f seconds\n", (float32(s2-s1))/1e9)
s1 = time.Nanoseconds()
e := newtax.Store("file.bin")
s2 = time.Nanoseconds()
if e != nil {
fmt.Println(e)
os.Exit(1)
}
fmt.Printf("tree written to file in %.5f seconds\n", float32(s2-s1)/1e9)
s1 = time.Nanoseconds()
_,err := Load("file.bin")
s2 = time.Nanoseconds()
if err != nil {
fmt.Println(err)
os.Exit(1)
}
fmt.Printf("tree read from file in %.5f seconds\n", float32(s2-s1)/1e9)
}