Help with Dataframes

480 views
Skip to first unread message

Vikram Rawat

unread,
May 29, 2017, 7:23:51 AM5/29/17
to golang-nuts
Can anybody please tell me how to write GOTA Golang dataframes on a csv...

It's been 2 days I am trying to find a way to write dataframes onto a csv. can anybody please help me understand what does this IO.writer means and how to use it...

I have given up understanding it...

Please any help will be appriciated. 

Jesper Louis Andersen

unread,
May 29, 2017, 7:33:00 AM5/29/17
to Vikram Rawat, golang-nuts
Don't give up! When things becomes to daunting, go do something else then come back later. Brains needs some processing time.

Your post suggests that you are missing some background information and that you are plunging into deep waters. io.Writer is an interface, which is a concept central to Go. Make sure you have a good understanding of interfaces first. io.Writer is an abstraction over something you can write to, so it generalizes files, memory buffers, network sockets and so on.

GOTA dataframes seems to have some packages written for it already, so if the format is complex it is perhaps better to use a library which is already written.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sebastien Binet

unread,
May 29, 2017, 7:33:54 AM5/29/17
to Vikram Rawat, golang-nuts
Vikram,

On Mon, May 29, 2017 at 1:23 PM, Vikram Rawat <vikram...@gmail.com> wrote:
Can anybody please tell me how to write GOTA Golang dataframes on a csv...

It's been 2 days I am trying to find a way to write dataframes onto a csv. can anybody please help me understand what does this IO.writer means and how to use it...

I am not an expert wrt gota/dataframe but here is what I got:

$> go run ./main.go 
$> cat out.csv 
COL.1,COL.2,COL.3
b,1,3.000000
a,2,4.000000

with the following main.go file:
package main

import (
"log"
"os"

)

func main() {
df := dataframe.New(
series.New([]string{"b", "a"}, series.String, "COL.1"),
series.New([]int{1, 2}, series.Int, "COL.2"),
series.New([]float64{3.0, 4.0}, series.Float, "COL.3"),
)

o, err := os.Create("out.csv")
if err != nil {
log.Fatal(err)
}
defer o.Close()

err = df.WriteCSV(o)
if err != nil {
log.Fatal(err)
}

err = o.Close()
if err != nil {
log.Fatal(err)
}
}

here is the doc+examples for gota/dataframe:

hth,
-s

Ayan George

unread,
May 29, 2017, 7:42:30 AM5/29/17
to golan...@googlegroups.com


On 05/29/2017 07:23 AM, Vikram Rawat wrote:> Can anybody please tell me
io.Writer is an interface that matches any concrete type that implements
the Write() method:

https://golang.org/pkg/io/#Writer

os.Create returns a writer that you can use to write to a file like:

w, err := os.Create("myfile.csv")

and you can write your CSV to it using the WriteCSV() method described
below:

https://godoc.org/github.com/kniren/gota/dataframe

So based on the documentation, something like the code below should work:

df := dataframe.LoadRecords(
[][]string{
[]string{"A", "B", "C", "D"},
[]string{"a", "4", "5.1", "true"},
[]string{"b", "4", "6.0", "true"},
[]string{"c", "3", "6.0", "false"},
[]string{"a", "2", "7.1", "false"},
},
)

w, err := os.Create("myfile.csv")

if err == nil {
/* handle os.Create() error here. */
}

df.WriteCSV(w)

...


Vikram Rawat

unread,
May 29, 2017, 7:48:20 AM5/29/17
to golang-nuts
Thank You
Thank You
Thank You
Thank You
Thank You
Thank You
Thank You
Thank You
Thank You
Thank You

Very very very MUCH

My brain was about to bleed to death... I am not a programmer but somebody suggested me GOLANG and I started it just a MONTH Ago.

It's quite different and hard to grasp But if it has an active group like you guys It will surely not die a slow death..

thanks again everybody....


jesper, sebestian and Ayan thanks again guys...

Pee Jai

unread,
Jul 26, 2020, 5:59:50 AM7/26/20
to golang-nuts
I created https://github.com/rocketlaunchr/dataframe-go to make dealing with data much easier.
It has an example code snipped in the docs on how to write dataframes to a csv file.

I created it because I found gota to be very cumbersome to use.

Yassine KICH

unread,
Sep 24, 2025, 3:14:43 PM (5 days ago) Sep 24
to golang-nuts

Jason E. Aten

unread,
Sep 24, 2025, 6:19:23 PM (5 days ago) Sep 24
to golang-nuts
Hi Vikram,

Sounds like you got it working--great!  Also the LLMs are terrific for explaining language concepts
if you are stuck conceptually.

If you need a dataframe package that scales to big data 
(as it turns out parsing floating
point numbers is a very slow operation), 
I wrote a use-all-cores fast parallel loading dataframe 
for Go called SlurpDF. I was envious of how 
fast R's data.table could read in CSV files in parallel. See


See slurp_test.go for an example of writing back to CSV on disk.

(this was in service of a little Xgboost-like gradient boosted decision 
tree ensemble machine learner, e.g. https://github.com/glycerine/gocortado)

Enjoy,
Jason

robert engels

unread,
Sep 24, 2025, 6:31:17 PM (5 days ago) Sep 24
to Jason E. Aten, golang-nuts
As an aside, your slurp isn’t really doing what you think.

The line byby := bytes.Split(buf, newline) is causing the entire file to be read into memory on a single core, which is unnecessary.

You need to modify the code a bit to get the optimum performance.

You should calculate a base offset which is (total file size / number of cores).

Then calculate the actual offsets by seeking to that point, then advancing to the next new line, then do the same for the rest - so then you having an array of slices - each of which is a portion of the file.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Jason E. Aten

unread,
Sep 24, 2025, 11:18:42 PM (5 days ago) Sep 24
to golang-nuts
Thanks Robert. Of course you are right, and a pull request would be welcome :)

Seriously though -- I do appreciate the comment. At the time, if
I remember -- this was 2 years ago when I wrote it -- I recall
not wanting to complicate the code by having to deal
with the CVS file lines that got split between two goroutines
if I didn't find the newlines first. Once you do that
you need more locking to resolve the conflict and
not step on the same memory another goroutine is using...
much more coordination seemed necessary.

That and the true bottle neck usually being the
parsing of the floats means once I matched what
the C code for data.table was doing, I moved on. So yes,
it could be faster, but the simpler code was appealing.

- J


On Wednesday, September 24, 2025 at 11:31:17 PM UTC+1 robert engels wrote:
As an aside, your slurp isn’t really doing what you think.

The line byby := bytes.Split(buf, newline) is causing the entire file to be read into memory on a single core, which is unnecessary.

You need to modify the code a bit to get the optimum performance.

You should calculate a base offset which is (total file size / number of cores).

Then calculate the actual offsets by seeking to that point, then advancing to the next new line, then do the same for the rest - so then you having an array of slices - each of which is a portion of the file.


Reply all
Reply to author
Forward
0 new messages