F# data frame library

1,229 views
Skip to first unread message

Tomas Petricek (Info)

unread,
May 13, 2013, 6:13:42 PM5/13/13
to fsharp-o...@googlegroups.com

Hi everyone,

I just released a new version of F# Data library – one thing that is new is that the CSV provider also generates a couple of methods that can be used to work with the data once loaded (e.g. Filter and Truncate methods – see "Transforming CSV Files" in [1]).

 

While these are nice, I feel that this is a bit ad hoc extension (and that it does not logically belong to CSV type provider). After discussing this with Don and Howard Mansell I realized that what we actually need is a library representing "data frame" in F#. So, I would like to start the discussion about this –

 

In dynamic languages, data frame is quite easy thing (see Python [2] or R [3]). Assuming I have three sequences of values (or vectors) with dates (dates) and prices (opens and closes), I can write something like this in R:

 

# Create a data frame with 3 columns named Date, Open and Close

df = data.frame(Date=dates, Open=opens, Close=closes)

 

# Drop the Close column from the data frame

# Take the first 100 rows from the data frame

df_do = df[,c("Date", "Open")

df_dosm = df_do[1:100, ]

 

# Add column with averages over 5 day floating window

df$FloatingClose = rollmean(df$Open, 5)

 

# Calculate the mean of all columns in the data frame

mean(df)

 

The question is, how can we get something like this in a type-safe (as much as possible) way in F#? Our current CSV data type lets you do a couple of things, but:

 

·         It does not implement more mathematical features (like mean, variance, …) because the goal was not to implement a data frame (but it would be useful: https://github.com/fsharp/FSharp.Data/issues/62)

 

·         It does not support adding/removing columns. We could generate DropXyz method for every column Xyz, but adding them is a bit harder (perhaps we could do this if we required the user to give a list of all columns they want to use in the whole file).

 

·         We do not really provide any decent API for constructing data frames at the moment (again, CSV provider is about reading CSV data).

 

So, I think it would be really nice to have something like FSharp.Data.DataFrame inside F# Data (or elsewhere), integrate this with the CSV provider, but add all other functionality that is needed by a proper data frame library (but not necessarily required by a CSV provider).

 

I would like to hear your feedback on this – do people thing that type-safe data-frame using F# type providers is a good idea? Would you be happy to help? Any suggestions about the design of this? What things do you like/dislike about data-frames in R, Pandas, Matlab or elsewhere?

 

(I imagine it look something like the code below.)

 

Thanks!

Tomas

 

[1] http://fsharp.github.io/FSharp.Data/library/CsvProvider.html

[2] http://pandas.pydata.org/

[3] http://www.r-tutor.com/r-introduction/data-frame

 

PS: I have not thought about this significantly, but I think this might be doable:

 

type DF = DataFrame<"Date (date), Open (decimal), Close (decimal), FloatingClose (float)">

 

// Create a data frame with 3 columns named Date, Open and Close

// Perhaps we can use some overloading to just say:

//    DF.Create(Date=dates, Open=opens, Close=closes)

// but I’m not entirely sure if this would work. We want

// df.Open to be defined, but df.FloatingClose not to be!

let df = DF.Create().WithDate(dates).WithOpen(opens).WithClose(closes)

 

// Drop the Close column from the data frame

// Take the first 100 rows from the data frame

let df_do = df.DropClose()

let df_dosm = df_do[1 .. 100]

 

// Add column with averages over 5 day floating window

// (This can work on the data-frame as a whole, so we do not

// need to talk about individual columns – but here I just project

// Close at the end.)

let winCloses = df.Windowed(5).Map(fun win -> win.Mean()).Close

let df = df.WithFloatingClose(winCloses)

 

// Calculate the mean of all columns in the data frame

let dfMeans = df.Mean()

 

 

Mathias Brandewinder

unread,
May 13, 2013, 7:34:51 PM5/13/13
to fsharp-o...@googlegroups.com, in...@tomasp.net
I had some similar thoughts recently looking at the CSV type provider - looking at some of the Kaggle data sets and some of the problems that came up when working with it, I vaguely considered a "CSV for datascientist" type provider.

The "issue" with the CSV provider is that it looks at the world as a collection of records, whereas a data scientist will typically look at it in 2 directions: records, and features (columns).

I am not sure I would want something exactly like a data frame, exactly, but here are a few thoughts on what I might be interested in:

* it would be great to be able to access the dataset by features, and not only by record
* it would be great to have basic statistics, specifically min/max, on features, or set of values for categoricals
* it would be great to infer types of variables (continuous/double, categorical, ordinal), or be able to specify it
* missing values is a pain, R handles this pretty nicely, not sure what's the right approach. Making everything an option is heavy. On the other hand I had the case recently where the training set had no missing data, but sure enough missing data came up in real data.

Bohdan Szymanik

unread,
May 13, 2013, 9:58:06 PM5/13/13
to fsharp-o...@googlegroups.com, in...@tomasp.net

I find in R that my dataframes tend to evolve as I go. Typically I'll be creating columns and then renaming them progressively. They become living data structures. Very useful.

John Tarbox

unread,
May 14, 2013, 12:22:35 AM5/14/13
to fsharp-o...@googlegroups.com, in...@tomasp.net
I am curious what the advantage of implementing data-frames is compared to enhanced implementation of arrays? In the specific case of CSV files, isn't it often the case that data found is spreadsheets is really an irregular matrix (http://en.wikipedia.org/wiki/Irregular_matrix)?
 
Looked at a different way, data-frames seem quite limited and "special case" compared to arrays. In particular would it be worth considering carrier arrays (see: "Carrier arrays: an idiom-preserving extension to APL" by Paul Geoffrey Lowney www.cs.yale.edu/publications/techreports/tr256.pdf and dl.acm.org/citation.cfm?id=567533 )
With carrier arrays it should be possible to represent any arbitrary workbook including multiple sheets (simply an additional dimension).

Bohdan Szymanik

unread,
May 14, 2013, 4:32:46 AM5/14/13
to fsharp-o...@googlegroups.com, in...@tomasp.net
As long as they're easy to manipulate interactively including adding and deleting something that looks like a column.

Howard Mansell

unread,
May 15, 2013, 8:12:52 AM5/15/13
to fsharp-o...@googlegroups.com, in...@tomasp.net
I think this would be an awesome project!

We have a data frame library at BlueMountain which is implemented in C# but designed for, and mainly used from, F#.  We use it for exploratory data analysis, but the scripts we write, if found useful, live for a long time and run as an automated process (producing some report as output).  So we dance a fine line between quick-and-dity usage (where you might argue you want a dynamically typed python-like environment) and the safety of static type checking (so that we can update libraries and ensure we don't break things as a result).

We might open-source our library in the future, but it's not in a state where that is possible at the moment, and we don't have the resources to dedicate to it at this point.  Here's a few lessons learned on approaches that work well:
  • The most useful capability of DF for us it time-series alignment.  Almost every data frame is keyed on date or date/time.  Our data frame allows us to set up a timeline for the data frame and then snap a bunch of time-series to it.  Even if one series has more frequent data (and hence a different number of points) we get sensible behavior.  
  • It's very useful to allow data frames with heterogenous value types (one column being a double, another a string, another some kind of record instance).  I note this because saddle [http://saddle.github.io] for Scala is more type safe, but only allows homogenous column types.
  • We have a Series<K, V> type which represents a series keyed on K.  Series operations are at least as important as the data frame operations.  We allow a DF to be viewed as a Series<K, DynamicRow>, where DynamicRow allows dynamic access to values in the row by name (or by ? operator).
  • We toyed with static typing but basically our DF is dynamically typed (it just has a static type parameter which is the type of the index – usually Date [our own type] or DateTimeOffset).  While I am a fan of static typing, it has much more value between modules/functions and less within the body of a function/script.  If I am developing some script where I can see the results instantly and I can see all the names of columns easily, static typing gives me fewer benefits.  However, if I want to expose data from a library (where I didn't create it) or pass it in, then there is a lot of value in that being statically typed.  We originally had the vision of implementing a dynamic DF for use within modules, and then exposing a statically-typed variant across functions.  E.g. DataFrame and DataFrame<Schema>, where you could assert the schema at entry/exit points.  But we so far haven't implemented that static version yet.
  • It's worth thinking a lot about mutability. We found it productive for the DF to be immutable except for adding columns.  Other operations produce new data frames.  We would probably be able to cope with everything being immutable but it just requires rebinding names too much.  Small syntactic differences can dramatically affect productivity and interactivity.  Again, it might be useful to have an column-immutable schematized version… 
  • We have two kinds of series – ordered and indexed.  Ordered requires an order on the index (useful for date/time) and data is stored in order, indexed allows an arbitrary order (the order data is supplied) and doesn't sort or require comparison to be supported.  Turns out that indexed is not very useful, and can be dealt with just by using an ordinal series (ordered on the row number).  Ordered indices allow much faster implementations of operations in most cases.
  • We support adding columns that are either Series<K, V> or seq<>.  The latter is useful for transforming columns and then assigning the result back into the DF.  When adding an Ienumerable it has to be the same length as the DF.  Series are snapped according to their keys.  This makes building up data frames super-easy.
  • Missing value support is important, though we haven't done this particularly well.  For doubles we just use NaN and that works fine.
  • We found using the ? Operator super useful for accessing and adding columns.  For example you can do 'df?A <- df?B *2.0'.  Getting a column via ? defaults to  it being a double series (otherwise the typing becomes too difficult).
I'm happy to go over more specifics if anyone has questions…

Tomas Petricek (Info)

unread,
May 15, 2013, 10:39:10 PM5/15/13
to fsharp-o...@googlegroups.com

Thanks very much for a detailed reply with all the details about BlueMountain’s data frame!

 

As I had a chance to play with the library, I found it very powerful and easy to use. I think the number of features that you described shows why I think this should be separated from the CSV provider itself – there are so many useful things that the data-frame should do (but that ordinary users of CSV provider might not need).

 

I think time-series alignment, heterogeneous values, missing data handling are certainly things that should be supported.

 

As for the type-safety, I think we could follow the same design that other F# Data type providers follow – there is a dynamic underlying implementation (e.g. with ? and ?<- operators) and on top of that, we could build some type-safe wrapper (perhaps using type providers) that you may or may not use (perhaps in your scenario, you would start with dynamic, but then added these additional types to guarantee more safety?)

 

Thanks!

Tomas

Dave

unread,
May 20, 2013, 7:30:18 AM5/20/13
to fsharp-o...@googlegroups.com, in...@tomasp.net
Have you considered timeframe compression or filtering by time.  Something comparable to R's XTS package.

Howard Mansell

unread,
May 21, 2013, 6:56:08 PM5/21/13
to fsharp-o...@googlegroups.com, in...@tomasp.net
Not sure exactly what you mean by compression, but our dataframe and series support resampling (which snaps data to a new timeline, applying some arbitrary sampler to determine how to combine points).  We also support Between, Before and After.

We originally implemented these things on time-indexed dataframes.  We later generalized to any ordered series.  The samplers themselves are often specific to the type of index, though.  For example, sampling intraday data to daily transforms the index type from DateTimeOffset to Date (our internal date type).

These operations are pretty fundamental.

--
--
You received this message because you are subscribed to the Google
Groups "FSharp Open Source Community" group.
To post to this group, send email to fsharp-o...@googlegroups.com
To unsubscribe from this group, send email to
fsharp-opensou...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/fsharp-opensource?hl=en
 
---
You received this message because you are subscribed to a topic in the Google Groups "FSharp Open Source Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fsharp-opensource/3mb3HO3AvzA/unsubscribe?hl=en-US.
To unsubscribe from this group and all its topics, send an email to fsharp-opensou...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

David Terk

unread,
May 22, 2013, 3:31:00 PM5/22/13
to fsharp-o...@googlegroups.com, in...@tomasp.net
I realize they are fundamental.  Which is why I would like to see them if this library is created and made available.  We are talking the same thing, compression/resampling.

Nicolas

unread,
Jun 26, 2013, 6:43:34 PM6/26/13
to fsharp-o...@googlegroups.com, in...@tomasp.net

I am glad that this particular structure gets attention as this highlights a few points that makes dynamic language such a versatile and privileged tool. they do have strength and people use them for a reason, but where exactly ? Taking R as a vantage point is a fruitful place.

****

One of the trait to consider is the whole chain where the data get used as it is quite a different beast than your standard types. For instance, the same data often gets replicated to different stores of different reliability. Just as we have type system to assert our assumptions on the types, data follows in some workflow a similar refinement for shape and quality. As a result, different repositories, having the same data, can be given authority for different purposes (official accounting, trading dataset, third party), or optimised for different access pattern. 

But going back even before the access pattern, what is really driving the usage of the dataframe structure on an applicative level in the first place ? 

This is the real question to be asked.

*****

One can find a strong hint for answer in Hadley Wickham's work. His operations shape and melt, are all about exploiting the kind of flexibility allowed by the dataframe.

One of his paper, 'tidy data' (http://vita.had.co.nz/papers/tidy-data.pdf) he provides a down to earth sum up of what his libraries are here for and frame part of the (vast) problem space.

That should provide a good foundation as to where one should aim I think

****

Regarding the functionalities mentioned by Howard-san, one might want to separate operations for shaping data from the one using data further down in the pipe. And among those, time series oriented operations from the rest.

Dave

unread,
Jun 28, 2013, 1:03:32 PM6/28/13
to fsharp-o...@googlegroups.com, in...@tomasp.net
Is there anything going on with this currently?  

Tomas Petricek (Info)

unread,
Jun 28, 2013, 1:32:29 PM6/28/13
to Dave, fsharp-o...@googlegroups.com

I do not think so.

 

At the moment, the CSV provider in F# Data has some of the functionality that – I think – should instead be in Data Frame. I would be keen to restructure this and add more features to DataFrame, but I probably won’t be able to do much until August. (Another problem is, that we probably need to do some experiments first to figure out what the best option is… especially with respect to type safe vs. easy to use)

 

T.

Jomo Fisher

unread,
Jun 29, 2013, 2:35:53 AM6/29/13
to fsharp-o...@googlegroups.com, Dave, in...@tomasp.net
I think dataframe-for-F# is a really important project. I use R quite a lot these days and it is really an awful language except for the rich ecosystem that has built up around dataframe. It would be great to have a platform-strength dataframe type that libraries could build up around.

The tension between type safety and convenience is the main decision point. My view is that the F# type system is already as good as it gets for typesafe data manipulation. I would prefer a dataframe that fully captures the benefit of dynamic typing over structured data sequences. There should be facilities for returning dataframe to strong-type-land once convenient manipulation (merging, melting, slicing, summarizing, etc) is done--think duck-casting into a record sequence.

toDataframe: IEnumerable<'T> -> DataFrame
toEnumerable: DataFrame -> IEnumerable<'T>
// Mean, median, percentiles, for each column
summarize: DataFrame -> DataFrame

Nicolas

unread,
Jun 30, 2013, 5:41:18 PM6/30/13
to fsharp-o...@googlegroups.com, Dave, in...@tomasp.net
Another thing that can be handy is this supply of a name through the constructor

> aa = c(1,10,3)
> bb = c (2,6,79)
> df <- data.frame(attributeA = aa, attributeB = bb)
> df
  attributeA attributeB
1          1          2
2         10          6
3          3         79

This looks like a small scale, specialized, name binding, type provider.
which I guess would require some compiler change and annotation to 'open' the names of the class.
The effect of such thing is to lower the abstraction : we dont look at a structure with a generic 'fieldK' field with 'fieldKName' set to the concrete value somewhere else.


---

For a thought experiment, if we had some standard operation for manipulation like melting etc.. which outputs to the file system in CSV.
And some TP that reads from the previously outputted CSV, then perform some additional ops.
Then there would be a type safe way to manipulate those datas.
Obviously it would be much better to have the type system compose those 'future' type input for us, propagate the 'future types'  etc..  but I dont think we'll get that tomorrow in the type system.

----

Howard Mansell

unread,
Jul 1, 2013, 9:26:05 AM7/1/13
to fsharp-o...@googlegroups.com, Dave, in...@tomasp.net
Agree that the named inputs are useful.  We achieve this at BlueMountain through a reflection-base constructor, though that means you have to declare a record type.  Ironically in C# we have anonymous types so we actually have more concise construction syntax.  In F# had anonymous types (which would be useful for LINQ too) then we could support a nice data frame constructor.


--
--
You received this message because you are subscribed to the Google
Groups "FSharp Open Source Community" group.
To post to this group, send email to fsharp-o...@googlegroups.com
To unsubscribe from this group, send email to
fsharp-opensou...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/fsharp-opensource?hl=en
 
---
You received this message because you are subscribed to a topic in the Google Groups "FSharp Open Source Community" group.

Howard Mansell

unread,
Aug 5, 2013, 3:43:22 PM8/5/13
to fsharp-o...@googlegroups.com, Dave, in...@tomasp.net
I would like to kick off building out the data frame library.  Adam Klein, who co-wrote Pandas (Python data frame library) recently joined my team.  He obviously has a lot of good ideas about how build a data frame library.  He also built a Scala-based data frame library called Saddle.  He doesn't yet know F#, but he is keen to get involved.

Since we already have a working Data Frame library at BlueMountain, we cannot justify building this 100% ourselves.  So I'd like some volunteers to be involved in the project.

Initially I'd like to throw around ideas on how it should work, figuring out the key concepts.  Then we'll need help with implementation.

Please reply if you would like to be involved in one or both of these activities.

Howard
To post to this group, send email to fsharp-opensource@googlegroups.com

To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/fsharp-opensource?hl=en
 
---
You received this message because you are subscribed to a topic in the Google Groups "FSharp Open Source Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fsharp-opensource/3mb3HO3AvzA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fsharp-opensource+unsubscribe@googlegroups.com.

Jomo Fisher

unread,
Aug 5, 2013, 4:54:35 PM8/5/13
to fsharp-o...@googlegroups.com, Dave, in...@tomasp.net
Hi Howard, I'm interested in helping. My background is that I was on the team at Microsoft that shipped F# and did quite a lot of work on Type Providers. I've since moved to a job where I use R and python/pandas daily for statistical analysis.
Unfortunately, I don't have much spare time right now. Maybe a three or four hours on the weekends.


To post to this group, send email to fsharp-o...@googlegroups.com

To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/fsharp-opensource?hl=en
 
---
You received this message because you are subscribed to a topic in the Google Groups "FSharp Open Source Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fsharp-opensource/3mb3HO3AvzA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fsharp-opensou...@googlegroups.com.
Message has been deleted

Nabil Chouk

unread,
Aug 6, 2013, 5:42:10 PM8/6/13
to fsharp-o...@googlegroups.com
Hi Howard, I would be interested to contribute. I developped an F# time series library which deals with similar issues (alignement & aggregation relative to a timeframe, sampling,
application of pointwise or accumulative transforms ...) so that ideas/implementation techniques might be of use in the context of a more general data frame library. In any case, I would be happy to help with the implementation.
Nabil.

Howard Mansell

unread,
Aug 7, 2013, 9:23:21 AM8/7/13
to fsharp-o...@googlegroups.com
I've posted a Doodle poll to determine a mutually convenient time for a kick-off Skype call.  Prior to this call, Adam and I will send out a google doc covering typical concepts in a data frame library and some specific topics for discussion (which you will all be free to add to).

If you would like to attend the call, please indicate availability here:

Colin Bull

unread,
Aug 7, 2013, 11:18:56 AM8/7/13
to fsharp-o...@googlegroups.com
Hi Nabil, 

Slight deviation, but is this timeseries library available anywhere. We have something similar I was going to open source but it might make sense to combine them 

Cheers

Colin

Howard Mansell

unread,
Aug 7, 2013, 3:12:35 PM8/7/13
to fsharp-o...@googlegroups.com
Our current in-house C# data frame library has a class Series<K,T>, which is a generic series.  We also have OrderedSeries<K,T>, a subclass that keeps the series ordered on key.  Many time-series operations can be expressed as general-purpose operations that operate on OrderedSeries.  For those that are truly time-series operations, we define them as extension methods on OrderedSeries<DateTimeOffset,T>.

So, my hope is that you wouldn't need a timeseries library as such, if we implement a suitably capable DataFrame/Series library.  Or perhaps more accurately, we/you could create a time-series library that uses the core Series type and provides time-series functionality on top of that.


--
--
You received this message because you are subscribed to the Google
Groups "FSharp Open Source Community" group.
To post to this group, send email to fsharp-o...@googlegroups.com

To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/fsharp-opensource?hl=en
 
---
You received this message because you are subscribed to a topic in the Google Groups "FSharp Open Source Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fsharp-opensource/3mb3HO3AvzA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fsharp-opensou...@googlegroups.com.

Nabil Chouk

unread,
Aug 7, 2013, 3:16:24 PM8/7/13
to fsharp-o...@googlegroups.com
Hi Colin,
 
No, I haven't open sourced it.
 
I am not against the principle, although I'm not sure it is a good idea in its current form, mainly
because it relies upon a specific Matrix & Vector library supporting reference slicing which users
might not want to get a dependency to.
 
Furthermore, if the data frame library includes solid time series functionalities, would it
make sense to have projects with overlapping features ?
 
Anyways, I can send you an .fsi so that you can get an idea of what's implemented and if
there is some potential of combination.
 
And thank you for your own open source efforts, your actor library seems nice.
 
Nabil.
 

Richard Minerich

unread,
Aug 8, 2013, 3:01:13 PM8/8/13
to fsharp-o...@googlegroups.com
I've been thinking a lot about dataframes too.  If F# is going to be pushing data to and from different machine learning languages it would be best if we had a good representation for it.  

As far as features I'd like to see:
- Static column/type representation with a limited subset of available types
- Split/merge (immutable) 
- Conversion to matrix/vector (by a secondary function)
- Matrix index to field name (even one to many in the case of flattened categorical features)
- Fast Serialization/Deserialization (for saving to disk)

Just about everything else can be done more easily with the matrix format I think.

-Rick

Howard Mansell

unread,
Aug 8, 2013, 6:58:19 PM8/8/13
to fsharp-o...@googlegroups.com
It's probably because we work in different domains, but I rarely think of data frames as being similar to matrices.  In some respects, of course, they are.  But when doing time-series analysis data frames are very useful because they allow automatic alignment of series, and have good handling of missing values.  Also my data frames are quite often heterogenous (in terms of column types), which makes statically typing them much more difficult.


--
--
You received this message because you are subscribed to the Google
Groups "FSharp Open Source Community" group.
To post to this group, send email to fsharp-o...@googlegroups.com

To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/fsharp-opensource?hl=en
 
---
You received this message because you are subscribed to a topic in the Google Groups "FSharp Open Source Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fsharp-opensource/3mb3HO3AvzA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fsharp-opensou...@googlegroups.com.

Richard Minerich

unread,
Aug 10, 2013, 3:35:23 AM8/10/13
to fsharp-o...@googlegroups.com
Oh, in my domain they are almost always heterogenous too.  In fact, deciding exactly how to turn your data into a matrix is a big deal.  So, once I have it that way, I want to try a bunch of stuff with it. It is a problem with the F# type system, the ideal would be to base it around tuples and lenses I think.  Still, I don't mind too much having to statically parameterize it when I take data out, I just want it to be fast.

-Rick

Howard Mansell

unread,
Aug 10, 2013, 12:33:43 PM8/10/13
to fsharp-o...@googlegroups.com
I usually think of matrices as being homogenous. So I wonder how you do most of your work with matrices if your data is heterogenous. 

I guess if you are doing ML then data frame would be useful for retrieving and  cleansing your data set, and encoding features numerically. And then the rest would be linear algebra, so matrices make more sense. I definitely don't think data frame is well suited for linear algebra.  So I guess what you need is easy/fast conversion between data frames and matrices. 

Howard

Howard Mansell

unread,
Aug 12, 2013, 9:05:04 AM8/12/13
to fsharp-o...@googlegroups.com
We are having a kick-off meeting to discuss the Data Frame project, on Thursday 15th at 1800 UTC / 2pm EDT.

If you would like to attend, please add me (hmansell) to your Skype contacts, then connect to me via Skype at the time.  I will be running a Skype call.

Howard Mansell

unread,
Aug 14, 2013, 9:51:45 AM8/14/13
to fsharp-o...@googlegroups.com
Here is a write-up of concepts, prior implementations and design questions that Adam and I have put together.  Please read before the call tomorrow, if possible.

Nabil Chouk

unread,
Aug 22, 2013, 11:54:24 AM8/22/13
to fsharp-o...@googlegroups.com

Hi everyone,

 

Here is an overview of the time series library I mentioned above, which doesn’t have the scope of a full data frame library but can provide potentially interesting design ideas and usage patterns to people interested in the project. I use it in production (in analytic web services) and for ad-hoc research.

 

Nabil.

fsharp-time-series-library-overview.pdf

Tomas Petricek (Info)

unread,
Aug 24, 2013, 5:37:09 PM8/24/13
to fsharp-o...@googlegroups.com

Hi Nabil,

This is an excellent write-up! Thanks very much for sharing this – I’ll have a detailed look in a few days, but the lazy loading sounds like an interesting feature (we have this in mind for the design, but did not actually try to implement it yet… so learning from your experience will certainly help!)

 

Also, thanks for sharing the sample analyses. I’ll certainly try to re-write them using the new library to see if we’re missing something.

 

Tomas

 

PS: We were hoping to share something by the end of this week, but I did not quite finish some design changes that I started on Friday – so I’ll try to send the prototype to the group on Monday.

David Terk

unread,
Sep 27, 2013, 7:53:03 PM9/27/13
to fsharp-o...@googlegroups.com
Did a prototype ever make it out in the wild?


--

Tomas Petricek (Info)

unread,
Sep 27, 2013, 11:44:51 PM9/27/13
to fsharp-o...@googlegroups.com

Hi David,

There is a prototype and it can be easily found on public GitHub of BlueMountain Capital (and there is a fork on my profile too). We are not actively publicizing the project at the moment, because there is still lots of work to do and we want to coordinate the release with other activities of the F# Data Science working group.

 

Of course, everyone is welcome to try it & submit issues and pull requests :-). If you want to keep in touch with current discussions, the best way is to join the F# Data Science WG (http://fsharp.org/technical-groups). I plan to send some update there in a couple of days.

 

T.

 

From: fsharp-o...@googlegroups.com [mailto:fsharp-o...@googlegroups.com] On Behalf Of David Terk
Sent: Friday, September 27, 2013 7:53 PM
To: fsharp-o...@googlegroups.com; fsharp-o...@googlegroups.com
Subject: Re: F# data frame library

 

Did a prototype ever make it out in the wild?

You received this message because you are subscribed to the Google Groups "FSharp Open Source Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fsharp-opensou...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages