Use cases for tables and records

103 views
Skip to first unread message

jackh...@gmail.com

unread,
Feb 21, 2019, 2:59:29 PM2/21/19
to Racket Users
Hi folks! I'm looking for use cases for a few small data structure libraries I'm working on:

Records, which are dictionaries mapping keywords to values. Keys must be keywords, which allows for more efficient behavior in various cases and sometimes cooperates nicely with keyword arguments. Example:

> (define rec (record #:person "Joe Schmoe" #:age 30 #:favorite-color 'blue))
> (record-ref rec '#:age)
30

Tables, which are like a list of records that all have the same keywords. Tables are similar to dataframes and are intended to make it easy to process spreadsheet-like data such as CSV files. Example:

(table (columns #:name #:population #:capital-city)
         (row "Argentina" 43800000 "Buenos Aires")
         (row "Greece" 10800000 "Athens")
         (row "Nigeria" 198600000 "Abuja")
         (row "Japan" 126400000 "Tokyo"))

The libraries are really just bare-bones skeletons at the moment and are missing a lot of core features. Still, if they seem like things that you would use, please let me know how! Pointers to code "in the wild" where you think these libraries would help are especially useful. Pointers to similar libraries (like the data-frame package) and discussions about their advantages / disadvantages are also helpful.

Matt Jadud

unread,
Feb 22, 2019, 8:14:20 AM2/22/19
to jackh...@gmail.com, Racket Users
On Thu, Feb 21, 2019 at 2:59 PM <jackh...@gmail.com> wrote:

Tables, which are like a list of records that all have the same keywords. Tables are similar to dataframes and are intended to make it easy to process spreadsheet-like data such as CSV files. Example:


Everyone must be thinking the same things these past few months...


I have been looking at Pyret's interface to tables, as well as the data-frame package in the Racket pkg collection, and R's interface to dataframes. My goal is something that is syntactically/conceptually simple for students to use for EDA, and potentially scales well to more interesting questions involving data. In the fall, I'll be doing work with students around environmental sensing as part of their coursework, and I want something for working with data that fits into the HtDP approach to introducing students to thinking about designing programs.

Right now, I'm still exploring, and haven't made significant progress on documentation, largely because I've just lifted the library to a point that I can start using it myself for some experimentation with data in a project of my own. This has illustrated some things that are missing, are not as clean as they could be, etc., so I'm going to circle around again on the library as I explore. My thinking is that if I can simplify the interfaces and operations on a project with a heavy data lift, I might be heading in the right direction for small data projects as well.

At the moment, I can import public spreadsheets from Google, slurp in MySQL tables, SQLite tables, and CSV files. Although proprietary, I'll probably add support for Airtable as an input as well, and will eventually look at some of the MQTT dashboards for IoT (eg. io.adafruit.com). I have not tested the living daylights out of the library, but nascent tests are proceeding with development to watch for regressions as I explore. I can insert, select, and sieve (filter) from tables, as I like the nomenclature that Pyret uses for tables; I'm borrowing their ideas for the interface to table operations for now. 

I like the interface you've proposed for quickly specifying columns (although the rationale for keywords is unclear to me), but I'd personally have to think about the role of types on those columns. I like Pyret's sanitizers, which provide cleanliness guarantees that the user can specify, thus protecting the programmer from ill-formatted data in the table. 

I suppose data-table is a usable library at this point, but it's highly fragile while I'm working on it. However, I'm happy to push to Github as well as Bitbucket (which apparently does not play well with the package distribution system?), so that it can be poked at by others.

Cheers,
Matt


Sam Tobin-Hochstadt

unread,
Feb 22, 2019, 8:47:33 AM2/22/19
to Matt Jadud, Jack Firth, Racket Users
Bitbucket should work fine with the package system -- just provide the
URL for the git repository as the source and everything should be good
to go.

Sam
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

travis.h...@gmail.com

unread,
Feb 22, 2019, 10:22:07 AM2/22/19
to Racket Users
The data-science package isn't focused on the table (or data frame) structure (it uses lists of lists) but it includes tooling that is useful for working with data stored in that type of structure such as "split -> apply -> combine", column indexing, subsetting, grouping, and aggregating.


In particular, I find the documentation for that package very approachable for people coming from R/Matlab.

Greg Hendershott

unread,
Feb 22, 2019, 10:50:11 AM2/22/19
to Racket Users
The overall idea sounds great. I don't really understand the
motivation for "records" with #:keywords?

Maybe you could add a quick explanation about how/when/why they would
be preferred over "more Rackety" choices:

When the keys aren't known at compile time:

- hasheq hash-tables with symbol keys, like jsexprs
- association lists

When the keys are known at compile time:

- structs


p.s. None of the above isn't about the name. But about that: I'm not a
"classic Scheme" person but I think "record" ~= "struct" there so that
might be confusing?

On Thu, Feb 21, 2019 at 2:59 PM <jackh...@gmail.com> wrote:
>

Ryan Kramer

unread,
Feb 22, 2019, 10:55:13 AM2/22/19
to Racket Users
On the topic of tables, I recently thought "It would be nice if DrRacket had some awareness of tabular data, in the same way that picts and syntax objects get special treatment."

For my project, I just wrote a quick-and-dirty function to make an ASCII art table and moved on: https://github.com/default-kramer/plisqin/blob/3d48cacd3c4239aa0a3a96d712cfdef09272422c/private/rows-result-to-string.rkt#L80

But if we're all working with tables, perhaps it would be worth the time investment to enhance DrRacket so that tabular data is formatted better, copies well to a spreadsheet, and is scrollable in the case of larger data sets. Maybe more features I'm not thinking of.

I have no idea how much work this would be. But if this sounds like a good idea, and if someone could give me some guidance, I wouldn't mind making an attempt.

James Platt

unread,
Feb 22, 2019, 12:55:06 PM2/22/19
to Racket Users
In R, I have extensively used the sqldf package, which allows you to execute SQL commands on one or more data frames and get the results back as another data frame. You can connect it to different database engines to handle the SQL. Although sqlite is the default, I mostly used PostgreSQL because of it's extensive features. Windowing queries from PostgreSQL, for example, can be a really good solution in some circumstances.

As far as I understand the term, I think this is a good example of language oriented programming. It makes sense to me that Racket should have a way of using SQL to directly manipulate data structures.

Philip McGrath

unread,
Feb 22, 2019, 4:57:13 PM2/22/19
to Sam Tobin-Hochstadt, Matt Jadud, Jack Firth, Racket Users
On Fri, Feb 22, 2019 at 8:14 AM Matt Jadud <ma...@jadud.com> wrote:
However, I'm happy to push to Github as well as Bitbucket (which apparently does not play well with the package distribution system?), so that it can be poked at by others.

On Fri, Feb 22, 2019 at 8:47 AM Sam Tobin-Hochstadt <sa...@cs.indiana.edu> wrote:
Bitbucket should work fine with the package system -- just provide the
URL for the git repository as the source and everything should be good
to go.
 
Just in case anyone got the idea that Bitbucket might not work well with the package system from https://github.com/racket/racket/issues/2385, which I opened, that issue does not appear to be related to BitBucket after all and in any case only applies to private repositories.

I use Bitbucket with no problems to host public packages like ricoeur-tei-utils.

-Philip
Reply all
Reply to author
Forward
0 new messages