documentation suggestions

379 views
Skip to first unread message

ivo welch

unread,
Feb 10, 2016, 10:58:37 AM2/10/16
to julia-users

ladies and gents---I am not (yet) a julia user.

may I suggest adding more examples into two places where julia users will face starting hurdles?

[1] the I/O docs of julia.  like, reading and writing csv files that are compressed and decompressed on-the-fly, even if not in the ultimate efficient manner.    a large fraction of the time and frustration of new users is consumed by the task of shoehorning data into and out of new computer languages.  with all of R's problem, the ' d <- read.csv("f.csv")' and 'd<-read.csv(pipe(paste("gzcat ", fname)))' reduced this entry frustration greatly.  perhaps xml file reading and writing.  perhaps...

[2] more 'standard task' programs would be great.  read a csv file, run a regression according to variable names on the command line, print output, draw a graph.  I know there are fragments throughout the docs, but some section with ready to run complete programs would be good, perhaps at the end of the manual.

in a year, I hope to switch my students from R to julia.

regards,

/iaw

Josh Day

unread,
Feb 10, 2016, 1:25:03 PM2/10/16
to julia-users
I think a lot of what you're looking for already exists.  It's just that things like "run a regression according to variable names" wouldn't belong in base Julia.  If you haven't already, I'd take a look at StatsBase.jl, DataFrames.jl, and GLM.jl.

ivo welch

unread,
Feb 10, 2016, 3:49:47 PM2/10/16
to julia...@googlegroups.com

indeed.  thank you, josh.  I would add a final chapter at 


with a set of links to various further resources, examples, full stand-alone programs, etc.  for me, at least, the perl cookbook and sets of self-contained snippet programs to start with, were the main reason why I learned perl many years ago.

the key problem to my use of julia over R for my students is that I do not have a resident julia expert at UCLA.  this won't change anytime soon, because they are hard to find (hire) :-(.  this google forum is great, but it's scary to switch without a double hull.  many, many full *working* standalone examples are the next best thing for me.

regards,

/iaw


----
Ivo Welch (ivo....@gmail.com)
http://www.ivo-welch.info/
J. Fred Weston Distinguished Professor of Finance
Anderson School at UCLA, C519
Free Finance Textbook, http://book.ivo-welch.info/
Exec Editor, Critical Finance Review, http://www.critical-finance-review.org/
Editor and Publisher, FAMe, http://www.fame-jagazine.com/

Jeffrey Sarnoff

unread,
Feb 10, 2016, 5:47:46 PM2/10/16
to julia-users, ivo....@gmail.com
That is a reasonable want; it may take Anderson some time to institute scholarships for expertise in Julia
If you were already expert with Julia, what would you have your students doing?


  for expertThat is a reasonable want.  As an alternative, Anderson is not offering scholarships earmarked for Julia experts. 

Jeffrey Sarnoff

unread,
Feb 10, 2016, 5:59:34 PM2/10/16
to julia-users, ivo....@gmail.com
If you want to use it,  the julia-jobs forum exists to let people know of opportunities that are posted there.    

ivo welch

unread,
Feb 10, 2016, 6:26:34 PM2/10/16
to Jeffrey Sarnoff, julia-users

ooops...I leaked my signature.   not a problem, but it is also was not necessarily what I had meant to say.  for those who are interested, here is a little background from my side of the world.

ucla anderson, like most other business schools, has been pretty ignorant with respect to any kind of research computing expertise.

this is beginning to change, as management schools (incl us) are moving towards one-year quantitatively oriented one-year masters program.  anderson already has a masters of financial engineering and is about to start a masters program in data analytics.  as for me, I am also trying to figure out how to offer more of this to MBA students, our traditional bread and butter, but it is not clear whether this can be implemented.  so, in the future, we will need more data, programming, and other computing support than we did in the past.  like every other industry.

it is exceedingly difficult to hire good programmers in a context like our's.  universities do not pay much, for institutional reasons.  individuals that are very good at this tend to be lured away to industry if they are good, and non-terminable if they are bad.  a year goes by very fast---we may find someone for one year, but then not the next.  any program has to be prepared to run for decades.  we cannot shut down a masters program for lack of a critical person.

our current IT department (both UCLA and Anderson) mostly handle basics, such as the network and Microsoft apps.  as far as I can tell, http://www.ats.ucla.edu/stat/ offers some R expertise, but not Julia expertise.  its depth has varied with the individuals working there.  there is no julia support afaik.

our best choices are typically individuals that want to get a phd and just happen to have good expertise.  R, julia, etc.  another choice would be someone who wants to work half-time on a project like julia and the other half-time work on direct program support.  job has nice benefits...

just to get a position approved can take UC about 3-6 months and is a high-effort affair.  we have rules up the wazoo.  there is also one month of data expertise that anyone would want to learn (WRDS, CRSP, Compustat).  I can spend a month full-time to get there.  sigh.

so, for the most part, the few of us faculty and phd students, who like programming have been bootstrapping it ourselves.  at UCLA Anderson, we are luckier in this respect than many other places (Keith Chen, Peter Rossi, John Mamer, ...), but it's tough.

julia expertise would be great for us to have.  it would have great externalities for us.  if anyone with deep julia expertise wants to apply to UCLA for a few years (phd, undergrad, master), with a side job at Anderson, then drop me an email ;-).  for obvious reasons, faculty has and wants no power to make admission decisions (or we would be besieged by our friends and family), but I could put in a good word with our admissions department(s).  it matters on the margin.  if someone working on julia wants a regular job, also please email me.

/iaw

Douglas Bates

unread,
Feb 11, 2016, 3:37:40 PM2/11/16
to julia-users
Hi Ivo,

Good to hear from you.
My main use of the RCall package is to import datasets from R into Julia.  If I have a dataset in an R package I use, e.g.

 julia> using RCall

julia> ds = rcopy("lme4::Dyestuff")
30x2 DataFrames.DataFrame
| Row | Batch | Yield  |
|-----|-------|--------|
| 1   | "A"   | 1545.0 |
| 2   | "A"   | 1440.0 |
| 3   | "A"   | 1440.0 |
| 4   | "A"   | 1520.0 |
| 5   | "A"   | 1580.0 |
| 6   | "B"   | 1540.0 |
| 7   | "B"   | 1555.0 |
| 8   | "B"   | 1490.0 |
| 9   | "B"   | 1560.0 |
| 10  | "B"   | 1495.0 |
| 11  | "C"   | 1595.0 |
| 12  | "C"   | 1550.0 |
| 13  | "C"   | 1605.0 |
| 14  | "C"   | 1510.0 |
| 15  | "C"   | 1560.0 |
| 16  | "D"   | 1445.0 |
| 17  | "D"   | 1440.0 |
| 18  | "D"   | 1595.0 |
| 19  | "D"   | 1465.0 |
| 20  | "D"   | 1545.0 |
| 21  | "E"   | 1595.0 |
| 22  | "E"   | 1630.0 |
| 23  | "E"   | 1515.0 |
| 24  | "E"   | 1635.0 |
| 25  | "E"   | 1625.0 |
| 26  | "F"   | 1520.0 |
| 27  | "F"   | 1455.0 |
| 28  | "F"   | 1450.0 |
| 29  | "F"   | 1480.0 |
| 30  | "F"   | 1445.0 |

If I wanted to read a CSV file using the facilities in R I could use

julia> rcopy("read.csv('/usr/share/distro-info/debian.csv')")
17x6 DataFrames.DataFrame
| Row | version | codename       | series         | created      | release      | eol          |
|-----|---------|----------------|----------------|--------------|--------------|--------------|
| 1   | 1.1     | "Buzz"         | "buzz"         | "1993-08-16" | "1996-06-17" | "1997-06-05" |
| 2   | 1.2     | "Rex"          | "rex"          | "1996-06-17" | "1996-12-12" | "1998-06-05" |
| 3   | 1.3     | "Bo"           | "bo"           | "1996-12-12" | "1997-06-05" | "1999-03-09" |
| 4   | 2.0     | "Hamm"         | "hamm"         | "1997-06-05" | "1998-07-24" | "2000-03-09" |
| 5   | 2.1     | "Slink"        | "slink"        | "1998-07-24" | "1999-03-09" | "2000-10-30" |
| 6   | 2.2     | "Potato"       | "potato"       | "1999-03-09" | "2000-08-15" | "2003-07-30" |
| 7   | 3.0     | "Woody"        | "woody"        | "2000-08-15" | "2002-07-19" | "2006-06-30" |
| 8   | 3.1     | "Sarge"        | "sarge"        | "2002-07-19" | "2005-06-06" | "2008-03-30" |
| 9   | 4.0     | "Etch"         | "etch"         | "2005-06-06" | "2007-04-08" | "2010-02-15" |
| 10  | 5.0     | "Lenny"        | "lenny"        | "2007-04-08" | "2009-02-14" | "2012-02-06" |
| 11  | 6.0     | "Squeeze"      | "squeeze"      | "2009-02-14" | "2011-02-06" | "2014-05-31" |
| 12  | 7.0     | "Wheezy"       | "wheezy"       | "2011-02-06" | "2013-05-04" | ""           |
| 13  | 8.0     | "Jessie"       | "jessie"       | "2013-05-04" | "2015-04-25" | ""           |
| 14  | 9.0     | "Stretch"      | "stretch"      | "2015-04-25" | ""           | ""           |
| 15  | 10.0    | "Buster"       | "buster"       | "2018-07-01" | ""           | ""           |
| 16  | NA      | "Sid"          | "sid"          | "1993-08-16" | ""           | ""           |
| 17  | NA      | "Experimental" | "experimental" | "1993-08-16" | ""           | ""           |


(It turns out that R's allowing either ' or " for enclosing strings is an advantage for quoting strings within strings.)

ivo welch

unread,
Feb 11, 2016, 4:06:45 PM2/11/16
to julia-users

hi doug---and vice-versa.  it's interesting that a core function (reading a .csv file) would not be in a native julia library.  when are you switching your students to julia?  regards,  /iaw


----
Ivo Welch (ivo....@gmail.com)
http://www.ivo-welch.info/
J. Fred Weston Distinguished Professor of Finance
Anderson School at UCLA, C519
Free Finance Textbook, http://book.ivo-welch.info/
Exec Editor, Critical Finance Review, http://www.critical-finance-review.org/
Editor and Publisher, FAMe, http://www.fame-jagazine.com/

Ariel Katz

unread,
Feb 11, 2016, 4:39:56 PM2/11/16
to julia-users, ivo....@gmail.com
Hello,

With regards to your specific point about CSV I/O,   there are a several ways to read CSV files in Julia.

Dataframes.jl:
df = readtable("data.csv")

- Base:

readdlm(source, delim::Char, T::Type; options...)

- And the current state of the art  with regards to speed, CSV.jl with its datastream integration.

Unless you are reading fairly large CSV files, I would stick to Dataframes.jl.

I would caution you though that Data I/O in Julia is still in its infancy and there are methods that are either slower than Python/R or missing (xls etc).

Zooming out a bit, I've found Data Wookie Month of Julia blog series to be the best getting started guide for practical data sciency Julia stuff. 

Douglas Bates

unread,
Feb 12, 2016, 12:49:27 PM2/12/16
to julia-users, ivo....@gmail.com
On Thursday, February 11, 2016 at 3:06:45 PM UTC-6, ivo welch wrote:

hi doug---and vice-versa.  it's interesting that a core function (reading a .csv file) would not be in a native julia library.  when are you switching your students to julia?  regards,  /iaw

Writing a function to read a .csv file is not trivial - partly because CSV is not well-defined.  It is also the case of an itch getting scratched - if those working on Julia with the skills to write such a function don't have a need to read .csv files that particular functionality stagnates.

The definition and functionality of data frames, which are the natural output when reading a CSV file,  in Julia is still being debated.  In R the choices were much easier because R was designed to emulate S version 3 in which a data frame was a central construct.  Sacrifices in performance were made to allow for checking for NA's during each atomic arithmetic operation.  That trade-off wouldn't fly in Julia.  Also R vector structures all allow for element names - again at an expense in performance.

I'm not really in the position to convert my students as I am now an Emeritus Professor.  I do still offer a seminar series on "Statistics with Julia" and have convinced some students to use Julia in thesis research.

I would be quite happy with Julia if only git and I got along better.  I just lost three days worth of work this morning because of yet another git disaster.

Jeffrey Sarnoff

unread,
Feb 12, 2016, 1:11:49 PM2/12/16
to julia-users, ivo....@gmail.com
Doug,  I found some shelter after months of being bit by git using this (free for non-commercial use) www.syntevo.com/smartgit/ 

Kristoffer Carlsson

unread,
Feb 12, 2016, 4:40:27 PM2/12/16
to julia-users
It is hard to actually lose work with git. With git reflog you can always see where your HEAD has been and you can then git reset to a previous revision.

Po Choi

unread,
Feb 13, 2016, 2:17:33 AM2/13/16
to julia-users, jeffrey...@gmail.com, ivo....@gmail.com
Before seriously hiring someone to bring Julia into your school, perhaps you first can try the commercial service from http://juliacomputing.com/ to organize some workshops or events to see how the students and other faculties feel about the potential of Julia.
Reply all
Reply to author
Forward
0 new messages