Read_result
outside of S
? This would avoid defining the same type alias in all implementations (ok, there are only three of them, but still)read_item
instead of get
--
You received this message because you are subscribed to the Google Groups "biocaml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biocaml+u...@googlegroups.com.
To post to this group, send email to bio...@googlegroups.com.
Visit this group at http://groups.google.com/group/biocaml.
To view this discussion on the web visit https://groups.google.com/d/msgid/biocaml/CAMu2m2K6TNXHqeDqNN-dOs%2BfOpXh5tABYHYNsAkMusX5TfAoTQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
- in Biocaml_fastq, I suggestread_item
instead ofget
- do you plan to add signature items for writing too?
- do you foresee any trouble to use transforms in this setting?
One more thing. Maybe I should make Future a separate library. I can already see myself wanting it for other projects.
--
You received this message because you are subscribed to the Google Groups "biocaml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biocaml+u...@googlegroups.com.
To post to this group, send email to bio...@googlegroups.com.
Visit this group at http://groups.google.com/group/biocaml.
To view this discussion on the web visit https://groups.google.com/d/msgid/biocaml/CAMu2m2%2B9YWXCbhScBB8hCOw%3DTart_QcwVf76%3D3pmCR_7RuGh7Q%40mail.gmail.com.
functorizing everything over a deferred-like signature, we have already tried, and it does not scale very far (preserving type-equalities, especially with Result.t's, slowly becomes a nightmare).
--
You received this message because you are subscribed to the Google Groups "biocaml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biocaml+u...@googlegroups.com.
To post to this group, send email to bio...@googlegroups.com.
Visit this group at http://groups.google.com/group/biocaml.
To view this discussion on the web visit https://groups.google.com/d/msgid/biocaml/CAMu2m2%2BdNknL%3DqUBsCWq6Cz5x6cSmjK1EXT6LOvGFjUwMZtuQQ%40mail.gmail.com.
I'd like to keep the signature as equivalent to Async.Std as possible. I don't want to introduce yet another API that people have to learn. My hope is to say "Future.S is Async. Go read Async's documentation".
- in Biocaml_fastq, I suggestread_item
instead ofget
read_item sounds like you get just one item, but actually this function returns a stream from which you can read all items.
- do you foresee any trouble to use transforms in this setting?
One reason I like this approach is that we don't need transforms anymore. My feeling is the Transform module is difficult to understand and discourages contributions. The motivation for it was to enable Lwt and Async support, but now we're getting that without going through Transform.
Also note that Pipe is a more sophisticated implementation of Transform. Admittedly it is also difficult to understand (if you use its more complex features), but at least it is part of a more widely used library. That goes a long way to making it more usable. For example, Pipe is covered in Real World OCaml.Similarly, we have Future.S.Deferred.Result.t, which can replace Flow.t, but better IMO. For one thing, Flow is defined only for Lwt. The functorization now gives us the Flow monad for Lwt, Async, and blocking calls. And again, we have the benefit of uniformity with an existing library, rather than introducing yet another API.
Overall, we can remove Transform and Flow, substantially simplifying Biocaml's implementation, while at the same time increasing its functionality.
--
You received this message because you are subscribed to the Google Groups "biocaml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biocaml+u...@googlegroups.com.
To post to this group, send email to bio...@googlegroups.com.
Visit this group at http://groups.google.com/group/biocaml.
To view this discussion on the web visit https://groups.google.com/d/msgid/biocaml/CAMu2m2%2BLFTR%2BrUuXjiLxF9bLgH8-ourai-2TNJp6K5xp-%2B5vGg%40mail.gmail.com.
On Fri, Jan 17, 2014 at 12:16 PM, Ashish Agarwal <agarw...@gmail.com> wrote:
On Fri, Jan 17, 2014 at 11:26 AM, Sebastien Mondet <sebastie...@gmail.com> wrote:
functorizing everything over a deferred-like signature, we have already tried, and it does not scale very far (preserving type-equalities, especially with Result.t's, slowly becomes a nightmare).Okay, I'll keep working in the `future` branch and get to the point of a command line app. Can you expand on why Result.t's are especially difficult?
Actually it's not really because of Result.t itself, some of the problems I remember were coming from *open* polymorphic variants (that we want on the `Error` side of a Result.t).
My point was just that as long as Read_result only contains a polymorphic variant type, it could be removed from S without altering compatibility with Async. But ok, at some point you'll want to equip Read_result with operations (monad operations, to begin with).
read/write seems a better alternative to me (Fastq.read ic).
I think we have to be careful about performance too before deciding to switch. Have you got some insights on this yet?
Yes, the "with type" annotations do not play well with open polymorphic variant types, you're right.
On Sat, Jan 18, 2014 at 4:52 AM, Philippe Veber <philipp...@gmail.com> wrote:
Yes, the "with type" annotations do not play well with open polymorphic variant types, you're right.Is there a simple example demonstrating the issue? I don't see what it could be given the functorization doesn't affect at all the type variables that the polymorphic variants would instantiate.
Yes, the "with type" annotations do not play well with open polymorphic variant types, you're right.
Err, now I'm not sure what I meant either :o). I remember having troubles with open polymorphic variants types and functors, but I have nothing specific in mind right now.Is there a simple example demonstrating the issue? I don't see what it could be given the functorization doesn't affect at all the type variables that the polymorphic variants would instantiate.
In order to be a little more constructive I started a benchmark on the future branch [1]. The bench is on counting the lines of a Fastq file, I do it with Transform and Future, using blocking or Lwt threads. The raw results:
Thread Transform Future
---------- ------------- ----------
Blocking 0.63 0.51
Lwt 0.99 3.53
A couple of remarks:
- I did not pay attention to size of read buffers
- the lwt/transform version furiously leaks memory so don't try the bench on a big file (up to a couple millions of lines is ok)
- more generally the lwt/transform version certainly is poorly written (I tried to get inspiration on some code of biocaml_app_common)
- it lacks an Async line, but I'd like to have a better lwt/transform version before starting it
Ok, so let's say it's a start, please feel free to improve it directly or to give me some indications on how to improve the lwt/transform version.
Cheers,
Philippe.
[1] https://github.com/pveber/biocaml/tree/future
--
You received this message because you are subscribed to the Google Groups "biocaml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biocaml+u...@googlegroups.com.
To post to this group, send email to bio...@googlegroups.com.
Visit this group at http://groups.google.com/group/biocaml.
To view this discussion on the web visit https://groups.google.com/d/msgid/biocaml/CAOOOohRX6Mn9Mv_h%2Bm00nAFrf2CoSwjAQzYD9_eJcxs1EJ0HWA%40mail.gmail.com.
Fastq: IO -> FASTQHere a simple example would be:In the code base we had at NYU, the main library was abstracted over an IO monad (up to May 2012), the problem was that the whole module dependency tree had to be "followed" by the functorization
If we have:Line: IO -> LINEFasta: IO -> FASTAThen the implementation of Fastq.Make must instanciate Line(IO) to use it
same for Fasta.MakeThen a program using both Fastq and Fasta will end up with incompatible Lines.t unless all the "with type" and/or "with module" have been passed around from bottom to top correctly (and that means *every* possible type or submodule defined anywhere)
The other solution is to implement the dependency tree with the functors:Line: IO -> LINEFastq: LINE -> FASTQFasta: LINE -> FASTQBiocaml: FASTA -> FASTQ -> ... -> BIOCAML
and that is also a big pain, if suddenly there is something like a Log module that is between IO and LINE we have to redefine everything (we can create mid-level "comon stuff" modules but it is still very painful),
or just imagine the 20 MB error message we'd get when we change an error type that is exposed in BIOCAML.(yes, before the defunktorization, I was often crashing/freezing emacs with a simple `M-x compile` and one of those signature mismatch messages)
Yes, the "with type" annotations do not play well with open polymorphic variant types, you're right.Err, now I'm not sure what I meant either :o). I remember having troubles with open polymorphic variants types and functors, but I have nothing specific in mind right now.Is there a simple example demonstrating the issue? I don't see what it could be given the functorization doesn't affect at all the type variables that the polymorphic variants would instantiate.
In order to be a little more constructive I started a benchmark on the future branch [1]. The bench is on counting the lines of a Fastq file, I do it with Transform and Future, using blocking or Lwt threads. The raw results:
Thread Transform Future
---------- ------------- ----------
Blocking 0.63 0.51
Lwt 0.99 3.53
A couple of remarks:
- I did not pay attention to size of read buffers
- the lwt/transform version furiously leaks memory so don't try the bench on a big file (up to a couple millions of lines is ok)It's strange to leak memory there,
and it still performs pretty well compared to the functor (0.99 Vs 3.53) ?
- more generally the lwt/transform version certainly is poorly written (I tried to get inspiration on some code of biocaml_app_common)
- it lacks an Async line, but I'd like to have a better lwt/transform version before starting it
Ok, so let's say it's a start, please feel free to improve it directly or to give me some indications on how to improve the lwt/transform version.
Cheers,
Philippe.
[1] https://github.com/pveber/biocaml/tree/future--To view this discussion on the web visit https://groups.google.com/d/msgid/biocaml/CAOOOohRX6Mn9Mv_h%2Bm00nAFrf2CoSwjAQzYD9_eJcxs1EJ0HWA%40mail.gmail.com.You received this message because you are subscribed to the Google Groups "biocaml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biocaml+u...@googlegroups.com.
To post to this group, send email to bio...@googlegroups.com.
Visit this group at http://groups.google.com/group/biocaml.--
You received this message because you are subscribed to the Google Groups "biocaml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biocaml+u...@googlegroups.com.
To post to this group, send email to bio...@googlegroups.com.
Visit this group at http://groups.google.com/group/biocaml.
To view this discussion on the web visit https://groups.google.com/d/msgid/biocaml/CALScVY%3DNdzkLG2K-tjjQccorniCdpYDd3ad3MCFg-iGR1w%3DJzg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/biocaml/CALScVY%3DNdzkLG2K-tjjQccorniCdpYDd3ad3MCFg-iGR1w%3DJzg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/biocaml/CAOOOohTqtFvXuz3eKZd%2BqmNEbWWs%2BiqvgPCKPxLpSRd6KPWuyg%40mail.gmail.com.
I might have missed this, but what is the reasoning for providing an
async interface for reading these file formats?
Philippe:
To be more precise, isn't it enough to write types of the form F(M).t in signatures to avoid type equations most of the time, if not always (in our particular case of course)?
On Tue, Jan 21, 2014 at 9:04 AM, Malcolm Matalka <mmat...@gmail.com> wrote:
I might have missed this, but what is the reasoning for providing an
async interface for reading thesee
file formats?
Good question! Well, we want to test the performance differences between the various approaches (using an OS thread or not), but we can't unless we first have the options available. Also, it would be nice to have the API for sake of uniformity since it's not hard to provide (despite our lengthy discussion of how to provide it, there's no fundamental challenge. The code is trivial). Finally, having the Lwt version would let us play with js_of_ocaml.
--
You received this message because you are subscribed to the Google Groups "biocaml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biocaml+u...@googlegroups.com.
To post to this group, send email to bio...@googlegroups.com.
Visit this group at http://groups.google.com/group/biocaml.
To view this discussion on the web visit https://groups.google.com/d/msgid/biocaml/CAMu2m2KhKVve8nBad2a4M7PhVzVVhFTDZK1FS02Wi69io%2B%3DhQQ%40mail.gmail.com.
> But have you done that for big files?IME, if you are processing a large file, you are better off tossing the
>
> Here what we want is to go through very large files in a streaming fashion,
> so if the parsing is in a posix-thread there will be a lot of inter-thread
> communication (very time a chunk is read it has to pass the hand to the
> main async "thread", it's easy to do with Lwt_condition but I don't know if
> the performance is good enough: there is a lot of "OS" context switching).
async loop out as is, because it's doing a lot of work you don't want
and it will cost you. I think it's better to toss those operations in
another process and let the OS handle it. Depending on the problem
you're solving, you'll want multi-core support anyways. And you'll
probably want to be able to run across a cluster, so processes are the
optimal (IMO) unit for such things.
I think, for big files, you're better off specializing the processing
program and using an async program to orchestrate things, like a
workflow engine.
To view this discussion on the web visit https://groups.google.com/d/msgid/biocaml/871u018hei.fsf%40gmail.com.
> we still depend on core, with js_of_ocaml it is still not really viableI'm hopeful Core will become js_of_ocaml compatible over time. They continue to make improvements in this direction.
--
You received this message because you are subscribed to the Google Groups "biocaml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biocaml+u...@googlegroups.com.
To post to this group, send email to bio...@googlegroups.com.
Visit this group at http://groups.google.com/group/biocaml.
To view this discussion on the web visit https://groups.google.com/d/msgid/biocaml/CAMu2m2JZWVAzyYJg-7uGxUghr2p3aQovppp32Zvmt9B5QZo3eg%40mail.gmail.com.
If you relax the definition of "compatible" to "it compiles and doesn't crash at start-up" then I think Core_kernel is already there (they removed "Num")
but a web-application that loads mega-bytes of unused code, and runs tons of unused module initializations will remain a joke by any standard(and those initializations leave the library in a very risky inconsistent state, it's difficult to know which functions can be called or not).
To view this discussion on the web visit https://groups.google.com/d/msgid/biocaml/87wqht6zzq.fsf%40gmail.com.
--
You received this message because you are subscribed to the Google Groups "biocaml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biocaml+u...@googlegroups.com.
To post to this group, send email to bio...@googlegroups.com.
Visit this group at http://groups.google.com/group/biocaml.
To view this discussion on the web visit https://groups.google.com/d/msgid/biocaml/CAOOOohQ_oPRQEsTEtbKMoCCzkWz-BjNtjZ%2BgiQHo2aMa4LgXAA%40mail.gmail.com.