Hi,
I had replaced list_vt with qlist and now it is the most efficient version I could get now:
```
patscc -O2 -DATS_MEMALLOC_LIBC -I"../libs/" -o test7 TEST/test7.dats
/run/current-system/sw/bin/time ./test7
1077871
0.84user 0.01system 0:00.85elapsed 100%CPU (0avgtext+0avgdata 35108maxresident)k
0inputs+0outputs (0major+8486minor)pagefaults 0swaps
```
in comparison to haskell's:
```
1.16user 0.02system 0:01.18elapsed 99%CPU (0avgtext+0avgdata 74492maxresident)k
0inputs+0outputs (0major+17827minor)pagefaults 0swaps
```
I have to note, that test4 is more correct than test3: it is the same, except that it is using GC to collect stream()
```
patscc -O2 -DATS_MEMALLOC_GCBDW -lgc -o test4 TEST/test4.dats
/run/current-system/sw/bin/time ./test4
1077871
2.13user 0.03system 0:02.13elapsed 101%CPU (0avgtext+0avgdata 97344maxresident)k
```
which is expected to use less memory, but increases time to GC (as expected as well), but, technically, this version is the closest to haskell's one.
for 2^30 constraint:
```
patscc -O2 -DATS_MEMALLOC_LIBC -I"../libs/" -o test7 TEST/test7.dats
/run/current-system/sw/bin/time ./test7
54400028
227.79user 0.63system 3:48.43elapsed 99%CPU (0avgtext+0avgdata 1701428maxresident)k
0inputs+0outputs (0major+425064minor)pagefaults 0swaps
```
haskell version with GHC8.10:
```
54400028
278.48user 0.62system 4:38.75elapsed 100%CPU (0avgtext+0avgdata 3016564maxresident)k
0inputs+0outputs (0major+753319minor)pagefaults 0swaps
```
and test4
```
patscc -O2 -DATS_MEMALLOC_GCBDW -lgc -o test4 TEST/test4.dats
/run/current-system/sw/bin/time ./test4
54400028
704.39user 1.81system 11:46.22elapsed 99%CPU (0avgtext+0avgdata 4966216maxresident)k
0inputs+0outputs (0major+1241156minor)pagefaults 0swaps
```
I am wondering why test4 performs quite poor(I guess, Boehm GC is just too general/unoptimized), because, I had expected, that GC should free stream_con() instances as soon as they will be converted into stream_vt_con() by stream_t2vt... That is what haskell's GC is doing with intermediate data. I wish I was able to free stream_con() instance forcibly in stream_t2vt(). In this case, I expect it will be on par with haskell (I assume, that qlist is implemented as a flat array (I haven't actually checked, hehe))
Meanwhile:
1. I had started this topic, because Haskell's source code is a good example of composability of lazy evaluation: it is short, but quite powerful as I had spent quite some time to produce an outperforming version which is more difficult as well. Basically, I had to implement memoization using qlist() which is built-in into Haskell's runtime;
2. now I have an example of stateful lazy stream processing in ATS2 and it was very interesting as I had the impression that it is hard in ATS to keep an implicit state (as resource usage is harder without GC);
3. I have a food for thoughts, because, even if Haskell's version is short but powerful, lazy-by-default evaluation has a side-effect of having to dig and decide what is worse in particular case: code being too lazy or code being too strict. Just a simple example is a `foldr` implementation (
https://downloads.haskell.org/~ghc/latest/docs/html/libraries/base-4.16.1.0/src/Data.Foldable.html#foldr): which requires some time to understand how it is being done. Also it is hard to not to meet
https://downloads.haskell.org/~ghc/latest/docs/html/libraries/deepseq-1.4.6.1/Control-DeepSeq.html for forcing evaluation. So Haskell hides laziness but requires understanding how to force evaluation and marshall data to fit GC's expectations. ATS on its turn, exposes laziness primitives and requires understanding how to use laziness. For me, it seems that it is a more fair approach: if you don't use laziness, then you don't pay for it, unlike with haskell: if you write a strict code, you will still need to be aware of laziness/GC/marshalling and etc...;
4. I was missing `qlist_free` function;
5. it is still interesting if there a way how to improve test4: I understand, that having stream_free() will be impossible, as stream() is a datatype and type checker will not be able to protect from double free, as well as having possibility to have a recursive linear stream like this:
```
val rec ones: stream_vt(int) = $ldelay( stream_vt_cons( 1, ones))
```
will probably won't help as well, as it will require a possibility to forcibly walk through a stream without consuming it, which is hard due to requirement of usage of $ldelay....
So it looks like some Boehm GC optimization is the correct answer... Or, maybe it means that linear streams and qlist is the "only" correct version for such task... Especially, as we already know that older GHC version produces worse results
I hope that the topic of lazy streams in ATS is interesting for others :)