Yesterday I uploaded a new package on Hackage called generic-deepseq. It
implements the 'deepseq' function generically using the new GHC.Generics
framework as found in GHC >= 7.2. It can be used as a replacement for
the deepseq package.
Given that hackage is currently down, here is an URL from a mirror where
the package description & documentation can be found:
http://hackage.factisresearch.com/package/generic-deepseq-1.0.0.0
Any suggestions are welcome.
Cheers,
Maxime Henrion
Nice work but it would be nice to have this functionality directly in
the deepseq package as in:
#ifdef GENERICS
{-# LANGUAGE DefaultSignatures, TypeOperators, FlexibleContexts #-}
#endif
class NFData a where
rnf :: a -> ()
rnf a = a `seq` ()
#ifdef GENERICS
default rnf :: (Generic a, GNFData (Rep a)) => a -> ()
rnf = grnf . from
class GNFData f where
grnf :: f a -> ()
instance GNFData U1 where
grnf U1 = ()
{-# INLINE grnf #-}
instance NFData a => GNFData (K1 i a) where
grnf = rnf . unK1
{-# INLINE grnf #-}
instance GNFData f => GNFData (M1 i c f) where
grnf = grnf . unM1
{-# INLINE grnf #-}
instance (GNFData f, GNFData g) => GNFData (f :+: g) where
grnf (L1 x) = grnf x
grnf (R1 x) = grnf x
{-# INLINE grnf #-}
instance (GNFData f, GNFData g) => GNFData (f :*: g) where
grnf (x :*: y) = grnf x `seq` grnf y
{-# INLINE grnf #-}
#endif
Unfortunately this is not possible since the two default
implementations conflict. I see two solutions:
1) Change the DefaultSignatures extension to always give preference to
the default signature. I think giving preference to the default
signature makes sense since it's usually more specific (more
constraint) and thus "more correct" than the default implementation.
2) Remove the default implementation of rnf. I understand the default
implementation gives some convenience when writing instances for types
that have an all strict representation, as in:
instance NFData Int
instance NFData Word
instance NFData Integer
...
However, I think having the default implementation can mask some bugs as in:
data T = C Int; instance NFData T
which will neither give a compile time error nor warning.
I don't think it's that much more inconvenient to write:
instance NFData Int where rnf = rnf'
instance NFData Word where rnf = rnf'
instance NFData Integer where rnf = rnf'
...
where
rnf' :: a -> ()
rnf' a = a `seq` ()
So I would vote for option 2, removing the default rnf implementation.
If I find some time I will turn this into an official proposal.
Regards,
Bas
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
I agree it would have been nice to have that functionality directly in
the deepseq package, or at least in a way that extends the existing
functionality rather than completely replace it. However, as you noted,
it isn't possible to do that in a backwards compatible way, unless we
hack the implementation of the DefaultSignatures extension. That being
said, even if it was possible to do this in a backwards compatible way,
I'm not entirely sure it would be desirable to do so because there is
one subtle difference between this code and the deepseq package.
With the generic-deepseq package, you should only need to provide an
explicit DeepSeq instance for some type if it is abstract, because you
can't get a Generic instance in that case (unless the library author
derived Generic himself, but that would be a weird and dangerous thing
to do for an abstract datatype). If you're not dealing with an abstract
datatype, you _shouldn't_ have an explicit instance, because it would be
possible to write an incorrect one, while that is impossible if you just
derive a generic implementation (as long as the generic code is correct,
of course).
So, knowing that it would necessarily be backwards incompatible (I
wasn't intending to hack on GHC :-), and also that, in the end, this is
not quite the same class as the NFData class from the deepseq package, I
thought it made more sense to create another package that would be
mostly compatible with deepseq, but with a different class name so as to
force people to reevaluate the need for their instances if they have
some. I'd be interested in knowing what you and others think about that.
Maybe I'm being overly cautious?
I kept the rest of the API identical so that it's still easy to switch
to this package, thus you can still use the ($!!), force, and rnf
functions. I'm guilty of not having preserved the "rnf :: a -> ()"
function as the class function though, it's a wrapper around "deepseq"
in my code. I just didn't see the point of having a class function with
such a signature versus having a function just like "seq :: a -> b ->
b". In retrospect, that might have been a bad idea, and maybe I should
switch to have an "rnf :: a -> ()" class function to make switching even
easier?
Thanks a lot for your input!
Maxime Henrion
I agree. I hadn't considered this advantage yet. I guess it's the same
argument for why it's better to automatically derive Data and Typeable
instances using the DeriveDataTypeable extension.
> So, knowing that it would necessarily be backwards incompatible (I
> wasn't intending to hack on GHC :-), and also that, in the end, this is
> not quite the same class as the NFData class from the deepseq package, I
> thought it made more sense to create another package that would be
> mostly compatible with deepseq, but with a different class name so as to
> force people to reevaluate the need for their instances if they have
> some. I'd be interested in knowing what you and others think about that.
> Maybe I'm being overly cautious?
I do think it's better to integrate this into the deepseq package (and
thus removing the default implementation of rnf). Otherwise we end up
with two ways of evaluating values to normal form.
> I'm guilty of not having preserved the "rnf :: a -> ()"
> function as the class function though, it's a wrapper around "deepseq"
> in my code. I just didn't see the point of having a class function with
> such a signature versus having a function just like "seq :: a -> b ->
> b". In retrospect, that might have been a bad idea, and maybe I should
> switch to have an "rnf :: a -> ()" class function to make switching even
> easier?
I'm not sure but maybe a method like "rnf :: a -> ()" is easier to optimize.
Also in my experience (with generics support in aeson and cereal) it's
a very good idea (performance-wise) to INLINE your methods like I did
in my previous message. Of course the only way to know for sure is the
create some (criterion) benchmarks.
One last issue: Say I have a type like: "data T = C !Int"
Currently GHC Generics can't express the strictness annotation. This
means that your deepseq will unnecessarily evaluate the Int (since it
will always be evaluated already). It would be nice if the strictness
information could be added to the K1 type. (José, would it be hard to
add this to GHC.Generics?)
I'm okay with that, but I was interested in knowing whether you think my
reasoning for changing the class name and thus deliberately breaking the
API slightly more was sane or not (I say "more" because removing the
default implementation of rnf already constitutes an API breakage in
that the generic replacement would be optional and depends on having
Generic instances to work).
> > I'm guilty of not having preserved the "rnf :: a -> ()"
> > function as the class function though, it's a wrapper around "deepseq"
> > in my code. I just didn't see the point of having a class function with
> > such a signature versus having a function just like "seq :: a -> b ->
> > b". In retrospect, that might have been a bad idea, and maybe I should
> > switch to have an "rnf :: a -> ()" class function to make switching even
> > easier?
>
> I'm not sure but maybe a method like "rnf :: a -> ()" is easier to optimize.
>
> Also in my experience (with generics support in aeson and cereal) it's
> a very good idea (performance-wise) to INLINE your methods like I did
> in my previous message. Of course the only way to know for sure is the
> create some (criterion) benchmarks.
Yeah, I should definitely get going on the benchmarks.
> One last issue: Say I have a type like: "data T = C !Int"
> Currently GHC Generics can't express the strictness annotation. This
> means that your deepseq will unnecessarily evaluate the Int (since it
> will always be evaluated already). It would be nice if the strictness
> information could be added to the K1 type. (José, would it be hard to
> add this to GHC.Generics?)
Assuming there is way to differentiate strict constructors in
GHC.Generics, and that I have a specific instance so as to not call seq
in that case, can you actually do such an optimization safely? Consider
this code (imports omitted for simplicity):
data T = C !Int deriving Generic
instance DeepSeq T
x :: T
x = undefined
main :: IO ()
main = print (x `deepseq` ())
I would expect this to diverge, just like it does if one uses `seq`. If
we implement the optimization you suggest, I believe that deepseq
wouldn't diverge, but I admit I'm not 100% sure either.
Cheers,
Maxime Henrion
I do think it's better to integrate this into the deepseq package (and
thus removing the default implementation of rnf). Otherwise we end up
with two ways of evaluating values to normal form.
One last issue: Say I have a type like: "data T = C !Int"
Currently GHC Generics can't express the strictness annotation. This
means that your deepseq will unnecessarily evaluate the Int (since it
will always be evaluated already). It would be nice if the strictness
information could be added to the K1 type. (José, would it be hard to
add this to GHC.Generics?)
Isn't it also an issue that bang patterns only guarantee WHNF, but
there might be unevaluated data "further inside"? Obviously not a
problem for !Int; I don't know if the logic is there to tell the
difference.
> Also in my experience (with generics support in aeson and cereal) it's
> a very good idea (performance-wise) to INLINE your methods like I did
> in my previous message. Of course the only way to know for sure is the
> create some (criterion) benchmarks.
btw, how much is GHC able to optimize the GHC.Generics based NFData
instance as e.g. compared to what GHC is able to optimize with
deepseq-th[1]?
> One last issue: Say I have a type like: "data T = C !Int"
> Currently GHC Generics can't express the strictness annotation. This
> means that your deepseq will unnecessarily evaluate the Int (since it
> will always be evaluated already). It would be nice if the strictness
> information could be added to the K1 type. (José, would it be hard to
> add this to GHC.Generics?)
that's btw one thing that I tried hard to avoid in deepseq-th[1], by
having a hacky predicate
decWhnfIsNf :: Dec -> Q (Maybe Bool)
whis is able to detect whether WHNF is the same as NF for a given
declaration;
e.g. in the following case (see also the example in [1]):
data Foo = Foo !Int !Int !Int
data Bar = Bar !Foo !Foo
the instance generated would be:
instance NFData Foo
instance NFData Bar
since the WHNF=NF property holds for Foo as well as for Bar
[1]: http://hackage.haskell.org/packages/archive/deepseq-th/0.1.0.2/doc/html/Control-DeepSeq-TH.html
Well I wrote some dumb criterion benchmarks that run deepseq over
increasingly bigger lists of numbers, and it appears that using rnf as
the member function of the DeepSeq class indeed makes a _huge_
difference.
According to criterion, the performance of the old generic-deepseq code
was 6 to 7 times worse than that of the deepseq package. After switching
the class function to rnf, it got on par, if not better than the deepseq
package. I'm saying "if not", because I've observed contradicting
results from criterion, when I ran benchmarks for both packages at once,
and when I ran those separately. When running both at once,
generic-deepseq is slower than deepseq, except for the test with the
bigger list (see report-both.html). When ran separately, generic-deepseq
is consistantly faster (see report-deepseq.html and
report-gdeepseq.html). The criterion benchmark can be found on the
bitbucket repo at http://mu.org/~mux/report-deepseq.html.
I have released a 2.0.0.0 version of this package on hackage with this
change and a few more instances for base types.
Cheers,
Maxime Henrion
When running criterion benchmarks use the -g flag to prevent
benchmarks from interfering with each other due to GC. Also, set a
high initial heap size (e.g. -H1G) so the benchmarks that run first
don't have to pay the price of GHC growing the heap.
-- Johan
Thanks Johan! I thought the -g option was the default, so hadn't
bothered with it. Unfortunately though, it doesn't seem to make a
difference in my case, I get similar results than previously when
running both tests at once (see attached report).
Cheers,
Maxime
Nice, that's what I expected. Have you checked if adding INLINE
pragma's helps even more? (I guess not since it's already on par with
manual written code, as you mentioned)
BTW I would also recommend making a benchmark for a big sum type.
Some nitpicking:
* In the instance:
instance GDeepSeq U1 where grnf _ = ()
I think it makes sense to pattern match on the U1 constructor, as in:
grnf U1 = ().
I haven't checked if that's necessary but my fear is that assuming:
data Unit = Unit deriving Generic; instance DeepSeq Unit
rnf (⊥ :: Unit) would equal: () while I would expect it to equal ⊥.
* Why do you have the instance:
instance GDeepSeq V1 where grnf _ = ()
The only way to construct values of a void type is using ⊥. And I
would expect that rnf ⊥ = ⊥, not (). I think the best thing is to just
remove the V1 instance.
Cheers,
Oh, I had forgotten to mention that: INLINE pragmas indeed didn't make
any significant difference in my tests, so I let them out.
> BTW I would also recommend making a benchmark for a big sum type.
Yes, I should definitely do that.
> Some nitpicking:
>
> * In the instance:
>
> instance GDeepSeq U1 where grnf _ = ()
>
> I think it makes sense to pattern match on the U1 constructor, as in:
> grnf U1 = ().
>
> I haven't checked if that's necessary but my fear is that assuming:
> data Unit = Unit deriving Generic; instance DeepSeq Unit
> rnf (⊥ :: Unit) would equal: () while I would expect it to equal ⊥.
That's a good point, I will do tests to see whether this makes a
difference. This seems to mirror my thinking when I asked you whether it
would be safe to skip seq calls in the case the constructor of a type is
strict in another e-mail (have you seen it?).
> * Why do you have the instance:
>
> instance GDeepSeq V1 where grnf _ = ()
>
> The only way to construct values of a void type is using ⊥. And I
> would expect that rnf ⊥ = ⊥, not (). I think the best thing is to just
> remove the V1 instance.
This would have the consequence that any type tagged with a phantom type
(for whatever reason) couldn't be used with deepseq, it would return
bottom. What if I want to deepseq a 2-3 finger tree tagged with a
type-level natural that ensures the proper shape of the tree statically?
It seemed to me that I should be able to do that; this is why I added
this V1 instance.
Cheers,
Maxime
This would have the consequence that any type tagged with a phantom type
> * Why do you have the instance:
>
> instance GDeepSeq V1 where grnf _ = ()
>
> The only way to construct values of a void type is using ⊥. And I
> would expect that rnf ⊥ = ⊥, not (). I think the best thing is to just
> remove the V1 instance.
(for whatever reason) couldn't be used with deepseq, it would return
bottom. What if I want to deepseq a 2-3 finger tree tagged with a
type-level natural that ensures the proper shape of the tree statically?
It seemed to me that I should be able to do that; this is why I added
this V1 instance.
Yes, such as the usual type-level naturals (not using DataKinds):
data Z
data S n
Those can be used to tag a type which also contains actual values that
you would want to deepseq? For example, a length-type vector? I seemed
to remember a similar construct for 2-3 finger trees that would
statically guarantee that the shape of the tree is valid, so I took that
as an example, but I don't remember the specifics.
Cheers,
Maxime
>
>
data Proxy t = Proxy
Oh, ok; in that case, I probably don't need that V1 instance indeed. I
should probably write QuickCheck tests using the ChasingBottoms package
in order to ensure correct behaviour of this code.
Thanks,
Maxime
Cheers,
Andres
I don't understand what's going on here. Instances for V1 should of
course be defined if they can be! And in this case, a V1 instance
makes sense and should be defined. The definition itself doesn't
matter, as it'll never be executed.
data Ze deriving Generic
class DeepSeq a where
rnf :: a -> ()
default rnf :: (Generic a, GDeepSeq (Rep a)) => a -> ()
rnf = grnf . from
instance DeepSeq Ze
class GDeepSeq f where
grnf :: f a -> ()
instance GDeepSeq V1 where
grnf _ = ()
instance GDeepSeq a => GDeepSeq (M1 i c a) where
grnf = grnf . unM1
-- other instances are not relevant now
t :: Ze
t = undefined
>> I don't understand what's going on here. Instances for V1 should of
>> course be defined if they can be! And in this case, a V1 instance
>> makes sense and should be defined. The definition itself doesn't
>> matter, as it'll never be executed.
>
>
> The definition certainly matters:
[...]
You're right. I was too quick to conclude the definition doesn't
matter. But it should still be there. V1 can occur in representations
of non-empty types (even if the current mechanism might not generate
them). You'd still want to be able to call generic functions on such
types.
Would you have an example of a type for which it would be useful to have
a DeepSeq instance, and that would require a V1 instance? I cannot think
of one now; I originaly thought it would be necessary to permit deriving
DeepSeq instances for types tagged with "void" types, but as José
explained, in that case, the V1 instance isn't needed because those void
types don't show up in the representation.
Cheers,
Maxime
I just tested this and you were right; I have corrected the code in the
mercurial repository.
> * Why do you have the instance:
>
> instance GDeepSeq V1 where grnf _ = ()
>
> The only way to construct values of a void type is using ⊥. And I
> would expect that rnf ⊥ = ⊥, not (). I think the best thing is to just
> remove the V1 instance.
I have confirmed what Jos explained, and V1 instances are indeed not
necessary for the use case I originally intended them for, that is for
types tagged with void types. I have removed the V1 instance for now.
Thanks,
Maxime
While void datatypes are rare, it just doesn't make sense to exclude
them. It's an arbitrary restriction. Here's a constructed example:
data X a = C1 Int | C2 a
data Z -- empty
type Example = X Z
We're using Z as a parameter to X in order to exclude the use of the
C2 case. Without a V1 case, you cannot use deepSeq on values of type
Example.
> Would you have an example of a type for which it would be useful to haveWhile void datatypes are rare, it just doesn't make sense to exclude
> a DeepSeq instance, and that would require a V1 instance? I cannot think
> of one now; I originaly thought it would be necessary to permit deriving
> DeepSeq instances for types tagged with "void" types, but as José
> explained, in that case, the V1 instance isn't needed because those void
> types don't show up in the representation.
them. It's an arbitrary restriction. Here's a constructed example:
data X a = C1 Int | C2 a
data Z -- empty
type Example = X Z
We're using Z as a parameter to X in order to exclude the use of the
C2 case. Without a V1 case, you cannot use deepSeq on values of type
Example.
Thanks for all the input guys. I have just released generic-deepseq
2.0.1.0 to hackage, with fixed U1 and V1 instances per this discussion.
Cheers,
Maxime