Would it be possible to get a Data instance for Data.Text.Text?
No, that's definitely not correct, or even remotely scalable as we
increase the number of abstract types in disparate packages. If
someone suggests it's necessary for their generics library, I suggest
you use Uniplate ;-)
There are two options, both listed in the above email.
1) Use string conversion in the instance. This is morally correct, and
works perfectly. However, as mentioned, it's not great performing. The
Map/Set instances both do a similar trick.
2) Just add deriving on the Data type, and hope no one abuses the
internals. This is what ByteString does, it works great, it's fast,
but you are violating some amount of abstraction. You have to trust
people not to break that abstraction, but it's not a simple
abstraction to break - it's the moral equivalent of pointer prodding
in a std::string, no one breaks it accidentally.
> If that feels too arduous, I'd consider adding your suggested instance of
> Data until such time as the One True Generics Package emerges to walk the
> earth. But please give it a think first.
Data.Data is the one true runtime reflection package, so Data
instances are strongly advised, totally ignoring Generics stuff. I
would pick option 2, but a Data instance really is useful.
Thanks, Neil
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Data/Text/Array.hs:104:35:
Couldn't match kind `#' against `*'
When matching the kinds of `ByteArray# :: #' and `d :: *'
Expected type: d
Inferred type: ByteArray#
In the first argument of `z', namely `Array' "
The problem with a Data instance for Text is that it is using this
ByteArray# type, which can't easily interact with the Data type-class
because it's a special type. I would suggest providing a Data instance
for ByteArray#, but I don't think that's possible either. As far as I
can understand it all, your Data instance is probably the closest you
are going to get to having a decent Data instance without something else
(GHC/SYB) changing significantly.
Thanks,
Neil.
No, that's definitely not correct, or even remotely scalable as weincrease the number of abstract types in disparate packages.
[..]
> Any other suggestions?
4. Write a new package:
* serialize-text
* text-instances (which would be a place holder for more instances)
I would go for trying solution 2. and otherwise solution 4.
--
Nicolas Pouillard
http://nicolaspouillard.fr
The only safe rule is: if you don't control the class, C, or you don't
control the type constructor, T, don't make instance C T. Application
writers can often relax that rule as the set of dependencies for the
whole application is known and in many cases any reasonable instance
for a class C and constructor T is acceptable. Under those
conditions, the worst-case scenario is that the application writer may
need to remove an instance declaration when migrating to new versions
of the dependencies. When you control a class C, you should make as
many (relevant) type constructors instances of it as is reasonably
possible, i.e. without adding any extensive dependencies. So at the
very least, all standard type constructors. Similarly for those who
control a type constructor T. This is for convenience. These
correspond to solutions #1 and #2 only significantly weakened.
Definitely, making a package depend on tons of other packages just to
add instances is NOT the correct solution.
The library writers depending on a package for a class and another
package for a type are the problem case. There are three potential
solutions in this case which basically are reduce the problem to one
of the above three cases. Either introduce a new type and add it to a
class, introduce a new class and add the types to it, or try to push
the resolution of such things onto the application writer. The first
two options have the benefit that they also protect you from the
upstream libraries introducing instances that won't work for you.
These two options have the drawback that they are usually less
convenient to use. The last option has the benefit that it usually
corresponds to having a more flexible/generic library, in some cases
you can even go so far as to remove your dependence on the libraries
altogether.
One solution to this problem though it can't be done post-hoc usually,
is to simply not use the class mechanism except as a convenience.
This has the benefit that it usually leads to more flexibility and it
helps to realize the third option above. Using Monoid as an example,
one can provide functions of the form: f :: m -> (m -> m -> m) -> ...
and then also provide f' = f mempty mappend :: Monoid m => ... The
parameters can be collected into a record as well. You could even
systematize this into: class C a where getCDict :: CDict a, and then
write f :: CDict a -> ... and f' = f getCDict :: C a => ...
Whatever one does, do NOT add instances of type constructors you don't
control to classes you don't control. This can lead to cases where
two libraries can't be used together at all.
I agree in principle, but in the real world you can't live by this rule.
Example, I want to use Uniplate to traverse the tree built by haskell-src-exts,
Using Data.Data is too slow, so I need to make my own instances.
HSE provides like 50 types that need instances, and it has to be
exactly those types.
Also, Uniplate requires instances of a particular class it has.
I don't own either of these packages. Including the HSE instances in
Uniplate would just be plain idiotic.
Including the Uniplate instances with HSE would make some sense, but
would make HSE artificially depend on Uniplate for those who don't
want the instances.
So, what's left is to make orphan instances (that I own). It's not
ideal, but I don't see any alternative to it.
-- Lennart
The problem with Data for Text isn't that we have to write a new
instance, but that you could argue that proper handling of Text with
Data would not be using a type class, but have special knowledge baked
in to Data. That's far worse than the Serialise problem mentioned
above, and no one other than the Data authors could solve it. Of
course, I don't believe that, but it is a possible interpretation.
The Serialise problem is a serious one. I can't think of any good
solutions, but I recommend you give knowledge of your serialise class
to Derive (http://community.haskell.org/~ndm/derive/) and then at
least the instances can be auto-generated. Writing lots of boilerplate
and regularly ripping it up is annoying, setting up something to
generate it for you reduces the pain.
>> The only safe rule is: if you don't control the class, C, or you don't
>> control the type constructor, T, don't make instance C T.
>
> I agree in principle, but in the real world you can't live by this rule.
> Example, I want to use Uniplate to traverse the tree built by haskell-src-exts,
> Using Data.Data is too slow, so I need to make my own instances.
> HSE provides like 50 types that need instances, and it has to be
> exactly those types.
> Also, Uniplate requires instances of a particular class it has.
Read my recent blog post
(http://neilmitchell.blogspot.com/2010/01/optimising-hlint.html), I
optimised Uniplate for working with HSE on top of the Data instances -
it's now significantly faster in some cases, which may mean you don't
need to resort to the Direct stuff. Of course, if you do, then
generating them with Derive is the way to go.
Thanks, Neil
Hi,
The problem with Data for Text isn't that we have to write a new
instance, but that you could argue that proper handling of Text with
Data would not be using a type class, but have special knowledge baked
in to Data. That's far worse than the Serialise problem mentioned
above, and no one other than the Data authors could solve it. Of
course, I don't believe that, but it is a possible interpretation.
The Serialise problem is a serious one. I can't think of any good
solutions, but I recommend you give knowledge of your serialise class
to Derive (http://community.haskell.org/~ndm/derive/) and then at
least the instances can be auto-generated. Writing lots of boilerplate
and regularly ripping it up is annoying, setting up something to
generate it for you reduces the pain.
instance Serialize MyType where
getCopy = gget
putCopy = gput
instance (Data a) => Serialize a where ...
Hi Jeremy,
As Neil Mitchell said before, if you really don't want to expose the internals of Text (by just using a derived instance) then you have no other alternative than to use String conversion. If you've been using it already and performance is not a big problem, then I guess it's ok.
Regarding the Serialize issue, maybe I am not understanding the problem correctly: isn't that just another generic function? There are generic implementations of binary get and put for at least two generic programming libraries in Hackage [1, 2], and writing one for SYB shouldn't be hard either, I think. Then you could have a trivial way of generating instances of Serialize, namely something like
instance Serialize MyType where
getCopy = gget
putCopy = gput
Isn't it better to write
error "Data.Text.Text: toConstr"
Usually I try to do this as we don't get stack traces for _|_.
--
Felipe.
On Tue, Jan 26, 2010 at 11:52:34AM -0600, Jeremy Shaw wrote:Isn't it better to write
> + toConstr _ = error "toConstr"
> + gunfold _ _ = error "gunfold"
error "Data.Text.Text: toConstr"
Usually I try to do this as we don't get stack traces for _|_.
>> The problem with Data for Text isn't that we have to write a new
>> instance, but that you could argue that proper handling of Text with
>> Data would not be using a type class, but have special knowledge baked
>> in to Data. That's far worse than the Serialise problem mentioned
>> above, and no one other than the Data authors could solve it. Of
>> course, I don't believe that, but it is a possible interpretation.
>
> Right.. that is the problem with Text. Do you think the correct thing to do for gunfold and toConstr is to convert the Text to a String and then call the gufold and toConstr for String? Or something else?
No idea sadly - the SYB stuff was never designed to work with abstract
structures, or structures containing strict/unboxed components.
Converting the Text to a String should work, so in the absence of any
better suggestions, that seems reasonable.
>> The Serialise problem is a serious one. I can't think of any good
>> solutions, but I recommend you give knowledge of your serialise class
>> to Derive (http://community.haskell.org/~ndm/derive/) and then at
>> least the instances can be auto-generated. Writing lots of boilerplate
>> and regularly ripping it up is annoying, setting up something to
>> generate it for you reduces the pain.
>
> We currently use template haskell to generate the Serialize instances in most cases (though some data types have more optimized encodings that were written by hand). However, you must supply the Version and Migration instances by hand (they are super classes of Serialize).
> I am all for splitting the Serialize stuff out of happstack .. it is not really happstack specific. Though I suspect pulling it out is not entirely trivial either. I think the existing code depends on syb-with-class.
If you switch to Derive then you can generate the classes with
Template Haskell, or run the Derive tool as a preprocessor. Derive
abstracts over these details, and also tends to be much easier than
working within Template Haskell (which I always find surprisingly
difficult).
I think so... none of the other instances do.. but I guess that is not a very good excuse :)
I have attached a new version that should work with GHC 6.10, though I have not tested it.