[Haskell-cafe] Efficient use of ByteString and type classes in template system

129 views
Skip to first unread message

Johan Tibell

unread,
Apr 14, 2007, 11:35:45 AM4/14/07
to haskel...@haskell.org
Hi Haskell Café!

I'm writing a perl/python like string templating system which I plan
to release soon:

darcs get http://darcs.johantibell.com/template

The goal is to provide simple string templating; no inline code, etc..
An alternative to printf and ++.

Example usage:

> import qualified Data.ByteString as B
> import Text.Template
>
> helloTemplate = "Hello, $name! Would you like some ${fruit}s?"
> helloContext = [("name", "Johan"), ("fruit", "banana")]
>
> test1 = B.putStrLn $ substitute (B.pack helloTemplate) helloContext

I want to make it perform well, especially when creating a template
once and then rendering it multiple times. "Compiling" the template is
a separate step from rendering in this use case:

> compiledTemplate = template $ B.pack helloTemplate
>
> test2 = B.putStrLn $ render compiledTemplate helloContext

A template is represented by a list of template fragments, each
fragment is either a ByteString literal or a variable which is looked
up in the "context" when rendered.

> data Frag = Lit ByteString | Var ByteString
> newtype Template = Template [Frag]

This leads me to my first question. Would a lazy ByteString be better
or worse here? The templates are of limited length. I would say the
length is usually between one paragraph and a whole HTML page. The
Template data type already acts a bit like a lazy ByteString since it
consists of several chunks (although the chunck size is not adjusted
to the CPU cache size like with the lazy ByteString).

Currently the context in which a template is rendered is represented
by a type class.

> class Context c where
> lookup :: ByteString -> c -> Maybe ByteString
>
> instance Context (Map String String) where
> lookup k c = liftM B.pack (Map.lookup (B.unpack k) c)
>
> instance Context (Map ByteString ByteString) where
> lookup = Map.lookup
>
> -- More instance, for [(String, String)], etc.

I added this as a convenience for the user, mainly to work around the
problem of not having ByteString literals. A typical usage would have
the keys in the context being literals and the values some variables:

> someContext = Map.fromList [("name", name), ("fruit", fruit)]

I'm not sure if this was a good decision, With this I'm halfway to the
(in)famous Stringable class and it seems like many smarter people than
me have avoided introducing such a class. How will this affect
performace? Take for example the rendering function:

> render :: Context c => Template -> c -> ByteString
> render (Template frags) ctx = B.concat $ map (renderFrag ctx) frags
>
> renderFrag :: Context c => c -> Frag -> ByteString
> renderFrag ctx (Lit s) = s
> renderFrag ctx (Var x) = case Text.Template.lookup x ctx of
> Just v -> v
> Nothing -> error $ "Key not found: " ++ (B.unpack x)

How will the type dictionary 'c' hurt performance here? Would
specializing the function directly in render help?

> render (Template frags) ctx = B.concat $ map (renderFrag f) frags
> where f = flip Text.Template.lookup ctx
>
> renderFrag f (Var x) = case f x of

I can see the implementation taking one of the following routes:
- Go full Stringable, including for the Template
- Revert to Context = Map ByteString ByteString which was the original
implementation.
- Some middle road, without MPTC, for example:
> class Context c where
> lookup :: ByteString -> c ByteString ByteString -> Maybe ByteString
This would allow the user to supply some more efficient data type for
lookup but not change the string type. Having a type class would allow
me to provide things like the possibility to create a Context from a
record where each record accessor function would server as key.
Something like:

> data Person { personName :: String, personAge :: Int }
would get converted (using Data?) to:
> personContext = [("personName", show $ personName aPerson),
> ("personAge", show $ personAge aPerson)]
but not actually using a Map but the record itself.

I guess my more general question is: how do I reason about the
performance of my code or any code like this? Are there any other
performance improvements that could be made?

Also, I would be grateful if someone could provide some feedback on
the implementation, anything goes!

I still have some known TODOs:

- Import error messages for invalid uses of "$".
- Improve the regex usage overall.
- Add some more functions; the plan is to add those function which
could be expressed in efficiently with the current interface. An
example is things like renderAndWrite, when writing doing a B.concat
first is unnecessary.

Cheers,

Johan Tibell
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Donald Bruce Stewart

unread,
Apr 15, 2007, 9:04:48 PM4/15/07
to Johan Tibell
johan.tibell:
> Hi Haskell Caf?!

>
> I'm writing a perl/python like string templating system which I plan
> to release soon:
>
> darcs get http://darcs.johantibell.com/template
>
> The goal is to provide simple string templating; no inline code, etc..
> An alternative to printf and ++.

Ok. You might also want to briefly look at the other templating system I
know of in Haskell, this small module by Stefan Wehr,

http://www.cse.unsw.edu.au/~dons/code/icfp05/tests/unit-tests/VariableExpansion.hs

Just a quick thing he did for the ICFP contest, but does indicate one
way to do it (i.e. via pretty printing).

>
> Example usage:
>
> >import qualified Data.ByteString as B
> >import Text.Template
> >
> >helloTemplate = "Hello, $name! Would you like some ${fruit}s?"
> >helloContext = [("name", "Johan"), ("fruit", "banana")]
> >
> >test1 = B.putStrLn $ substitute (B.pack helloTemplate) helloContext
>
> I want to make it perform well, especially when creating a template
> once and then rendering it multiple times. "Compiling" the template is
> a separate step from rendering in this use case:
>
> >compiledTemplate = template $ B.pack helloTemplate
> >
> >test2 = B.putStrLn $ render compiledTemplate helloContext
>
> A template is represented by a list of template fragments, each
> fragment is either a ByteString literal or a variable which is looked
> up in the "context" when rendered.
>
> >data Frag = Lit ByteString | Var ByteString
> >newtype Template = Template [Frag]
>
> This leads me to my first question. Would a lazy ByteString be better
> or worse here? The templates are of limited length. I would say the
> length is usually between one paragraph and a whole HTML page. The
> Template data type already acts a bit like a lazy ByteString since it
> consists of several chunks (although the chunck size is not adjusted
> to the CPU cache size like with the lazy ByteString).

Probably lazy bytestrings are better here, since you get O(n/k) append
cost, rather than O(n). If most strings are small, it mightn't be
noticeable.

> Currently the context in which a template is rendered is represented
> by a type class.
>
> >class Context c where
> > lookup :: ByteString -> c -> Maybe ByteString
> >
> >instance Context (Map String String) where
> > lookup k c = liftM B.pack (Map.lookup (B.unpack k) c)
> >
> >instance Context (Map ByteString ByteString) where
> > lookup = Map.lookup
> >
> >-- More instance, for [(String, String)], etc.
>
> I added this as a convenience for the user, mainly to work around the
> problem of not having ByteString literals. A typical usage would have
> the keys in the context being literals and the values some variables:

note sure if it is relevant, but:

pack "Foo"

will be converted via rewrite rules to a bytestring literal at compile
time. So there's no overhead for having String literals.

>
> >someContext = Map.fromList [("name", name), ("fruit", fruit)]
>
> I'm not sure if this was a good decision, With this I'm halfway to the
> (in)famous Stringable class and it seems like many smarter people than

Yes, seems a little worrying.

> me have avoided introducing such a class. How will this affect
> performace? Take for example the rendering function:
>
> >render :: Context c => Template -> c -> ByteString
> >render (Template frags) ctx = B.concat $ map (renderFrag ctx) frags
> >
> >renderFrag :: Context c => c -> Frag -> ByteString
> >renderFrag ctx (Lit s) = s
> >renderFrag ctx (Var x) = case Text.Template.lookup x ctx of
> > Just v -> v
> > Nothing -> error $ "Key not found: " ++
> > (B.unpack x)
>
> How will the type dictionary 'c' hurt performance here? Would
> specializing the function directly in render help?

Hmm. Hard to say: look at the Core code and we will know.

Really though, you'll need some stress test cases to be able to make
resonable conclusions about performance.

I'd suggest: keep it simple and fast. Then work out what extra stuff you
need.

-- Don

Thomas Hartman

unread,
Apr 17, 2007, 6:32:48 AM4/17/07
to Haskell-cafe
Created wiki page

http://haskell.org/haskellwiki/String_Interpolation

and referenced various topics mentioned in this thread, there.

2007/4/16, Donald Bruce Stewart <do...@cse.unsw.edu.au>:

Johan Tibell

unread,
Apr 17, 2007, 8:39:33 AM4/17/07
to Thomas Hartman
Great!

I've written some QuickCheck tests now (not commited) so I can start
to swap out the implementation and benchmark it. After I get it to run
fast enough and some nice utility methods (like the possibility of
using records as context) I'll announce a version 1.0.

Johan

Reply all
Reply to author
Forward
0 new messages