A word on Ledger structure

290 views
Skip to first unread message

John Wiegley

unread,
Mar 14, 2012, 1:45:10 AM3/14/12
to ledge...@googlegroups.com
Ledger is developed as a tiered set of functionality, where lower tiers no
nothing about the higher tiers. In fact, I build multiple libraries during
the process, and link unit tests to these libraries, so that it is a link
error for a lower tier to violate this modularity.

Those tiers are:

- Utility code

There's lots of general utility in Ledger for doing time parsing, using
Boost.Regex, error handling, etc. It's all done in a way that can be
reused in other projects as needed.

- Commoditized Amounts (amount_t, commodity_t and friends)

An numerical abstraction combining multi-precision rational numbers (via
GMP) with commodities. These structures can be manipulated like regular
numbers in either C++ or Python (as Amount objects).

- Commodity Pool

Commodities are all owned by a commodity pool, so that future parsing of
amounts can link to the same commodity and established a consistent price
history and record of formatting details.

- Balances

Adds the concept of multiple amounts with varying commodities. Supports
simple arithmetic, and multiplication and division with non-commoditized
values.

- Price history

Amounts have prices, and these are kept in a data graph which the amount
code itself is only dimly aware of (there's three points of access so an
amount can query its revalued price on a given date).

- Values

Often the higher layers in Ledger don't care if something is an amount or a
balance, they just want to add stuff to it or print it. For this, I
created a type-erasure class, value_t/Value, into which many things can be
stuffed and then operated on. They can contain amounts, balances, dates,
strings, etc. If you try to apply an operation between two values that
makes no sense (like dividing an amount by a balance), an error occurs at
runtime, rather than at compile-time (as would happen if you actually tried
to divide an amount_t by a balance_t).

This is the core data type for the value expression language.

- Value expressions

The next layer up adds functions and operators around the Value concept.
This lets you apply transformations and tests to Values at runtime without
having to bake it into C++. The set of functions available is defined by
each object type in Ledger (posts, accounts, transactions, etc.), though
the core engine knows nothing about these. At its base, it only knows how
to apply operators to values, and how to pass them to and receive them from
functions.

- Query expressions

Expressions can be onerous to type at the command-line, so there's a
shorthand for reporting called "query expressions". These add no
functionality of there own, but are purely translated from the input string
(cash) down to the corresponding value expression (account =~ /cash/).
This is a convenience layer.

- Format strings

Format strings let you interpolate value expressions into string, with the
requirement that any interpolated value have a string representation.
Really all this does is calculate the value expression in the current
report context, call the resulting value's "to_string()" method, and stuffs
the result into the output string. It also provides printf-like behavior,
such as min/max width, right/left justification, etc.

- Journal items

Next is a base type shared by anything that can appear in a journal: an
item_t. It contains details common to all such parsed entities, like what
file and line it was found on, etc.

- Journal posts

The most numerous object found in a Journal, postings are a type of item
that contain an account, an amount, a cost, and metadata. There are some
other complications, like the account can be marked virtual, the amount
could be an expression, etc.

- Journal transactions

Postings are owned by transactions, always. This subclass of item_t knows
about the date, the payee, etc. If a date or metadata tag is requested
from a posting and it doesn't have that information, the transaction is
queried to see if it can provide it.

- Journal accounts

Postings are also shared by accounts, though the actual memory is managed
by the transaction. Each account knows all the postings within it, but
contains relatively little information of its own.

- The Journal object

Finally, all transactions with their postings, and all accounts, are owned
by a journal_t object. This is the go-to object for querying ad reporting
on your data.

- Textual journal parser

There is a textual parser, wholly contained in textual.cc, which knows how
to parse text into journal objects, which then get "finalized" and added to
the journal. Finalization is the step that enforces the double-entry
guarantee.

- Iterators

Every journal object is "iterable", and these iterators are defined in
iterators.h and iterators.cc. This iteration logic is kept out of the
basic journal objects themselves for the sake of modularity.

- Comparators

Another abstraction isolated to its own layer, this class encapsulating the
comparison of journal objects, based on whatever value expression the user
passed to --sort.

- Temporaries

Many reports bring pseudo-journal objects into existence, like postings
which report totals in a "<Total>" account. These objects are created and
managed by a temporaries_t object, which gets used in many places by the
reporting filters.

- Option handling

There is an option handling subsystem used by many of the layers further
down. It makes it relatively easy for me to add new options, and to have
those option settings immediately accessible to value expressions.

- Session objects

Every journal object is owned by a session, with the session providing
support for that object. In GUI terms, this is the Controller object for
the journal Data object, where every document window would be a separate
session. They are all owned by the global scope.

- Report objects

Every time you create report output, a report object is created to
determine what you want to see. In the Ledger REPL, a new report object is
created every time a command is executed. In CLI mode, only one report
object ever comes into being, as Ledger immediately exits after displaying
the results.

- Reporting filters

The way Ledger generates data is this: it asks the session for the current
journal, and then creates an iterator applied to that journal. The kind of
iterator depends on the type of report.

This iterator is then walked, and every object yielded from the iterator is
passed to an "item handler", whose type is directly related to the type of
the iterator.

There are many, many item handlers, which can be chained together. Each
one receives an item (post, account, xact, etc.), performs some action on
it, and then passes it down to the next handler in the chain. There are
filters which compute the running totals; that queue and sort all the input
items before playing them back out in a new order; that filter out items
which fail to match a predicate, etc. Almost every reporting feature in
Ledger is related to one or more filters. Looking at filters.h, I see over
25 of them defined currently.

- The filter chain

How filters get wired up, and in what order, is a complex process based on
all the various options specified by the user. This is the job of the
chain logic, found entirely in chain.cc. It took a really long time to get
this logic exactly write, which is why I haven't exposed this layer to the
Python bridge yet.

- Output modules

Although filters are great and all, in the end you want to see stuff. This
is the job of special "leaf" filters call output modules. They are
implemented just like a regular filter, but they don't have a "next" filter
to pass the time on down to. Instead, they are the end of the line and
must do something with the item that results in the user seeing something
on their screen or in a file.

- Select queries

Select queries know a lot about everything, even though they implement
their logic by implementing the user's query in terms of all the other
features thus presented. Select queries have no functionality of their
own, they are simple a shorthand to provide access to much of Ledger's
functionality via a cleaner, more consistent syntax.

- The Global Scope

There is a master object which owns every other objects, and this is
Ledger's global scope. It creates the other objects, provides REPL
behavior for the command-line utility, etc. In GUI terms, this is the
Application object.

- The Main Driver

This creates the global scope object, performs error reporting, and handles
command-line options which must precede even the creation of the global
scope, such as --debug.

And that's Ledger in a nutshell. All the rest are details, such as which
value expressions each journal item exposes, how many filters currently exist,
which options the report and session scopes define, etc.

John

Jim Robinson

unread,
Mar 14, 2012, 7:40:02 PM3/14/12
to ledge...@googlegroups.com
On Tuesday, March 13, 2012 10:45:10 PM UTC-7, John Wiegley wrote:

And that's Ledger in a nutshell.  All the rest are details


I believe the term is "a small matter of programming." :-D
 
Thanks very much for posting this overview, it's very interesting
to read how it is all laid out.

Jim

Alexandre Rademaker

unread,
Mar 14, 2012, 9:43:57 PM3/14/12
to ledge...@googlegroups.com
Hello John,

Many thanks for this email. Does cl-ledger (lisp version) has similar
architecture? What are the differences? I don't know how many of the
ledger's users are programmers but making Ledger's architecture more
transparent will, for sure, help people understand and contribute to
ledger.

Best,

Alexandre Rademaker
http://arademaker.github.com/

John Wiegley

unread,
Mar 14, 2012, 11:17:57 PM3/14/12
to ledge...@googlegroups.com
>>>>> Alexandre Rademaker <arademaker-Re5JQ...@public.gmane.org> writes:

> Many thanks for this email. Does cl-ledger (lisp version) has similar
> architecture? What are the differences? I don't know how many of the
> ledger's users are programmers but making Ledger's architecture more
> transparent will, for sure, help people understand and contribute to ledger.

CL-Ledger is based on the same essential design. On that platform, "report
filters" are SERIES functions, so that all evaluation is performed lazily.
Otherwise, everything else is quite similar.

John

Simon Michael

unread,
Mar 19, 2012, 4:58:06 PM3/19/12
to ledge...@googlegroups.com, hle...@googlegroups.com
That was a great tour of Ledger's architecture, John, thanks for writing it up. It's also a nice guide for other
implementors of Ledger-likes, and for documentors.

The strict testing of layering at link time is pretty neat. hledger's layering emerged as needed to avoid GHC "import
cycle" errors. It's good to see the similarities between my layers and your (lower) layers. Our terminology has also
become pretty consistent. Some time I should do a similar writeup following this format. Your post gives me some nice
ideas and food for thought.

-Simon

Reply all
Reply to author
Forward
0 new messages