# Current Design Choices

6 views

### Mike K

Oct 1, 2008, 4:38:02 PM10/1/08
to vdash
A mistake that often gets made is to design too far into the future as
this can cause project stagnation. A more agile methodology is
required, and vdash is indeed pursuing the correct ones.

In the initial stages there are two main requirements that need to be
satisfied:
1) Data entry must be dead simple
2) Verifying data entered should be fairly easy to integrate.

MediaWiki is a great choice to facilitate #1, and also allows for some
of the interconnectedness which will become necessary in later stages.
The barrier to entry by novices with MediaWiki is very small, both for
data entry and data consumption. Though of course, various fancy
semantic features are sacrificed, with the amount of data foreseen to
be entered within the near future, that's not too much of a loss.

MediaWiki has a pretty simple design, and thus integrating
verification via Isabelle will be about as easy as any other interface
layer can provide.

The next stage of development will have a different set of
requirements. A strong interconnectedness of the data will be a major
component of that. The wide use of MediaWiki will be a boon here.
There are many interfaces for connection to MediaWiki db data, and
some converters that can work with that data. Extending will be
possible and from there building the data into a Linked Data web will
be an incremental process.

Overall, an agile approach such as this attempts to minimize energy
costs at each stage. And any potential future overhead from less than
theoretically perfect avenues taken will be amortized over the course
of development. Maximum level of contribution of both data and code is
paramount to the success of vdash. Otherwise, as with many academic
software projects (open source or not), it will remain a very small
and mostly closed community.

Mike.

### Hendrik Boom

Oct 6, 2008, 1:37:11 PM10/6/08
to vdash

On Oct 1, 4:38 pm, Mike K <kats...@gmail.com> wrote:
> A mistake that often gets made is to design too far into the future as
> this can cause project stagnation. A more agile methodology is
> required, and vdash is indeed pursuing the correct ones.
>
> In the initial stages there are two main requirements that need to be
> satisfied:
> 1) Data entry must be dead simple
> 2) Verifying data entered should be fairly easy to integrate.

Speaking of agile approaches, consider the combination of monotone
with ikiwiki.

monotone would seem to meet the requirements for a back-end data
base. Officially, it's a distributed revision control system, and is
much used for computer programming, which probably has similar
characteristics to formalized proof writing.

In addition, it has a mechanism for "certifying" specific revisions.
This could be used for automatically recording the results of
automated proof checking. You can have as many different kinds of
certificates as you like, for example, certification by different
versions of the theorem prover (useful in case a bug is discovered),
certification by different theorem provers, even certification using
different logics.

One common way for programmers to use monotone is to check in a
nonworking revision, then chat over email about technical problems
with it (while anyone can read the revision from the data base), and
then to certify it only after the problems have been resolved and the
code has been debugged. This is an effective way of combining
creative anarchy with rigorous quality control.

Because it is distributed, there need be no central site. There can
be multiple sites with the same data base of theorems -- extremely
useful in case the central site were to go down (for technical or
other reasons) or suffer from load problems. The data base could
easily outlive any single funding source, for example.

Finally, monotone has several web interfaces. The one I suggest here
is ikiwiki. ikiwiki can be configured to use monotone as its backing
store. When so configured, it takes its raw data from the monotone
data base, compiles it into HTML, and caches the HTML for download.
This translation can be tinkered with, enabling a variety of views to
be used.
Initially. we'd want to keep it simple, just translating formalism
into human-readable content. Later we could have it look at proof-
check certificates -- well, who knows what we'd want to do here --
isn't experimentation to find out what's compatible with human beings
part of the point of this project?

Since the underlying monotone data base is distributed, different
sites could even experiment with different presentation and
interaction tools -- even different wikis or nonwikis -- all without
adversely affecting the contents of the data base itself.

-- hendrik

### slawekk

Oct 7, 2008, 7:51:37 PM10/7/08
to vdash
> Speaking of agile approaches, consider the combination of monotone
with ikiwiki.

Here is an example of formalized proof in Isar, Isabelle/ZF:

============================================================================

theorem disjointness_symetric:
shows "A {is disjoint with} B \<longleftrightarrow> B {is disjoint
with} A"
proof -
have "A {is disjoint with} B \<longrightarrow> B {is disjoint with}
A"
proof -
{ assume "A {is disjoint with} B"
then have "A \<inter> B = 0" using AreDisjoint_def by simp
hence "B \<inter> A = 0" by auto
then have "B {is disjoint with} A"
using AreDisjoint_def by simp
} thus ?thesis by simp
qed
moreover have "B {is disjoint with} A \<longrightarrow> A {is
disjoint with} B"
proof -
{ assume "B {is disjoint with} A"
then have "B \<inter> A = 0" using AreDisjoint_def by simp
hence "A \<inter> B = 0" by auto
then have "A {is disjoint with} B"
using AreDisjoint_def by simp
} thus ?thesis by simp
qed
ultimately show ?thesis by blast
qed

====================================================================

It has the essential elements: structured proofs and mathematical
symbols. Let's see how it can be rendered by different wiki engines. I
was to suggest DokuWiki, but was unable to get expandable proofs in
it. Here is the MediaWiki markup:

=================================================================================================

''theorem'' disjointness_symetric:
''shows'' $A \text{ is disjoint with } B \longleftrightarrow B \text{ is disjoint with } A$
{{hidden |ta1=left| proof |
''have'' $A \text{ is disjoint with } B \longrightarrow B \text{ is disjoint with } A$
{{hidden |ta1=left| proof |
{ ''assume'' $A \text{ is disjoint with } B$
''then have'' $A \cap B = 0$ ''using''
AreDisjoint_def by simp
''hence'' $B \cap A = 0$ by auto
''then have'' $B \text{ is disjoint with } A$
''using'' AreDisjoint_def by simp
} ''thus'' thesis by simp
''qed''
}}
''moreover have'' $B \text{ is disjoint with } A \longrightarrow A \text{ is disjoint with } B$
{{hidden |ta1=left| proof |
{ ''assume'' $B \text{ is disjoint with } A$
''then have'' $B \cap A = 0$ using AreDisjoint_def by
simp
''hence'' $A \cap B = 0$ by auto
''then have'' $A \text{ is disjoint with } B$
''using'' AreDisjoint_def by simp
} ''thus'' thesis by simp
''qed''
}}
''ultimately show'' thesis by blast
''qed''
}}

==============================================================================================

You can copy and paste it into Wikipedia sandbox to see the rendering.

### slawekk

Oct 7, 2008, 8:10:42 PM10/7/08
to vdash
Copy and paste does not work too well, some long lines are broken and
some leading spaces removed. I guess this is due to HTML to text
conversion. But you can get the idea.

Slawek

### Cameron Freer

Oct 10, 2008, 11:50:56 AM10/10/08
to vdash

The idea of ikiwiki + monotone is intriguing. The distributed
nature of monotone (and other revision control systems like git and
mercurial) would add robustness, and promote the development of
alternative interfaces, as you suggest. Also, the cryptographic
hashes of monotone brings to mind an infrastructure with
independent trusted Isabelle verification nodes, which
asynchronously inspect the database and certify results. This would
seem to provide a convenient way of decoupling the verification
engine from the database, and likewise the database from the wiki
and other human interfaces. Even if they're all running on the same
server at first, this will help us scale over time.

I'm not sure whether ikiwiki and/or monotone specifically are ideal
choices (now or eventually), but we should definitely look more into
this. In any case, there are broader architectural issues, whatever
wiki and database we choose in the short- & long-term.

1. Several people have been thinking about how Isabelle might
asynchronously process proof documents:
http://homepages.inf.ed.ac.uk/rpollack/mathWiki/slides/Wenzel.pdf
http://www.ags.uni-sb.de/~omega/workshops/UITP08/gast.pdf

2. There are several possible ways of dealing with dependencies and
changes in definitions and theorem statements. Slawekk has proposed
some relatively simple rules:
and I think we'd do well with something like that initially, though
people have thought through more elaborate options (e.g., slides 14-39 of
http://homepages.inf.ed.ac.uk/rpollack/mathWiki/slides/Corbineau.pdf ).

As Mike K pointed out, we should make our initial design decisions
so as to facilitate data entry and verification, even if that means
knowing that we'll have to add more sophisticated layers later.

Thoughts about these two design decisions, or other architectural issues?

Best,
- Cameron

### hen...@topoi.pooq.com

Oct 13, 2008, 4:57:17 AM10/13/08

Shortly after posting here, I encountered a few lines on the
monotone mailing list. They were talking about the history of
monotone, and the reason it is called monotone. It almost seems
prescient looking at these words from vdash perspecitve.

: Feb 09 16:32:22 graydon I initially wrote monotone in order to support
: the use-case of a "monotonically improving branch", where there's a
: view
: of a branch which never gets worse, constructed by selecting only the
: branch certs signed by an automatic tester.
: Feb 09 16:32:41 graydon this way a customer or user who wanted "new,
: but
: never worse" software could view the branch that way
: Feb 09 16:33:22 graydon or alternatively, a developer could configure
: their update rules to exclude stuff other people wrote which isn't
: working yet

The "branch certs" are certificates that a given revision is on a
particular development branch; in this case the development branch that
consists of code that is actually recommended to users.

The "automatic tester" they had in mind is something like a
regression suite, a common tool in software development.
But we would be using a correctness tester, not just a regression
tester!

Along the lines of software development, we could have the following
model. Each mathematician has access to the repository. He uses it to
check out theorems proofs, and theories, and writes new ones, using
locally available software of his choice. Such software could of
course also be part of the repository. When done to his satisfaction,
possibly checked locally with his own version of the erifier,
he checks in his new proofs.

Sometime later, an official repository verifier (which could be
operating on a different copy) certifies the changes or not.
These certificates are then spread far and wide as to the replicates of
the repository.

Where does the wiki fit into this model? Wherever anyone likes. It
could be, initially, as simple as a graphical tool to examine the
repository. It could be a web site that displays it. And it could
actually allow changes to be checked in and propagated.

The minimum required to start this is:
(a) the repository system
(b) a verifier
(c) a tool for applying the verifier to a copy of the repository and
generating certificates.

Displaying everything nicely (as a web site or a wiki) can come later,
but for public-relations purposes, it should come soon. It makes it
possible to have people use the system without having their own
repository.

-- hendrik

> 1. Several people have been thinking about how Isabelle might
> asynchronously process proof documents:
> http://homepages.inf.ed.ac.uk/rpollack/mathWiki/slides/Wenzel.pdf
> http://www.ags.uni-sb.de/~omega/workshops/UITP08/gast.pdf
>
> 2. There are several possible ways of dealing with dependencies and
> changes in definitions and theorem statements. Slawekk has proposed
> some relatively simple rules:
> and I think we'd do well with something like that initially, though
> people have thought through more elaborate options (e.g., slides 14-39 of
> http://homepages.inf.ed.ac.uk/rpollack/mathWiki/slides/Corbineau.pdf ).
>
> As Mike K pointed out, we should make our initial design decisions
> so as to facilitate data entry and verification, even if that means
> knowing that we'll have to add more sophisticated layers later.

like the distributed repository and a verifier? The web site/wiki could
be one of the first later layers.

>
> Thoughts about these two design decisions, or other architectural issues?

There's one theoretical problem with the use of monotone, or any
distributed revision control system. They use hash codes to identify
entities. The hash codes are cryptographically secure, and it is
unlikely that two theorems or documents or programs will ever have
the same hash codes by accident, but the possibility is there.

I suggested to the monotone developers that they provide a more
explicitly managed naming system for all the entities thay are
managing. They refused, on grounds of reliability. It turns out that
systems systematically organised to provide unique names fail in
practice because of human or other mismanagement.

For example, there have been duplicate IP numbers and duplicate domain
names on the internet, and there have been ethernet devices with

All of these have happened with far greater frequency than duplicate
64-but cryptographic hash codes.

There should be no practical problem using hash codes that
are long enough that collision would be unlikely from now until the sun
goes nova.

-- hendrik

### hen...@topoi.pooq.com

Oct 13, 2008, 5:12:55 AM10/13/08
On Mon, Oct 13, 2008 at 04:57:17AM -0400, hen...@topoi.pooq.com wrote:
>
>
> All of these have happened with far greater frequency than duplicate
> 64-but cryptographic hash codes.

And far lower frequency than my typing errors :-( I meant, of course,

64-bit cryptographic hash codes.

-- hendrik

### slawekk

Oct 14, 2008, 8:09:52 PM10/14/08
to vdash
One thing that would be useful to have from start is that the
connection between the wiki and Isabelle should be over the network.
Something in the spirit of [Isabelle server](http://
www.csse.monash.edu.au/hons/projects/2003/Mostyn.Bramley-Moore/thesis_html/node33.html)
The reason this is important is that probably there has to be an
instance of Isabelle running for every editing user. Running Isabelle
is quite resource intensive, even a strong machine will noticeably
slow down with 2-3 instances of Isabelle running. With a network-based
protocol those instances of Isabelle can run anywhere on the local
network.

### Michael Katsevman

Oct 14, 2008, 8:26:38 PM10/14/08
Network operation is indeed optimal, since it would allow using "the
cloud". Donations (or funding) could fund more EC2 instances running
Isabelle. Jobs can be queued using SQS.

Mike.

### Cameron Freer

Oct 14, 2008, 9:17:05 PM10/14/08
to vdash

This makes a lot of sense. In principle the wiki (or other UI),
database (including a possible revision control system), and
theorem prover (Isabelle) could all be on different machines, but
it's the latter component that will require the most computing power.

Does it make sense to do this over PGIP (like the Isabelle server
to which you linked), or something more elaborate like Gast's
Asynchronous Proof Processing (APP) proposal?:
http://www.ags.uni-sb.de/~omega/workshops/UITP08/gast.pdf

It might be best to do something much simpler, since revision
control could keep track of the versions, and in any case we need
much less fine-grained interactivity than the typical uses for
either PGIP or APP. At first, we can probably give each revision a
unique ID, and just have Isabelle process each document once,
shortly after it first enters the database. (As Hendrik suggests
with his comments on Monotone, maybe this is the right way to do
things indefinitely also.)

What fields in the database should the Isabelle server update?
We don't need any real interactivity, but we should try to harvest
as much data from the Isabelle run-through as we can. In addition
to updating certain flags, it could tell Monotone to add a given
entry to the current version of "all verified truths", but
(especially for documents that don't parse correctly) it would be
useful for the wiki (and future more elaborate UIs) to have access
to a log of detailed error messages and other Isabelle output as well.

- Cameron

### Gary Boswell

Oct 15, 2008, 8:46:16 AM10/15/08
You are right about designing too far into the future. Yet it is very
important to write down in a rank order, each of the design goals that
must be met. The design goals must be complete and ambitious. The
reason for the rank order is that as the actual design proceeds and it
becomes clear that all the goals cannot be met you already have a
clear indication of which goals must be given up first. This decision
was made at an early stage in the project when it could be made with
more logic and less heat.

Gary Boswell

### hen...@topoi.pooq.com

Oct 15, 2008, 8:01:48 PM10/15/08
On Tue, Oct 14, 2008 at 09:17:05PM -0400, Cameron Freer wrote:
>
> What fields in the database should the Isabelle server update?

If you use monotone, you don;t change the verified document if Isabelle
approves it; you add a certificate to the data base. The certificate
will refer to the revision it certifies. It should probably
also identify the exact copy and version of Isabelle used for the
verification, just in case.

If Isabelle disapproves, it's not clear what you want to put in the data
base. Are the failure reports from Isabelle detailed and bulky?
Are they worth distributing to all replications of the repository
forever? I suspect they will only be only of transient interest.
Oerhaps they are best placed on a suitable mailing list, and/or sent to
the mathematicion that contributed the alleged piece of proof.

I was originally imagining that the originating mathematician would be
able to use his own version of Isabelle, in which case less of this
would be necessary.

-- hendrik

### Cameron Freer

Oct 15, 2008, 10:03:49 AM10/15/08

Right, in the case of success, there's not much to add other than
the certificate. Maybe a list of dependencies (gathered by the
Isabelle server from the database, before parsing with Isabelle
proper) would be useful.

In the case of failure, I was thinking that the error from Isabelle
(suitably parsed) could be usefully displayed in the wiki. Some
people will want to write polished documents on their own local copy
directly to the wiki (something I think we should encourage), a
brief indication of where Isabelle failed could help those working
on future revisions.

- Cameron

### hen...@topoi.pooq.com

Oct 15, 2008, 9:07:40 PM10/15/08

It would probably save effort if those submitting anything were
required to make it explicit if they thought it was ready for
formal checking.

>
> - Cameron
>
> >

### Slawomir Kolodynski

Oct 15, 2008, 7:00:10 PM10/15/08

> Does it make sense to do this over PGIP (like the Isabelle
> server
> to which you linked), or something more elaborate like
> Gast's
> Asynchronous Proof Processing (APP) proposal?:

Definitely not anything more elaborate. PGIP is the right way. I would not do it though, because it is still too much work. It was designed to do much more than we need. I would do something much simpler and ad-hoc, to get things going.

I think the right moment to do verification would be be when the user hits the "Preview" button. What he sees after that should give him an opportunity for corrections. I will contribute stuff to wiki that will be verified, but we can not make that a requirement. We should not require to install anything. We don't need an elaborate GUI, but some level of interaction is necessary to call the thing a wiki.

Slawekk