Announce: TOPIC database specification

Skip to first unread message


Feb 25, 2011, 8:30:47 AM2/25/11


This document describes the formal specifications of the TOPIC
database format and serves as its canonical reference.

Designed for humans first and machines second, the TOPIC format
attempts to provide a standardized structure for plain text
databases that's easy to read and edit in most text editors, and
easy to programmatically parse as well [i].

Uses include: knowledge-bases, glossaries, apropos, notes...


- TOPIC databases are OS neutral.

- TOPIC databases are self-indexing.

- TOPIC databases provide associations linking blocks of data.

- TOPIC databases are written and read as standard ASCII [ii], so
virtually any plain text editor is suitable for editing chores.

- TOPIC databases use fundamentally simple markup [iii] employing
only the tab and comma characters to delimit content.

- TOPIC databases allow the end user to label data in a straight
foreword, intuitive manner.


- Tags are always located above the block it describes, alone on a
single line.

- Tags only contain alpha/numeric characters A-Z, a-z, 0-9, and
optionally spaces [HEX:20] [iv].

- A tag can be either a single word, or a group of words.

- Multiple tags are comma delimited [HEX:2C] [iv].


- A block is always located below the tags that describe it.

- A block may contain any number of lines, each beginning with a
horizontal tab character [HEX:09] [iv].

- Empty lines within a block are valid.


- Lines are terminated with one of CR [HEX:0D], LF [HEX:0A], or a
CR/LF pair [iv].

- No limits are imposed on the length of a given line [v].


Using multiple tags establishes associations between otherwise
unrelated blocks. In the example below, the first block has a tag
named 'Apples', the second block has a tag named 'Oranges', and both
blocks have a common tag named 'Fruit' as shown in the next two

Apples, Fruit

Block line 1
Block line 2
Block line n...

Oranges, Fruit

Block line 1
Block line 2
Block line n...

This means you can stream the first block with the 'Apples' tag,
stream the second block with the 'Oranges' tag, or stream both
blocks via the 'Fruit' tag. The advantage gained is that your data
can be filtered in an arbitrary manner. For instance, you could have
twelve blocks, each with differing month tags, and a common year tag
allowing you to scrutinize your data by month as well as year...


- TOPIC databases are parsed line-by-line sequentially from top to
bottom, and left to right.

- Parsing ignores blocks, seeking only tags matching the current
query, and when a match is found, outputs the associated block.

- Because a given tag can define multiple blocks, the data should
be parsed in its entirety 'per query'.


There are no formally sanctioned modifications to the TOPIC database
specification. However, the user is free to extend and alter the
format as best fits the need provided all legalese is observed.


ASCII/hexadecimal equivalents used in this document:

0 1 2 3 4 5 6 7 8 9 A B C D E F

0 ^@ ^A ^B ^C ^D ^E ^F ^G ^H ^I ^J ^K ^L ^M ^N ^O
1 ^P ^Q ^R ^S ^T ^U ^V ^W ^X ^Y ^Z ^[ ^\ ^] ^^ ^_
2 SPC ! " # $ % & ' ( ) * + , - . /
3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4 @ A B C D E F G H I J K L M N O
5 P Q R S T U V W X Y Z [ \ ] ^ _
6 ` a b c d e f g h i j k l m n o
7 p q r s t u v w x y z { | } ~ DEL


Updates, parsing examples, and other resources are located at:


i. Parse: To scan/analyze data looking for a desired pattern.

ii. ASCII: American Standard Code for Information Interchange.

iii. Markup: A system for annotating text.

iv. See topic 'HEX TABLE' for ASCII/hexadecimal equivalents.

v. Caveat: The user should recognize the constraints governing
both the hardware and software rendering the data.


The TOPIC database specification is copyright Topcat Software LLC.
and is absolutely free for anyone to use for any reason in
perpetuity. A single line citation is requested in the form of:

TOPIC database specification by Topcat Software LLC.


Reply all
Reply to author
0 new messages