Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Transaction File System - a replacement of JFS

4 views
Skip to first unread message

Cheen Liao

unread,
Jan 17, 2003, 8:49:28 PM1/17/03
to
Recently there are discussions on JFS on FreeBSD. I think my company's
development plan may meet the demands.

My company is planning to build a Transactional File System (TFS) on
FreeBSD, which has journaling (logging) capability and database capability.
The basic idea is to build a file system on a database engine. When it is
done, it should supersede JFS with its database functionality.

TFS also has some advantages over the traditional UNIX file systems. If we
put all "inodes" in a btree table, it will have faster path lookup by
replacing paired inode-directory block IOs with a btree search for the
inode. If design properly, a btee search takes a little bit more than 1 IOs
on the average, depending on how many internal pages are cached. Also a
small file's contents can be stored in a regular variable length field and
be part of the "inode". This greatly improve performance and space for small
files.

The TFS project is a long term project and now is in the early planning
stage. Here is the rough plan, and no schedule :)

. develop a prototype on FreeBSD 4.x.
. use postgreSQL as the internal database engine.
. define the database schema for
a) storing directories and files "inodes".
b) storing large objects (i.e. storing the block numbers for large files)
. write all VFS functions using postgreSQL lib in user mode.
. write a file system which will "callback (or pop up)" to user mode
functions described above.

At the end of this stage, we will have a running prototype of TFS, and
obviously it has serious performance problems. Also some of the database
functions are not good enough for the file system functions and need to be
strengthened. With the database engine inside we can easily add extended
attributes for each directory or file object and search on them. So in next
stage, we will

. move the core database engine into kernel. And this has to be FreeBSD 5.0
kernel. Because some of the database functions can take a long time to run.
The pre-5.0 kernel process is non-preemptive, the system could hang in
kernel because the long-running functions. Obviously we will need a lot of
helps from the FreeBSD community to make the move smooth. Especially merging
the database buffers with system cache will be a big challenge.

. strengthen the database functions:
a) add new free space management that is suitable for database extension,
so it can run on raw block device.
b) improve btree - store record in btree (clustered Btree), add btree
deletion function.
c) improve large object storage - including clustering policy and
recovery policy.
d) make logging robust. It will handle the "torn write".
e) expose the database functions through new system calls or other
creative methods.

At this stage, we should have all the basic TFS working, and we will need a
lot of fine tunings and tools, such as

. performance tuning - a task that never ends.
. fsck on TFS - in case logs are lost and it will fix TFS to a consistent
state.
. add snapshot capability - it will be a piece of cake with logging
supports.
. add replication by shipping the log to another system and replay the log
there.

By now if you are still reading, then you probably know what we are trying
to achieve. Suggestions and discussions on TFS are extremely welcome. Any
suggestion on how to merge our efforts with BSD community's, if any, and
speed up the development?

Thanks,
Cheen


To Unsubscribe: send mail to majo...@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

Dan Ellard

unread,
Jan 17, 2003, 9:21:44 PM1/17/03
to
On Sat, 18 Jan 2003, Cheen Liao wrote:

> Recently there are discussions on JFS on FreeBSD. I think my company's
> development plan may meet the demands.
>
> My company is planning to build a Transactional File System (TFS) on
> FreeBSD, which has journaling (logging) capability and database capability.
> The basic idea is to build a file system on a database engine. When it is
> done, it should supersede JFS with its database functionality.

> ...

You should get in contact with Lex Stein (st...@eecs.harvard.edu) and
Mike Tucker (mtu...@eecs.harvard.edu). They have built a file system
on top of Berkeley DB, and it's completely transaction-oriented. It's
open source and available to download now. The basic idea sounds like
almost exactly what you're planning to do, except that it's based on
Berkeley DB instead of Postgres, and its interface is a user-level
NFSv3 server instead of VFS. (I don't know whether they've thought
about the niftier features like snapshots/replication, beyond what is
already provided by BDB)

Even if you don't like exactly what they've done, and really want to
use VFS, I think you'll find it much easier to cram BDB into the
kernel than Postgres! If you're determined to stick with Postgres,
however, you should check out Michael Olson's work on the "Inversion"
file system, which used Postgres as the basis for a file system that
did some of the things you are thinking about, circa 1993. (But note
that following in Michael Olson's footsteps will also lead you back to
Berkeley DB...)

-Dan

Cheen Liao

unread,
Jan 17, 2003, 10:09:34 PM1/17/03
to
These are great information. I will check them out. Here let me try some
quick explanation to the rationale behind some decisions:

We choose postgresql is because, postgresql has true BSD license. It does
not matter if it is used for commercial redistribution or not. BDB is not.
Also postgresql has great query supports and migration supports. Users can
migrate their commercial database application over postgresql, or in the
future, TFS.

We choose VFS approach is because there are a lot of functions, from both
open source community and my company, built on VFS layers. Note that it is
more clean to run a database engine in kernel while VFS is just one way to
view the data in the database. Certainly NFS can be another way. I expect
the main challenge of the project is relying in merging the resources
managed by database engine into the kernel. Adding more interfaces to
accessing the data can be done in a later stage.

Again thank you for the information and your interest in the project,

Cheen

0 new messages