Use of Protobuf as a Simple Database

Stephen Miller

unread,

Dec 12, 2008, 6:36:31 PM12/12/08

to Protocol Buffers

First, let me apologize if this is not the forum to speak about design
decisions.

I'm program for a hobby, so I do not consider myself a guru with
respect to making smart design choices. I'm starting a new project.
This project is in C++ and will have lots of persistent data.

It is not transactional in the sense that it needs to maintain
integrity of data. It is time-sensitive in that I want it to run
without hiccup or lag, and it is fine if I lose some data once in
awhile, since I will be making regular backups and rewinding to a
moment in time is alright. For all these reasons I do not think I
*NEED* to use something like a relational database.

I'm concerned about speed, and about memory, since I expect there to
be a lot of data in memory whenever the program is running, which is
all the time.

So, having said all that ...

Is it stupid to use protobuf to store data to files and to use it
basically as database?

If no:

Would it be wise to store most messages in one large file with some
proprietary defined structure to figure out where messages begin and
end and what type they are? I've noticed games typically put all
there data in one large file. Is that because its faster to fseek
than it is to open up a new file? Of course, single player games
don't seem to store changing data in one large file, maybe because
corrupting that file can corrupt all the data?

I recognize that databases give you all of these features, like very
intelligent fast searching of data, the ability to have multiple
things hitting the data at once, transactional updates, etc. But,
would messages saved to hard disk files be faster data lookups AND
updates then a relational SQL?

From my perspective, protobuf is far easier to work with than building
SQL statements all over in your code and figuring out some way to bind
C++ classes with containers, pointers, and other indirection to
relational databases. Plus, it seems more friendly to adding new data
to those classes. Has it been thought that it might be interesting to
have an automatic way of generating SQL statements to create tables to
hold protobuf objects and automatic conversion of protobufs to SQL
updates/inserts? Making it easy to communicate with remote
applications and with binding objects to databases...There are a lot
of issues about that I am not aware of, I'm sure.

And, finally, is it smarter to have a copy of the proto class
encapsulated by a c++ object with direct writes/reads to the proto? As
opposed to copying the data to a class, deleting the proto, and
generating a new one anytime you want to save the object? Here, it
seems that the first approach lends itself to data that changes often
and is saved frequently, and the second to data that is almost never
changing.

With gratitude,
N

Kenton Varda

unread,

Dec 12, 2008, 8:18:44 PM12/12/08

to Stephen Miller, Protocol Buffers

All these questions are hard to answer definitively. It depends on the details of your use case. In many case there are many possible solutions and it's impossible to say which one is "best".

One thing that probably matters a lot is the complexity of the queries you intend to execute on this "database". If you are just looking up entries by key, that's pretty easy to implement, but complex queries involving matching multiple columns -- to say nothing of joins -- will be more difficult, especially if you want them to be efficient. Again, all depends on the use case.

But I don't see anything obviously wrong with your ideas.

Alain M.

unread,

Dec 12, 2008, 8:58:21 PM12/12/08

to ProtBuf List

This topic is very interesting, and I have put a lot of thought in it. Let's share:

I have noticed that in most of my applications, most of the columns in the database are just storage for later reports, nothing happens there. I gess this is what you call persistence.

I was thinking of structuring the tables like this:
1) a few columns have information usefull for index creation and for joins.
2) one column to store a blob, consisting of a protobuf binary, with all the rest of the information.
- probably columns in (1) will be duplicates or will be derived from what is in (2)

This sounds very promissing if there is a server atached to the database and all accesses are done via protobuf. A lot of time can be saved, as protobuf doesn't need to convert to/from binary. Also as protobuff is forward/backward version compatible, database update is eliminated when adding new resources or even optional ones.

Alain

Kenton Varda escreveu:

Reply all

Reply to author

Forward