今年三月,Mad Penguin訪問了Josh
Berkus,PostgreSQL核心組成員(主管
公關),下面是訪談錄,我不定期翻譯一些,相信大家感興趣:
下面的 MP 標識 Mad Penguin,Josh Berkus
的縮寫用中文拼音看起來有些不
雅,特別是對於泡慣了BBS的,我就用「喬」代替了:
MP:今天是週一,三月。。這個七號,謝謝你,喬什。我們正在採訪喬什,
PostgreSQL 項目的市場領導人。喬,你還有別的啥頭銜?
喬:嗯,實際上我的頭銜是核心組成員,我碰巧又是
PR (公共關係,公
關)的組領導。不過那是在我當選核心組成員之後才發生的。
MP:讓我們簡單回顧一下 PostgreSQL 8.0
的成就,也就是剛剛發佈的這個
版本。
喬:好滴,新聞界報導最多的特性,當然是本機的
Windows 移植了。這是
第一個可以本機運行於 Windows 的
PostgreSQL,並且還保留了一定的可以
和 Linux 或者 BSD
上相比較之性能的版本。這些是我們在所有新聞上都
看得見的東西,並且也是讓我們增加了大概十萬新用戶的東西。除了這些
之外,還有一些其它的特性,填補了那些大型企業用戶所需要特性的空白,
包括:即時恢復,也就是常說的連續備份;嵌套事務或者說SQL標準術語
裡面的「Savepoint」;表空間,是使用更多磁盤的一種方法;以及一些
內存和I/O改進,主要是為了讓大型的多處理器機器上跑得更好。
MP:我們早些時候曾經說,這個版本代表著 PostgreSQL
跨越的第三座山
峰,這個版本如何代表著 PostgreSQL
歷史上的一個里程碑?
喬:說起三座大山,我已經幹過好幾個不同的開源軟件項目,而在我看來,
在你能號稱「大項目」之前,必須跨越三座大山。第一座是從個開發人員
擴展到多個開發人員。第二座是在你從開源社區的用戶中獲取動力,在你
發現你從未謀面的用戶開始在郵遞列表上說:「嘿,哥們兒,我在用你的
軟件,我能幫你幹點啥麼?」這個階段會推動你繼續進步一段時間,然後
如果你繼續在這個方向上進步,那麼就會碰到第三座大山。這個時候蹦出來
的就不僅僅是個人了,而是一些公司開始說了:「嘿,你們的項目還挺酷
的,我們在用這個軟件,我們想公開參加這個軟件,而且我們的PR也覺得
這麼做挺好。」
這就是所謂的第三座大山,而我們【PostgreSQL】已經在去年看到了它的發
生。去年裡,不僅僅是那些常年支持我們的公司,比如
SRA,
PostgreSQL Inc. 和 Command Prompt, Credative 和 TDMSoft
以及其它幾
個在活躍,而且還包括
Pervasive,一個正開源的專有數據庫提供商;以及
富士通,年營業額430億美刀的日本大型公司,還有其它幾個還沒公開的
公司。
MP:讓我們聊聊富士通。他們做了相當令人驚訝的承諾。你能談談這個嗎?
喬:基本上是這樣的,他們的開源業務應用部的頭,中澤孝之(Takayuki
Nakazawa)
(哪位幫我譯成漢語)先生說,「我們承諾將幫助
PostgreSQL 成為領先
的數據庫管理系統。」當然,我們作為 PostgreSQL
項目本身,自然是非常
高興有這樣大型的,營業額比yahoo(譯註:這句話是我在mailing
list上
看過自己添油加醋編的
,實際上翻譯成「上億營業額的」)還大的公司
的投票支持,因為它表現了對我們的信心,而不僅僅是因為富士通貢獻了
一些代碼和特性,實際上,富士通貢獻了不少代碼和特性,比如表空間,
嵌套事務等等。
另外,在一個開源軟件逐漸向商業市場擴展的過程中,經常會碰到的一個
挑戰是傳統。簡單說,用戶在選用新軟件的時候,特別是為企業的數據中心
這麼關鍵的業務選用軟件的時候,沒人會願意當冤大頭。他們通常都有
「別人也在用」這樣的從眾心態,這樣他們才覺得安心,並且這樣的「別的
用戶」越大越好。所以,如果有了來自富士通,Pervasive以及其它一些
新成立的公司的贊助,就會實實在在給公司裡的 IT
主管以信心,這樣他們
就可以在他們老闆面前說:「嗨,我找著了一些給我們的項目使用的東西:
PostgreSQL。」
MP:你認為富士通能從 PostgreSQL 8.0 獲得多大的好處?
喬:美國的人們可能並不熟悉富士通,但是在日本,富士通即使不是最大的
軟件商,也是主要的軟件商之一,但是它沒有自己的大型數據庫。不過它
倒是有很多非常有用的數據庫工具,因此把一種開源數據庫集成到他們的
產品線裡,這樣他們可以附加他們自己的數據庫工具,而不是從其它商業
公司獲取數據庫使用證的做法是他們的自然選擇。這樣,如果他們完成得
好,那麼對他們而言就是掌握了和
IBM、微軟等競爭的手段。
MP:讓我們聊聊開源社區堆 PostgreSQL 8.0
的普遍反應吧。 8.0 發佈多
久了?現在下載的數目大概是多少?
喬:PostgreSQL
正式發佈是在一月17號,不過之前它經歷了漫長的 6
個月
的beta。修補在 Windows
平台上的所有毛病實在是花了我們前所未有的時
間。不過自從它發佈以來,我們在我們的主 FTP 站和
BitTorrent 站上,
已經有將近20萬份下載。並且這些地方並非唯一可以獲得
PostgreSQL 的
地方。這些下載可能代表了我們最近的新用戶中的一半,當然,我們還是
從主要的 Linux
版本中獲得我們大多數的用戶,這些用戶我們沒有一個數
量的概念,今後兩年我們會看到他們中的用戶的使用情況。
所以,總體來說,反應是非常劇烈滴,鼓舞我的一件事情是
Windows 版本
已經導致了十萬的下載量和新用戶,這些用戶都是那些原來因為周圍沒有
Linux 或者 FreeBSD 或者 Solaris 或者其它類 Unix
系統專家,而不敢
使用 PostgreSQL
的用戶。Windows版本會繼續幫助增長我們的社區。
MP:這次發佈對於群星璀璨的整個開源「天空」中那顆叫的
PostgreSQL 的
星星有啥意義呢?
喬:我不認為有人會懷疑我們處在那些「大名鼎鼎」的項目領域中的某個
地方。新聞界和贊助 PostgreSQL
的公司肯定是這麼認為的。從曝光率的
角度將,你的 A 類項目很可能是 Linux 和
Apache。(笑)然後你就看到
第二層,這一層包括全部的應用。我們要麼在第二層,要麼在第三層。當
然,我們有六位數的用戶;我們有好幾百的代碼貢獻者;並且,不管什麼
時候我和大公司談數據庫,總會發現在部門裡的某個地方,哪怕不是整個
公司的範圍也好,總會有已經在使用的 PostgreSQL。
MP:是什麼吸引開發人員參加這個項目?
喬:很多特性是我們一直以來都有的東西。比如,PostgreSQL
是一種完全
由社區擁有的項目。我們不是公司,我們甚至沒有控制代碼的基金會。我們
由一個用於贊助的基金會,但是並不控制代碼的貢獻。因此,開放不僅僅是
我們設計的一部分,而且是我們文化的一部分。這裡頭的觀念是
PostgreSQL
就是放在這給你編的。如果你需要某些大多數用戶不需要,或
者不使用的東西,而這些東西是你的項目或者你的商業目標所特別需要的,
那麼你就可以捋捋袖子編碼了。這樣的文化與商業版權結合起來,你就可以
寫你的代碼,然後將之商業化。你甚至可以在商業版權下發佈它。這可是
100% 的自由,這就是吸引人們到 PostgreSQL
上來的主要的東西。
當然,還有除了版權之外的其它特性。比如,非常清晰的代碼風格,以及
非常容易使用的代碼基礎,讓我們很容易編寫擴展。我們有一種可插拔的
結構,讓我們可以很容易給數據庫寫自己的擴展。
我們的通用結構設計中有一部分吸引了來自所有『編程』語言的開發人員,
和任何我知道的其它數據庫系統不同,PostgreSQL 支持 11
到 12 種不同
的語言用來書寫存儲過程。所以不管你選擇哪種編程語言,Perl,PHP,
Java,不管是啥,你可能都可以用它寫存儲過程。這樣就向開發人員打開了
數據庫編程的大門,如果不是支持這麼多語言,這些開發人員很可能不會
涉足數據庫編程。
對於現有的用戶,我們增加了一些東西,讓他們繼續樂於使用。比如,
克利斯多夫·欽斯-萊(Christopher
Kings-Lynne)完全重寫了備份和
恢復的那些軟件,消除了備份恢復中的大量讓人討厭的小毛病。PL/Perl
服務器端腳本語言(允許你在數據庫裡寫 Perl
腳本的語言)也經歷了一次
大改造,這點也可能對 Perl 用戶非常有吸引力。
這個版本裡的其它關注於現有用戶和 DBA
的特性包括大大擴展了的日誌
選項以及填充了幾個我們在數據庫設計時 SQL
支持的空白,這些方面主要
是和管理數據庫權限以及數據庫對象的特性相關的。
MP:在未來 12 到 24 個月裡,PostgreSQL 將奔向何方?
喬:我們最近幾年差不多都是一年發佈一個新版本,並且我看不出改變這個
習慣的原因。這是在開發人員和用戶之間的平衡--通常開發人員希望快速
發佈,大概每六個月一次,而用戶通常認為應該甚重,實際上更希望18個月
發佈一次。
因為我們是完全由社區組織的項目,所以,如果你想開發啥東西,並且自己
有資源開發,那麼你只要蹦到黑客列表裡說「嘿,兄弟們,我想開發一樣
東西,我想這麼這麼做...」然後人們就會分析你的主意,幫你挑刺,然後
提出一些通常的適合 PostgreSQL
的建議。然後你做就是了!你用不著考慮
什麼預先定義好的市場目標。
另外一方面,這也意味著 8.1
版本沒有頭銜。我們對這個版本沒有明確的
目標。舉個例子,現在人們在幹的事情之一是 SQL
標準兼容的存儲過程。
我們現在已經有了存儲過程,但是它們不兼容標準的語法。我們也在做位圖
索引,它對那些 Oracle
用戶而言是一個大特性;還有,大大提高其它形式
的高級索引的性能;兩階段提交是另外一個分佈應用用戶需要的大特性;
把自動清理(一個管理員的維護工作)集成到後端,這樣就不再需要一個獨
立的進程。
我想回答的另外一個問題是複製,因為我總是聽到這樣的問題:和其它數據
庫系統不一樣的是,對於 PostgreSQL
而言,複製是一個附加項。它是一個
獨立的應用。這並非開發事故。我們是有意為之的。
這麼做的原因有好幾個。其中之一是複製實際上不是單個特性。它是一個
四到五個不同的,相關的實現之集合,每個實現滿足某種不同的需要。結
果是,我們不想在主數據庫上綁定某種特定的複製,因為它不可能適合所有
用戶。我們現在最好的複製項目(從項目流行程度角度來說)是Slony-I,
由 金·維克(Jan
Wieck)領導,他也是核心組成員。該項目實際上非常流
行,幾乎是所有主從高可用性複製系統中最好的。金現在正在搞Slony-II,
這個將是用於數據庫集群的同步多主機複製軟件。根據他以前的工作的速度
來看,我估計一年左右時間可以獲得這玩頁兒。不過別指望在
PostgreSQL
主板本的發布信息裡看到這些信息,因為他們就是一個獨立的、平行的項目。
MP:您也是 OpenOffice.org
的市場領導,它也是一個大型跨平台項目。
PostgreSQL 也是一個跨平台的項目。您從 OpenOffice.org
項目有哪些
經驗也同樣適合 PostgreSQL 項目?
喬:我想大多數開源軟件已經學習到的,而那些還沒學習到的開源軟件應該
學習的東西,是一個簡單的事實:成百萬的人們都使用
Windows,並且有成
百萬的用戶只用
Windows。如果你在這個平台上沒有一直,那麼你就拒絕了
這些人參與你的項目。
這對我們很多人來說是一個殘酷的現實。我本人並沒有任何Windows
機器。
這是一個全 Linux
的辦公室。但是這不意味著我沒有認識到不僅僅有那些
只使用 windows
的個人用戶,而且還有很多公司裡有標準的,比如說
Windows 2000
的服務器環境,並且他們並沒有異構的環境可用。還有就
是,很多開發人員在跑 Windows XP
的筆記本上幹活兒--儘管他們的最終
應用將在 Linux 或者 Solaris
服務器上跑。因此,結論就是,如果你真的
希望你的開源項目成長並起飛,並且接觸到百萬以上的用戶--如果這些對
你的項目之目的而言都是合適的,那麼你就需要有個
Windows 移植,這樣
那些人們才會下載之,並且在他們已經有的操作系統上嘗試之,不管他們最
後是否使用它。
MP:您還有什麼想補充的話題嗎?
喬:最主要的東西是,PostgreSQL
是一個社區項目。我們總是歡迎新的人
們。如果你對 PostgreSQL
感興趣,你可以下載它,嘗試它,如果你覺得
都挺好,但是有件事有些不爽,那麼就可以跳到郵遞列表裡和人們討論它,
因為不管這事兒是啥,修補它都很有可能比你想得要容易些。
喬:Well, actually my title is Core Team member, I happen to be
sort of the de-facto PR lead as well. But that evolved after I was
elected to the Core Team.
MP: Let's talk real briefly about the highlights of PostgreSQL
8.0, the version that was recently released.
喬:Well the big new feature that's getting the most press is, of
course, our native Windows port. This is the first version of
PostgreSQL that will run natively on Window with something
comparable to the performance that you can expect on Linux or BSD.
That's what's gotten us all the press, and that's what's gotten
us probably somewhere around like 100,000 new users. Other than
that, there's a number of other features that have filled holes
that our big enterprise users want, including: point in time
recovery, which is otherwise known as continuous backup; nested
transactions or 」Savepoints」 in SQL standard terminology; table
spaces, which is a way of making use of more disks; and some
memory and I/O improvements, which were intended primarily to help
large multi-processor machines.
MP: We were talking earlier about this version representing a
「third hump」 for PostgreSQL . How does this version represent a
major milestone for PostgreSQL?
喬: To mention the three humps, I've been involved with a number
of open source projects, and as I see it, there are three humps to
get over before you are a 「big project.」 The number one is when
you go from being one developer to being multiple developers.
Number two is when you pick up momentum among open source users,
when people you never met before start jumping on mailing lists
and saying, 「Hey, I've used the software, is there anything can I
do to help you out.」 That gets you a certain distance, and then
when you grow even further in that direction to reach the third
hump, where rather than just individuals, companies start saying
「Hey, you've got a cool project, we use the software, we want to
contribute, we want to be publicly involved with your project,
it's good for our PR, too.」
That's sort of the third hump, and we [PostgreSQL] have seen that
happen in the last year. Over the last year we have seen not just
SRA, PostgreSQL Inc., and Command Prompt, Credativ and TDMSoft and
a few other companies who have supported us for years, but we've
also picked up Pervasive, a proprietary database vendor that's
going open source; and Fujitsu, the $43 billion Japanese
mega-corporation, and a few others who haven't been public.
MP: Let's talk a bit about Fujitsu. They made a rather stunning
statement of commitment. Can you tell us a bit about it?
喬: Basically, the head of their open source business
applications division, Mr. Takayuki Nakazawa, said, 「We are
committed to helping make PostgreSQL the leading database
management system.」 Of course, we at the PostgreSQL project
really appreciated that vote of confidence from a large,
multi-billion dollar corporation, not just because Fujitsu has
been contributing code and features, which they have, for example,
tablespaces, nested transactions, and others.
In addition, one of the battles that you always face as open
source software growing on the commercial market is legitimacy.
Basically, no business that's adopting a new software,
particularly something as vital as a enterprise data center, wants
to be a maverick. They all want to have the confidence that
someone else is using it, and the bigger the corporation that's
endorsing it, the better. So if you have endorsements from
companies like Fujitsu, Pervasive, in addition to the start-ups,
that really gives IT managers the confidence they need to go to
their bosses and say, 「Hey, I found something to use for our
project: PostgreSQL.」
MP: What do you think was the big draw for Fujitsu for PostgreSQL
8.0?
喬: People in the US are not really familiar with it, but in
Japan, Fujitsu is a major software vendor, if not THE leading
software vendor, but Fujitsu does not themselves have a big
database offering. Yet they do have a lot of very substantial
database tools, so incorporating an open source database into
their repertoire, something they can attach their database tools
to , rather than licensing something from another commercial
company, was a natural move for them. When it comes right down to
it, it is a way for them to compete with IBM and Microsoft.
MP: Let's talk about the general open source community's reaction
to PostgreSQL 8.0. How long has 8.0 been out, and what are the
download numbers looking like?
喬: PostgreSQL came out officially on January 17, but it was in
beta 6 months before that. Fixing all of the bugs in the Windows
platform really took us an inordinate amount of time. But since it
came out, we've had about 200,000 downloads from our primary FTP
and BitTorrent sites. That's not the only place you can get
PostgreSQL. That may represent about half of all of our recent new
users, and of course, we get the majority of our users through
major Linux distributions, for which we don't have numbers and for
which we will be seeing adoption for the next two years.
So, overall, there has been a huge reaction, and one of the things
that I am encouraged by is that the Windows port has resulted in
over 100,000 downloads and new users, potentially, people who
weren't able to use PostgreSQL before because they didn't have
access to experts on Linux or FreeBSD or Solaris or other
Unix-like operating systems. And that's going to continue to help
grow our community.
MP: What's this launch going to mean for PostgreSQL's star in the
constellation of the overall open source 「sky」?
喬: I don't think that anyone questions that we're somewhere in
the realm of the 「big-name」 projects. Certainly both the press
and the companies that are affiliating with the PostgreSQL project
seem to think so. In terms of ratings, your class A projects are
pretty much Linux and Apache. (laughing) And then you have your
second tier, which include a whole host of applications. And we're
either in the second tier or the third tier. Certainly, we have
users into the six figures; we have a couple hundred code
contributors; and at this point, any time I talk to major
corporations about databases, it turns out that somewhere in some
department, if not corporation wide, they are already using
PostgreSQL.
MP: What are the major draws that will attract developers to the
project?
喬: A lot of the features are stuff that we've had for a long
time. For example, PostgreSQL is a completely community owned
project. We're not corporate, we don't even have a foundation that
governs the code. We have a foundation for fundraising, but it
does not govern code contributions. So that openness is not only
part of our design, but part of our culture. The idea being that
PostreSQL is there for you to hack. If you need something that
most users don't need, or don't use, but it's special for your
project or your business goal, you can go ahead and hack it. That,
combined with the business license, let's you go ahead and hack
it, and then commercialize it. You can even release under a
commercial license. It's there to be completely, 100% free, and
that's the main thing that attracts people to PostgreSQL.
And of course, there are other features that compliment the
license. For example, a clear coding style, and a very accessible
code base make it easy to hack. We have a pluggable architecture
that makes it easy to write your own extensions to the database.
There's part of our general architecture design to attract
developers from all [programming] languages, unlike any other
database system that I know of, PostgreSQL supports 11 or 12
different languages for writing stored procedures. So whatever
your chosen programming language is, Perl, PHP, Java, whatever,
you can probably write stored procedures in it. That opens up the
world of database programming to developers who otherwise might
not approach it.
For current users, we have added stuff to keep them on board. For
example, Christopher Kings-Lynne completely overhauled the backup
and restore stuff to eliminate a lot of the annoying issues with
that. The PL/Perl server-side scripting language that allows you
to write Perl scripts inside the database has been vastly
enhanced, which should be a big attraction for Perl users.
Other features in this release intended for existing users and
DBAs include vastly expanded logging options and filling in a few
holes in our SQL support for database design in terms of managing
database permissions and database object characteristics.
MP: Where is PostgreSQL heading over the next 12 to 24 months?
喬: We have been releasing a new version about once per year for
the last several years, and I don't see any reason for that
pattern to change. It's a good compromise between how often the
developers would like to release, which is about once every six
months, and how often our users would like us to release, which is
actually more like once very 18 months.
Because we're a completely community-organized project, if you
want to develop something, and have the resources to develop it,
you just jump on the hackers' list, and say 「hey, this is
something that I want to develop, and this is how I want to do
it.」 People will criticize your ideas, and suggest changes to fit
into PostgreSQL in general. And then you do it! You don't have to
fit some pre-determined marketing goal.
On the other hand, this also means that version 8.1 doesn't have a
title. There's no specific goal for the release. An example of
what people are working on right now is SQL standard compliant
stored procedures. We have procedures now, but they're not
compliant with the standard syntax. We are also working on the
bit-mapped indexes, which is a big feature for our Oracle users
out there; also, vastly improved improved performance on other
forms of advanced indexing; two-phase commit, which is another big
thing for distributed application users; migrating auto-vacuum,
(a maintenance administrator) into the back end, so that it's no
longer a separate process.
One other question that I would like to answer is replication,
because I get this question all the time: unlike some other
database systems within PostgreSQL, replication is an add-in. It's
a separate application. That isn't an accident. It's done on
purpose.
There are several reasons for that. One is that replication is
actually not a single feature. It is a set of four or five
different related implementations, which serve four or five
different needs. As a result, we don't want to bundle one
particular kind of replication with the main database, because
that's not suitable to all users. Our leading replication project,
in terms of popularity, is something called Slony-I, lead by Jan
Wieck, who is also on the Core Team. That has actually been quite
popular as one of the leading master-slave high availability
replication systems of any kind. Jan is currently working on
Slony-II, which will be synchronous multi-master replication for
database server clusters. Based on the pace of his past work, I
would anticipate that it would be available in about a year or so.
But don't look for that information in the main release notes for
PostgreSQL, because it will always be a separate parallel
project.
MP: you were the marketing lead for OpenOffice.org, which is a
huge cross-platform project. Now PostgreSQL is a cross-platform
project, too. What did you learn from the OpenOffice.org project
that will be applicable to the PostgreSQL project?
喬: I think that the thing that lots of open source projects have
learned, and that those that haven't should learn, is the simple
fact that millions of people use Windows, and millions of people
use only Windows. If you don't have a port to that platform, you
have denied them access to your project.
That's a tough thing for lots of us. I personally do not have any
Windows machines. This is an all-Linux office. But that doesn't
mean I don't recognize that not only do individuals only have a
Windows machine to use, but there are companies that have
standardized, say, on a Windows 2000 server environment, and don't
have a heterogeneous environment available. And there are a lot
of developers who do their work on a notebook running Windows XP,
although their final applications will run on Linux or a Solaris
server. So, as a result, if you really want your open source
project to grow and take off, and reach millions of people, if
it's appropriate for what your project does, then you need to have
a Windows port of your project so that those people can download
it and try it out on the hardware and the operating system that
they already have, regardless of what they may use it on later.
MP: Is there anything else you would like to add?
喬: The main thing is that PostgreSQL is a community project. We
always welcome new people. If you're interested in PostgreSQL,
you downloaded it, you tried it, you liked it except for one
thing, then jump on a mailing list and talk to people about it,
because whatever that one thing is, fixing it might be closer than
you think.