Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

please comment on technologies

1 view
Skip to first unread message

luc wastiaux

unread,
Apr 26, 2003, 4:06:19 PM4/26/03
to
Hello, I am writing the specifications for a school project (the subject
is free for us to choose), I was thinking about doing a news-server
archive database just like google groups, but on a smaller scale (it would
only archive a limited number of newsgroup, not all of USENET). I got this
idea because my school uses a news server and a lot of valuable
information is lost when old messages are erased from the server's spool.

The project would be developped by a group of 4 to 6 CS students
(including me), over a three months timespan (but not full time, we have
to go to classes and other stuff)

I have a small experience with web stuff, but mostly with PHP, and some
perl. and I'm learning python right now.

I'm (almost) sure we will make use of python as the main language, and use
either Mysql or postgresql for the database (I'm familiar with mysql but
maybe it's too limited for what I want to do?). I just found out about
Spyce today and it looks like I could use it for the web interface.

Then I have this idea about using XSL for templating, the idea is not only
to separate the data from the representation, but also allow programmers
to "leech off" the search engine (it's easier to do this if the output is
XML).

Please comment on everything.. about the technological choices, and if
you want to suggest any features that we should add to the project, please
don't hesitate to tell me about it.

thanks a lot!

--
luc wastiaux
$ finger l...@info.4002.org

Alex Martelli

unread,
Apr 26, 2003, 4:59:19 PM4/26/03
to
luc wastiaux wrote:

> Hello, I am writing the specifications for a school project (the subject
> is free for us to choose), I was thinking about doing a news-server
> archive database just like google groups, but on a smaller scale (it would
> only archive a limited number of newsgroup, not all of USENET). I got this
> idea because my school uses a news server and a lot of valuable
> information is lost when old messages are erased from the server's spool.

Sounds like a cool and feasible idea to me.


> The project would be developped by a group of 4 to 6 CS students
> (including me), over a three months timespan (but not full time, we have
> to go to classes and other stuff)

The most valuable thing you'll learn will be how to structure your
work. I heartily recommend incremental development (most crucial
features first), test-driven design, pair programming, and a healthy
diffidence towards "big design up front" - you may get the temptation
of designing in some big chunks of infrastructure preparing for future
enhancements... that may never come.


> I have a small experience with web stuff, but mostly with PHP, and some
> perl. and I'm learning python right now.
>
> I'm (almost) sure we will make use of python as the main language, and use

Sounds like an excellent idea to me.

> either Mysql or postgresql for the database (I'm familiar with mysql but
> maybe it's too limited for what I want to do?). I just found out about

MySQL has limitations, but it's reasonably fast and simple for use
within its limitations. You might want to program most of your app
to a "database engine independent layer" -- so you could start using
MySQL but still be able to easily switch to e.g. PostgreSQL if you
do turn out to need something that's a problem in MySQL, eg. Views.

> Spyce today and it looks like I could use it for the web interface.

Sure, that's one possibility. Personally I'd choose Webware, which
offers an amount of flexibility I appreciate, but Spyce's OK too.

> Then I have this idea about using XSL for templating, the idea is not only
> to separate the data from the representation, but also allow programmers
> to "leech off" the search engine (it's easier to do this if the output is
> XML).

The ability for other programmers to send "queries" to your search
engine and get XML output is good. However, personally, I would avoid
XSL (unless you guys are already well familiar with it -- or NEED to
get very familiar with it anyway to pursue your studies). It's simpler
to have separate URLs, or a form field requesting XML or HTML output,
and just use different templates with the same contents -- doing your
templating in Cheetah (or preppy/Spyce/whatever).


Alex

luc wastiaux

unread,
Apr 26, 2003, 5:39:45 PM4/26/03
to
In article <HqCqa.8931$3M4.2...@news1.tin.it>, Alex Martelli wrote:
>> The project would be developped by a group of 4 to 6 CS students
>> (including me), over a three months timespan (but not full time, we have
>> to go to classes and other stuff)
>
> The most valuable thing you'll learn will be how to structure your
> work. I heartily recommend incremental development (most crucial
> features first), test-driven design, pair programming, and a healthy
> diffidence towards "big design up front" - you may get the temptation
> of designing in some big chunks of infrastructure preparing for future
> enhancements... that may never come.
>

In short, you recommend following extreme programming guidelines, did I
understand your advice correctly ?


>> Spyce today and it looks like I could use it for the web interface.
>
> Sure, that's one possibility. Personally I'd choose Webware, which
> offers an amount of flexibility I appreciate, but Spyce's OK too.

I will have a look a webware. I was looking forward to using spyce because
it features <?php style tags to embed code into HTML templates, does
webware have that ?
I just know webware from this url:
http://colorstudy.com/docs/shootout.html

>
>> Then I have this idea about using XSL for templating, the idea is not only
>> to separate the data from the representation, but also allow programmers
>> to "leech off" the search engine (it's easier to do this if the output is
>> XML).
>
> The ability for other programmers to send "queries" to your search
> engine and get XML output is good. However, personally, I would avoid
> XSL (unless you guys are already well familiar with it -- or NEED to
> get very familiar with it anyway to pursue your studies). It's simpler
> to have separate URLs, or a form field requesting XML or HTML output,
> and just use different templates with the same contents -- doing your
> templating in Cheetah (or preppy/Spyce/whatever).

Ok, I thought XSL was the definitive answer to all templating needs, but
you don't seem very excited about it ?

thanks a lot for all your comments and advice!

andrew cooke

unread,
Apr 26, 2003, 6:52:14 PM4/26/03
to

luc wastiaux said:
> In article <HqCqa.8931$3M4.2...@news1.tin.it>, Alex Martelli wrote:
>>> The project would be developped by a group of 4 to 6 CS students
>>> (including me), over a three months timespan (but not full time, we
>>> have
>>> to go to classes and other stuff)
>>
>> The most valuable thing you'll learn will be how to structure your
>> work. I heartily recommend incremental development (most crucial
>> features first), test-driven design, pair programming, and a healthy
>> diffidence towards "big design up front" - you may get the temptation
>> of designing in some big chunks of infrastructure preparing for future
>> enhancements... that may never come.
>
> In short, you recommend following extreme programming guidelines, did I
> understand your advice correctly ?

[i'm adding unsolicited comments here!]

there's no one right answer in my opinion - everything depends on
circumstance. if you're a tight, cohesive group that know each other and
have lots of experience then incremental development works (can't comment
on pair programming). a lot of planning might be better if you're new
together, haven't had much experience, or have different opinions on
what's happening(!). most likely you'll find your own compromise -
planning at a high level, followed by implementing the main structures,
followed by iterations of planning and implementation at lower levels,
with some "jumping outside the box" to ensure that you're not duplicating
work as you work down the tree of tasks. alternatively, you can try going
bottom up and try writing "one to throw away" in the first third of your
time available. might work with python, could be risky...

the one thing that does seem to be (almost!) always worth doing is writing
plenty of tests, when it's easy to do so (and consider modifying your
design to make testing easier).

>>> Spyce today and it looks like I could use it for the web interface.
>>
>> Sure, that's one possibility. Personally I'd choose Webware, which
>> offers an amount of flexibility I appreciate, but Spyce's OK too.
>
> I will have a look a webware. I was looking forward to using spyce because
> it features <?php style tags to embed code into HTML templates, does
> webware have that ?
> I just know webware from this url:
> http://colorstudy.com/docs/shootout.html

[...]


>>> Then I have this idea about using XSL for templating, the idea is not
>>> only
>>> to separate the data from the representation, but also allow
>>> programmers
>>> to "leech off" the search engine (it's easier to do this if the output
>>> is
>>> XML).
>>
>> The ability for other programmers to send "queries" to your search
>> engine and get XML output is good. However, personally, I would avoid
>> XSL (unless you guys are already well familiar with it -- or NEED to
>> get very familiar with it anyway to pursue your studies). It's simpler
>> to have separate URLs, or a form field requesting XML or HTML output,
>> and just use different templates with the same contents -- doing your
>> templating in Cheetah (or preppy/Spyce/whatever).
>
> Ok, I thought XSL was the definitive answer to all templating needs, but
> you don't seem very excited about it ?

in my personal experience, xsl is a very neat solution that's very hard to
self in practice. if you're a bunch of smart programmers that are capable
with functional programming then i'd say go with it - you'll save time.
but it's a difficult job getting people used to more traditional
approaches to take it seriously for long enough to get past the learning
curve.

that's from working with java - i don't know what python's xsl support is
like (when i last looked i was less than impressed with python compared to
java in this regared, but i suspect it's improved since then. you should
check this out as there's nothing worse than having to work round bugs in
external libraries that are critical to the project).

have fun,
andrew

--
http://www.acooke.org/andrew

Aahz

unread,
Apr 26, 2003, 9:15:47 PM4/26/03
to
In article <slrnbalpl...@trillian.dont-panic.info>,

luc wastiaux <l...@nospam.com> wrote:
>
>Hello, I am writing the specifications for a school project (the
>subject is free for us to choose), I was thinking about doing a
>news-server archive database just like google groups, but on a smaller
>scale (it would only archive a limited number of newsgroup, not all of
>USENET). I got this idea because my school uses a news server and a lot
>of valuable information is lost when old messages are erased from the
>server's spool.

One problem you *do* want to think about up-front is how you're going to
query the database. Full-text searching can be a daunting task.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"In many ways, it's a dull language, borrowing solid old concepts from
many other languages & styles: boring syntax, unsurprising semantics,
few automatic coercions, etc etc. But that's one of the things I like
about it." --Tim Peters on Python, 16 Sep 93

Alex Martelli

unread,
Apr 27, 2003, 4:27:09 AM4/27/03
to
luc wastiaux wrote:

> In article <HqCqa.8931$3M4.2...@news1.tin.it>, Alex Martelli wrote:
>>> The project would be developped by a group of 4 to 6 CS students
>>> (including me), over a three months timespan (but not full time, we have
>>> to go to classes and other stuff)
>>
>> The most valuable thing you'll learn will be how to structure your
>> work. I heartily recommend incremental development (most crucial
>> features first), test-driven design, pair programming, and a healthy
>> diffidence towards "big design up front" - you may get the temptation
>> of designing in some big chunks of infrastructure preparing for future
>> enhancements... that may never come.
>
> In short, you recommend following extreme programming guidelines, did I
> understand your advice correctly ?

Not necessarily ALL of them. "Customer on site and part of the team" may
be particularly hard here for example -- if your "customer" is a professor
he may not WANT to be part of your team for undestandable reasons. And
while "40 hours weeks" is a wonderful rule, it's hard to stick to when one
is a student in some engineering discipline -- what with the courseload
and all, I remember my time as a student as one of EIGHTY-hours weeks,
month in month out (looking back a quarter century, I can't believe I ALSO
managed to have something of a social life -- guess I must not have slept
for a few years on end;-).

But I'm definitely recommending some variation on *Agile* development
methodologies, and particularly recommending *AGAINST* the "waterfall"
model of development that you may have been taught and, in my opinion
and experience, is a disaster. In "waterfall", you're supposed to do
all analysis first; then, all design; then, all the coding; and then
all the testing. It just plain don't work. It's better to plan for
incremental, iterative development from the start. Do SOME analysis,
identify an absolutely minimal set of features you MUST have at all
costs, i.e., in your case, a set such that you'll fail your course if
you don't have AT LEAST those features. Put all the "oh yeah and it
would be nice to have THIS too" on one side for now -- index cards do
work quite nicely for that (XP uses them, but so do many other approaches,
particularly agile ones).

Having identified an absolutely-minimal, will-die-without-these set of
features, then do SOME design -- as little as you can get away with:
i.e., design the minimal infrastructure you MUST have to implement the
features in the minimal set. This is where some (at least preliminary)
technology choices MUST be made, too; for the purpose you'll probably
need "spikes", where two of you go off on the side for a bit (could be
hours or a couple of days) and try out some technological possibility
to ascertain it's well understood and stable and rich enough for you.

Avoid "big design up front" -- don't make the infrastructure rich and
powerful to support "future features" that may never come to be -- that's
an EASY trap to fall into. What you DON'T want is to have the deadline
come and by that time have an analysis that supports 4 times as many
features as you could sensibly implement in the given time, a halfway-
done infrastructure that might support all of those 4-times-as-many
features plus as many again if they were ever dreamed up, and about 1/10
the working application code and about 1/100 the unit and acceptance
tests you should have. I'm being light-hearted, but this IS what happens
in some software projects, particularly but not exclusively ones where the
participants lack actual in-the-trenches experience.

Divide the work up into SMALL increments, and make sure every one of
you is reasonably familiar with all the subsystems as well as the
overall architecture. Pair programming, with ever-shifting pairs,
helps here. One thing I've found helps things move forwards, even
though XP doesn't support it, is to have each of you take "primary
architectural responsibility" for some of the subsystems and modules
you have identified -- and don't forget to nominate one of you as
"overall architect". This means that, while consensus is seeked, when
consensus cannot be reached the person who wears the architect hat
for the specific subsystem has to make the decision and get the whole
project moving again -- decisions can later be reversed, at the cost
of some rework, but I've found that trying to do EVERYthing by consensus
can slow things down excessively (a variant of "analysis paralysis").

*DON'T SKIMP ON TESTS*. Once you THINK you have a design for some
subsystem, feature, module, class, function -- *WRITE TESTS FOR IT*
before you start coding it. The tests will fail at first -- of course
they must, as the subsystem/etc ain't there yet. But they DRIVE your
coding, acting as "executable docs and specifications" to some extent.

They're better than discursive docs/specs because they're executable --
thus necessarily unambiguous (not necessarily COMPLETE -- so, adding
SOME discursive material in terms of comments and docstrings is good
too -- but don't be surprised if NON-executable stuff fails to get
maintained and updated as your understanding grows... sadly, it almost
inevitably happens all of the time).

So -- I'm not specifically recommending XP for your needs (in some
areas, such as the advice to have "chief architects", my advice runs
counter to XP's) -- but I _AM_ recommending an Agile and deliberately
DESIGNED process for your work, AND avoidance of "waterfall".


>>> Spyce today and it looks like I could use it for the web interface.
>>
>> Sure, that's one possibility. Personally I'd choose Webware, which
>> offers an amount of flexibility I appreciate, but Spyce's OK too.
>
> I will have a look a webware. I was looking forward to using spyce because
> it features <?php style tags to embed code into HTML templates, does
> webware have that ?

Webware does let you embed Python in HTML -- even though more often
than not this is an inferior approach, Webware's designers are well
aware of the unhealthy fascination the approach holds for people
coming from PHP or ASP, and therefore support it. But that's just
part of Webware's flexibility. Servlets and templating are the ways
you should REALLY be using, IMHO. Anyway, Webware supports them
all (AND lets you connect your code to the webserver with commendable
flexibility too -- so you can use one model when you're developing
and unit-testing and another for more extensive tests, for example,
without changing your code, just the "admin" procedures that you
use for starting it up).


>>> Then I have this idea about using XSL for templating, the idea is not
>>> only to separate the data from the representation, but also allow
>>> programmers to "leech off" the search engine (it's easier to do this if
>>> the output is XML).
>>
>> The ability for other programmers to send "queries" to your search
>> engine and get XML output is good. However, personally, I would avoid
>> XSL (unless you guys are already well familiar with it -- or NEED to
>> get very familiar with it anyway to pursue your studies). It's simpler
>> to have separate URLs, or a form field requesting XML or HTML output,
>> and just use different templates with the same contents -- doing your
>> templating in Cheetah (or preppy/Spyce/whatever).
>
> Ok, I thought XSL was the definitive answer to all templating needs, but
> you don't seem very excited about it ?

I guess I shouldn't put my name to this -- it's not a "respectable"
thing to say, and might inveigle me in flamewars I'd rather avoid --
but I think XSL is an approach that makes things harder for you, and
is therefore less pragmatically useful, than ad-hoc templating based
on mostly-procedural approaches. It's worth learning it (particularly
if you have no other experience with non-procedural languages!) because
it expands your mind -- but I personally wouldn't want to have "mastery
of XSL" as a do-or-die prereq for any crucial project I'm responsible
for. Perhaps I'm just not looking at it the right way.

> thanks a lot for all your comments and advice!

You're welcome!


Alex

andrew cooke

unread,
Apr 27, 2003, 8:51:20 AM4/27/03
to

Alex Martelli said:
>> Ok, I thought XSL was the definitive answer to all templating needs, but
>> you don't seem very excited about it ?
>
> I guess I shouldn't put my name to this -- it's not a "respectable"
> thing to say, and might inveigle me in flamewars I'd rather avoid --
[...]

not sure if this is an oblique reference to my comments about it being
difficult to sell XSL in another post, but i wasn't looking for a
flamewar, just relating direct experience. i used XSL in my current
project and cannot get the client to use it. instead they are writing
specific chunks of VB code (with hand rolled XML parsers!) to rearrange
data. they were the people i was thinking of, not you!

if you've never used XSL then another reason to avoid it is that it can be
fiendishly tricky to solve some problems (even if it is turing complete).
it's not a general purpose language by any means. but it does rearrange
xml well (and i did manage to write something that verified the checksum
of chilean ID numbers once ;-)

andrew

--
http://www.acooke.org/andrew

Alan Kennedy

unread,
Apr 27, 2003, 9:50:51 AM4/27/03
to
luc wastiaux wrote:

> Ok, I thought XSL was the definitive answer to all templating needs, but
> you don't seem very excited about it ?

Luc,

I like the sound of your project. I'm sure it's going to be fun and a
good learning experience.

As for XSLT, I fell in love with it when it first appeared in 1998. It
seemed like the perfect solution for my web-site creation needs (which
were relatively simple back then).

I liked it so much because it allowed me, as I thought, to separate
content from format in a clean fashion. And although the syntax can be a
little tiresome sometimes, the functional programming aspect of XSLT can
be picked up. The most important thing to understand about writing XSLT
code is the Xpath data model.

However, over the years I've drifted away from XSLT, for the following
reeasons.

1. Memory hog. Although the XSLT transformation sheets themselves may
not consume too much memory, the documents that they are transforming to
and from can consume *enormous* amounts of memory. The output side can
be dealt with by not building an output tree, but instead outputting to
SAX2 events and then into a character buffer. But to do that with many
existing XLST processors, you're looking at some coding work. And there
is NO WAY to avoid constructing the huge input tree: it MUST be
constructed so that you can run Xpath queries on it.

2. CPU hog. You really have to be very careful what you're doing with
XSLT. When you use an Xpath expression to select nodes in a tree, the
order of (in)efficiency of those Xpath expressions can be much higher
than you think. This is generally not a problem for the people who are
thoroughly familiar with Xpath. But it can be a huge problem for
newbies. Especially when you're using XSLT in a web application and
you're trying to figure out why your HTML pages take 20 seconds to
render. (I'm facing this problem right now. I'm load-testing a web app
that uses JSP and XSLT to generate a series of 17-20 HTML pages to
interact with the user. It all works really well, until you get to the
particular stage when it actually uses XSLT to render a group of HTML
pages. For a single user, it takes 20 seconds, and for 20 simultaneous
users, it takes nearly 5 minutes each user! Completely unacceptable. I
reckon the reason for this is because the development team were told to
use XSLT, without having the relevant training or experience to use it
*properly*).

3. Poor design ingegration. Now that I work on web sites that have to be
*designed*, i.e. using GUI design tools, it is essential that whatever
template language I use be able to integrate with the likes of
DreamWeaver or FrontPage. With XSLT, this can be quite difficult to do.
One possible solution is to auto-generate XSLT templates from the
GUI-created HTML page, using a combination of "tidy"ing the page into
XHTML and using a private (XML) namespace of attributes to tell the
template generator which bits of the GUI-edited HTML it should be
putting into which XSLT templates. And there's still often plenty of
hand-editing to be done after this process is complete.

If you want to use the best and most pythonic markup templating language
in existence (IMHO ;-) look no further than Template Attribute
Lanaguage, aka TAL. This was created by the BDFL (can't get more
pythonic than that), and solves all of the problems mentioned above. TAL
used to be a ZOPE-only thing, but now there are multiple standalone
implementations available. A list of them can be found in this Usenet
posting.

http://tinyurl.com/af8i

As far as I'm concerned, TAL is *the* best templating language, for the
following reasons

1. Efficient: in terms of memory and cpu.
2. Simple: easiest thing in the world for people to understand.
3. Design-integration: As described above.

Just my 0,02 euro.

Best of luck with your project.

regards,

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan

Evan Simpson

unread,
Apr 28, 2003, 11:45:00 AM4/28/03
to
Alan Kennedy wrote:
> If you want to use the best and most pythonic markup templating language
> in existence (IMHO ;-) look no further than Template Attribute
> Lanaguage, aka TAL. This was created by the BDFL (can't get more
> pythonic than that), and solves all of the problems mentioned above. TAL
> used to be a ZOPE-only thing, but now there are multiple standalone
> implementations available. A list of them can be found in this Usenet
> posting.

As Official Cheerleader for ZPT (and by extension, TAL) I must point out
that it was actually created by a team at Zope Corp. that *included*
Guido, and is based on ideas from both Hiperlógica's HiperDOM and
Enhydra's XMLC. A maintained list of implementations (plus pointers to
documentation) can be found at:

http://dev.zope.org/Wikis/DevSite/Projects/ZPT/FrontPage

Cheers,

Evan @ 4-am

Cameron Laird

unread,
Apr 28, 2003, 4:02:04 PM4/28/03
to
In article <slrnbalpl...@trillian.dont-panic.info>,
luc wastiaux <l...@nospam.com> wrote:
>Hello, I am writing the specifications for a school project (the subject
>is free for us to choose), I was thinking about doing a news-server
>archive database just like google groups, but on a smaller scale (it would
>only archive a limited number of newsgroup, not all of USENET). I got this
>idea because my school uses a news server and a lot of valuable
>information is lost when old messages are erased from the server's spool.
>
>The project would be developped by a group of 4 to 6 CS students
>(including me), over a three months timespan (but not full time, we have
>to go to classes and other stuff)
.
.

.
>I'm (almost) sure we will make use of python as the main language, and use
>either Mysql or postgresql for the database (I'm familiar with mysql but
>maybe it's too limited for what I want to do?). I just found out about
.
.
.
I'll say a few things I've read from no one else.

If a commercial client told me that he depended on Usenet, and
it was working well, but "a lot of valuable information is lost
...", my first recommendation would be to tune his Usenet
service. That's a way to re-use a lot of existing infrastruc-
ture. You can set up a special-purpose NetNews server, with
its own local datastore of just the groups you want, and
configure expiration times to "never". That's the most con-
servative approach that comes to my mind, and, in
return-on-investment calculations, I'm a conservative.

Also, while it's fine with me that you practice your Mysql or
PostGresql skills, I don't get the point. You're just storing
messages, right? It's a write-once-read-many-delete-never
model? You can do worse than just dumping stuff into the file
system. Again, look at the existing NetNews server implementa-
tions.

These simplifications free attention to be focused on the
query model, which is likely to be a more interesting problem
than you realize.

For historical interest, I'll note that <URL: http://
phaseit.net/claird/news.lists/newsgroup_archives.html > indexes
a few thousand (well, it used to; maybe a couple hundred are
still live) special-purpose Usenet archives. Perhaps it'll
amuse you to see how others have worked in this area before
you.
--

Cameron Laird <Cam...@Lairds.com>
Business: http://www.Phaseit.net
Personal: http://phaseit.net/claird/home.html

0 new messages