First steps...

1 view
Skip to first unread message

Elias Bizannes

unread,
Aug 7, 2008, 12:11:49 AM8/7/08
to Silicon Beach Australia - distributed database initiative
Sorry everyone, I've been stupidly busy with work and at a client site
with shoddy internet, so haven't been able to contribute to
discussions as much as I want.

But I would like to guage what are the next steps. Are we happy with
the concept? What are the challenges we need to do? There is already a
diverse group on this list, which is good because we can allocate
tasks according to different strengths.

So my question is....what do we need to do? A list of tasks?

Elias Bizannes

unread,
Aug 7, 2008, 12:19:08 AM8/7/08
to Silicon Beach Australia - distributed database initiative
The data
- How does a user do it? Is it a plugin, a generator, a partner site?
- What data does the user need to contribute? Myles in the other
thread said Geo coordinates - good idea. What else?

The aggregation
- How will the user's data store communciate with the central server?
- How do we store the data? Do we cache it, and spider once a week?

The product
- What can people do when using this data? Ar people generating
reports? Whta information would be of value to the community?

My walkthough of the process
- A user somehow stores this information on a site they designate.
Blog or otherwise
- Every time they update this information, it send a notification to
the SBA.org
- SBA.org caches the information in a database
- a user interface can interact with the database to query it for what
ever information people require


Some issues that need discussion
- What data does a user need to store?
- What information do we hope to extract from the datastore?

Wayne Meissner

unread,
Aug 7, 2008, 6:29:51 AM8/7/08
to SiliconBeachAu...@googlegroups.com
Way too many questions in one email, Elias :)

2008/8/7 Elias Bizannes <elias.b...@gmail.com>:


>
> The data
> - How does a user do it? Is it a plugin, a generator, a partner site?

Which is the simplest of these that will work?

> - What data does the user need to contribute? Myles in the other
> thread said Geo coordinates - good idea. What else?

Geo coordinates would be v.useful. See below.

>
> The aggregation
> - How will the user's data store communciate with the central server?

Would RSS be any good here? I know its polling, but with say less
than 1K users, doing so once a day shouldn't be a problem - and its
possibly simpler to setup than push is.

> - How do we store the data? Do we cache it, and spider once a week?

With a decentralised service, you really need to store/cache locally
to make sure the data is always available.

>
> The product
> - What can people do when using this data? Ar people generating
> reports? Whta information would be of value to the community?

With the geo coords stuff, we should be able to do stuff like:
1) google maps/earth layers
2) iphone app/service that shows what other SBAers are nearby.
3) ????
4) Profit!


>
> My walkthough of the process
> - A user somehow stores this information on a site they designate.
> Blog or otherwise
> - Every time they update this information, it send a notification to
> the SBA.org
> - SBA.org caches the information in a database
> - a user interface can interact with the database to query it for what
> ever information people require

The only extra nicety to the above would be an API that we can use to
query the SBA DB.

Warren Seen

unread,
Aug 7, 2008, 6:33:49 PM8/7/08
to SiliconBeachAu...@googlegroups.com
Hi All.

some good comments Wayne, just adding my bit on how I think it would
work using hResume - bear in mind that i've never actually done this
before, this is just my impression from reading the various specs.

>
> Way too many questions in one email, Elias :)
>
> 2008/8/7 Elias Bizannes <elias.b...@gmail.com>:
>>
>> The data
>> - How does a user do it? Is it a plugin, a generator, a partner site?
>
> Which is the simplest of these that will work?

Leveraging existing hResume sites and plugins for the various blogs
is easiest. But really, anyone with basic HTML chops can put together
a page that complies with the format, or use a generator like http://
hresume.weblogswork.com/hresumecreator/


>
>> - What data does the user need to contribute? Myles in the other
>> thread said Geo coordinates - good idea. What else?
>
> Geo coordinates would be v.useful. See below.

Agreed. hResume includes hCard for your contact details, which in
turn can optionally contain geo coords. It's probably fairly simple
to bodge up City/geo mapping on our side for data sourced from sites
that don't include this. People doing their own page can simply use a
service like addressfix.com to turn their address (or one a
comfortable distance away for the paranoid) into geo format.


>
>>
>> The aggregation
>> - How will the user's data store communciate with the central server?
>
> Would RSS be any good here? I know its polling, but with say less
> than 1K users, doing so once a day shouldn't be a problem - and its
> possibly simpler to setup than push is.

No reason we couldn't embed it into an RSS feed, but I wonder if
there's any real benefit to doing so as it would cause some problems
with data pulled from existing pages. The way I would see it working
is folks simply submit the URL to their hResume for spidering, and it
gets polled at regular intervals.

>
>> - How do we store the data? Do we cache it, and spider once a week?
>
> With a decentralised service, you really need to store/cache locally
> to make sure the data is always available.

Agreed. It's probably worth caching both the raw format (ie anything
within the hResume container div), and processing it to populate an
indexable db which maps to the format, so we're not having to parse
each entry when someone searches by location, skills, etc, but we can
still present the data in its original format.

>
>>
>> The product
>> - What can people do when using this data? Ar people generating
>> reports? Whta information would be of value to the community?
>
> With the geo coords stuff, we should be able to do stuff like:
> 1) google maps/earth layers
> 2) iphone app/service that shows what other SBAers are nearby.
> 3) ????
> 4) Profit!
>

2a) Show you what their house looks like on street view? :-p

>
>>
>> My walkthough of the process
>> - A user somehow stores this information on a site they designate.
>> Blog or otherwise
>> - Every time they update this information, it send a notification to
>> the SBA.org
>> - SBA.org caches the information in a database
>> - a user interface can interact with the database to query it for
>> what
>> ever information people require
>
> The only extra nicety to the above would be an API that we can use to
> query the SBA DB.

As mentioned above, we need to poll I think, there's no easy way to
push a notification from existing services.

+1 for an API.

In terms of implementing this, what have we got to work with in terms
of hosting platform, etc? Any constraints or restrictions?

Cheers,

Warren.

Elias Bizannes

unread,
Aug 7, 2008, 8:00:14 PM8/7/08
to SiliconBeachAu...@googlegroups.com
In terms of implementing this, what have we got to work with in terms
of hosting platform, etc? Any constraints or restrictions?
 
The host I used at the moment is site5.com, but
 
 
 





--
Elias Bizannes
http://liako.biz

Elias Bizannes

unread,
Aug 7, 2008, 8:00:46 PM8/7/08
to SiliconBeachAu...@googlegroups.com
On 8/8/08, Elias Bizannes <elias.b...@gmail.com> wrote:
In terms of implementing this, what have we got to work with in terms
of hosting platform, etc? Any constraints or restrictions?
 
The host I used at the moment is site5.com, but I am happy to bootstrap another service if it better serves our needs
 
 
 

Jason Stirk

unread,
Aug 7, 2008, 10:05:40 PM8/7/08
to SiliconBeachAu...@googlegroups.com
Hi All,

2c below :

2008/8/8 Warren Seen <warre...@gmail.com>

> 2008/8/7 Elias Bizannes <elias.b...@gmail.com>:
>>
>> The data
>> - How does a user do it? Is it a plugin, a generator, a partner site?
>
> Which is the simplest of these that will work?

Leveraging existing hResume sites and plugins for the various blogs
is easiest. But really, anyone with basic HTML chops can put together
a page that complies with the format, or use a generator like http://
hresume.weblogswork.com/hresumecreator/

Agreed. Initially I suspect many of us will go through and mark up our existing pages/resume with the hResume data. Picking something structured means that the tools to handle it all really easily can be built once that's done.

 
>> - What data does the user need to contribute? Myles in the other
>> thread said Geo coordinates - good idea. What else?
>
> Geo coordinates would be v.useful.  See below.

>
Agreed. hResume includes hCard for your contact details, which in
turn can optionally contain geo coords. It's probably fairly simple
to bodge up City/geo mapping on our side for data sourced from sites
that don't include this. People doing their own page can simply use a
service like addressfix.com to turn their address (or one a
comfortable distance away for the paranoid) into geo format.

Agreed.

 
>> The aggregation
>> - How will the user's data store communciate with the central server?
>
> Would RSS be any good here?  I know its polling, but with say less
> than 1K users, doing so once a day shouldn't be a problem - and its
> possibly simpler to setup than push is.

No reason we couldn't embed it into an RSS feed, but I wonder if
there's any real benefit to doing so as it would cause some problems
with data pulled from existing pages. The way I would see it working
is folks simply submit the URL to their hResume for spidering, and it
gets polled at regular intervals.

I'd go for a normal XHTML/HTML page. Means folks can leverage their existing portfolio/bio/blog/resume, and would be simplest to implement.

We can still do clever things like If-Modified-Since and things like that to save bandwidth for the polling, honouring gzip encoding, etc. Whilst we can hammer the pages once a day (or some other sane value) we can still be friendly to their server.

I've done a fair bit of work in fetching and agregating RSS & ATOM feeds under Ruby, including sensibly queuing up when feeds should be checked, etc. Fetching HTML and parsing the microformats would be pretty trivial in comparison.

 
>> - How do we store the data? Do we cache it, and spider once a week?
>
> With a decentralised service, you really need to store/cache locally
> to make sure the data is always available.

Agreed. It's probably worth caching both the raw format (ie anything
within the hResume container div), and processing it to populate an
indexable db which maps to the format, so we're not having to parse
each entry when someone searches by location, skills, etc, but we can
still present the data in its original format.

Agreed - keep the content we fetch (also makes it easier if we find a bug in the parser - just re-parse all the recently fetched content) and parse out all the data into a database. The model wouldn't really need to be _that_ complex.

Perhaps the hardest bit would be normalizing the skills, etc. but that can probably be left as a task for later - I would expect the initial version would just treat these as text tags which would be used to do simple matching. We can define synonyms, related tags, etc. later.

 
>>
>> The product
>> - What can people do when using this data? Ar people generating
>> reports? Whta information would be of value to the community?
>
> With the geo coords stuff, we should be able to do stuff like:
> 1) google maps/earth layers
> 2) iphone app/service that shows what other SBAers are nearby.
> 3) ????
> 4) Profit!

2a) Show you what their house looks like on street view? :-p

The ability for organizations (AWIA, WIPA) to register themselves and post out announcements to nearby folks would also be good.

eg. I'm in Perth, and I want to know about events happening within 100KM of my address. Ok - AWIA has 6 events within 100KM in the next 6 months, CFLUG has 2, etc. Think of last.fm's recommended events if you're familiar with that.

I've also got a few ideas on how to build dynamic "*Planet" agregators using the data. Eg. All the blogs of Ruby developers in WA aggregated. I think that's better as a consumer of the SBDB data, rather than part of the core project though.

 
>>
>> My walkthough of the process
>> - A user somehow stores this information on a site they designate.
>> Blog or otherwise
>> - Every time they update this information, it send a notification to
>> the SBA.org
>> - SBA.org caches the information in a database
>> - a user interface can interact with the database to query it for
>> what
>> ever information people require
>
> The only extra nicety to the above would be an API that we can use to
> query the SBA DB.

Definitely!

 
As mentioned above, we need to poll I think, there's no easy way to
push a notification from existing services.

Push would put too much responsibility on the author. A middle ground might be to allow authors to "ping" the DB for a fetch in it's next cycle (ala. Technorati, Google Sitemaps). Otherwise, we just fetch it at the normal intervals (say, 7 days). It would also be feasible to give users the ability to say how long they want it cached for using either some sort of ad-hoc microformat or other parameter. eg <span id="sbdb-check-every">30 days</span>
 
In terms of implementing this, what have we got to work with in terms
of hosting platform, etc? Any constraints or restrictions?

To get it off the ground I'm happy to host either PHP or Rails. Once it gets bigger I might have to change my mind, but it would probably want it's own server once it's in production anyway.
 
I'm also happy to start taking a stab at the initial coding. I know Myles was interested - any other devs on board?

(I'm getting excited - I think this project really opens the doors for folks to do some awesome things with the data).

All the best,
Jason Stirk
http://griffin.oobleyboo.com/
http://twitter.com/j_stirk

Warren Seen

unread,
Aug 7, 2008, 10:49:09 PM8/7/08
to SiliconBeachAu...@googlegroups.com


To get it off the ground I'm happy to host either PHP or Rails. Once it gets bigger I might have to change my mind, but it would probably want it's own server once it's in production anyway.

Can work with either, but I vote +eleventy million for Ruby and either rails or merb.

 
I'm also happy to start taking a stab at the initial coding. I know Myles was interested - any other devs on board?

I'm interested, though pinched for time at the moment. If we host this on something like github, I can drop in and out as time permits to lend a hand. Are we going to open source this? The value as far as I can see is in how you use the contacts you find, not the code itself. 

Jason Stirk

unread,
Aug 7, 2008, 11:00:14 PM8/7/08
to SiliconBeachAu...@googlegroups.com


2008/8/8 Warren Seen <warre...@gmail.com>

Can work with either, but I vote +eleventy million for Ruby and either rails or merb.

I like your style!
 
I'm interested, though pinched for time at the moment. If we host this on something like github, I can drop in and out as time permits to lend a hand. Are we going to open source this? The value as far as I can see is in how you use the contacts you find, not the code itself. 

I'd agree and say that it makes sense for the DB code to be open for precisely that reason - folks can come in and contribute features as they see fit. We've all got other projects we need to spend time on, and I think it'd be a great way to get the coders on the list working together a bit.

GitHub seems to be flavour of the month at the moment - I'm a little late to jump on the GIT bandwagon myself, but it seems to be a great way to get this sort of distributed development running. I'll defer to someone else who's actually had experience working and developing on GitHub but it sounds like a good plan to me.

Elias Bizannes

unread,
Aug 7, 2008, 11:02:33 PM8/7/08
to Silicon Beach Australia - distributed database initiative
Sweet, thanks. I'm keen to utilise the unique skill set of everyone in
this mailing list. Some good ideas above - let's try to break this
down into a project with unique tasks and areas of responsibility.
Structuring this will help us develop a plan of attack.

Some areas I can think of:
- Use cases: We need use cases of how we intend to use this data, once
we can magically aggregate it. Determining use cases will aid the
design. Question this answers: What can I do with this service?

- Data acquisition: We need to determine and build the way in which
users store their data (ie, a CMS plugin, a generator...and what
exactly do these generators generate...). Question this answer: What
data do we need from people, and how do they create it?

- Data sync: We need to investigate what is the smartest way (as well
as develop) the way we will somehow collect the data from peoples
websites and have it interface with the sba.org server ie, the
discussion above about polling etc. Question this answers: How do we
get this data?

- API: We need to investigate how we can build a programming interface
to manipulate the data from the aggregator, with the view of opening
up to remote queries in the future. Question this answers: How do we
store this data?

- Ontology: We need to determine what the master ontology is, so that
all these skills people list on their resumes, can be globally defined
in a consistent matter.
Question this answers: What skills are there?


Does everyone like this approach...and if so, anything else you can
think of, or can correct me on. If we can do this, then we can split
up into focus areas, and have threads on specific areas.



On Aug 8, 12:05 pm, "Jason Stirk" <jst...@gmail.com> wrote:
> Hi All,
>
> 2c below :
>
> 2008/8/8 Warren Seen <warren.s...@gmail.com>
>
> > > 2008/8/7 Elias Bizannes <elias.bizan...@gmail.com>:

Elias Bizannes

unread,
Aug 7, 2008, 11:12:04 PM8/7/08
to SiliconBeachAu...@googlegroups.com
 
I'm interested, though pinched for time at the moment. If we host this on something like github, I can drop in and out as time permits to lend a hand. Are we going to open source this? The value as far as I can see is in how you use the contacts you find, not the code itself. 
 
Open-sourcing it is a great idea. Yes, the value is in the data collected and what can be interpreted from it. It would be great for other communities to use this for their own use.
 
With the example we set up with siliconbeachaustralia.org, it's likely I will be able to get the initiative global attention with the DataPortability Project as an implementation.

Warren Seen

unread,
Aug 7, 2008, 11:44:53 PM8/7/08
to SiliconBeachAu...@googlegroups.com
Or, we take the agile approach: build the simplest thing that could possibly
work, and iterate from there.

By all means, develop use cases to drive the iterations, and make sure
there's flexibility in the data acquisition method.

Talk however of APIs and ontologies is premature IMHO when we don't know
what's in people's data that's already out there that we can access in
hResume format. It might turn out that it’s more effective to apply natural
language and text clustering techniques than to try and retrofit a master
ontology to existing data. It needs to be a friction-free process for the
user, otherwise it just won't get the takeup needed to be successful -
people should not have to reformat their existing data to suit us, we should
adapt to their data instead.

After all, it's just text, marked up with HTML - it's not as though
spidering, parsing and searching it isn't already a well developed field.
;-)

Anyway, that's just the way I would tackle it. Call me a heretic if you
must...

Jason Stirk

unread,
Aug 8, 2008, 12:52:15 AM8/8/08
to SiliconBeachAu...@googlegroups.com


2008/8/8 Warren Seen <warre...@gmail.com>


Or, we take the agile approach: build the simplest thing that could possibly
work, and iterate from there.

By all means, develop use cases to drive the iterations, and make sure
there's flexibility in the data acquisition method.

I agree with the "simple first - special later" approach. Let's get some data and a prototype and see where that takes us.

That said, I do also see a good point in working out what needs to be done, and splitting it up in some sort of "responsibility".

I say this as there's no point 6 of us writing the spider code, or 6 of us writing the search code. We need to know what the others are working on to some extent.

I also suspect that, as it's not an altogether complex app, any one of us could easily write it all in one foul swoop without much input from the others. Note that I'm talking about the core here - once that's done, and we have some data I suspect that we'll have plenty of tasks to keep everyone interested.

So, I'm trying to work out how to divide the labour in a way that everyone feels involved as much as they want to be.

One thing I'm certain of is that there are going to be a few things that must be done to get up and running :
  • Git repo needs to be set up, and that person's going to have to be "in charge" of pulling from everyone else's repos, and managing that. I'm not familar wit Git, but I'm willing to learn if that needs to be the case. Otherwise, I'm happy to follow someone else if they really want to do it.

  • Initial project structure needs to be set up in whatever language/framework we're using. Ruby seems to be a call so far, but I guess we should wait to see if anyone weighs in before we set that in stone?

  • Start writing some test data and examples of what we're aiming to accept, and how we want that data parsed.

    Here, I'm thinking that each of us on the list goes out and writes up their hResume profile as they would list on SB.org.au. That can be something they author, or a service they already use that produces hResume content. The point here is to get a list of test data that we can work with, test against, etc.
I think from there we have more than enough to get something rolling. I don't think things need to be "designed by committee" at this stage - far more important to get something out, kick the tires and say "man, that code there is UGLY!" and someone sweep in with awesome code to save the day. That's how these things work, isn't it??? :P

 
Talk however of APIs and ontologies is premature IMHO when we don't know
what's in people's data that's already out there that we can access in
hResume format. It might turn out that it's more effective to apply natural
language and text clustering techniques than to try and retrofit a master
ontology to existing data. It needs to be a friction-free process for the
user, otherwise it just won't get the takeup needed to be successful -
people should not have to reformat their existing data to suit us, we should
adapt to their data instead.

Exactly, and that's why I think the test data is going to be the first step. It'll also give a lot of us that helping push into seeing just how feasible the content is to write, and what existing services there are. Lots of us are saying "hResume is the way to go", but how many of us have something marked up with it?

Doing this might even get some folks starting on the other tools such as blog plugins, etc. to produce the content.




 

Wayne Meissner

unread,
Aug 8, 2008, 6:55:51 AM8/8/08
to SiliconBeachAu...@googlegroups.com
2008/8/8 Jason Stirk <jst...@gmail.com>:

> Git repo needs to be set up, and that person's going to have to be "in
> charge" of pulling from everyone else's repos, and managing that. I'm not
> familar wit Git, but I'm willing to learn if that needs to be the case.
> Otherwise, I'm happy to follow someone else if they really want to do it.

You may as well do it. git is not exactly a new-user friendly vcs,
but you do get used to it. And setting stuff up on github is very
easy.

>
> Initial project structure needs to be set up in whatever language/framework
> we're using. Ruby seems to be a call so far, but I guess we should wait to
> see if anyone weighs in before we set that in stone?

I say go for ruby. Not just because I'm biased toward it, but in the
time we'd spend debating the which language, you could have the thing
written ;)


> I think from there we have more than enough to get something rolling. I

Yep, I say start hacking on something.

Elias Bizannes

unread,
Aug 9, 2008, 3:32:24 AM8/9/08
to Silicon Beach Australia - distributed database initiative
I like that plan fellas. A little hesistant with agile because it will
make it more feature driven rather than focussed on the core purpose,
but then again, my job here is to get a result...and if you guys have
a way you want to do this, I'm getting out of the way :)

Disagree about the ontologies, but not important now - we can discuss
at another time.

I think that's a great idea about getting eveyrone on this mailing
list to create their own resume, which will allow us to experiment on
potential. Can everyone commit to that? Either play around with
existing blog plugins, or a HTML page you create - and for the less
technical savvy, a word document so we can at least analyse the data.

I think the key data we need for now are
- name
- location
- skills

So not quite a complete resume...

Mark Neely

unread,
Aug 20, 2008, 10:28:07 AM8/20/08
to SiliconBeachAu...@googlegroups.com
Elias,

Apologies for wading in late, but I was wondering if the group has a
neat/concise statement about the 'problem' the DDI is designed to solve?

I think there are many opportunities to leverage groups
capabilities/resources/IP, but it would be useful to stipulate (pref in less
than 50 words) what it is we're trying to achieve.

Regards,

Mark
-----
Mark Neely
Master Strategist
Infolution Pty Ltd

e: m...@infolution.com.au
m: +61 (0)412 0417 29
skype: mark.neely

Read my blogs --> www.infolution.com.au
www.neelyready.com
Connect on LinkedIn --> www.linkedin.com/in/markneely

-----Original Message-----
From: SiliconBeachAu...@googlegroups.com
[mailto:SiliconBeachAu...@googlegroups.com] On Behalf Of Elias
Bizannes
Sent: Thursday, 7 August 2008 2:12 PM
To: Silicon Beach Australia - distributed database initiative

Reply all
Reply to author
Forward
0 new messages