Documentation wiki on Github

383 views
Skip to first unread message

Tom Morris

unread,
Oct 29, 2012, 3:15:11 PM10/29/12
to google...@googlegroups.com
I didn't move the whole wiki over from Google Code to Github because I
thought it'd be a good opportunity to review the content and a lot of
it was going to need editing anyway to account for changes hosting,
name changes, etc, but now I'm thinking perhaps I should do a first
pass automatic conversion to cut down on the labor required,
particularly if we change markup languages

On Fri, Oct 26, 2012 at 5:20 AM, Mateja Verlic
<mateja...@zemanta.com> wrote:
> On Friday, October 26, 2012 10:53:41 AM UTC+2, Mateja Verlic wrote:
>>
>> If Edit mode is set to Markdown, there is even more editing, because each
>> segment with a code needs to start with at least 4 spaces. If Edit mode is
>> set to MediaWiki, code fragments display nicely.
>> My suggestion would be to use MediaWiki format for all pages...
>
> Just saw that all other pages were written with Markdown, so I'll change the
> mode of pages I've added...

I think the use of Markdown is more an artifact of it being the Github
default. I don't have a favorite markup language, so I'm happy to use
whatever language folks like best. Anyone want to make an argument
for their favorite language? Against any of the languages supported
by Github: ASCIIDoc, Creole, Markdown, GitHub Flavored Markdown, Org,
Pod, RDoc, ReStructuredText, Textile, ediaWiki.

As far as automatic vs manual conversion goes, the basic flows would be:

Manual - pages are transferred one by one by hand, with markup changes
done by hand. Each page is reviewed and updated as it is transferred,
but references to the old site are left in place until the target of
the reference has been moved. As each page is transferred the
referring page is updated to point to the Github version instead of
the Google Code version. Anything hosted on Github has had at least a
cursory review.

Automatic - Google Code wiki repository converted from SVN to git.
Global edits made for all obvious stuff like name changes, hosting
references. Resulting git wiki repository uploaded to Github. Pages
reviewed and updated as time allows (perhaps we can come up with a
reviewed/not-reviewed tagging). Much less manual work, but no
guarantee that a page that is viewed on Github has necessarily been
reviewed or is correct (although that's generally true of any wiki).

Thoughts? Volunteers to take the automatic conversion task off our hands?

Tom

Martin Magdinier

unread,
Oct 29, 2012, 5:07:29 PM10/29/12
to google...@googlegroups.com
I am supporting Medawiki syntax as it is a common syntax for documentation.

I agree also we should take this opportunity to reorganized the wiki. As said earlier in an other thread I'll do a test on a fork of the current wiki to see how we put things together. By doing this I want to merge existing documentation with my blog content and add links to good tutorial I've archive in my delicious account.  

Then, if it is ok with everyone, we can take from this version and merge my edit in the trunk (I understood that doc in GitHub were manage the same way than code). 

A Mediawiki version of the current documentation it is not mandatory as I'll be doing a manual / rewriting / re ordering for every page. 

Thanks

--
Martin

Tom Morris

unread,
Oct 29, 2012, 6:33:29 PM10/29/12
to google...@googlegroups.com
On Mon, Oct 29, 2012 at 1:07 PM, Martin Magdinier
<martin.m...@gmail.com> wrote:

> I am supporting Medawiki syntax as it is a common syntax for documentation.

Noted. One vote for Mediawiki. Here are a couple of resources for
others who want to evaluate the pros/cons of various languges:

http://en.wikipedia.org/wiki/Lightweight_markup_language#Comparison_of_language_features
http://www.terminally-incoherent.com/blog/2008/06/18/the-problem-with-wikis/

> I agree also we should take this opportunity to reorganized the wiki.

I think what I said was review and update, not reorganize, and I'm
having second thoughts about even lumping together the move with the
review/update pass. In general, I believe smaller, incremental
changes are better.

> A Mediawiki version of the current documentation it is not mandatory as I'll
> be doing a manual / rewriting / re ordering for every page.

That sounds like a lot of work. We certainly appreciate you
volunteering, but I'd suggest waiting for a consensus on the markup
language and general direction before investing so much effort.

Tom

Tim McNamara

unread,
Oct 29, 2012, 7:37:47 PM10/29/12
to google...@googlegroups.com
On 30 October 2012 07:33, Tom Morris <tfmo...@gmail.com> wrote:
On Mon, Oct 29, 2012 at 1:07 PM, Martin Magdinier
<martin.m...@gmail.com> wrote:

> I am supporting Medawiki syntax as it is a common syntax for documentation.

Noted. One vote for Mediawiki.  Here are a couple of resources for
others who want to evaluate the pros/cons of various languges:

I feel that the community shouldn't re-litigate where the documentation should live. The choice was for GitHub, which implies Markdown.

> I agree also we should take this opportunity to reorganized the wiki.

I think what I said was review and update, not reorganize, and I'm
having second thoughts about even lumping together the move with the
review/update pass.  In general, I believe smaller, incremental
changes are better.

My preference is to a simple porting of the content. That will at least allow people to reread material. Anything that seems like it should be touched up or removed can be added as a GitHub issue.
 

> A Mediawiki version of the current documentation it is not mandatory as I'll
> be doing a manual / rewriting / re ordering for every page.

That sounds like a lot of work.  We certainly appreciate you
volunteering, but I'd suggest waiting for a consensus on the markup
language and general direction before investing so much effort.

i agree with Tom here. Being bold is great, but if you want to undertake substantial cull and rewrite of existing material, then you should probably create a an ebook or other manual on OpenRefine. Please don't be so bold as to break away from the community's effort entirely.

Gianni

unread,
Oct 29, 2012, 10:05:03 PM10/29/12
to google...@googlegroups.com
The wiki in github is an openspace where all github users can write a page (a guide, a recipe etc.). The wiki supports many markup language ​​to allow everyone to share their knowledge. 'Lock' the markup is not helpful. The wiki is not the best place for the clean, tidy, beauty, ... docs.
-- Gianni

2012/10/29 Tim McNamara <mcnama...@gmail.com>

Thad Guidry

unread,
Oct 30, 2012, 1:08:22 AM10/30/12
to google...@googlegroups.com
I disagree with other comments and vote for Creole as a standard for our Default English language documentation.  It's 90% close to Google Code wiki syntax, easing the burden, and supports tables and other bits that we have used in our Refine documentation.  I volunteer to continue our wiki documentation and updates as I always have.  Having many wiki syntaxes across the default English language OpenRefine wiki will cease my voluntarism.  ;)
--
-Thad
http://www.freebase.com/view/en/thad_guidry

Peter Ring

unread,
Oct 29, 2012, 11:01:04 PM10/29/12
to google...@googlegroups.com
Wiki markup can indeed be used for clean, tidy, beautiful docs.

IMHO, Markdown is not really suited for anything more complex than
commit messages and the like.
The syntax is a mess with no governance and no clear paths for growth.
Markdown dialects proliferate.

But finally, Markdown is in a state of flux:
* http://www.codinghorror.com/blog/2012/10/the-future-of-markdown.html.
* http://www.w3.org/community/markdown/

Consider reStructuredText and, to a lesser extent, Creole and
ASCIIDoc, which are already pretty mature,

RST is now used for official Python documentation. All of it.
RST can be used in a simple style, like Markdown, or as a lightweight
alternative to DocBook XML markup

Creole is a fusion between common wikitext syntaxes. If you know any
of them, you'll feel at home.

ASCIIDoc is a bit simpler than RST, and has a nice little toolset.

/Peter

Tom Morris

unread,
Oct 30, 2012, 3:57:51 AM10/30/12
to google...@googlegroups.com
I believe that we should settle on a common wiki markup language, but
don't think that it has to be Markdown and don't believe that the
choice of Github as the wiki host implied that Markdown would be the
language. Github's wiki, Gollum [1], supports any format supported by
Github-Markup [2], namely that entire list I posted before and, as far
as I can tell, it supports them equally.

I don't consider Markdown's political squabbles to be a huge negative,
but I don't like it for other reasons, including its lack of table
support. I'm happy to consider Thad's Creole or any of the options
mentioned by Peter.

I don't see compatibility with Google Code to be a strong requirement
unless it's important to our wiki editors. I've got a little Python
script [3] that does GC Wiki to Github Markdown conversion and it's be
pretty easy to modify it for a different target language. We can do a
bulk conversion of the entire wiki without too much effort.

Unfortunately, the style of localization support offered by Google
Code where a page in the language sent by the user's browser is
served, if available, with fallback to a default language, is
apparently not available on Github, so we'd have to cook up our own
solution of parallel sets of pages if we wanted to support
localization.

So far I think we have one vote each for Mediawiki, Markdown, Creole,
and reStructuredText. I'd like to hear some more opinions,
particularly those which include supporting rationales.

Tom

1. https://github.com/github/gollum
2. https://github.com/github/markup
3. https://github.com/tfmorris/googlecode2github/blob/master/wikiconvert.py

Martin Magdinier

unread,
Oct 30, 2012, 1:19:57 PM10/30/12
to google...@googlegroups.com

Following Peter email, I am ok with Creole too as I work with various wiki language (doku and media).
Recipe and tutorial can also be written by non engineer / tech person. The wiki language should be common to not turn away those potential contributors (even if I still think GitHub will turn them away in a first place).

John David Smith

unread,
Nov 7, 2012, 7:52:41 PM11/7/12
to google...@googlegroups.com
Hosting OpenRefine documentation on the Semantic Mediawiki makes a lot of sense to me.  Pros include:
  • Widely used syntax
  • Systematic approach to increasing the value of documentation text
  • Alliance with an existing community that has related goals and values
John

On Tuesday, November 6, 2012 11:53:54 AM UTC-8, Joel Natividad wrote:
Folks,
I am an active member of the Semantic Mediawiki community (semantic-mediawiki.org).  If the community accepts, we'd be more than happy to set it up and host it gratis for the OpenRefine Community.

FYI, Semantic Mediawiki is the sister project of WikiData, and I can see all kinds of use-cases where we can get structured data from the documentation wiki and even Linked Open Data sources to munge on OpenRefine.

We're also cleaning up a lot of NYC data and we're thinking of creating an extension ala the FreeBase extension so people can do reconciliation operations against NYC Linked Open Data.

Best,
Joel

Martin Magdinier

unread,
Nov 7, 2012, 11:44:32 PM11/7/12
to google...@googlegroups.com
Just discover that there is no search function on GitHub wiki. I see that as a major drawback!

Regarding the Semantic Web offer, will it be merged with your current documentation or hosted on totally different wiki. Can we set it on a subdomaine like wiki.openrefine.org?

Thanks

Martin

Joel Natividad

unread,
Nov 8, 2012, 7:17:30 PM11/8/12
to google...@googlegroups.com
Yes Martin.  We were thinking of hosting it in an Amazon AMI gratis for the community.

And to John's points, its a no-brainer for all the reasons he cited, and we can even use the semantic features of SMW to create a gallery of OpenRefine projects, among other things.

Just so the community can check it out, I'll spin up a sandbox SMW and give ya'll admin access over the weekend.

Best,
Joel

Thad Guidry

unread,
Nov 8, 2012, 7:40:58 PM11/8/12
to refine
SMW might be a very good idea, I think as well.  I recall in the past that we wanted a place for users to easily upload and store short recipes and small Refine projects as examples that users could then download and import into Refine.  Of course any storage location could do that for us, including Github, but the more I think about it, the more it makes sense to have an SMW instance to handle this and a lot more for us.  SMW has nice features, like automatically creating lists like this http://semanticweb.org/wiki/Africa , instead of manual entry upkeep.  SMW also supports exporting of formats like CSV, JSON, & RDF already (and there are triplestore connectors as well), which is in line with Refine's import / export capabilities.  Finally, the SMW+ (Halo) extension with its Ontology browser is sooo very very cool http://www.smwplus.net/index.php/HaloExtension_Product_Information

Joel, If the sandbox looks good to me, then I might just change my vote, as long as we have assurances from the SMW community & yourself that they would help out "long term".  That kind of community dedication is what we are trying to drive ;)

John David Smith

unread,
Nov 8, 2012, 10:31:36 PM11/8/12
to google...@googlegroups.com

What I see in the SMW is that it would lower the barrier to entry for publishing recipes and snippets (people worrying that it’s not good enough) while raising the level of utility by providing for after-the-fact linking and classification.

 

John

* John David Smith ~ Voice: 503.963.8229 ~ Skype & Twitter: smithjd http://gplus.to/smithjd

* Portland, Oregon, USA http://www.learningAlliances.net

* "In a world presenting unique challenges and ambiguity,

* play prepares these bears for an evolving planet." -- Stuart Brown

Joel Natividad

unread,
Nov 13, 2012, 11:44:37 PM11/13/12
to openr...@googlegroups.com, google...@googlegroups.com, john....@learningalliances.net
Hi Folks,
Just wanted to give you an update on the SMW+ sandbox instance.

Troubleshooting some issues, but I anticipate the sandbox should be up and running before the end of the week.  Sorry about the delay...

Best,
Joel

Thad Guidry

unread,
Nov 14, 2012, 3:39:23 PM11/14/12
to openr...@googlegroups.com
Thanks Joel.  No problem on the delay.  We are all busy and understand.

Will the sandbox be easily converted to a permanent instance ?  Is that your plan ?


Joel Natividad

unread,
Nov 14, 2012, 7:34:09 PM11/14/12
to openr...@googlegroups.com
Yes Thad,
That is the plan.  It will be a VM hosted on Azure.

There is a way in MW/SMW to reset the wiki.  Will also put in a Project/Recipe Gallery using SMW features (Semantic Forms, Ontology Browser, etc.) so we can play around with that.  SMW+ also has faceted search using Lucene so it should help with finding stuff on the wiki.

With a VM, backups should also be easy as we just take snapshots.  We can even make the image available to the community should they want to mount private OpenRefine SMWs behind their firewall.

Best,
Joel

Martin Magdinier

unread,
Nov 20, 2012, 3:30:07 PM11/20/12
to openr...@googlegroups.com
Hi 

Any progress on where OpenRefine documentation will be hosted? I have been running for work the last few weeks and I'm just catching back with OpenRefine. 

Thanks!

Martin



--
 
 

Joel Natividad

unread,
Nov 26, 2012, 1:52:05 AM11/26/12
to openr...@googlegroups.com
Hi Folks,
Sorry for the delay.  I tried setting up SMW+ for the group, but decided to use the latest SMW instead.  

Thad expressed an interest in using the Halo extension's Ontology Browser, but it had so many dependencies on some extensions that were not current, so I decided it best to use the regular distro of SMW.

In terms of support from the SMW community, its also best to use SMW in its vanilla version as more people are familiar with the codebase.

Anyway, the test wiki is at http://openrefinetest3.cloudapp.net/wiki.  Its an Ubuntu 12.0 LTS VM running in Azure that we can run gratis for the OpenRefine community (including backups, maintenance, access thru SSH for admins, etc.).  It has the latest version of SMW and all relevant extensions (http://openrefinetest3.cloudapp.net/wiki/index.php/Special:Version)

I also went ahead and installed extensions that should help with creating sample data (External DataData Transfer, Google Spreadsheet Widget).  We also installed additional widgets that should help with creating a vibrant community (DISQUS, Google Form, YouTube player, Twitter Search, Slideshare, etc.)

To demonstrate the power of a Semantic Wiki, I also went ahead and created a simple Slideshare Ontology that we can use to crowdsource maintaining a gallery of slideshares in the wiki - http://openrefinetest3.cloudapp.net/wiki/index.php/Presentations

Once the community accepts the wiki, I suggest we promote the VM to production and have it serve up documentation at openrefine.org/wiki

Thanks!
Joel

John David Smith

unread,
Nov 26, 2012, 8:59:43 PM11/26/12
to openr...@googlegroups.com, google...@googlegroups.com
Wow!  Don't know what else to say.

Martin Magdinier

unread,
Dec 2, 2012, 6:19:59 PM12/2/12
to openrefine, google...@googlegroups.com
Hey,

Thanks Joel for the wiki sandbox. I did took some time to play around and features are nice. However core team think that the mix of GitHub wiki + creole language is the safest mix in term of
  1. longevity & administrative burden as we want to focus participant effort to develop doc and code and not performed system admin work
  2. ease of participation, once you have a GitHub account that's easy to participate to the code / doc and issue whereas an other platform will require user to maintain multiple identity (and using a unique system will increase sys admin workload)
  3. expressiveness as Creole is a robust and easy to catch language for anyone used to wikis
Thank you all for your inputs and participation on this topic. We will use GitHub with Creole syntax as the starting point and iterate from there if it doesn't meet our needs.

Currently we need to get things moving as OpenRefine is split between Google Code and GitHub for too long now (more than two months now!) We want to build a strong and easy to engage community to support OpenRefine growth over time. The current situation doesn't help and this GitHub / creole mix is the leanest way out.

Thanks you all for your participation on this topic. 


--
Martin Magdinier




--
 
 

Joel Natividad

unread,
Jan 3, 2013, 7:42:53 PM1/3/13
to openr...@googlegroups.com, google...@googlegroups.com
Hi Martin,

NP.  I just wanted to give back and I thought a data-driven wiki was a great fit for Open Refine.

Hopefully, when the WikiData project is ready and it has integrated some of SMW's features, a complementary site can be developed.

I see Open Refine as a must-have tool in the pragmatic semantics toolbelt and we hope to contribute to the community some SMW-related extensions in the future.

Best,
Joel  

Martin Magdinier

unread,
Jan 4, 2013, 2:33:46 PM1/4/13
to openrefine

Hi Joel

Thanks for coming back on this topic and offering to participate to the dev of semantic extension for OpenRefine. I'm looking forward to see / use them.

Thanks also for the offer regarding to host the documentation throught the WikiData project. Openrefine is a data driven application. However its usage goes beyond semantic application (see usage survey details) and its documentation isn't that data oriented. Main Goal of the wiki is to provide information on how to use the tool through functionality description, some recipes and few tutorials. The need to host data on the wiki is really minimal. I think a data driven wiki might overkill the issue and add extra barriers to people willing to participate (extra login, new platform to get used to ...)

Tom migrated the wiki to GitHub last month. It need to be reviewed and some formatting updated. Feel free to participate. 

Thanks 

Martin 

--
 
 
Reply all
Reply to author
Forward
0 new messages