Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

emded revision control in Python application?

25 views
Skip to first unread message

duncan smith

unread,
Jun 22, 2012, 11:58:23 AM6/22/12
to
Hello,
I have an application that would benefit from collaborative
working. Over time users construct a "data environment" which is a
number of files in JSON format contained in a few directories (in the
future I'll probably place these in a zip so the environment is
contained within a single file). At the moment there is one individual
constructing the data environment, and me occasionally applying
corrections after being e-mailed the files. But in the future there
might be several individuals in various locations.

As a minimum requirement I need to embed some sort of version control,
so that changes committed by one individual will be seen in the local
environments of the others. Some of the work involves editing graphs
which have restrictions on their structure. In this case it would be
useful for edits to be committed / seen in real time. The users will not
be particularly technical, so the version control will have to happen
relatively quietly in the background.

My immediate thoughts are to (somehow) embed Mercurial or Subversion. It
would certainly be useful to be able to revert to a previous version of
the data environment if an individual does something silly. But I'm not
actually convinced that this is the whole solution for collaborative
working. Any advice regarding the embedding of a version control system
or alternative approaches would be appreciated. I haven't tried anything
like this before. The desktop application is written in Python (2.6)
with a wxPython (2.8) GUI. Given the nature of the application / data
the machines involved might be locally networked but without web access
(if this makes a difference). TIA.

Duncan

Emile van Sebille

unread,
Jun 22, 2012, 12:42:25 PM6/22/12
to pytho...@python.org
On 6/22/2012 8:58 AM duncan smith said...
> Hello,
> I have an application that would benefit from collaborative working.
> Over time users construct a "data environment" which is a number of
> files in JSON format contained in a few directories

You don't say what your target platform is, but on linux I've done some
testing with python-fuse that allows interception on file access to take
whatever actions you like, in your case archive prior upon write.

Might be worth a look.

Emile

Prasad, Ramit

unread,
Jun 22, 2012, 12:48:04 PM6/22/12
to pytho...@python.org
Why not just stick the configs (binary blob or JSON string) in something
like a sqlite database and store that database centrally accessible[1]?
Something like subversion might be overkill.

A table like the following would work if you want to track each file
separately:
table_name( revision_number, file/config name, data, username, timestamp ).

Otherwise, if you want to track "environments" and not files:
table_name( revision_number, data, username, timestamp ).

Where revision number can be a sequence used to track / change
current configuration. I recommend storing each file separately in the
database.

[1] http://www.sqlite.org/faq.html#q5


Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423

--

This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.

duncan smith

unread,
Jun 22, 2012, 2:19:57 PM6/22/12
to
I develop on Linux, but most users would be running some flavour of
Windows. Initially I'd like to get something up and running that would
allow me to collaborate from an Ubuntu box at home with someone using a
Windows machine (not sure which version) in an office at the University
of Manchester. The most likely end users would (probably) be running
Windows machines on a local network with no internet access.

I expect it would generally be possible to have an always-on server, but
I'm also thinking about peer to peer style communication (information
might not always be completely available, but it's better than being
totally unaware of changes being made by others).

I don't have much experience getting applications to communicate across
a network, particularly in a reasonably secure fashion. Someone I know
also suggested RabbitMQ. Any pointers that help me to reduce the options
to a manageable number of candidates will be appreciated. A shallow
learning curve would also be good (given that ATM this is an idea I want
to try out rather than paid work). I am looking at fuse at the moment.
Thanks.

Duncan

Emile van Sebille

unread,
Jun 22, 2012, 4:34:15 PM6/22/12
to pytho...@python.org
On 6/22/2012 11:19 AM duncan smith said...
> On 22/06/12 17:42, Emile van Sebille wrote:
>> On 6/22/2012 8:58 AM duncan smith said...
>>> Hello,
>>> I have an application that would benefit from collaborative working.
>>> Over time users construct a "data environment" which is a number of
>>> files in JSON format contained in a few directories

So, will the users modify their local environment and you'd push the
revisions to a known location for redistribution? How might peer-to-peer
work? How would you know which peers get the change, or would all peers
get the change?

I've been working with rpyc (in as much spare time as I can manage) on
some similar sounding issues and am now settling on a central system
which provides convenient administration and potential relaying or
pulling. See http://rpyc.sourceforge.net/

I just tested my in-process development status and find 64 remote
machines up and 5 non-responsive which in my case are likely machines
that are not yet configured properly. As this has been on the back
burner the past two months I'm pleased with how it's fared in the face
of neglect.

At least with rpyc (which does have a low learning curve) you'll be
fully in python.

Emile

duncan smith

unread,
Jun 22, 2012, 9:45:06 PM6/22/12
to
On 22/06/12 21:34, Emile van Sebille wrote:
> On 6/22/2012 11:19 AM duncan smith said...
>> On 22/06/12 17:42, Emile van Sebille wrote:
>>> On 6/22/2012 8:58 AM duncan smith said...
>>>> Hello,
>>>> I have an application that would benefit from collaborative working.
>>>> Over time users construct a "data environment" which is a number of
>>>> files in JSON format contained in a few directories
>
> So, will the users modify their local environment and you'd push the
> revisions to a known location for redistribution?

Yes. My rough idea is that each time a user opens the application it
will connect to a server and download the current data environment (or
preferably just the changes made since the application was last
connected). Thus the user can start with an up to date environment. As
the application is used to make changes to the data environment any
changes are uploaded to the server for immediate redistribution to other
connected application instances.

Part of the application involves the construction of directed acyclic
graphs. If I add an edge to a graph I want anyone else editing the same
graph to be able to see the edge in something approaching real time so
that cycles are avoided. (Being able to lock the file so that only one
user can edit it concurrently might be another solution to this specific
issue.)

How might peer-to-peer
> work? How would you know which peers get the change, or would all peers
> get the change?
>

All peers. I'm not sure about the peer to peer thing though. It would be
better if the user could be guaranteed that the environment they see is
current, rather than having changes residing on someone else's machine
that happens to be switched off. I suppose the alternative must be that
the information is sat on a server somewhere.

> I've been working with rpyc (in as much spare time as I can manage) on
> some similar sounding issues and am now settling on a central system
> which provides convenient administration and potential relaying or
> pulling. See http://rpyc.sourceforge.net/
>
> I just tested my in-process development status and find 64 remote
> machines up and 5 non-responsive which in my case are likely machines
> that are not yet configured properly. As this has been on the back
> burner the past two months I'm pleased with how it's fared in the face
> of neglect.
>
> At least with rpyc (which does have a low learning curve) you'll be
> fully in python.
>

Yes, it looks very interesting. Cheers.

Duncan

rusi

unread,
Jun 23, 2012, 1:45:07 AM6/23/12
to
On Jun 22, 8:58 pm, duncan smith <buzz...@urubu.freeserve.co.uk>
wrote:
If you are looking at mercurial and subversion you may want to look at
git also.

From http://en.wikipedia.org/wiki/Git_%28software%29#Implementation
(quoting Linus Torvalds)
---------------------------
In many ways you can just see git as a filesystem — it's content-
addressable, and it has a notion of versioning, but I really really
designed it coming at the problem from the viewpoint of a filesystem
person (hey, kernels is what I do), and I actually have absolutely
zero interest in creating a traditional SCM system.

More details https://git.wiki.kernel.org/index.php/Git#Design
-------------------------
Of course its good to say upfront that git is mostly C+shell ie its
not python
There is gitpython http://packages.python.org/GitPython/0.1/tutorial.html
but I know nothing about it

duncan smith

unread,
Jun 23, 2012, 12:27:51 PM6/23/12
to
Thanks. I'm trying to figure out whether I'm better of with a version
control system, a virtual filesystem (e.g.
http://code.google.com/p/pyfilesystem/), remote procedure calls or some
combination of these.

What I really need is a flexible framework that I can experiment with,
as it's not clear what the best strategy for collaborative working might
be. e.g. It might be best to restrict working on certain elements of the
data environment to a single individual. Cheers.

Duncan
0 new messages