Open source tools and non-programmers

9 views
Skip to first unread message

Aline Bernstein

unread,
Jun 7, 2010, 4:22:32 PM6/7/10
to Open Standards for eDiscovery (OpSED)
I'd like to lift/paraphrase a few of Troy's remarks from another
thread and start a new discussion about tools in general.

> the person who needs X done is unlikely to download perl and figure it out

> anyone want to take a crack at it in python?

Doesn't a tool written in python still require the end user to
download runtime libraries and configure their environment? I see this
as the type of hurdle that limits how extensively open source tools
can spread. I'm the only programmer on a litsupport team of seven
people. Even if I had the time to learn python or perl, I couldn't
build a toolset based on those languages if the rest of my team
couldn't run them.

Are there open source development tools that allow programs to be
compiled to native Windows executables?

Kyle Jones

unread,
Jun 7, 2010, 5:00:16 PM6/7/10
to op...@googlegroups.com
Aline,

There actually are methods for converting Python to to a Windows executable. Py2Exe (http://www.py2exe.org/) is one, and there are also a few others. I imagine there are similar tools for other languages, including Perl or Ruby, etc.

You do make an excellent point, though: what good are "open" tools if few can even make use them? If we are going to be releasing tools of this sort, it is likely that an equal amount of work will need to go into deployment and distribution as into development itself. There are obviously pieces of the project which will not need "easy" distribution (model implementations of readers, etc), but for tools like you refer to, a good amount of effort will need to be made to make the tools as accessible as possible.

Coming from the other side, I spend most of my time on a Mac or Linux. I realize that most any tool will need to be (easily) useable in Windows, as that is by far the most used environment; however, part of my interest in open tools, etc, are in implementations which are reasonably cross platform. To that end, a scripting language such as Python is perfect: simple syntax, reasonably quick development time, and very good cross-platform compatibility.

There will likely need to be tools which compile natively and will need to be written in something like C#, but for smaller tools which do not have extreme speed needs, focusing on smaller dynamic languages is a good way to keep the tools as open as possible.

This would be a good avenue to follow for pre-planning: documentation of tools and processes for deploying/distributing tools developed by OpSED. Perhaps guidelines for target platforms, people in charge of distribution, etc.

Thoughts?

- Kyle

Troy Howard

unread,
Jun 7, 2010, 5:01:17 PM6/7/10
to op...@googlegroups.com
Aline,

I'm glad you brought this up. Python is actually an ideal language for
this situation. From the perspective of learning Python or editing
existing Python code, it's widely considered one of the easiest
programming languages to learn, with the lowest barrier of entry. It
has a massive collection of existing tools in its base libraries,
which often results in only needing a few lines of code to do fairly
complicated tasks. It's powerful enough in its capabilities and fast
enough to scale to the tasks that eDiscovery requires, while still
being easy to use for a beginner.

Your correct that it runs on a runtime, and thus requires a separate
install to execute Python files. Mac OS/X has Python installed by
default, as well as most Linux distributions. Windows has an easy to
install distribution (click-next type installer) that is a one-time
task. After that you can just double click on .py files.

The download for Windows is here:

http://www.python.org/download/windows/

That said, you can also compile Python code into free-standing
executables which removes the need for the separate runtime install.
There are compilation tools for the three major platforms, which
result in native executable code. Those are:

py2exe - makes a windows exe
Freeze - makes a Unix executable
py2app - makes a MacOS executable

This topic is covered on the Python wiki at:

http://wiki.python.org/moin/PythonInstalledByDefault


For OpSED tooling, I think the best way to distribute the tools would
be to make the .py source code available as well as binary builds
(with installers) for each of the major platforms (Windows, Mac, and
Linux variants). The simplest way to use the tools would be to
download the binaries. When you're ready to start hacking the source
and customizing the tools, you can download the runtime and the source
code.

That said, we look to Python as a first choice because it's so ideal
for this purpose, but it is not the only option. Java or C++ are also
both widely used for this purpose, however both are more complicated
to deploy and compile.

In other situations, languages like CPL might be the best option
(automating something in Concordance), and of course we'd gladly host
and distribute those options.

In the end, the goal is to facilitate the sharing and reuse of
solutions, in whatever way is easiest for the end user.

Thanks,
Troy


On Mon, Jun 7, 2010 at 1:22 PM, Aline Bernstein
<aline.b...@gmail.com> wrote:

Aline Bernstein

unread,
Jun 8, 2010, 12:08:51 AM6/8/10
to Open Standards for eDiscovery (OpSED)
Let's say I've installed Python for XP from the current production msi
and am too busy to RTFM and start learning from the ground up. The
three things I know I'm going to need to do in a litsupport context
are: (1) process all files in a folder structure, (2) for each file,
perform some sort of transformation, and (3) save the the output
either over the original or to a new location. Does someone have a
small program of this type where the transformation is minor (or
replace with a comment: "your code here")? Maybe making this example
program available would make it easier for newbies to get started. ;)

Troy Howard

unread,
Jun 8, 2010, 3:37:38 AM6/8/10
to op...@googlegroups.com
Aline,

As an example, I created a couple of Python scripts and posted them up
at opsed.org for download.

Here are the links:

http://opsed.org/attachments/download/2/walk-files.py
http://opsed.org/attachments/download/3/show-mimetypes.py

To run the files, you can just double click them, or run from the DOS
command prompt like this:

C:\Python26\python.exe C:\opsed\walk-files.py

To view the source code, just open them in a text editor...

The first script, 'walk-files.py', simply loops recursively over every
subdirectory/file of a root directory and prints out the name to the
console. For the sake of showing the simplicity of the Python code, I
didn't include many comments.

It currently expects a directory called 'C:\Test' to exist, so if it
doesn't exist on your machine, it won't do anything... To change the
root directory, just edit the fourth line to point to a different
directory.

Example:

To scan 'C:\My\Folder\Path'...

rootdir = 'C:\Test'

would change to:

rootdir = 'C:\My\Folder\Path'


To show something a bit more complex, I made 'show-mimetypes.py' which
first asks the user to enter the root directory, then scans all the
files under it recursively, and prints out the filename and the mime
type to the console. It then prompts the user to 'Press any key to
exit'. For the sake of explaining all the details, I include a bunch
of comments.

This should give you a rough idea of how to accomplish similar tasks.
As an exercise, why don't we put together a few simple tools like this
to see how things can be done. It'll be great to do this in the
context of eDiscovery, so let's choose something valuable to typical
post-processing tasks that a litigation support team might do.

The tools we discussed from requests on the litsupport list are a bit
more complex to accomplish, but maybe we can come up with something
equally useful but more simple as a learning exercise.

Thanks,
Troy

Aline Bernstein

unread,
Jun 8, 2010, 6:37:00 PM6/8/10
to Open Standards for eDiscovery (OpSED)
Okay, one IndentationError was all it took to convince me that Python
is NOT the language for me. I haven't been forced into that corner
since COBOL in the 90s. I'll stick with CPL for my own programming.
But I'll be happy to be a tester for the rest of you Python gurus.
> > program available would make it easier for newbies to get started. ;)- Hide quoted text -
>
> - Show quoted text -

Troy Howard

unread,
Jun 8, 2010, 6:57:27 PM6/8/10
to op...@googlegroups.com
Yeah... Python uses whitespace to determine scope. It's honestly the
one thing that I both love and hate about the language. Love it,
because it makes the code easier to read. Hate it, because it causes
problems like this. At my day job, we use C#, and that has all the
lovely curly braces from C/C++/Java.. Hard to say which is better.
Curly braces create a lot of noise, and it can sometimes be difficult
to know what level of depth/scope you're at with those as well.

One thing that can mess a lot of people up is that Python treats tabs
and spaces differently.. For example, in my text editor, when I hit
<tab> it will insert 4 spaces, not a <tab> character. In other
people's editors, it will insert a <tab> character.

If your text editor is inserting <tab> characters, and you tried to
edit a file that I made when I was using spaces, it would upset
Python. It would think that your tab was equivalent to a single space,
instead of four, as it interprets a <tab> as a single whitespace
character... The easy way to deal with that is configure the text
editor to always use spaces instead of tabs, or vice-versa. Then the
problem just goes away. I choose to use spaces, because most text
editors will insert spaces instead of tab characters, so it had the
greatest chance of avoiding this problem.. but that's still just a
chance.

It's a common gotcha with Python, but it's mostly due to the way
various that text editors all work differently by default.

Anyhow, a user/tester/discussion partner is just as useful as a
programmer to OpSED. Many of our projects have nothing to do with
writing code, and those are in fact, our more important projects. The
tools project is just something that happened to come up. That
project's original focus was just a place to compile a list of
existing tools and how to use them, with the thought that we could
write tools when there wasn't already an existing free/open source one
available.

There are a lot of people out there who don't know our industry, but
do know Python, who could write the code for these kinds of tools. But
only people like yourself can describe what kinds of tools will be
most useful to you, and how you want them to work. Of course, it's
ideal if you can also write or modify your tools, because that gives
you more power as a user.

Thanks,
Troy

Reply all
Reply to author
Forward
0 new messages