Fwd: [litsupport] DAT file extraction

31 views
Skip to first unread message

Troy Howard

unread,
Jun 4, 2010, 1:00:20 AM6/4/10
to op...@googlegroups.com
Here's another opportunity that cropped up from the litsupport mailing list for creating a simple tool that would be quite handy to a lot of people. As usual, no one is stepping forward and just providing the tool. Jim probably typed more text in his email talking about how easy it would be to write in perl, than it would have taken to just make the tool in the first place.

This is where OpSED can really close the gap. The fact is, as much as Jim would like him to, the original poster is not likely to download perl and figure it out... or TextPipe Pro for that matter. Someone, somewhere is going to cash in on doing this trivial task, and when they are done invoicing their billable hours, they are not going publish their tools or methods for general reuse, and the world will not be bettered for it. 

This is one of the many problems in our industry that OpSED is meant to be a solution for.

This would be a great thing to add into the Open Tools project, and we could put it together really quickly, as it's a very simple task.

Anyone want to take a crack at whipping up a Python version of this? 

Thanks,
Troy

---------- Forwarded message ----------
From: Jim Monty <jim....@yahoo.com>
Date: Wed, Jun 2, 2010 at 10:59 AM
Subject: Re: [litsupport] DAT file extraction
To: litsu...@yahoogroups.com


 

Tim Piganelli asked:


> Has anyone heard of a tool that will extract the text out of a DAT file?
> I know you can get Multi-Page Text files exported out of Concordance
> once the DAT file is loaded. I am trying to bypass loading the data into
> Concordance and try to get the Text field "parsed" out and put into a
> text file.
>
> Any ideas?

Tim,

I routinely use simple AWK and Perl scripts to do this. Perl and other modern scripting languages (AWK, Python, Ruby, PowerShell) excel at simple text and file processing tasks like this one: extracting the plain text of whole documents embedded within a character-separated value (CSV) file (e.g., a Concordance DAT file) and putting the text in individual document files on the file system with rational names derived from data in the same CSV file (e.g., beginning Bates numbers).

There's a commercial software application called TextPipe Pro you can use to accomplish this same task. However, TextPipe Pro is a GUI application and, so, doesn't support standard I/O streams (stdin/stdout/stderr, redirection and pipelines). It has a steep learning curve. To do useful things with it, you must use filters and these filters are written in VBScript and JScript, scripting langages that do *not* excel at text processing tasks compared to Perl and other languages. In my opinion, if you're going to spend the time it takes to learn a text processing tool, you'd be better off in the long run to learn AWK or one of its descendants (Perl, Python, Ruby) instead of TextPipe Pro.

AWK is easy to learn compared to other options. You can quickly begin wielding it to do very useful things. For example, a trivial version of an AWK script to extract the text of whole documents from a Concordance DAT file would probably be less than twenty lines of code and would be very understandable, even to a novice programmer.

AWK, Perl, Python and Ruby are all free (free-as-in-freedom and free-as-in-free-beer) and available for all modern operating systems (Microsoft Windows, Mac OS X, Unix/Linux). TextPipe Pro costs money, is proprietary, and is only available for Windows.

I hope this suggestion helps you.

--
Jim Monty
Tempe, AZ

__._,_.___
Recent Activity:
Copyright 2008.  The Litigation Support Mailing List

Post message: litsu...@yahoogroups.com
List owner: litsuppo...@yahoogroups.com

To subscribe, unsubscribe or change delivery options, please go to:

http://groups.yahoo.com/group/litsupport

.

__,_._,___

Caleb

unread,
Jun 4, 2010, 12:06:25 PM6/4/10
to Open Standards for eDiscovery (OpSED)
Sounds like a Saturday afternoon hack to me... I just might have to do
this. Heck, I've got half of this written already (the part to parse
the DAT file) - to write the results out to named text files should be
a snap.

Now, if I only had a sample DAT file... hmm?
>   Reply to sender<jim.mo...@yahoo.com?subject=Re:+%5Blitsupport%5D+DAT+file+extraction>|
> Reply
> to group<litsupp...@yahoogroups.com?subject=Re:+%5Blitsupport%5D+DAT+file+extraction>|
> Reply
> via web post<http://groups.yahoo.com/group/litsupport/post;_ylc=X3oDMTJvZXZ1bnZ2BF...>|
> Start
> a New Topic<http://groups.yahoo.com/group/litsupport/post;_ylc=X3oDMTJjbHZraGM2BF...>
> Messages in this
> topic<http://groups.yahoo.com/group/litsupport/message/43979;_ylc=X3oDMTM0a...>(
> 3)
>  Recent Activity:
>
>    - New Members<http://groups.yahoo.com/group/litsupport/members;_ylc=X3oDMTJkdTBiYnF...>
>    13
>
>  Visit Your Group<http://groups.yahoo.com/group/litsupport;_ylc=X3oDMTJjMWt1OTE1BF9TAzk...>
>  Copyright 2008.  The Litigation Support Mailing List
>
> Post message: litsupp...@yahoogroups.com
> List owner: litsupport-ow...@yahoogroups.com
>
> To subscribe, unsubscribe or change delivery options, please go to:
>
> http://groups.yahoo.com/group/litsupport
>
>  MARKETPLACE
>
> Stay on top of your group activity without leaving the page you're on - Get
> the Yahoo! Toolbar
> now.<http://us.ard.yahoo.com/SIG=15o1ro7jr/M=493064.13983314.14041046.1329...>
>   ------------------------------
>
> Get great advice about dogs and cats. Visit the Dog & Cat Answers
> Center.<http://us.ard.yahoo.com/SIG=15om8oq9e/M=493064.13814537.14041040.1083...>
>   ------------------------------
>
> Hobbies & Activities Zone: Find others who share your passions! Explore new
> interests.<http://us.ard.yahoo.com/SIG=15o77su0o/M=493064.14012770.13963757.1329...>
>    [image: Yahoo!
> Groups]<http://groups.yahoo.com/;_ylc=X3oDMTJiYjNrcnU3BF9TAzk3MzU5NzE0BGdycEl...>
> Switch to: Text-Only<litsupport-traditio...@yahoogroups.com?subject=Change+Delivery+Format:+Traditional>,
> Daily Digest<litsupport-dig...@yahoogroups.com?subject=Email+Delivery:+Digest>•
> Unsubscribe <litsupport-unsubscr...@yahoogroups.com?subject=Unsubscribe>
> • Terms
> of Use <http://docs.yahoo.com/info/terms/>
>    .
>
> __,_._,___
Reply all
Reply to author
Forward
0 new messages