Program for comparing Chinese texts

29 views
Skip to first unread message

Morten Schlütter

unread,
Feb 25, 2015, 1:31:02 PM2/25/15
to chine...@googlegroups.com, Morten Schlutter
H all,

Does anyone have suggestions for a program that will run a comparison of two Chinese texts? I am looking for matching character strings, the text themselves are quite different.

Thanks,

Morten
---
Morten Schlütter
Associate Professor of Chinese Religion
Department of Religious Studies

Director, Center for Asian and Pacific Studies
International Programs

The University of Iowa
Iowa City, IA 52242

Ph.: 319-335-2165



Jens Østergaard Petersen

unread,
Feb 25, 2015, 2:33:47 PM2/25/15
to Morten Schlütter, chine...@googlegroups.com
--
--
You received this message because you are subscribed to the Chinese Mac group.
For answers to frequently-asked questions, visit http://www.yale.edu/chinesemac
To start a new topic, send a new message to chine...@googlegroups.com
To unsubscribe, send a message to chinesemac-...@googlegroups.com
For more options, visit http://groups.google.com/group/chinesemac
---
You received this message because you are subscribed to the Google Groups "Chinese Mac" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chinesemac+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

D G Rossiter

unread,
Feb 26, 2015, 9:49:51 AM2/26/15
to chine...@googlegroups.com, morten-s...@uiowa.edu
I think the Unix "diff" command would work fine. I use it within Emacs but it is also a standalone program. Since the Mac OS X is based on Darwin (a Unix variant) it is built-in and you can get it from the terminal. 

Jens Østergaard Petersen

unread,
Feb 26, 2015, 10:47:58 AM2/26/15
to D G Rossiter, chine...@googlegroups.com, morten-s...@uiowa.edu
A diff shows differences, but Morten wants matches. Plagiarism checkers find matches, even fuzzy ones. Perhaps what Morten is thinking of is the algorithm which finds parallel passages in ctext.org <http://ctext.org/tools/parallel-passages>.

A lot of Mac apps can do diffs – for instance, TextWrangler <http://www.barebones.com/products/textwrangler/>.

Jens

D G Rossiter

unread,
Feb 26, 2015, 10:50:01 AM2/26/15
to chine...@googlegroups.com, morten-s...@uiowa.edu
Well yes, but non-differences are matches!!


On Wednesday, 25 February 2015 13:31:02 UTC-5, MortenSchlütter wrote:

Jens Østergaard Petersen

unread,
Feb 26, 2015, 11:36:00 AM2/26/15
to D G Rossiter, chine...@googlegroups.com, morten-s...@uiowa.edu
Yes, but "the texts themselves are quite different," as Morten writes, and diff checks line by line and then character by character. What if the lines are all jumbled? Applications like QuotationFinder goes beyond a simple diff, using different techniques to find matches anywhere in the text.

TenThousandThings

unread,
Feb 26, 2015, 12:16:09 PM2/26/15
to chine...@googlegroups.com, cyru...@gmail.com, morten-s...@uiowa.edu
On Thursday, February 26, 2015 at 10:47:58 AM UTC-5, Jens Østergaard Petersen wrote:
Perhaps what Morten is thinking of is the algorithm which finds parallel passages in ctext.org <http://ctext.org/tools/parallel-passages>.

Note that you can submit texts to the project, here:


Not sure what Sturgeon's attitude is toward whatever you might want to add, but the requirements don't mention any restrictions about the types of texts:


Might make sense to contact him and ask for advice:

TenThousandThings

unread,
Feb 26, 2015, 12:19:18 PM2/26/15
to chine...@googlegroups.com, morten-s...@uiowa.edu

This isn't what you are looking for, but I think it is worth mentioning here. AntConc is a Mac OS X-friendly concordance generator, which has recently added a basic segmentation tool for Japanese and Chinese texts:


AntConc homepage: http://www.laurenceanthony.net/software/antconc/


The segmentation tool is here: 


http://www.laurenceanthony.net/software/segmentant/


I wouldn't bet the house on it working well with Chinese religious texts, but you never know. I think this stuff is geared toward linguistics and translation research. The developer is based in Japan at Waseda University -- he is currently active and seems open to suggestions.

Reply all
Reply to author
Forward
0 new messages