The idea of the opensource program that I sent before

0 views
Skip to first unread message

alaa hamed

unread,
Oct 2, 2010, 7:07:18 AM10/2/10
to 3_...@googlegroups.com
// Details: Reads in the list of documents and dynamically allocates a table of document entries, one for
//   each document. It then reads in each document, converts each word to a 32-bit hash-coded value
//   and dynamically allocates space to store those lists of hash-coded word values. Each document
//   entry ends up with pointers to (1) a sequential list of the hash-coded word values, (2) a sorted
//   list of those same hash-coded word values, and (3) a list of the word numbers associated with
//   the sorted hash-coded word values (so that the program can figure out where each of the
//   sorted hash-coded words actually appears in the document).
//   After it reads in all the documents and creates the hash-coded lists, the program begins to
//   compare documents. It selects two documents, left and right, and goes through their sorted
//   hash-coded word lists, beginning with the first entries for words of more than 3 characters.
//   It ignores words that don't appear in both documents. When it finds words common to both
//   documents, it searches around those words in the normal-order hash-coded word lists to see
//   how long the matching phrases are. If they are long enough, it marks the matches in its lists.
//   The program must treat redundant words carefully--words that appear more than once in either or
//   both documents. For each copy of a redundant word in the left document, it looks through all
//   copies of that redundant word in the right document. In general, the right document's counter
//   changes first and recycles when redundant words are encountered.
//   If the program finds more than the threshold number of matching words in two documents, it
//   generates two html files. It embeds the document text in each file, with html codes to
//   underline the matching text.

alaa hamed

unread,
Oct 2, 2010, 7:14:36 AM10/2/10
to 3_...@googlegroups.com
Fingerprinting :D:D:D:D


From: alaa hamed <alaa_hamed...@yahoo.com>
To: 3_...@googlegroups.com
Sent: Sat, October 2, 2010 4:07:18 AM
Subject: The idea of the opensource program that I sent before
Reply all
Reply to author
Forward
0 new messages