Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

sgrep-0.99 - A tool for searching structured text

26 views
Skip to first unread message

Jani Jaakkola

unread,
May 1, 1996, 3:00:00 AM5/1/96
to

-----BEGIN PGP SIGNED MESSAGE-----


The Document Management Group at the Department of Computer Science
of University of Helsinki, Finland, proudly presents

---------------------------------------------------------------------------

SGREP v0.99 - A tool for searching files for structured patterns

---------------------------------------------------------------------------


INTRODUCTION
- ------------

If you have ever wondered how to

o Locate only TITLE and H1 .. H9 elements from HTML documents
o Remove all <FONT> tags from an HTML document
o Rename all B elements to STRONG elements
o Find out how many FIG elements there are under SUBPARA
elements but not under PARA elements in your SGML file
o Print out the TITLE elements from a set of HTML documents
in which word 'SGML' is mentioned more than 12 times, or
which contain word SGML inside H1 or H2 elements.
o Find out mail senders of mail messages from a set of mail
files, which contain word 'SGML' in the subject line, do
not contain 'HTML' in the body of the mail, are sent in year
1996 and are not sent from address fl...@hot.com

then sgrep is a tool for you.

Sgrep (structured grep) is a tool for searching text files and
filtering text streams for structured criteria. Sgrep implements
a query language based on so called region expressions.

Like grep, sgrep can be used for any kind of text files. However it
is most useful for text files containing some kind of structured text.
A file containing structured text could be defined as a file, which
obeys some syntax. Examples of structured text files are SGML, HTML,
C, Tex and mail files.

ENVIRONMENT
- -----------

Sgrep needs a Unix-like system to run. It has been tested on the following
platforms:
SunOS 5.4 sparc
Linux 1.3.85 alpha
Linux 1.2.13 intel, a.out binaries
Linux 1.2.13 intel, elf binaries
HP-UX 9000/735
OSF1 alpha

It has been reported to run also on
SGI/Irix 5.2

A macro preprocessor is most useful as a front-end to sgrep.
The authors use m4, and the delivery package contains example macro files
written for m4. However, a C-preprocessor or some other program could also
be used instead of m4.

COPYRIGHT
- ---------

Sgrep is distributed under the GNU General Public License.

WHERE CAN I FIND IT ?
- ---------------------

We have put up some WWW-pages on sgrep at

http://www.cs.helsinki.fi/~jjaakkol/sgrep.html

In the WWW-pages you will also find the queries, which solve the
problems above.

Source for sgrep can be downloaded from

ftp://ftp.cs.helsinki.fi/pub/Software/Local

Sorry, there are no binary distributions (yet).
Send mail to jjaa...@cs.helsinki.fi, if you have a problem, which you
cannot solve yourself.

CREDITS
- -------

Sgrep was created by Jani Jaakkola (jjaa...@cs.helsinki.fi) and Pekka
Kilpeläinen (kilp...@cs.helsinki.fi).

We wish to thank professor Heikki Mannila for suggesting us to design
and implement sgrep.

Sgrep is based upon the paper "An algebra for structured text search and
framework for its implementation" by C. L. A. Clarke, G. V. Cormack and
F. J. Burkowski. The Computer Journal, 38(1):43-56, 1995.
A preliminary version of their paper is available from
ftp://cs-archive.uwaterloo.ca/cs-archive/CS-94-30
However, sgrep is not a strict implementation of the language of Clarke,
Cormack and Burkowski. Unlike their language, sgrep is able to deal
with nested regions, e.g., lists within lists (within lists ..).

LSM entry
- ---------

Begin3
Title: sgrep - A tool for searching files for structured patterns
Version: 0.99
Entered-date: 30Apr96
Description: Sgrep is a convenient tool for making queries to almost
any kind of text files with some well kown structure.
These include programs, mail folders, news folders,
HTML, SGML, etc... With relatively simple queries you
can display mail messages by their subject or sender,
extract titles or links or any regions from HTML files,
function prototypes from C or make complex queries to
SGML files based on the DTD of the file.
Keywords: text structure SGML HTML grep search
Author: jjaa...@cs.helsinki.fi (Jani Jaakkola)
kilp...@cs.helsinki.fi (Pekka Kilpelainen)
Maintained-by: jjaa...@cs.helsinki.fi (Jani Jaakkola)
Primary-site: ftp://ftp.cs.helsinki.fi/pub/Software/Local/Sgrep
72kB sgrep-0.99.tar.gz
Alternate-site: sunsite.unc.edu /pub/Linux/utils/text
Platforms: any unix like OS
Copying-policy: GPL
End

Enjoy !

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2i

iQCVAwUBMYdMyoQRll5MupLRAQEdDwQAsRDI3QKbWSrexB5MybZCrbXMB+0HrHlw
pxHVbZhbgN20OCD+ctQibdlNdSP4dx/0Jiqw47nF9pvv0tIxs470QItusZh0bttY
Hb5Or9IP5xDb7Ge80/uEHunWLQvXWrsLqKSww4JVwEL8NVU3oETCRnw3i3wvFlYw
WLQaOC9RDUU=
=qZ4B
-----END PGP SIGNATURE-----

--
This article has been digitally signed by the moderator, using PGP.
Finger wirz...@kruuna.helsinki.fi for PGP key needed for validating signature.
Send submissions for comp.os.linux.announce to: linux-a...@news.ornl.gov
PLEASE remember a short description of the software and the LOCATION.

0 new messages