Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

htdig: htsearch logging problem

3 views
Skip to first unread message

Doug

unread,
Jan 22, 1999, 3:00:00 AM1/22/99
to
Now that things are running the way I want them, I started looking at
some of the other features. When I enable the logging feature htsearch
dumps core every time. Here is the relevant info:

57$ uname -a
SunOS 5.6 Generic_105181-06 sun4m sparc sun4m

htdig 3.1.0b4 compiled with gcc 2.8.1 and libstdc++ 2.8.1

Here is the relevant info from the core file:

#0 0xef624614 in strlen ()
(gdb) where
#0 0xef624614 in strlen ()
#1 0xef65a4ec in _doprnt ()
#2 0xef663990 in vsnprintf ()
#3 0xef64fe84 in _vsyslog ()
#4 0xef64faa8 in syslog ()
#5 0x2e2b4 in Display::logSearch (this=0xefffea38, page=1,
matches=0x94260)
at Display.cc:1044
#6 0x2b3f4 in Display::display (this=0xefffea38, pageNumber=1)
at Display.cc:175
#7 0x2fe98 in main (ac=1001616, av=0xefffef30) at htsearch.cc:295


If anyone has any ideas I'm all ears. :) This isn't crucial because we
can always use GET, but it would be nice to have more info available.

Thanks,

Doug
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-...@sdsu.edu containing the single word "unsubscribe" in
the body of the message.

Gilles Detillieux

unread,
Jan 22, 1999, 3:00:00 AM1/22/99
to
Just a hunch, but do the core dumps happen only when testing htsearch
from the command line, or do they happen when running searches from
your browser too? If it's just from the command line, it may be that
your syslog is dying when it tries to access the NULLs returned by
getenv() for variables that are undefined (REMOTE_HOST & HTTP_REFERER).

When run by the http server, these variables should be defined, so I
can't imagine what else would be the problem, if htsearch dumps core
even then. If this is the case, you'll need to poke around in logSearch()
to see what's failing. Given that the problem is happening in strlen(),
called indirectly from syslog(), the most likely culprit is a NULL string
pointer, i.e. char * 0 (as opposed to an empty string), being passed as
an argument to syslog().

According to Doug:

--
Gilles R. Detillieux E-mail: <grd...@scrc.umanitoba.ca>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930

Doug

unread,
Jan 22, 1999, 3:00:00 AM1/22/99
to
Gilles Detillieux wrote:
>
> Just a hunch, but do the core dumps happen only when testing htsearch
> from the command line, or do they happen when running searches from
> your browser too? If it's just from the command line, it may be that
> your syslog is dying when it tries to access the NULLs returned by
> getenv() for variables that are undefined (REMOTE_HOST & HTTP_REFERER).
>
> When run by the http server, these variables should be defined, so I
> can't imagine what else would be the problem, if htsearch dumps core
> even then. If this is the case, you'll need to poke around in logSearch()
> to see what's failing. Given that the problem is happening in strlen(),
> called indirectly from syslog(), the most likely culprit is a NULL string
> pointer, i.e. char * 0 (as opposed to an empty string), being passed as
> an argument to syslog().

*Nod* I suspected this to be the case, and no, I'm not doing it from
the command line, I'm doing it with a real browser, search page, etc. On
the same hunch I thought I'd give commenting out the empty "restrict"
and "exclude" variables from the form, but that didn't help either. INRE
other possibly unset variables, we're using apache, so my understanding
is that all of the "normal" variables should be filled in. When I run
apache's "test.cgi" as a cgi script all the environment variables it
lists have values. Although come to think of it... our servers don't
always fill in remote_host, in fact it's not always possible to fill it
in if the IP doesn't have a PTR record or it is misconfigured. Also,
looking it up is a toggle in apache which is defaulted to OFF. So if
htdig's logging routine is relying on remote_host having a value that
should probably be conditionalized.

If there is something else I can take a look at I'll be glad to,
however now that this project is basically finished the boss is
redirecting my attention to other areas. If someone comes up with a
patch I could test it to see if it helps, but I won't be able to work on
it myself (which isn't saying much given my lack of C++ ability :).

On a slightly related note, once the webmaster puts the final touches
on the nooks and crannies our site's search engine is going to go
public, probably this upcoming monday. Is this the appropriate forum to
announce a "Hey, take a look at this!" type post, and if not can someone
point me to one? :) We'd also like to be included in the list of sites
that use htdig if that's appropriate.... the boss is *really* happy with
the end product, so once again my thanks.

Doug

Geoff Hutchison

unread,
Jan 22, 1999, 3:00:00 AM1/22/99
to
At 7:37 PM -0400 1/21/99, Doug wrote:
>patch I could test it to see if it helps, but I won't be able to work on
>it myself (which isn't saying much given my lack of C++ ability :).

This is from the latest CVS tree, so it will probably require some fuzz to
apply correctly:

diff -u -r1.35 -r1.39
--- htdig3/htsearch/Display.cc 1999/01/20 19:18:54 1.35
+++ htdig3/htsearch/Display.cc 1999/01/22 04:40:57 1.39
@@ -1263,21 +1278,28 @@
void
Display::logSearch(int page, List *matches)
{
- // Currently unused char *env_host;
// Currently unused time_t t;
int nMatches = 0;
int level = LOG_LEVEL;
int facility = LOG_FACILITY;
+ char *host = getenv("REMOTE_HOST");
+ char *ref = getenv("HTTP_REFERER");
+
+ if (host == NULL)
+ host = getenv("REMOTE_ADDR");
+
+ if (ref == NULL)
+ *ref = '-';

if (matches)
nMatches = matches->Count();

openlog("htsearch", LOG_PID, facility);
syslog(level, "%s [%s] (%s) [%s] [%s] (%d/%s) - %d -- %s\n",
- getenv("REMOTE_HOST"),
+ host,
input->exists("config") ? input->get("config") : "default",
config["match_method"], input->get("words"), logicalWords.get(),
nMatches, config["matches_per_page"],
- page, getenv("HTTP_REFERER")
+ page, ref
);
}

>announce a "Hey, take a look at this!" type post, and if not can someone
>point me to one? :) We'd also like to be included in the list of sites
>that use htdig if that's appropriate.... the boss is *really* happy with

Sure you can announce that. I don't think anyone will mind. As for the
list, I suggest filling out the handy form on
http://www.htdig.org/uses.html ;-)


-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

Doug

unread,
Jan 23, 1999, 3:00:00 AM1/23/99
to
This is a multi-part message in MIME format.
--------------95EF9DF030A8855430413C2B
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Doug wrote:
>
> Now that things are running the way I want them, I started looking at
> some of the other features. When I enable the logging feature htsearch
> dumps core every time.

Ok, Gilles' patch fixed the logging... thanks once again. :) Here is
the complete info on the changes I made for our version of htdig. If
anyone spots something really silly, please don't hesitate to let me
know.

Doug
--------------95EF9DF030A8855430413C2B
Content-Type: text/plain; charset=us-ascii;
name="SN-htdig.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="SN-htdig.diff"

diff -cr htdig-3.1.0b4-stock/htlib/URL.cc htdig-3.1.0b4/htlib/URL.cc
*** htdig-3.1.0b4-stock/htlib/URL.cc Thu Dec 24 09:20:20 1998
--- htdig-3.1.0b4/htlib/URL.cc Thu Jan 21 12:32:17 1999
***************
*** 137,142 ****
--- 137,145 ----
// Thanks goes to David Filiatrault <d...@WebThreads.Com> for suggesting
// this removal process.
//
+
+ char *stupid_question_mark = strchr(ref, '?');
+
char *anchor = strchr(ref, '#');
char *params = strchr(ref, '?');
if (anchor)
***************
*** 155,160 ****
--- 158,171 ----
}
}

+
+ // Well, if that works for anchors will it work for the stinkin' ?'s
+ if (stupid_question_mark)
+ {
+ *stupid_question_mark = '\0';
+ }
+
+
//
// If, after the removal of a possible '#' we have nothing left,
// we just want to use the base URL.
***************
*** 277,285 ****
--- 288,303 ----
// Ignore any part of the URL that follows the '#' since this is just
// an index into a document.
//
+
+ char *stupid_question_mark = strchr(nurl, '?');
char *p = strchr(nurl, '#');
if (p)
*p = '\0';
+
+ if (stupid_question_mark)
+ {
+ *stupid_question_mark = '\0';
+ }

//
// Extract the service
diff -cr htdig-3.1.0b4-stock/htsearch/Display.cc htdig-3.1.0b4/htsearch/Display.cc
*** htdig-3.1.0b4-stock/htsearch/Display.cc Tue Dec 22 18:15:39 1998
--- htdig-3.1.0b4/htsearch/Display.cc Fri Jan 22 15:47:11 1999
***************
*** 114,119 ****
--- 114,120 ----
#include <ctype.h>
#include <syslog.h>

+ extern char *sn_query_string;

//*****************************************************************************
//
***************
*** 351,356 ****
--- 352,359 ----
int matchesPerPage = config.Value("matches_per_page");
int nPages = (nMatches + matchesPerPage - 1) / matchesPerPage;

+ vars.Add("SN_QUERY_STRING", new String(sn_query_string));
+
if (nPages < 1)
nPages = 1; // We always have at least one page...
vars.Add("MATCHES_PER_PAGE", new String(config["matches_per_page"]));
***************
*** 498,503 ****
--- 501,511 ----
s << "format=" << input->get("format") << '&';
if (input->exists("matchesperpage"))
s << "matchesperpage=" << input->get("matchesperpage") << '&';
+
+ // Add query string
+ if (sn_query_string)
+ s << "SN_query_string=" << sn_query_string << '&';
+
if (input->exists("words"))
s << "words=" << input->get("words") << '&';
s << "page=" << pageNumber;
***************
*** 1023,1037 ****


int level = LOG_LEVEL;
int facility = LOG_FACILITY;

if (matches)
nMatches = matches->Count();

openlog("htsearch", LOG_PID, facility);
syslog(level, "%s [%s] (%s) [%s] [%s] (%d/%s) - %d -- %s\n",

! getenv("REMOTE_HOST"),


input->exists("config") ? input->get("config") : "default",
config["match_method"], input->get("words"), logicalWords.get(),
nMatches, config["matches_per_page"],

! page, getenv("HTTP_REFERER")
);
}
--- 1031,1056 ----


int level = LOG_LEVEL;
int facility = LOG_FACILITY;

+ char *host = getenv("REMOTE_HOST");
+ char *ref = getenv("HTTP_REFERER");
+
+ if (host == NULL)
+ host = getenv("REMOTE_ADDR");
+ if (host == NULL)

+ host = "-";


+
+ if (ref == NULL)

+ ref = "-";
+

if (matches)
nMatches = matches->Count();

openlog("htsearch", LOG_PID, facility);
syslog(level, "%s [%s] (%s) [%s] [%s] (%d/%s) - %d -- %s\n",

! host,


input->exists("config") ? input->get("config") : "default",
config["match_method"], input->get("words"), logicalWords.get(),
nMatches, config["matches_per_page"],

! page, ref
);
}
diff -cr htdig-3.1.0b4-stock/htsearch/htsearch.cc htdig-3.1.0b4/htsearch/htsearch.cc
*** htdig-3.1.0b4-stock/htsearch/htsearch.cc Tue Dec 22 17:53:16 1998
--- htdig-3.1.0b4/htsearch/htsearch.cc Fri Jan 22 15:59:15 1999
***************
*** 100,105 ****
--- 100,106 ----
int debug = 0;
int minimum_word_length = 3;

+ char *sn_query_string;

//*****************************************************************************
// int main()
***************
*** 218,223 ****
--- 219,227 ----
if (input.exists("keywords"))
requiredWords.Create(input["keywords"], " \t\r\n");

+ if (input.exists("SN_query_string"))
+ sn_query_string = input["SN_query_string"];
+
minimum_word_length = config.Value("minimum_word_length", minimum_word_length);

Parser *parser = new Parser();

--------------95EF9DF030A8855430413C2B--

Gilles Detillieux

unread,
Jan 23, 1999, 3:00:00 AM1/23/99
to
I've GOT to get in the habit of reading all my mail before responding to
it. I was all set to post a patch for this, when I saw you had beat me
to it. Your patch is a little cleaner than mine, but I saw a couple
problems with it. First of all, if REMOTE_HOST & REMOTE_ADDR are both
undefined, host will remain NULL. Secondly, if ref is NULL, you can't
assign to *ref. Here's my amended patch, applied to 3.1.0b4:

--- htsearch/Display.cc.envbug Tue Dec 22 20:15:39 1998
+++ htsearch/Display.cc Fri Jan 22 10:41:57 1999
@@ -1017,21 +1017,30 @@


void
Display::logSearch(int page, List *matches)
{
- // Currently unused char *env_host;
// Currently unused time_t t;
int nMatches = 0;

int level = LOG_LEVEL;
int facility = LOG_FACILITY;
+ char *host = getenv("REMOTE_HOST");
+ char *ref = getenv("HTTP_REFERER");
+
+ if (host == NULL)
+ host = getenv("REMOTE_ADDR");
+ if (host == NULL)
+ host = "-";
+
+ if (ref == NULL)
+ ref = "-";

if (matches)
nMatches = matches->Count();

openlog("htsearch", LOG_PID, facility);
syslog(level, "%s [%s] (%s) [%s] [%s] (%d/%s) - %d -- %s\n",

- getenv("REMOTE_HOST"),
+ host,


input->exists("config") ? input->get("config") : "default",
config["match_method"], input->get("words"), logicalWords.get(),
nMatches, config["matches_per_page"],

- page, getenv("HTTP_REFERER")
+ page, ref
);
}

According to Geoff Hutchison:


>
> At 7:37 PM -0400 1/21/99, Doug wrote:
> >patch I could test it to see if it helps, but I won't be able to work on
> >it myself (which isn't saying much given my lack of C++ ability :).
>
> This is from the latest CVS tree, so it will probably require some fuzz to
> apply correctly:
>
> diff -u -r1.35 -r1.39
> --- htdig3/htsearch/Display.cc 1999/01/20 19:18:54 1.35
> +++ htdig3/htsearch/Display.cc 1999/01/22 04:40:57 1.39
> @@ -1263,21 +1278,28 @@
> void
> Display::logSearch(int page, List *matches)
> {
> - // Currently unused char *env_host;
> // Currently unused time_t t;
> int nMatches = 0;

> int level = LOG_LEVEL;
> int facility = LOG_FACILITY;
> + char *host = getenv("REMOTE_HOST");
> + char *ref = getenv("HTTP_REFERER");
> +
> + if (host == NULL)
> + host = getenv("REMOTE_ADDR");
> +

> + if (ref == NULL)

> + *ref = '-';


>
> if (matches)
> nMatches = matches->Count();
>
> openlog("htsearch", LOG_PID, facility);
> syslog(level, "%s [%s] (%s) [%s] [%s] (%d/%s) - %d -- %s\n",

> - getenv("REMOTE_HOST"),
> + host,


> input->exists("config") ? input->get("config") : "default",
> config["match_method"], input->get("words"), logicalWords.get(),
> nMatches, config["matches_per_page"],

> - page, getenv("HTTP_REFERER")
> + page, ref
> );
> }
>
> >announce a "Hey, take a look at this!" type post, and if not can someone
> >point me to one? :) We'd also like to be included in the list of sites
> >that use htdig if that's appropriate.... the boss is *really* happy with
>
> Sure you can announce that. I don't think anyone will mind. As for the
> list, I suggest filling out the handy form on
> http://www.htdig.org/uses.html ;-)
>
>
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/

--

Gilles R. Detillieux E-mail: <grd...@scrc.umanitoba.ca>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930

0 new messages