subfolders

5 views
Skip to first unread message

thealexangroup

unread,
Feb 4, 2008, 10:04:15 AM2/4/08
to SOFTplus GSiteCrawler
how do i set up GsiteCrawler to spider subfolders. i was able to do
this but for some reason i now keep getting "error 403" if i have GCS
crawl a specific folder with folders within it. Prior to this, i did
not make any changes other than to add a robots.txt and uncheck
"import known pages to google..." AND "delete all manual links".

Now GCS works if i go to each individual subfolder (each with only
files within) but it would take forever to do it this way. (8 main
folders hold approx 30-80 subfolders)

alx

Chris Wright

unread,
Feb 4, 2008, 10:53:13 AM2/4/08
to gsitec...@googlegroups.com
thealexangroup wrote:
> how do i set up GsiteCrawler to spider subfolders. i was able to do
> this but for some reason i now keep getting "error 403" if i have GCS
> crawl a specific folder with folders within it. Prior to this, i did
> not make any changes other than to add a robots.txt and uncheck
> "import known pages to google..." AND "delete all manual links".
>
Did you verify if directory indexing has been disabled?

> Now GCS works if i go to each individual subfolder (each with only
> files within) but it would take forever to do it this way. (8 main
> folders hold approx 30-80 subfolders)
>
>
If I am following your problem above, it is the manual entering of these
sub-folders that is taking the time.

So...

In your current links within your site structure, I am assuming you have
references to http://www.example.com/sub-folder/
rather than an absolute reference to
http://www.example.com/sub-folder/file1.php

Do you have an index.*** in each of the sub-folders?
(You can instruct your web server to first look for index.htm then
index.html then index.php and so on and so on)

If you don't have an index file in your sub-folder, and you tell GSC to
go look in that sub-folder, then your web server will return a 403 error
code.

IF, you have http://www.example.com/sub-folder/file1.php as your 'main
page' in that sub-folder, you can also instruct your web server to look
for that file INSTEAD OF or AS WELL AS the index.*** file.

But that is only any good if you always call it file.php in ALL your
sub-folders.

But, you could always just enable indexing for the whole domain for the
time being...

Because I've not seen your domain and your current setup, I'm still
having to guess at the problem.
I'd like to run GSC and Xenu from here if that was ok with you?

Regards

Chris


soji san

unread,
Feb 4, 2008, 11:45:59 AM2/4/08
to gsitec...@googlegroups.com
hey chris,
 
i was wondering if you were going to get this because i didn't send it as a reply to you.
anyway, sorry if i'm babbling all over the place. i should actually explain what i'm trying to do first of all.
 
i'm not using GSC to actually make a sitemap. i'm using it so i can make a list of particular folders in some of my urls, because it seems to be the fastest way to get all my urls onto one text list. (yahoo urllist) so the only filters  i used were to ban external google urls that were attached to some (most) of my file links AND i only wanted the php files.
 
incidentally, the folder/files i am trying to access are adsense article pages which will total roughly 30k file pages. i need each url listed.
 
okay, on to your questions:
 
is directory indexing "disabled" a default? (i don't actually know where that is)
 
-yes i reference it as follows (actual):
http://thealexangroup.com/amo4/     (now this USE TO work fine prior, i don't know what setting i touched or?, it just doesn't work this way anymore)
 
-will crawl if i go one folder down like so:
 
-all subfolders have one or more index.php page
 
so basically, you are saying if i "enable" directory indexing i will be able to crawl
from http://thealexangroup.com/amo4/  like before, even if amo4 doesn't have an index?
 
if so, i just need to know where "directory indexing" is located at.
 
thanx for your on going help, much appreciated,
 
alx
 
 
 
 
 
 
 
 


Chris Wright <chris.a...@gmail.com> wrote:
for visiting THE site the GURUS don't want you to know EXISTS...theVAULT
http://thealexangroup.com/presents/thevault8.html

Make $$$ Saving Families $$$ on products & services they regularly buy EVERY MONTH. The perfect residual commissions biz opportunity.
http://thealexangroup.myworlddiscounts.com

It's been called "MySpace on STEROIDS" Social Networking Site that PAYS its members. Bigger than YouTube? MySpace? check it out:
http://thealexangroup.com/promotes/friendswin.html

Explode Traffic to your site or blog with this FREE site script. Builds your customer lists in Triple Time!
http://thealexangroup.com/promotes/exit-explode-sq.html

webado

unread,
Feb 4, 2008, 10:18:18 PM2/4/08
to SOFTplus GSiteCrawler
You have to fix your server settings, sorry.



On Feb 4, 11:45 am, soji san <yourbi...@yahoo.com> wrote:
> hey chris,
>
>   i was wondering if you were going to get this because i didn't send it as a reply to you.
>   anyway, sorry if i'm babbling all over the place. i should actually explain what i'm trying to do first of all.
>
>   i'm not using GSC to actually make a sitemap. i'm using it so i can make a list of particular folders in some of my urls, because it seems to be the fastest way to get all my urls onto one text list. (yahoo urllist) so the only filters  i used were to ban external google urls that were attached to some (most) of my file links AND i only wanted the php files.
>
>   incidentally, the folder/files i am trying to access are adsense article pages which will total roughly 30k file pages. i need each url listed.
>
>   okay, on to your questions:
>
>   is directory indexing "disabled" a default? (i don't actually know where that is)
>
>   -yes i reference it as follows (actual):
>  http://thealexangroup.com/amo4/    (now this USE TO work fine prior, i don't know what setting i touched or?, it just doesn't work this way anymore)
>
>   -will crawl if i go one folder down like so:
>  http://thealexangroup.com/amo4/loans/
>
>   -all subfolders have one or more index.php page
>
>   so basically, you are saying if i "enable" directory indexing i will be able to crawl
>   fromhttp://thealexangroup.com/amo4/ like before, even if amo4 doesn't have an index?
>
>   if so, i just need to know where "directory indexing" is located at.
>
>   thanx for your on going help, much appreciated,
>
>   alx
>
> Chris Wright <chris.a.wri...@gmail.com> wrote:
> thealexangroup wrote:
> > how do i set up GsiteCrawler to spider subfolders. i was able to do
> > this but for some reason i now keep getting "error 403" if i have GCS
> > crawl a specific folder with folders within it. Prior to this, i did
> > not make any changes other than to add a robots.txt and uncheck
> > "import known pages to google..." AND "delete all manual links".
>
> Did you verify if directory indexing has been disabled?> Now GCS works if i go to each individual subfolder (each with only
> > files within) but it would take forever to do it this way. (8 main
> > folders hold approx 30-80 subfolders)
>
> If I am following your problem above, it is the manual entering of these
> sub-folders that is taking the time.
>
> So...
>
> In your current links within your site structure, I am assuming you have
> references tohttp://www.example.com/sub-folder/
> rather than an absolute reference tohttp://www.example.com/sub-folder/file1.php
>
> Do you have an index.*** in each of the sub-folders?
> (You can instruct your web server to first look for index.htm then
> index.html then index.php and so on and so on)
>
> If you don't have an index file in your sub-folder, and you tell GSC to
> go look in that sub-folder, then your web server will return a 403 error
> code.
>
> IF, you havehttp://www.example.com/sub-folder/file1.phpas your 'main
> page' in that sub-folder, you can also instruct your web server to look
> for that file INSTEAD OF or AS WELL AS the index.*** file.
>
> But that is only any good if you always call it file.php in ALL your
> sub-folders.
>
> But, you could always just enable indexing for the whole domain for the
> time being...
>
> Because I've not seen your domain and your current setup, I'm still
> having to guess at the problem.
> I'd like to run GSC and Xenu from here if that was ok with you?
>
> Regards
>
> Chris
>
> FREE Gift for visiting THE site the GURUS don't want you to know EXISTS...theVAULThttp://thealexangroup.com/presents/thevault8.html
>
> Make $$$ Saving Families $$$ on products & services they regularly buy EVERY MONTH. The perfect residual commissions biz opportunity.http://thealexangroup.myworlddiscounts.com
Reply all
Reply to author
Forward
0 new messages