Hi Karthick,
First of all I have to say I have no first hand knowledge of how to
manage such very large sites. The largest site I have has about 12000
urls, easily managed by GSC, though it takes 3 hours or so to recrawl.
GsiteCrawler has the option of making sitemap indexes for multiple
sitemaps.
For your site you need to produce multiple sitemaps with no more than
50000 urls (the default in GsiteCrawler is 40000), where none of the
sitemaps is over 10MB (when inflated), and each of the individual
sitemaps is gzipped. If it ends up larger than 10MB, then reduce the
number of urls in each individual sitemap. .
Then you connect all of them under a sitemap index, itself gzipped if
it's too large.
However keep in mind that recrawling such a huge site will be a very
long process. GsietCrawler does not have the option to manage
incremental re-crawls.
If you decide to separate your site into multiple subsites (maybe
based on subfolders), then you cna build a sitemap for each such
subsite, according to your own schedule. Again the same deal, each
sitemap can be a sitemap index with individual sitemaps gzipped.
I am not sure if you can wrap all the subsite sitemap indexes in
another sitemap index file to submit as one.
Also if the GSC Access database is too large you will need to install
the sql server version of the program, because an Access database
cannot grow beyond a certain point.
Another and probably more viable idea would be to use a server-side
sitemap generator (not available from GSiteCrawler) which can be
tailored to produce incremental sitemaps according to what is added
and what is changed. Depending on the software used to build your site
there may be tools, plugins to handle that.
Finally for a news-based website, perhaps a news sitemap is better
suited. See Google's help on how to build and submit news sitemaps.
Keep in mind that you cannot combine under a single sitemap index
different types of sitemaps (i.e. general sitemaps and news sitemap).
Keep them separate.
On Nov 10, 2:24 am, Karthick <sifychen...@gmail.com> wrote:
> Hi There
> I am karthick working for Sify Technologies India. I work for the
> domain sify.com it is one of the premier portal in India we have
> decided to create sitemap for our domain, We have many channels like
> sports, news, finance, movies. we have planned to create separate
> sitemap for all channels. The issue here is there will be thousands of
> the news coming in daily from various sources to our channels. we
> tried it in a test phase and the file size went up to 10 mb so can you
> please suggest a possible solutions for this.
> Waiting for your earliest of reply.
> Regards,
> Karthick