Google sitemap again

20 views
Skip to first unread message

kev

unread,
Oct 20, 2007, 2:49:12 PM10/20/07
to TiddlyWiki
I've been testing other wikis - Dokuwiki - http://dokuwiki.healthwealthandmusic.co.uk/
and Mediawiki - http://mediawiki.healthwealthandmusic.co.uk/ and they
are both very good. However it is always a relief to get back to
Tiddlywiki - is there anything easier or nicer to do productive work
with!?

However, it still has this one big flaw - no sitemap! And Google does
not index it at all well. I have posted before on this topic but it
seems to me to be of extreme importance. It is the vey last piece of
the jigsaw to make TW perfect.

I used Eric's scripts - http://groups.google.com/group/TiddlyWiki/browse_thread/thread/ed9772d8287f8a1c
and they are excellent and I pasted the results into my Joomla site
and they have been crawled by Google.

Google eventually crawled my TW but all it reads is Jeremy's intro! I
tried the index.html route with the output of url produced by Eric's
scripts but no luck. Dokuwiki produces a sitemap and gets indexed but
not TW.

TW is a perfect mini website but almost useless if it does not get
crawled and indexed. It produces quite nice Newsfeeds that output
here: http://dokuwiki.healthwealthandmusic.co.uk/doku.php/news:my_web_sites
under "Computer Information" but basically I'm using "back-door"
tricks. I know there is the SEO plugin but it kind of defeats the
purpose of a TW if I have to produce an entire mini website out of the
one TW.

Is there no solution is sight? TW produces an RSS feed which is very
similar to an xml sitemap. Is there not a way of converting the rss
feed into a sitemap or better still of using the same engine to
produce a sitemap? I don't know anything very much about programming
to do it. I've searched for a way of converting from an RSS xml to a
Google sitemap but have not found anything yet. It would be great if
this one last essential element can be an integral part of TW like the
RSS generator is.

Ken Girard

unread,
Oct 21, 2007, 4:36:29 PM10/21/07
to TiddlyWiki
Google reads far more then Jeremy's intro. For example do a google
search for:
'TiddlyWiki "Try clicking on various links and see what happens - you
cannot damage tiddlywiki.com" '
I get only 3 sites with that, and 2 of them are TWs. It is in a
tiddler titled "GettingStarted" on tiddlywiki.com, which means that
Google has to have read through 55 other tiddlers to get to that one.

Of course Google makes the link for this as http://tiddlywiki.com/ as
Google points at the page it is in, not understanding that there are
smaller parts that might mean something (OK, a comparison that came to
me: Tiddlers are to pages, as protons are to atoms).

But a Google site map generator plug in.... there is an idea. Eric
made a script that creates a link to every tiddler, and list it into
standard html. Seems like getting it to write it in xml and include
the tiddler content shouldn't be that hard of a switch.

Ken Girard

On Oct 20, 1:49 pm, kev <kj...@hotmail.com> wrote:
> I've been testing other wikis - Dokuwiki -http://dokuwiki.healthwealthandmusic.co.uk/

> and Mediawiki -http://mediawiki.healthwealthandmusic.co.uk/and they


> are both very good. However it is always a relief to get back to
> Tiddlywiki - is there anything easier or nicer to do productive work
> with!?
>
> However, it still has this one big flaw - no sitemap! And Google does
> not index it at all well. I have posted before on this topic but it
> seems to me to be of extreme importance. It is the vey last piece of
> the jigsaw to make TW perfect.
>

> I used Eric's scripts -http://groups.google.com/group/TiddlyWiki/browse_thread/thread/ed9772...

schilke

unread,
Oct 21, 2007, 5:41:13 PM10/21/07
to TiddlyWiki
sorry, Ken, but I still disagree strongly. Like I mentioned before
it's impossible for the spiders to read *any* javascript (even Google
would have problems to pay for the calculating power needed to crawl
through millions of sites and execute the javascript...).
Due to the fact that the output is *generated* by javascript and that
javascript code resides in script or CDATA tags or inside of HTML <!--
comments --> it's impossible to have much luck getting anything
indexed (other than by accident/malformed code). Even worser: if the
stuff *would* be indexed the visitors rarely would find what they
were searching for unless Google would index the content together with
the corresponding #tiddlerReference.

To bring it to the point: there's no useful outcome unless you use
some "cloaking method", e.g. save all tiddlers as HTML files and make
a redirect for javascript enabled browsers.

> Google points at the page it is in, not understanding that there are
> smaller parts that might mean something

a bit different: Google understands that you point to a small part of
a single page if it's pointed to e.g. http://tiddlywiki.com/#GettingStarted
- but: if you do that the spider doesn't find different content from
http://tiddlywiki.com/ :-|

--s

Saq Imtiaz

unread,
Oct 21, 2007, 6:26:24 PM10/21/07
to Tiddl...@googlegroups.com
Hey guys,

Loic and I actually did some experimenting with this just recently. I wrote a plugin that generates a static version of the TiddlyWiki (just the content dumped into an html file) and a sitemap that lists each tiddler in the TW. However, after about 4 weeks there is no significant difference in the indexing of the site by google.

I'll get that plugin online tomorrow in case anyone else wants to take a look and give it a shot... and I have some other thoughts on this as well, that I will try to share at that time.

Cheers,
Saq

) : a gallery of TiddlyWiki themes.
TiddlySnip ( http://tiddlysnip.com ) : a firefox extension that turns TiddlyWiki into a scrapbook!
LewcidTW ( http://tw.lewcid.org ) : a repository of extensions for TiddlyWiki

Ken Girard

unread,
Oct 21, 2007, 6:45:02 PM10/21/07
to TiddlyWiki
OK, try this rewrite of Eric's original code, but before you do read
my comments below the code after "and this at the bottom: </urlset>".

<script>
var out=""
var tids=store.getTiddlers("title","excludeLists");
for (var t=0; t<tids.length; t++) {
var url=store.getTiddlerText("SiteUrl");
if (!url) url=document.location.href;
var
permalink=encodeURIComponent(String.encodeTiddlyLink(tids[t].title));
out+="<url>\n<loc>"+url+"#"+permalink+"</loc>\n<priority>0.5</priority>
\n</url>\n";
}

return "{{{\n"+out+"\n}}}\n";
</script>

Install the InlineJavascriptPlugin
(http://www.TiddlyTools.com/#InlineJavascriptPlugin)
Put the above code in a tiddler (DO NOT tag it systemConfig)
Click on done and wait for all of your xml code to be made.
You might not see all of the code, but cut-n-paste all of it (In view
mode) into an xml page.

Put the following at the top of the page:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

and this at the bottom:
</urlset>

Not that I think this is going to help you, as the bot will follow the
link to the html page and read the entire page, not the singe div you
are pointing at and then create a link to everything in that page.
That right there is the main problem with this idea.

One of the problems I see for your site is that your pages are not
ranked by Google, Alexia or Complete so they get placed near the
bottom of any search, if it is shown at all. Does any other website
link to your pages?

Using TW is not the reason your page is not getting ranked. I say this
as tiddlywiki.com is ranked by Google as Page Rank 7, and both Alexia
& Complete rank it in the 200,000. One of my own TW pages is Page Rank
3, and in the 500,000 by Alexia & Complete.

Ken Girard
A man who has often experienced Frubble.

Ken Girard

unread,
Oct 21, 2007, 7:25:40 PM10/21/07
to TiddlyWiki
You are correct: Google does not read js. So for it to have given me
this for my search:
-----
TiddlyWiki - a reusable non-linear personal web notebook - 6 visits -
Oct 7

Try clicking on various links and see what happens - you cannot damage
tiddlywiki.com or your browser. (Use the <<closeAll>> button over on
the right to ...
www.tiddlywiki.com/ - 410k - Cached - Similar pages - Note this
----
then it most have read the storeArea (which is located at the bottom
of the page), and not bothered with the js. But it did start the
summery at the beginning of the <div>. But as all of the other search
engines are currently designed,

It seems to me that there are only two ways to get what you want:
1) Fool the search engines (a system that will work until they figure
out what is going on and then blackball your site. They did this to
one of the major car makers, so you can imagine what your chance is of
winning that battle). And they are already on to websites that have
tons of redirects pointing at them, as it is a common spammer tactic.
2) Convince search engines that TWs (And similar systems) should be
read in a different way then other pages.

Ken Girard

On Oct 21, 4:41 pm, schilke <googlegroups.tt.ch...@xoxy.net> wrote:
> sorry, Ken, but I still disagree strongly. Like I mentioned before
> it's impossible for the spiders to read *any* javascript (even Google
> would have problems to pay for the calculating power needed to crawl
> through millions of sites and execute the javascript...).
> Due to the fact that the output is *generated* by javascript and that
> javascript code resides in script or CDATA tags or inside of HTML <!--
> comments --> it's impossible to have much luck getting anything
> indexed (other than by accident/malformed code). Even worser: if the
> stuff *would* be indexed the visitors rarely would find what they
> were searching for unless Google would index the content together with
> the corresponding #tiddlerReference.
>
> To bring it to the point: there's no useful outcome unless you use
> some "cloaking method", e.g. save all tiddlers as HTML files and make
> a redirect for javascript enabled browsers.
>
> > Google points at the page it is in, not understanding that there are
> > smaller parts that might mean something
>
> a bit different: Google understands that you point to a small part of

> a single page if it's pointed to e.g.http://tiddlywiki.com/#GettingStarted

Xavier Vergés

unread,
Oct 21, 2007, 7:45:07 PM10/21/07
to TiddlyWiki
On Oct 21, 11:41 pm, schilke <googlegroups.tt.ch...@xoxy.net> wrote:
> it's impossible to have much luck getting anything
> indexed (other than by accident/malformed code).

1. Firefox.
2. Disable javascript (Tools->Options, Content tab)
3. go to tiddlywiki.com
4. View->Page Style->No style
5. You are looking now at google's view of tiddlywiki.com. Lots of
plain text to be indexed, but no links and no structure.

-Xv

schilke

unread,
Oct 22, 2007, 3:41:31 AM10/22/07
to TiddlyWiki
> then it most have read the storeArea (which is located at the bottom
> of the page), and not bothered with the js

there's an even easier testing case: the <noscript> tag should be read
by all user agents without javascript - around line 67 it says
"Microsoft Internet Explorer you may need to click on the yellow bar
above and select 'Allow Blocked Content'" - if you try to find that
with Google - it will return around 500 pages - so, like Xavier
mentions - the content will be indexed but the calculated relevance is
set to almost zero.
This behaviour relies on factors like e.g. "link popularity" (who
links to your site), "quality of information" (structure - e.g.
headings, paragraphs etc. plus quantity of text) and continuing/
further information (e.g. links)

> 1) Fool the search engines (a system that will work until they figure
> out what is going on and then blackball your site.

There's one important thing about using cloaking techniques with a
TiddlyWiki: the purpose is totally different from any spammer's
purpose - it could not considered "true fooling" as it will lead to
the promised information.
(BTW: that's what's also done by dynamically created content and/or
server side redirects...)
Additionally the javascript redirects could hardly be discovered
automatically. If a human investigator will see what those redirects
do - I personally doubt that he would treat it as spamming.
That's my true belief - but as you said (I did, too, some time ago):
there's no guarantee...

The idea of talking to Google for example seems to be not too bad.
That should be given a try as there might be also some suggestions
from the trouble-makers' side...

--s

kev

unread,
Oct 22, 2007, 11:15:01 AM10/22/07
to TiddlyWiki
Thanks for feedback. So it is a big problem, in fact the final problem
for TW. If only the Google programmers would treat TW files
differently! I'll make the request.

All I want to see is the content indexed. All my sites are indexed by
Google. If you do site:www.healthwealthandmusic.co.uk you can see.
That does not mean that my ranking is high or that I have a ranking
but that is another problem altogether. My subdomain is index - "Pages
from your site are included in Google's index." but the only content
found is site:wikis.healthwealthandmusic.co.uk - Welcome to TiddlyWiki
by Jeremy Ruston, Copyright © 2006 Osmosoft Limited. This page
requires JavaScript to function properly .... Can't imagine how that
particular message is the only scanned content! Also if you click on
"cached" the page gets stuck with the Splashcreen - could that also be
a block to successful indexing? I'll try disabling it.

=====================

Eric produced an excellent script that creates an index of all
tiddlers which as I said have been read by Google but via a page in my
Joomla site NOT from the index.html I created in the subdomain folder
where my TW files are located. I hoped Google would just read the list
of permalinks but it didn't. Perhaps google thinks it's one of those
false indexes.

What about my main inquiry? If it is possible to produce such a good
rss file which is an xml file, is it not possible to make TW produce a
sitemap xml because an rss output is basically the same thing - a list
of pages, except that the container tags are different. In fact I'm
surprised I can't find a conversion tool already out there in
cyberspace!
========================
If I can't find a way of indexing my TW files by hook or by crook then
I can see I'll have to move the content into Dokuwiki to get it
indexed and keep TW only for personal notes which would be a complete
waste of the program's potential. There simply isn't another program
like it that I have found so far.

Xavier Vergés

unread,
Oct 22, 2007, 11:56:48 AM10/22/07
to TiddlyWiki
> My subdomain is index - "Pages
> from your site are included in Google's index." but the only content
> found is site:wikis.healthwealthandmusic.co.uk - Welcome to TiddlyWiki
> by Jeremy Ruston, Copyright © 2006 Osmosoft Limited. This page
> requires JavaScript to function properly .... Can't imagine how that
> particular message is the only scanned content!
It is not:
http://www.google.com/search?hl=en&q=site%3Awww.healthwealthandmusic.co.uk%2Fwikis+heavens

With the google webmaster tools (that I have never used), you can
learn when your site was last indexed.

>
> What about my main inquiry? If it is possible to produce such a good
> rss file which is an xml file, is it not possible to make TW produce a
> sitemap xml because an rss output is basically the same thing - a list
> of pages, except that the container tags are different.

You can use atom or rss 2.0 as a sitemap.
http://www.google.com/support/webmasters/bin/answer.py?answer=34606

However, this would be pointless: you would only be adding relative
links to your TW, that google would probably ignore.

-Xv

kev

unread,
Oct 22, 2007, 2:27:00 PM10/22/07
to TiddlyWiki
Interesting! That is not what the Google sitempap index report says.
So who says Google is perfect? In fact it is becoming an obstacle to
fair play. If you do site:wikis.healthwealthandmusic.co.uk you get
the result indicated. Notice that hitting "cached" the site does not
load because of the Splashscreen. I very much hope that removing the
Splashscreen allows Google to go deeper. In which case the
Splashscreen needs altering because it too is important - who is going
to hang around while a site take so long to load.

I'll experiment with the other methods in your link and report back.
Even so it would still be better if TW produced a full sitemap. As
indicated by Googleinfo, a newsfeed only produces the most recent
documents (unless there is a way to output ALL the tiddlers for the
sole purpose of using this alternative method. I did try a newfeed as
an experiment but the sitemaper rejected my versions - but probably I
did not use rss2. Here we go again! This is what I mean - it could be
weeks before this stupid Google indexes the new files. Much simpler
using ONE regular reliable method. Hope there is still a solution.

On Oct 22, 4:56 pm, Xavier Vergés <xver...@gmail.com> wrote:
> > My subdomain is index - "Pages
> > from your site are included in Google's index." but the only content
> > found is site:wikis.healthwealthandmusic.co.uk - Welcome to TiddlyWiki
> > by Jeremy Ruston, Copyright © 2006 Osmosoft Limited. This page
> > requires JavaScript to function properly .... Can't imagine how that
> > particular message is the only scanned content!
>

> It is not:http://www.google.com/search?hl=en&q=site%3Awww.healthwealthandmusic....


>
> With the google webmaster tools (that I have never used), you can
> learn when your site was last indexed.
>
>
>
> > What about my main inquiry? If it is possible to produce such a good
> > rss file which is an xml file, is it not possible to make TW produce a
> > sitemap xml because an rss output is basically the same thing - a list
> > of pages, except that the container tags are different.
>

> You can use atom or rss 2.0 as a sitemap.http://www.google.com/support/webmasters/bin/answer.py?answer=34606

kev

unread,
Oct 22, 2007, 2:33:48 PM10/22/07
to TiddlyWiki
Also, I forgot to add that whatever works there is still this problem
of what happens when you gave more than 1 TW file? I would have to
create extra subdomains. Ideally, Google should simply index TWs
differently! Some hope but you never know - possibly Jeremy might have
more influence since he created the whole thing and programmers are
more likely to listen to him?

Daniel Baird

unread,
Oct 22, 2007, 7:17:19 PM10/22/07
to Tiddl...@googlegroups.com
On 23/10/2007, kev <kj...@hotmail.com> wrote:
>
> Notice that hitting "cached" the site does not
> load because of the Splashscreen. I very much hope that removing the
> Splashscreen allows Google to go deeper. In which case the
> Splashscreen needs altering because it too is important - who is going
> to hang around while a site take so long to load.

I'm pretty sure google doesn't see the splash screen. do you see it
when javascript is disabled?

;Daniel

--
Daniel Baird
"In teh beginnin Invisible Man was invisible, and he maded the skiez
and da earths, but he did not eated it." -- Genesis 1:1,
lolcatbible.com

Ken Girard

unread,
Oct 22, 2007, 9:13:14 PM10/22/07
to TiddlyWiki
I am going to make the guess that the Google cache never gets past the
splash screen is that you are using PrinceTW. Google seems to renders
internal js, but doesn't enable external.

Ken Girard

schilke

unread,
Oct 23, 2007, 7:05:08 AM10/23/07
to TiddlyWiki
> splash screen

don't mix things up: Google's spiders won't even know that there's
such a thing like a splash screen - but: if the page is cached, the
whole source code will be - including javascript. If you've got
problems to load the page it might result of some external calls for
example (you could verify that if you have a look at the document
source)

> > What about my main inquiry? If it is possible to produce such a good
> > rss file which is an xml file, is it not possible to make TW produce a
> > sitemap xml because an rss output is basically the same thing - a list
> > of pages, except that the container tags are different.
>
> You can use atom or rss 2.0 as a sitemap.
>

> However, this would be pointless: you would only be adding relative
> links to your TW, that google would probably ignore

the funny thing is: yes, Xavier is right, you still point to a single
file with those links - otherwise it would be much easier to get TW's
indexed because Google already reads, scans, indexes and caches XML
files :-/
What leads to cloaking again: I mentioned it some time ago that it's
probably a way to go, to hack the RSS generation to point to static
HTML files.
If you have a full blown web server (i.e. full control to your virtual
web) it should be possible to do a sneaky redirect by mod_rewrite. It
even would not be necessary to generate those static files, then
(having said <loudthinking>that tiddlyspot would probably be a place
to test and automate such a thing?</loudthinking>).

--s

ldachary

unread,
Oct 23, 2007, 7:26:56 AM10/23/07
to TiddlyWiki

Saq Imtiaz

unread,
Oct 23, 2007, 7:34:06 AM10/23/07
to Tiddl...@googlegroups.com
Of course now in hindsight it makes perfect sense that Google would treat all the url's in the sitemap as being the same file as the only difference is the hash....

What if the static content was baked into the TiddlyWiki file? File size is obviously a concern, but any thoughts on whether that would help?

Saq
--
TiddlyThemes.com ( http://tiddlythemes.com ) : a gallery of TiddlyWiki themes.

Ken Girard

unread,
Oct 23, 2007, 8:46:37 AM10/23/07
to TiddlyWiki
Schilke,
I think we are trying to say the exact same thing with small
differences of the definition of words. I have never thought that any
bot was reading the js, but it does read the entire storeArea, as well
as the shadow areas and indexes it. I am not saying that it indexes it
in the format of "This is a tiddler", "That is a tiddler" but in the
form of "This page contains the following content" just like every
other html file on the net. As far as I know, that is what every
search engine means by 'indexing' a page.

Kev stated that "Google eventually crawled my TW but all it reads is
Jeremy's intro!", and I've been trying to point out that if it got
that far, then it read the entire page and indexed it.... in the
normal Google fashion. "This page contains the following content".

Till a search engine starts using a bot that can understand that File
A is a TW, and reads it that way, then I see no hope of getting each
tiddler to be treated as separate content. And even then there will be
issues as I've seen people putting <html><div> words </div></html> in
tiddlers, which seems like a bot trying to make a link to each div
point at this non-tiddler div. (OK maybe if the bot looked for <div
title="Words">.... Idea for looking at later).

Ken Girard

On Oct 21, 4:41 pm, schilke <googlegroups.tt.ch...@xoxy.net> wrote:
> sorry, Ken, but I still disagree strongly. Like I mentioned before
> it's impossible for the spiders to read *any* javascript (even Google
> would have problems to pay for the calculating power needed to crawl
> through millions of sites and execute the javascript...).
> Due to the fact that the output is *generated* by javascript and that
> javascript code resides in script or CDATA tags or inside of HTML <!--
> comments --> it's impossible to have much luck getting anything
> indexed (other than by accident/malformed code). Even worser: if the
> stuff *would* be indexed the visitors rarely would find what they
> were searching for unless Google would index the content together with
> the corresponding #tiddlerReference.
>
> To bring it to the point: there's no useful outcome unless you use
> some "cloaking method", e.g. save all tiddlers as HTML files and make
> a redirect for javascript enabled browsers.
>
> > Google points at the page it is in, not understanding that there are
> > smaller parts that might mean something
>
> a bit different: Google understands that you point to a small part of

> a single page if it's pointed to e.g.http://tiddlywiki.com/#GettingStarted

schilke

unread,
Oct 23, 2007, 10:56:45 AM10/23/07
to TiddlyWiki
> but it does read the entire storeArea, as well
> as the shadow areas and indexes it. I am not saying that it indexes it
> in the format of "This is a tiddler", "That is a tiddler" but in the
> form of "This page contains the following content" just like every
> other html file on the net. As far as I know, that is what every
> search engine means by 'indexing' a page.

I can't find any of those results - do you have some examples? As I
said many times before: all my (raw & quick) tests result in a "no
content indexed" down the line so far - I've never seen a useful piece
of content of all those thousands of TW's on the net.
I would like someone to tell me that's bullshit - but that's my
experience :-/

Furthermore I noticed some problems with requests of bots like search
engine spiders - unfortunately I had not any time to investigate that
further but I suspect either the <noscript> tag, the "xhtml strict"
declaration and/or TW's custom tags to be responsible.

> issues as I've seen people putting <html><div> words </div></html> in
> tiddlers, which seems like a bot trying to make a link to each div
> point at this non-tiddler div.

normally this will result in

&lt;html&gt;&lt;div&gt; words &lt;/div&gt;&lt;html&gt;

Which won't do anything with bots ;)

Personally I don't have any plans to use a TiddlyWiki for web
publishing - as that's not the purpose I like it for, but it's totally
clear to me that there are other circumstances and tasks where this
might be useful - so I am also interested to improve TW's lack of SEO-
friendlyness.

--s

Xavier Vergés

unread,
Oct 23, 2007, 11:29:01 AM10/23/07
to TiddlyWiki
> I can't find any of those results - do you have some examples?

http://www.google.com/search?hl=en&q=site%3Atiddlyspot.com+bananas&btnG=Search

-Xv

Xavier Vergés

unread,
Oct 23, 2007, 11:40:52 AM10/23/07
to TiddlyWiki
On Oct 23, 1:34 pm, "Saq Imtiaz" <lew...@gmail.com> wrote:
> Of course now in hindsight it makes perfect sense that Google would treat
> all the url's in the sitemap as being the same file as the only difference
> is the hash....
If instead of using a # we were using a ?query string, google would
change its mind and treat them as different pages.

On a related noted: probably we want to have a plugin that checks the
referrer and, if it can identify it as a search, searches within the
TW on opening it; but maybe this is too early for a pugin?

-Xavier

kev

unread,
Oct 23, 2007, 12:29:51 PM10/23/07
to TiddlyWiki
If you look at my TW websites http://www.healthwealthandmusic.co.uk/wikis/worldwideweb.html
and http://www.healthwealthandmusic.co.uk/wikis/homepage.html and
those already on the web I think you can see how great a TW is for web
publishing. Obviously for full blown interactive sites I use Joomla or
something like it but if I want to get any work done fast and uploaded
in seconds then there simply isn't anything better in terms of
portability and ease of use (once past the learning curve). If there
are not too many TW websites out there possibly it is because of the
Google problem. How easy and useful it would be to publish information
on for example a novel that students are studying (my field) plus
comments and questions etc etc. So easy to update and publish - but
not if it does not get indexed.
=====
I made the request to treat TW differently here:
http://groups.google.com/group/Google_Webmaster_Help-Requests/browse_thread/thread/c1c535ac3c6e5e2d
and got one silly reply which I'm not going to bother responding to.
=====
As a simple minded "end-user" of TW! I unfortunately can't comment on
the programming side. I can only report my experience. I have removed
the splashscreen because my Webmaster site cached TW link gets stuck
at the opening screen so perhaps Google will now report more after
all. I notice that the quickest indexing (a week) was done on my
Mediawiki site (I'm testing a lot!).
=====
Any knowledge as to how the rss output could be adapted, extended and
integrated or perhaps Eric's code that produces all the tiddler links?
I will also try submittting the links in the text only sitemap format
mentioned above when I know Google has read the sites minus the
splashscreen.

Eric Shulman

unread,
Oct 23, 2007, 1:39:57 PM10/23/07
to TiddlyWiki
> I made the request to treat TW differently here:http://groups.google.com/group/Google_Webmaster_Help-Requests/browse_...

> and got one silly reply which I'm not going to bother responding to.

Wow! That "webado" person really is a jerk! The ignorant, dismissive
tone of her non-answers just serves to emphasize how well-behaved and
helpful people are here in the TiddlyWiki google groups.

Oh, and by the way, for a really belly-laugh, check out this part of
her profile:
------------------
I am a late-comer to the world of web design, website building and to
the internet in general. A long-standing programmer-analyst by trade,
I believe I understand computers and software and the software
creators sufficiently well to be able to troubleshoot issues of
interest to webmasters and offer solutions.
------------------

yeah... right. (pffft! indeed)

I'm much too annoyed at her stupid comments to reply just yet... or at
all... but if someone else wants to try to educate her about
TiddlyWiki and the use of intensive client-side javascript for
"Web2.0" applications, that would be helpful...

-e

FND

unread,
Oct 23, 2007, 2:42:22 PM10/23/07
to Tiddl...@googlegroups.com
> I'm much too annoyed at her stupid comments to reply just yet... or at
> all... but if someone else wants to try to educate her about
> TiddlyWiki and the use of intensive client-side javascript for
> "Web2.0" applications, that would be helpful...

Actually, I think this person is little more than a troll.
I would recommend to just ignore her ramblings; Don't Feed the Troll.

Ken Girard

unread,
Oct 23, 2007, 5:44:54 PM10/23/07
to TiddlyWiki
Do a google search for:

Tiddlywiki "Try clicking on various links and see what happens - you
cannot damage tiddlywiki.com"

I get 3 sites with that:
swik.net/tiddlywiki+instructions
www.tiddlywiki.com/
tiddlyspot.com/twhelp/

Here is the summery of that search for TiddlyWiki.com:
----
TiddlyWiki - a reusable non-linear personal web notebook - 7 visits -
3:56am


Try clicking on various links and see what happens - you cannot damage

tiddlywiki.com or your browser. (Use the <<closeAll>> button over on
the right to ...

www.tiddlywiki.com/ - 396k - Cached - Similar pages - Note this
----

Mind you that in no way are any of us tring to say that these links
take you to the correct tiddler. But they do show that the bot reads
the page, even the js. It does not run the js, but you can do a search
for it and find it.

Try to search for this:
w.subWikify(createTiddlyElement(w.output,this.element),this.terminator)

I got 160 hits for it. The first one is a TW that got messed up so the
code is visible, the next 3 were normal TWs, I didn't look any
further.

I get hits each day from people using search engines. They do not find
me from reading Jeremy's intro (No matter how nicely he wrote it).

Ken Girard

schilke

unread,
Oct 24, 2007, 5:34:18 AM10/24/07
to TiddlyWiki
you're right - I digged a bit deeper and at least at tiddlywiki.com I
got one result for each keyword I searched on...

Anyway - that might help but does not solve the problem: the visitor
is not able to find the proper location/tiddler by simply hit the
resulting link.

Compared with other sites on the same page rank level (Google 7, Alexa
209392, Compete 243749 for TiddlyWiki.com) it's simply a bunch of bull
that you get one (1) result if you do a search for keyword
"tiddlywiki" although there are 545 occurances of the word.
Measurements for keyword density of "tiddlywiki":
Location/matches/total words/percentage
Title: 1: 8: 12.5%
Body: 516: 48206: 1.1%
Links: 27: 371: 7.3%

Due to the fact that a lot of people already use a TiddlyWiki for web
publishing this seems to be more than a minor problem - I will start a
new thread in the dev group later today (anyone feel free to do so if
you have more time...).

I'll also try to dig the three groups for all related threads over the
years...

> But they do show that the bot reads
> the page, even the js. It does not run the js, but you can do a search
> for it and find it.
>
> Try to search for this:
> w.subWikify(createTiddlyElement(w.output,this.element),this.terminator)
>
> I got 160 hits for it.

Although I've already been disabused regarding the indexed content, I
strongly doubt the spiders to read javascript (besides it's damaged
code) and my results look a bit different: I just get 56 with Google
and most of the pages either aren't TW's, it's displayed code by
intention or stuff from trac.tiddlywiki.org
(even if you put the search term in quotes Google treats some of the
chars as search syntax rather than what they are - e.g. brackets and
periods - which leads to a bunch of results whith partly matches
only...)

--s

schilke

unread,
Oct 24, 2007, 5:59:28 AM10/24/07
to TiddlyWiki
> Due to the fact that a lot of people already use a TiddlyWiki for web
> publishing this seems to be more than a minor problem - I will start a
> new thread in the dev group later today (anyone feel free to do so if
> you have more time...).

Done.
So it might be a good idea to go on there: "How to solve the
unsatisfying SEO problem with TiddlyWikis used for web publishing"
http://groups.google.com/group/TiddlyWikiDev/browse_frm/thread/d14ec01af14e4f8f

--s

kev

unread,
Oct 24, 2007, 9:05:15 AM10/24/07
to TiddlyWiki
Yeah, ignore the Trolls. Too silly.

Any answer from the experts on the main question:

"Any knowledge as to how the rss output could be adapted, extended and
integrated or perhaps Eric's code that produces all the tiddler links?
"

On Oct 23, 10:44 pm, Ken Girard <ken.gir...@gmail.com> wrote:
> Do a google search for:
>
> Tiddlywiki "Try clicking on various links and see what happens - you
> cannot damage tiddlywiki.com"
>
> I get 3 sites with that:
> swik.net/tiddlywiki+instructionswww.tiddlywiki.com/
> tiddlyspot.com/twhelp/
>
> Here is the summery of that search for TiddlyWiki.com:
> ----
> TiddlyWiki - a reusable non-linear personal web notebook - 7 visits -
> 3:56am
> Try clicking on various links and see what happens - you cannot damage
> tiddlywiki.com or your browser. (Use the <<closeAll>> button over on

> the right to ...www.tiddlywiki.com/- 396k - Cached - Similar pages - Note this

Samuel Reynolds

unread,
Dec 13, 2007, 3:12:45 PM12/13/07
to Tiddl...@googlegroups.com
At 04:25 PM 10/21/2007, you wrote:
>It seems to me that there are only two ways to get what you want:
>1) Fool the search engines (a system that will work until they figure
>out what is going on and then blackball your site. They did this to
>one of the major car makers, so you can imagine what your chance is of
>winning that battle). And they are already on to websites that have
>tons of redirects pointing at them, as it is a common spammer tactic.
>2) Convince search engines that TWs (And similar systems) should be
>read in a different way then other pages.

3) Use Google's alternate submission mechanism: Give it an XML
document that indicates, for a given URL, what content to index.
I don't recall the details, but info should be available via
(and on!) Google.

- Sam


Reply all
Reply to author
Forward
0 new messages