However, it still has this one big flaw - no sitemap! And Google does
not index it at all well. I have posted before on this topic but it
seems to me to be of extreme importance. It is the vey last piece of
the jigsaw to make TW perfect.
I used Eric's scripts - http://groups.google.com/group/TiddlyWiki/browse_thread/thread/ed9772d8287f8a1c
and they are excellent and I pasted the results into my Joomla site
and they have been crawled by Google.
Google eventually crawled my TW but all it reads is Jeremy's intro! I
tried the index.html route with the output of url produced by Eric's
scripts but no luck. Dokuwiki produces a sitemap and gets indexed but
not TW.
TW is a perfect mini website but almost useless if it does not get
crawled and indexed. It produces quite nice Newsfeeds that output
here: http://dokuwiki.healthwealthandmusic.co.uk/doku.php/news:my_web_sites
under "Computer Information" but basically I'm using "back-door"
tricks. I know there is the SEO plugin but it kind of defeats the
purpose of a TW if I have to produce an entire mini website out of the
one TW.
Is there no solution is sight? TW produces an RSS feed which is very
similar to an xml sitemap. Is there not a way of converting the rss
feed into a sitemap or better still of using the same engine to
produce a sitemap? I don't know anything very much about programming
to do it. I've searched for a way of converting from an RSS xml to a
Google sitemap but have not found anything yet. It would be great if
this one last essential element can be an integral part of TW like the
RSS generator is.
Of course Google makes the link for this as http://tiddlywiki.com/ as
Google points at the page it is in, not understanding that there are
smaller parts that might mean something (OK, a comparison that came to
me: Tiddlers are to pages, as protons are to atoms).
But a Google site map generator plug in.... there is an idea. Eric
made a script that creates a link to every tiddler, and list it into
standard html. Seems like getting it to write it in xml and include
the tiddler content shouldn't be that hard of a switch.
Ken Girard
On Oct 20, 1:49 pm, kev <kj...@hotmail.com> wrote:
> I've been testing other wikis - Dokuwiki -http://dokuwiki.healthwealthandmusic.co.uk/
> and Mediawiki -http://mediawiki.healthwealthandmusic.co.uk/and they
> are both very good. However it is always a relief to get back to
> Tiddlywiki - is there anything easier or nicer to do productive work
> with!?
>
> However, it still has this one big flaw - no sitemap! And Google does
> not index it at all well. I have posted before on this topic but it
> seems to me to be of extreme importance. It is the vey last piece of
> the jigsaw to make TW perfect.
>
> I used Eric's scripts -http://groups.google.com/group/TiddlyWiki/browse_thread/thread/ed9772...
To bring it to the point: there's no useful outcome unless you use
some "cloaking method", e.g. save all tiddlers as HTML files and make
a redirect for javascript enabled browsers.
> Google points at the page it is in, not understanding that there are
> smaller parts that might mean something
a bit different: Google understands that you point to a small part of
a single page if it's pointed to e.g. http://tiddlywiki.com/#GettingStarted
- but: if you do that the spider doesn't find different content from
http://tiddlywiki.com/ :-|
--s
) : a gallery of TiddlyWiki themes.
TiddlySnip ( http://tiddlysnip.com ) : a firefox extension that turns TiddlyWiki into a scrapbook!
LewcidTW ( http://tw.lewcid.org ) : a repository of extensions for TiddlyWiki
<script>
var out=""
var tids=store.getTiddlers("title","excludeLists");
for (var t=0; t<tids.length; t++) {
var url=store.getTiddlerText("SiteUrl");
if (!url) url=document.location.href;
var
permalink=encodeURIComponent(String.encodeTiddlyLink(tids[t].title));
out+="<url>\n<loc>"+url+"#"+permalink+"</loc>\n<priority>0.5</priority>
\n</url>\n";
}
return "{{{\n"+out+"\n}}}\n";
</script>
Install the InlineJavascriptPlugin
(http://www.TiddlyTools.com/#InlineJavascriptPlugin)
Put the above code in a tiddler (DO NOT tag it systemConfig)
Click on done and wait for all of your xml code to be made.
You might not see all of the code, but cut-n-paste all of it (In view
mode) into an xml page.
Put the following at the top of the page:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
and this at the bottom:
</urlset>
Not that I think this is going to help you, as the bot will follow the
link to the html page and read the entire page, not the singe div you
are pointing at and then create a link to everything in that page.
That right there is the main problem with this idea.
One of the problems I see for your site is that your pages are not
ranked by Google, Alexia or Complete so they get placed near the
bottom of any search, if it is shown at all. Does any other website
link to your pages?
Using TW is not the reason your page is not getting ranked. I say this
as tiddlywiki.com is ranked by Google as Page Rank 7, and both Alexia
& Complete rank it in the 200,000. One of my own TW pages is Page Rank
3, and in the 500,000 by Alexia & Complete.
Ken Girard
A man who has often experienced Frubble.
It seems to me that there are only two ways to get what you want:
1) Fool the search engines (a system that will work until they figure
out what is going on and then blackball your site. They did this to
one of the major car makers, so you can imagine what your chance is of
winning that battle). And they are already on to websites that have
tons of redirects pointing at them, as it is a common spammer tactic.
2) Convince search engines that TWs (And similar systems) should be
read in a different way then other pages.
Ken Girard
On Oct 21, 4:41 pm, schilke <googlegroups.tt.ch...@xoxy.net> wrote:
> sorry, Ken, but I still disagree strongly. Like I mentioned before
> it's impossible for the spiders to read *any* javascript (even Google
> would have problems to pay for the calculating power needed to crawl
> through millions of sites and execute the javascript...).
> Due to the fact that the output is *generated* by javascript and that
> javascript code resides in script or CDATA tags or inside of HTML <!--
> comments --> it's impossible to have much luck getting anything
> indexed (other than by accident/malformed code). Even worser: if the
> stuff *would* be indexed the visitors rarely would find what they
> were searching for unless Google would index the content together with
> the corresponding #tiddlerReference.
>
> To bring it to the point: there's no useful outcome unless you use
> some "cloaking method", e.g. save all tiddlers as HTML files and make
> a redirect for javascript enabled browsers.
>
> > Google points at the page it is in, not understanding that there are
> > smaller parts that might mean something
>
> a bit different: Google understands that you point to a small part of
> a single page if it's pointed to e.g.http://tiddlywiki.com/#GettingStarted
1. Firefox.
2. Disable javascript (Tools->Options, Content tab)
3. go to tiddlywiki.com
4. View->Page Style->No style
5. You are looking now at google's view of tiddlywiki.com. Lots of
plain text to be indexed, but no links and no structure.
-Xv
there's an even easier testing case: the <noscript> tag should be read
by all user agents without javascript - around line 67 it says
"Microsoft Internet Explorer you may need to click on the yellow bar
above and select 'Allow Blocked Content'" - if you try to find that
with Google - it will return around 500 pages - so, like Xavier
mentions - the content will be indexed but the calculated relevance is
set to almost zero.
This behaviour relies on factors like e.g. "link popularity" (who
links to your site), "quality of information" (structure - e.g.
headings, paragraphs etc. plus quantity of text) and continuing/
further information (e.g. links)
> 1) Fool the search engines (a system that will work until they figure
> out what is going on and then blackball your site.
There's one important thing about using cloaking techniques with a
TiddlyWiki: the purpose is totally different from any spammer's
purpose - it could not considered "true fooling" as it will lead to
the promised information.
(BTW: that's what's also done by dynamically created content and/or
server side redirects...)
Additionally the javascript redirects could hardly be discovered
automatically. If a human investigator will see what those redirects
do - I personally doubt that he would treat it as spamming.
That's my true belief - but as you said (I did, too, some time ago):
there's no guarantee...
The idea of talking to Google for example seems to be not too bad.
That should be given a try as there might be also some suggestions
from the trouble-makers' side...
--s
All I want to see is the content indexed. All my sites are indexed by
Google. If you do site:www.healthwealthandmusic.co.uk you can see.
That does not mean that my ranking is high or that I have a ranking
but that is another problem altogether. My subdomain is index - "Pages
from your site are included in Google's index." but the only content
found is site:wikis.healthwealthandmusic.co.uk - Welcome to TiddlyWiki
by Jeremy Ruston, Copyright © 2006 Osmosoft Limited. This page
requires JavaScript to function properly .... Can't imagine how that
particular message is the only scanned content! Also if you click on
"cached" the page gets stuck with the Splashcreen - could that also be
a block to successful indexing? I'll try disabling it.
=====================
Eric produced an excellent script that creates an index of all
tiddlers which as I said have been read by Google but via a page in my
Joomla site NOT from the index.html I created in the subdomain folder
where my TW files are located. I hoped Google would just read the list
of permalinks but it didn't. Perhaps google thinks it's one of those
false indexes.
What about my main inquiry? If it is possible to produce such a good
rss file which is an xml file, is it not possible to make TW produce a
sitemap xml because an rss output is basically the same thing - a list
of pages, except that the container tags are different. In fact I'm
surprised I can't find a conversion tool already out there in
cyberspace!
========================
If I can't find a way of indexing my TW files by hook or by crook then
I can see I'll have to move the content into Dokuwiki to get it
indexed and keep TW only for personal notes which would be a complete
waste of the program's potential. There simply isn't another program
like it that I have found so far.
With the google webmaster tools (that I have never used), you can
learn when your site was last indexed.
>
> What about my main inquiry? If it is possible to produce such a good
> rss file which is an xml file, is it not possible to make TW produce a
> sitemap xml because an rss output is basically the same thing - a list
> of pages, except that the container tags are different.
You can use atom or rss 2.0 as a sitemap.
http://www.google.com/support/webmasters/bin/answer.py?answer=34606
However, this would be pointless: you would only be adding relative
links to your TW, that google would probably ignore.
-Xv
I'll experiment with the other methods in your link and report back.
Even so it would still be better if TW produced a full sitemap. As
indicated by Googleinfo, a newsfeed only produces the most recent
documents (unless there is a way to output ALL the tiddlers for the
sole purpose of using this alternative method. I did try a newfeed as
an experiment but the sitemaper rejected my versions - but probably I
did not use rss2. Here we go again! This is what I mean - it could be
weeks before this stupid Google indexes the new files. Much simpler
using ONE regular reliable method. Hope there is still a solution.
On Oct 22, 4:56 pm, Xavier Vergés <xver...@gmail.com> wrote:
> > My subdomain is index - "Pages
> > from your site are included in Google's index." but the only content
> > found is site:wikis.healthwealthandmusic.co.uk - Welcome to TiddlyWiki
> > by Jeremy Ruston, Copyright © 2006 Osmosoft Limited. This page
> > requires JavaScript to function properly .... Can't imagine how that
> > particular message is the only scanned content!
>
> It is not:http://www.google.com/search?hl=en&q=site%3Awww.healthwealthandmusic....
>
> With the google webmaster tools (that I have never used), you can
> learn when your site was last indexed.
>
>
>
> > What about my main inquiry? If it is possible to produce such a good
> > rss file which is an xml file, is it not possible to make TW produce a
> > sitemap xml because an rss output is basically the same thing - a list
> > of pages, except that the container tags are different.
>
> You can use atom or rss 2.0 as a sitemap.http://www.google.com/support/webmasters/bin/answer.py?answer=34606
I'm pretty sure google doesn't see the splash screen. do you see it
when javascript is disabled?
;Daniel
--
Daniel Baird
"In teh beginnin Invisible Man was invisible, and he maded the skiez
and da earths, but he did not eated it." -- Genesis 1:1,
lolcatbible.com
Ken Girard
don't mix things up: Google's spiders won't even know that there's
such a thing like a splash screen - but: if the page is cached, the
whole source code will be - including javascript. If you've got
problems to load the page it might result of some external calls for
example (you could verify that if you have a look at the document
source)
> > What about my main inquiry? If it is possible to produce such a good
> > rss file which is an xml file, is it not possible to make TW produce a
> > sitemap xml because an rss output is basically the same thing - a list
> > of pages, except that the container tags are different.
>
> You can use atom or rss 2.0 as a sitemap.
>
> However, this would be pointless: you would only be adding relative
> links to your TW, that google would probably ignore
the funny thing is: yes, Xavier is right, you still point to a single
file with those links - otherwise it would be much easier to get TW's
indexed because Google already reads, scans, indexes and caches XML
files :-/
What leads to cloaking again: I mentioned it some time ago that it's
probably a way to go, to hack the RSS generation to point to static
HTML files.
If you have a full blown web server (i.e. full control to your virtual
web) it should be possible to do a sneaky redirect by mod_rewrite. It
even would not be necessary to generate those static files, then
(having said <loudthinking>that tiddlyspot would probably be a place
to test and automate such a thing?</loudthinking>).
--s
http://garden.dachary.org/
http://garden.dachary.org/#TiddlyStaticPlugin
http://garden.dachary.org/#TiddlyStaticDavHooksPlugin
and the generated sitemap at
http://garden.dachary.org/sitemap.xml
Enjoy
Kev stated that "Google eventually crawled my TW but all it reads is
Jeremy's intro!", and I've been trying to point out that if it got
that far, then it read the entire page and indexed it.... in the
normal Google fashion. "This page contains the following content".
Till a search engine starts using a bot that can understand that File
A is a TW, and reads it that way, then I see no hope of getting each
tiddler to be treated as separate content. And even then there will be
issues as I've seen people putting <html><div> words </div></html> in
tiddlers, which seems like a bot trying to make a link to each div
point at this non-tiddler div. (OK maybe if the bot looked for <div
title="Words">.... Idea for looking at later).
Ken Girard
On Oct 21, 4:41 pm, schilke <googlegroups.tt.ch...@xoxy.net> wrote:
> sorry, Ken, but I still disagree strongly. Like I mentioned before
> it's impossible for the spiders to read *any* javascript (even Google
> would have problems to pay for the calculating power needed to crawl
> through millions of sites and execute the javascript...).
> Due to the fact that the output is *generated* by javascript and that
> javascript code resides in script or CDATA tags or inside of HTML <!--
> comments --> it's impossible to have much luck getting anything
> indexed (other than by accident/malformed code). Even worser: if the
> stuff *would* be indexed the visitors rarely would find what they
> were searching for unless Google would index the content together with
> the corresponding #tiddlerReference.
>
> To bring it to the point: there's no useful outcome unless you use
> some "cloaking method", e.g. save all tiddlers as HTML files and make
> a redirect for javascript enabled browsers.
>
> > Google points at the page it is in, not understanding that there are
> > smaller parts that might mean something
>
> a bit different: Google understands that you point to a small part of
> a single page if it's pointed to e.g.http://tiddlywiki.com/#GettingStarted
I can't find any of those results - do you have some examples? As I
said many times before: all my (raw & quick) tests result in a "no
content indexed" down the line so far - I've never seen a useful piece
of content of all those thousands of TW's on the net.
I would like someone to tell me that's bullshit - but that's my
experience :-/
Furthermore I noticed some problems with requests of bots like search
engine spiders - unfortunately I had not any time to investigate that
further but I suspect either the <noscript> tag, the "xhtml strict"
declaration and/or TW's custom tags to be responsible.
> issues as I've seen people putting <html><div> words </div></html> in
> tiddlers, which seems like a bot trying to make a link to each div
> point at this non-tiddler div.
normally this will result in
<html><div> words </div><html>
Which won't do anything with bots ;)
Personally I don't have any plans to use a TiddlyWiki for web
publishing - as that's not the purpose I like it for, but it's totally
clear to me that there are other circumstances and tasks where this
might be useful - so I am also interested to improve TW's lack of SEO-
friendlyness.
--s
http://www.google.com/search?hl=en&q=site%3Atiddlyspot.com+bananas&btnG=Search
-Xv
On a related noted: probably we want to have a plugin that checks the
referrer and, if it can identify it as a search, searches within the
TW on opening it; but maybe this is too early for a pugin?
-Xavier
Wow! That "webado" person really is a jerk! The ignorant, dismissive
tone of her non-answers just serves to emphasize how well-behaved and
helpful people are here in the TiddlyWiki google groups.
Oh, and by the way, for a really belly-laugh, check out this part of
her profile:
------------------
I am a late-comer to the world of web design, website building and to
the internet in general. A long-standing programmer-analyst by trade,
I believe I understand computers and software and the software
creators sufficiently well to be able to troubleshoot issues of
interest to webmasters and offer solutions.
------------------
yeah... right. (pffft! indeed)
I'm much too annoyed at her stupid comments to reply just yet... or at
all... but if someone else wants to try to educate her about
TiddlyWiki and the use of intensive client-side javascript for
"Web2.0" applications, that would be helpful...
-e
Actually, I think this person is little more than a troll.
I would recommend to just ignore her ramblings; Don't Feed the Troll.
Tiddlywiki "Try clicking on various links and see what happens - you
cannot damage tiddlywiki.com"
I get 3 sites with that:
swik.net/tiddlywiki+instructions
www.tiddlywiki.com/
tiddlyspot.com/twhelp/
Here is the summery of that search for TiddlyWiki.com:
----
TiddlyWiki - a reusable non-linear personal web notebook - 7 visits -
3:56am
Try clicking on various links and see what happens - you cannot damage
tiddlywiki.com or your browser. (Use the <<closeAll>> button over on
the right to ...
www.tiddlywiki.com/ - 396k - Cached - Similar pages - Note this
----
Mind you that in no way are any of us tring to say that these links
take you to the correct tiddler. But they do show that the bot reads
the page, even the js. It does not run the js, but you can do a search
for it and find it.
Try to search for this:
w.subWikify(createTiddlyElement(w.output,this.element),this.terminator)
I got 160 hits for it. The first one is a TW that got messed up so the
code is visible, the next 3 were normal TWs, I didn't look any
further.
I get hits each day from people using search engines. They do not find
me from reading Jeremy's intro (No matter how nicely he wrote it).
Ken Girard
Anyway - that might help but does not solve the problem: the visitor
is not able to find the proper location/tiddler by simply hit the
resulting link.
Compared with other sites on the same page rank level (Google 7, Alexa
209392, Compete 243749 for TiddlyWiki.com) it's simply a bunch of bull
that you get one (1) result if you do a search for keyword
"tiddlywiki" although there are 545 occurances of the word.
Measurements for keyword density of "tiddlywiki":
Location/matches/total words/percentage
Title: 1: 8: 12.5%
Body: 516: 48206: 1.1%
Links: 27: 371: 7.3%
Due to the fact that a lot of people already use a TiddlyWiki for web
publishing this seems to be more than a minor problem - I will start a
new thread in the dev group later today (anyone feel free to do so if
you have more time...).
I'll also try to dig the three groups for all related threads over the
years...
> But they do show that the bot reads
> the page, even the js. It does not run the js, but you can do a search
> for it and find it.
>
> Try to search for this:
> w.subWikify(createTiddlyElement(w.output,this.element),this.terminator)
>
> I got 160 hits for it.
Although I've already been disabused regarding the indexed content, I
strongly doubt the spiders to read javascript (besides it's damaged
code) and my results look a bit different: I just get 56 with Google
and most of the pages either aren't TW's, it's displayed code by
intention or stuff from trac.tiddlywiki.org
(even if you put the search term in quotes Google treats some of the
chars as search syntax rather than what they are - e.g. brackets and
periods - which leads to a bunch of results whith partly matches
only...)
--s
Done.
So it might be a good idea to go on there: "How to solve the
unsatisfying SEO problem with TiddlyWikis used for web publishing"
http://groups.google.com/group/TiddlyWikiDev/browse_frm/thread/d14ec01af14e4f8f
--s
Any answer from the experts on the main question:
"Any knowledge as to how the rss output could be adapted, extended and
integrated or perhaps Eric's code that produces all the tiddler links?
"
On Oct 23, 10:44 pm, Ken Girard <ken.gir...@gmail.com> wrote:
> Do a google search for:
>
> Tiddlywiki "Try clicking on various links and see what happens - you
> cannot damage tiddlywiki.com"
>
> I get 3 sites with that:
> swik.net/tiddlywiki+instructionswww.tiddlywiki.com/
> tiddlyspot.com/twhelp/
>
> Here is the summery of that search for TiddlyWiki.com:
> ----
> TiddlyWiki - a reusable non-linear personal web notebook - 7 visits -
> 3:56am
> Try clicking on various links and see what happens - you cannot damage
> tiddlywiki.com or your browser. (Use the <<closeAll>> button over on
> the right to ...www.tiddlywiki.com/- 396k - Cached - Similar pages - Note this
3) Use Google's alternate submission mechanism: Give it an XML
document that indicates, for a given URL, what content to index.
I don't recall the details, but info should be available via
(and on!) Google.
- Sam