capturing html output for static html 'search engine' page

53 views
Skip to first unread message

excyberlabber

unread,
Jul 9, 2012, 4:45:19 AM7/9/12
to simile-...@googlegroups.com
Hi,

I have searched the group and found: http://www.simile-widgets.org/wiki/How_to_make_Exhibit_search_engine_friendly.  However my Exhibit is old.  It is a Joomla extension created for Joomla version 1.5.  It has no 'copy all' that I can find.  I need to create a static html page that google can crawl - is there a good way?  Or have I overlooked something?


Thanks in advance!

Regards,

David Karger

unread,
Jul 10, 2012, 2:24:40 AM7/10/12
to simile-...@googlegroups.com, excyberlabber
It isn't actually important to make html. Just select and copy the
html, and you'll have a blob of text has the words you want google to
index. Or, download the contents of
http://www.hucompute.org/data/team/team.json2 . Take the contents and
put them in your page inside a <noscript></noscript> tag.
> --
> You received this message because you are subscribed to the Google
> Groups "SIMILE Widgets" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/simile-widgets/-/CVtu0Dm7kA0J.
> To post to this group, send email to simile-...@googlegroups.com.
> To unsubscribe from this group, send email to
> simile-widget...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/simile-widgets?hl=en.


Paul Warner

unread,
Jul 10, 2012, 3:23:23 AM7/10/12
to simile-...@googlegroups.com
Super!  Thanks for the tip.


On Tue, Jul 10, 2012 at 8:24 AM, David Karger <kar...@mit.edu> wrote:
It isn't actually important to make html.  Just select and copy the html, and you'll have a blob of text has the words you want google to index.  Or, download the contents of http://www.hucompute.org/data/team/team.json2 .  Take the contents and put them in your page inside a <noscript></noscript> tag.


On 7/9/2012 4:45 AM, excyberlabber wrote:
Hi,

I have searched the group and found: http://www.simile-widgets.org/wiki/How_to_make_Exhibit_search_engine_friendly.  However my Exhibit is old.  It is a Joomla extension created for Joomla version 1.5.  It has no 'copy all' that I can find.  I need to create a static html page that google can crawl - is there a good way?  Or have I overlooked something?

http://www.hucompute.org/publikationen

Thanks in advance!

Regards,
--
You received this message because you are subscribed to the Google Groups "SIMILE Widgets" group.
To view this discussion on the web visit https://groups.google.com/d/msg/simile-widgets/-/CVtu0Dm7kA0J.
To post to this group, send email to simile-widgets@googlegroups.com.
To unsubscribe from this group, send email to simile-widgets+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/simile-widgets?hl=en.
--
You received this message because you are subscribed to the Google Groups "SIMILE Widgets" group.
To post to this group, send email to simile-widgets@googlegroups.com.
To unsubscribe from this group, send email to simile-widgets+unsubscribe@googlegroups.com.

Paul Warner

unread,
Sep 4, 2012, 6:27:26 AM9/4/12
to simile-...@googlegroups.com
I have tried the second suggestion below, plugging the json code into the html output between  <noscript></noscript> tags.  But it seems google does not index it.  Or, to be more exact, google seems to have indexed only a small number of the publications in our page.

Here is an example of a json entry from our file that google has NOT indexed:

{
"pdf" :       "/data/pdf/gleim_warner_mehler_2010.pdf",
"booktitle" : "Proceedings of the 6th International Conference on Web Information Systems and Technologies (WEBIST '10), April 7-10, 2010, Valencia",
"pub-type" :  "inproceedings",
"uri" :       "urn:9d04c97ae1f239d6a44b4984decc33ea",
"date" :      "2010",
"author" :    [
"Gleim, R\u00FCdiger",
"Warner, Paul",
"Mehler, Alexander"
],
"authoreditor" :    [
"Gleim, R\u00FCdiger",
"Warner, Paul",
"Mehler, Alexander"
],
"type" :      "Publication",
"year" :      "2010",
"label" :     "eHumanities Desktop - An Architecture for Flexible Annotation in Iconographic Research",
"key" :       "Gleim:Warner:Mehler:2010"
},

Is there something else I should do to indicate that the pdf should be indexed?

I have also tried, using the Firefox extension Web Developer, opening the 'Generated source code', but that overloads the browser, apparently, and Firefox crashes every time before it generates the code.

Now, the first suggestion, copying and pasting the text, misses the links to the bibliography and especially the pdfs, that we want indexed.  How can I copy and paste the link information as well?

Thanks for any help!
Reply all
Reply to author
Forward
0 new messages