how to get list of all possible URLs in a Hippo site

75 views
Skip to first unread message

mari....@west.cmu.edu

unread,
May 27, 2015, 8:10:38 PM5/27/15
to hippo-c...@googlegroups.com
Hello,

We have a Hippo site and are wondering if there is a way to generate a list of all possible URLs that are accessible on the site, similar to doing "rake routes" in Ruby on Rails.  For example, "rake routes" produces output like this:

$ rake routes
                        Prefix Verb   URI Pattern                                              Controller#Action
                          root GET    /                                                        pages#index
                        signup GET    /signup(.:format)                                        users#new
                         login GET    /login(.:format)                                         sessions#new
                        logout DELETE /logout(.:format)                                        sessions#destroy
                      sessions GET    /sessions(.:format)                                      sessions#index
                               POST   /sessions(.:format)                                      sessions#create
                   new_session GET    /sessions/new(.:format)                                  sessions#new
                  edit_session GET    /sessions/:id/edit(.:format)                             sessions#edit
                       session GET    /sessions/:id(.:format)                                  sessions#show
                               PATCH  /sessions/:id(.:format)                                  sessions#update
                               PUT    /sessions/:id(.:format)                                  sessions#update
                               DELETE /sessions/:id(.:format)                                  sessions#destroy
                         users GET    /users(.:format)                                         users#index
                               POST   /users(.:format)                                         users#create
                      new_user GET    /users/new(.:format)                                     users#new
                     edit_user GET    /users/:id/edit(.:format)                                users#edit
                          user GET    /users/:id(.:format)                                     users#show
                               PATCH  /users/:id(.:format)                                     users#update
                               PUT    /users/:id(.:format)                                     users#update
                               DELETE /users/:id(.:format)                                     users#destroy


I want to end up with a list that has each accessible URL on a line so that I can make a CSV file out of it, like this:

http://www.mysite.com/
http://www.mysite.com/news
http://www.mysite.com/about-us
http://www.mysite.com/product1
http://www.mysite.com/product2
etc.

Thank you.

Ard Schrijvers

unread,
May 28, 2015, 4:33:41 AM5/28/15
to hippo-c...@googlegroups.com
Hey,

On Thu, May 28, 2015 at 2:10 AM, <mari....@west.cmu.edu> wrote:
> Hello,
>
> We have a Hippo site and are wondering if there is a way to generate a list
> of all possible URLs that are accessible on the site,

We do not have a default utility for this. Also, it is not always
possible, for example in case of faceted navigation the number of
possible URLs easily exceeds number of particles in the universe.

What is fairly easy would be the following:

1) Create all URLs for all explicit sitemap items (including their
ancestors). An explicit sitemap item is a item that does not contain
wildards
2) For every document in the repository, ask the HST to generate its URL

Then remove the doubles from (1) and (2). Shouldn't be hard to create
this (if you know what to do I admit :-) )

Note that for (2), it makes most sense to return the canonical URLs
(HST has context aware linkrewriting : We can show one and the same
document at different URLs depending on the current context)

Regards Ard
> --
> Hippo Community Group: The place for all discussions and announcements about
> Hippo CMS (and HST, repository etc. etc.)
>
> To post to this group, send email to hippo-c...@googlegroups.com
> RSS:
> https://groups.google.com/group/hippo-community/feed/rss_v2_0_msgs.xml?num=50
> ---
> You received this message because you are subscribed to the Google Groups
> "Hippo Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to hippo-communi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/hippo-community.
> For more options, visit https://groups.google.com/d/optout.



--
Hippo Netherlands, Oosteinde 11, 1017 WT Amsterdam, Netherlands
Hippo USA, Inc.- 745 Atlantic Ave, Eight Floor, Boston MA 02111,
United states of America.

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

Jeroen Reijn

unread,
May 28, 2015, 4:59:31 AM5/28/15
to hippo-c...@googlegroups.com
On Thu, May 28, 2015 at 10:33 AM, Ard Schrijvers <a.schr...@onehippo.com> wrote:
Hey,

On Thu, May 28, 2015 at 2:10 AM,  <mari....@west.cmu.edu> wrote:
> Hello,
>
> We have a Hippo site and are wondering if there is a way to generate a list
> of all possible URLs that are accessible on the site,

We do not have a default utility for this. Also, it is not always
possible, for example in case of faceted navigation the number of
possible URLs easily exceeds number of particles in the universe.

What is fairly easy would be the following:

1) Create all URLs for all explicit sitemap items (including their
ancestors). An explicit sitemap item is a item that does not contain
wildards
2) For every document in the repository, ask the HST to generate its URL

Then remove the doubles from (1) and (2). Shouldn't be hard to create
this (if you know what to do I admit :-) )

Note that for (2), it makes most sense to return the canonical URLs
(HST has context aware linkrewriting : We can show one and the same
document at different URLs depending on the current context)

Both points above is something that the google sitemap plugin on the forge does.




--
Jeroen Reijn
Hippo

Amsterdam office - Oosteinde 11, 1017 WT Amsterdam
Boston office - 745 Atlantic Ave, Eight Floor, Boston MA 02111, United states of America.

Ard Schrijvers

unread,
May 28, 2015, 5:08:22 AM5/28/15
to hippo-c...@googlegroups.com
Does that also crawl documents? Great! Didn't know that. Thanks Reijn!

Mathijs den Burger

unread,
May 29, 2015, 7:21:05 AM5/29/15
to hippo-c...@googlegroups.com
Note that the sitemap plugin is a certified one. Its documentation can be found at http://www.onehippo.org/library/concepts/plugins/sitemap/about.html.

Mathijs

Woonsan Ko

unread,
May 29, 2015, 11:09:26 AM5/29/15
to hippo-c...@googlegroups.com
I'm not sure if the crawling feature of the sitemap plugin is
production-ready yet according to this issue:
- https://issues.onehippo.com/browse/HIPPLUG-710

Also, the documentation doesn't seem to explain that yet properly.

On 5/29/15 7:20 AM, Mathijs den Burger wrote:
> Note that the sitemap plugin is a certified one. Its documentation can
> be found
> at http://www.onehippo.org/library/concepts/plugins/sitemap/about.html.
>
> Mathijs
>
>
> On Thu, May 28, 2015 at 11:08 AM, Ard Schrijvers
> <a.schr...@onehippo.com <mailto:a.schr...@onehippo.com>> wrote:
>
> On Thu, May 28, 2015 at 10:59 AM, Jeroen Reijn <j.r...@onehippo.com
> <mailto:j.r...@onehippo.com>> wrote:
> >
> >
> > On Thu, May 28, 2015 at 10:33 AM, Ard Schrijvers
> <a.schr...@onehippo.com <mailto:a.schr...@onehippo.com>>
> <mailto:hippo-c...@googlegroups.com>
> >> > RSS:
> >> >
> >> >
> https://groups.google.com/group/hippo-community/feed/rss_v2_0_msgs.xml?num=50
> >> > ---
> >> > You received this message because you are subscribed to the Google
> >> > Groups
> >> > "Hippo Community" group.
> >> > To unsubscribe from this group and stop receiving emails from
> it, send
> >> > an
> >> > email to hippo-communi...@googlegroups.com
> <mailto:hippo-community%2Bunsu...@googlegroups.com>.
> >> > Visit this group at http://groups.google.com/group/hippo-community.
> >> > For more options, visit https://groups.google.com/d/optout.
> >>
> >>
> >>
> >> --
> >> Hippo Netherlands, Oosteinde 11, 1017 WT Amsterdam, Netherlands
> >> Hippo USA, Inc.- 745 Atlantic Ave, Eight Floor, Boston MA 02111,
> >> United states of America.
> >>
> >> US +1 877 414 4776 <tel:%2B1%20877%20414%204776> (toll free)
> >> Europe +31(0)20 522 4466 <tel:%2B31%280%2920%20522%204466>
> >> www.onehippo.com <http://www.onehippo.com>
> >>
> >> --
> >> Hippo Community Group: The place for all discussions and
> announcements
> >> about Hippo CMS (and HST, repository etc. etc.)
> >>
> >> To post to this group, send email to
> hippo-c...@googlegroups.com
> <mailto:hippo-c...@googlegroups.com>
> >> RSS:
> >>
> https://groups.google.com/group/hippo-community/feed/rss_v2_0_msgs.xml?num=50
> >> ---
> >> You received this message because you are subscribed to the
> Google Groups
> >> "Hippo Community" group.
> >> To unsubscribe from this group and stop receiving emails from it,
> send an
> >> email to hippo-communi...@googlegroups.com
> <mailto:hippo-community%2Bunsu...@googlegroups.com>.
> >> Visit this group at http://groups.google.com/group/hippo-community.
> >> For more options, visit https://groups.google.com/d/optout.
> >
> >
> >
> >
> > --
> > Jeroen Reijn
> > Hippo
> >
> > Amsterdam office - Oosteinde 11, 1017 WT Amsterdam
> > Boston office - 745 Atlantic Ave, Eight Floor, Boston MA 02111, United
> > states of America.
> >
> > US +1 877 414 4776 <tel:%2B1%20877%20414%204776> (toll free)
> > Europe +31(0)20 522 4466 <tel:%2B31%280%2920%20522%204466>
> > www.onehippo.com <http://www.onehippo.com>
> >
> > http://blog.jeroenreijn.com | @jreijn | http://about.me/jeroenreijn
> >
> > --
> > Hippo Community Group: The place for all discussions and
> announcements about
> > Hippo CMS (and HST, repository etc. etc.)
> >
> > To post to this group, send email to
> hippo-c...@googlegroups.com
> <mailto:hippo-c...@googlegroups.com>
> > RSS:
> >
> https://groups.google.com/group/hippo-community/feed/rss_v2_0_msgs.xml?num=50
> > ---
> > You received this message because you are subscribed to the Google
> Groups
> > "Hippo Community" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send an
> > email to hippo-communi...@googlegroups.com
> <mailto:hippo-community%2Bunsu...@googlegroups.com>.
> > Visit this group at http://groups.google.com/group/hippo-community.
> > For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> Hippo Netherlands, Oosteinde 11, 1017 WT Amsterdam, Netherlands
> Hippo USA, Inc.- 745 Atlantic Ave, Eight Floor, Boston MA 02111,
> United states of America.
>
> US +1 877 414 4776 <tel:%2B1%20877%20414%204776> (toll free)
> Europe +31(0)20 522 4466 <tel:%2B31%280%2920%20522%204466>
> www.onehippo.com <http://www.onehippo.com>
>
> --
> Hippo Community Group: The place for all discussions and
> announcements about Hippo CMS (and HST, repository etc. etc.)
>
> To post to this group, send email to
> hippo-c...@googlegroups.com
> <mailto:hippo-c...@googlegroups.com>
> RSS:
> https://groups.google.com/group/hippo-community/feed/rss_v2_0_msgs.xml?num=50
> ---
> You received this message because you are subscribed to the Google
> Groups "Hippo Community" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to hippo-communi...@googlegroups.com
> <mailto:hippo-community%2Bunsu...@googlegroups.com>.
> Visit this group at http://groups.google.com/group/hippo-community.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> Hippo Community Group: The place for all discussions and announcements
> about Hippo CMS (and HST, repository etc. etc.)
>
> To post to this group, send email to hippo-c...@googlegroups.com
> RSS:
> https://groups.google.com/group/hippo-community/feed/rss_v2_0_msgs.xml?num=50
> ---
> You received this message because you are subscribed to the Google
> Groups "Hippo Community" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to hippo-communi...@googlegroups.com
> <mailto:hippo-communi...@googlegroups.com>.
> Visit this group at http://groups.google.com/group/hippo-community.
> For more options, visit https://groups.google.com/d/optout.


--
w....@onehippo.com www.onehippo.com
Boston - 745 Atlantic Ave, 8th Floor, Boston MA 02111
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Reply all
Reply to author
Forward
0 new messages