Cruftless links -- *best* way to catch and strip fluff from links like ".htm, .jsp. .asp .php etc...

51 views
Skip to first unread message

Gary Hewett

unread,
Dec 22, 2014, 6:05:19 PM12/22/14
to lif...@googlegroups.com
Yes of course I will build sites that have links like: 

some.site/
some.site/about-us
some.site/contact-us

etc

However I do get to work with replacing existing sites that have a lot of external links that I have no control over and want to 302 then to keep Google happy and the link juice flowing. 

So externally the links that have been stacked up might be: 

some.site/index
some.site/home
some.site/home.htm
some.site/home.html
some.site/home.jsp
some.site/home.php
some.site/home.asp

All of which I want to 302 over to 

some.site/

instead of simply 404ing them. 

Ditto for all the deep pages. 

I do see this and will go play with the suggestion in the last entry: 


 >> I've got it. The next example redirects from /hello.html to /index.html
 >> LiftRules.statelessDispatchTable.append({       
 >>  case Req(List("hello"), _,_) => () => {Full(new RedirectResponse("/index.html", null)).asInstanceOf[Box[LiftResponse]]       }
 >> })

Curious as to if/how that might muck up isLoggedOn (sic?) but I suspect it won't...

Also would like a more generic "lets take care of every page" function so I don't need a millions rules for bigger sites. I suspect .map over the Menu tree might become my friend...

Gary Hewett

unread,
Dec 22, 2014, 6:19:57 PM12/22/14
to lif...@googlegroups.com
Sorry - I need a 301 not a 302 in most situations - minor detail I know...

Diego Medina

unread,
Dec 22, 2014, 6:39:45 PM12/22/14
to Lift
I would say that a cleaner way to handle that is by having something nginx in front of lift. 

On Mon, Dec 22, 2014 at 6:19 PM, Gary Hewett <gary....@technical-magic.com> wrote:
Sorry - I need a 301 not a 302 in most situations - minor detail I know...

--
--
Lift, the simply functional web framework: http://liftweb.net
Code: http://github.com/lift
Discussion: http://groups.google.com/group/liftweb
Stuck? Help us help you: https://www.assembla.com/wiki/show/liftweb/Posting_example_code

---
You received this message because you are subscribed to the Google Groups "Lift" group.
To unsubscribe from this group and stop receiving emails from it, send an email to liftweb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Diego Medina
Lift/Scala consultant
di...@fmpwizard.com
http://fmpwizard.telegr.am

ti com

unread,
Dec 23, 2014, 12:44:59 AM12/23/14
to Lift

Stateless dispatch that you posted is that--it can handle everything in 1 place, it's just partial function

Antonio Salazar Cardozo

unread,
Dec 23, 2014, 1:45:19 AM12/23/14
to lif...@googlegroups.com
Agree re: nginx. Lift can do this for you, and so can the servlet container, but something like nginx
is best suited to it.
Thanks,
Antonio


On Monday, December 22, 2014 6:39:45 PM UTC-5, fmpwizard wrote:
I would say that a cleaner way to handle that is by having something nginx in front of lift. 
On Mon, Dec 22, 2014 at 6:19 PM, Gary Hewett <gary.hewett@technical-magic.com> wrote:
Sorry - I need a 301 not a 302 in most situations - minor detail I know...

--
--
Lift, the simply functional web framework: http://liftweb.net
Code: http://github.com/lift
Discussion: http://groups.google.com/group/liftweb
Stuck? Help us help you: https://www.assembla.com/wiki/show/liftweb/Posting_example_code

---
You received this message because you are subscribed to the Google Groups "Lift" group.
To unsubscribe from this group and stop receiving emails from it, send an email to liftweb+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Gary Hewett

unread,
Dec 23, 2014, 12:41:09 PM12/23/14
to lif...@googlegroups.com
So far this is what I have figured out. 

    LiftRules.dispatch.prepend {

        case r @ Req("index.html" :: _, _, _) => () => Full(PermRedirectResponse("/", r))

        case r @ Req("index.htm" :: _, _, _) => () => Full(PermRedirectResponse("/", r))

        case r @ Req("index.jsp" :: _, _, _) => () => Full(PermRedirectResponse("/", r))

        case r @ Req("home" :: _, _, _) => () => Full(PermRedirectResponse("/", r))

    }


I'm not sure if I'm in a world of future pain when this hits a Tomcat container due to the moving form relative URLs to absolute ones but that is a different issue. 


I'm trying to determine if there is a way to place a regex in the case statement logic but have yet to get full bearings on reading docs, healer classes and what Scala (ahem) loosely calls constructors. You're making Java look fun once more as at least I know exactly how things got done :) 


Regex (non functional) example: 


    LiftRules.dispatch.prepend {

        case r @ Req("index" (html,jsp,htm) :: _, _, _) => () => Full(PermRedirectResponse("/"r))

        case r @ Req("home" :: _, _, _) => () => Full(PermRedirectResponse("/", r))

    }

 

As for the answers to "do it elsewhere" I really want this particular "business logic" right next to where I define the Menu and to ensure that I'm not mixing development with maintenance. Doing the latter can be a nightmare at the next upgrade.  Bear in mind that I may be dealign with insanely long URLs. Also bear in mind that I may want to connect this to a data store at some point and make generating this map a dynamic process -- e.g. read the database and generate the map on boot. 


Editorial note: The Lift (ahem Scala?) Brochure promised that functions were first class citizens of the language -- however I feel like methods have been railroaded out of town :)  Would love to see how to connect the dots between the two so I can actually write code once more instead of just praying! 

ti com

unread,
Dec 23, 2014, 7:19:57 PM12/23/14
to liftweb

You don't need reg ex, just list of string and call contains.

--
--
Lift, the simply functional web framework: http://liftweb.net
Code: http://github.com/lift
Discussion: http://groups.google.com/group/liftweb
Stuck? Help us help you: https://www.assembla.com/wiki/show/liftweb/Posting_example_code

---
You received this message because you are subscribed to the Google Groups "Lift" group.
To unsubscribe from this group and stop receiving emails from it, send an email to liftweb+u...@googlegroups.com.

Antonio Salazar Cardozo

unread,
Dec 23, 2014, 8:25:40 PM12/23/14
to lif...@googlegroups.com
All case matches support additional guards on the end:

  case request @ Req(start :: _, _, _) if start.matches("index.(html|jsp|htm)") =>
    () => Full(PermRedirectResponse("/", request))

Or:

  val redirects = List("home", "magic")
  val regexRedirects = List("index.(html|htm|jsp)", "root.(html|htm|jsp)")

  case request @ Req(start :: _, _, _) if redirects.contains(start) || regexRedirects.find(start.matches _) =>
    () => Full(...)

Or if you want to get crazy with regexes and extract matching groups in your partial function:

  val HtmlyThing = "index.(html|htm|jsp)".r

  case request @ Req(HtmlyThing(extension) :: _, _, _) =>
    () => Full(...)

Generating a map on boot isn't impossible with nginx, you'd just have to reload the server (which
is a pretty painless operation),  and have its config include something you generate from Boot.
That's neither here nor there, though–if you're set on doing it this way that's fine.
Thanks,
Antonio

Vasya Novikov

unread,
Dec 25, 2014, 3:47:25 PM12/25/14
to lif...@googlegroups.com
BTW, do you really need to pattern-match on the first URL part? Maybe
you want the last?

Currently, as far as I see, you would match on
domain.org/index.jsp/test
but not match on
domain.org/test/index.jsp

If I am correct, something like this would be better:

case req if req.get_? && req.path.partPath.lastOption.***

case req if req.get_? && req.path.suffix.***


On 2014-12-23 20:41, Gary Hewett wrote:
> So far this is what I have figured out.
>
> LiftRules.dispatch.prepend {
>
> case r @ Req("index.html" :: _, _, _) => () => Full(
> PermRedirectResponse("/", r))
>
> case r @ Req("index.htm" :: _, _, _) => () => Full(
> PermRedirectResponse("/", r))
>
> case r @ Req("index.jsp" :: _, _, _) => () => Full(
> PermRedirectResponse("/", r))
>
> case r @ Req("home" :: _, _, _) => () => Full(PermRedirectResponse(
> "/", r))
>
> }
>
>
> I'm not sure if I'm in a world of future pain when this hits a Tomcat
> container due to the moving form relative URLs to absolute ones but that is
> a different issue.
>
>
> *I'm trying to determine if there is a way to place a regex in the case
> statement logic* but have yet to get full bearings on reading docs, healer
> classes and what Scala (ahem) loosely calls constructors. You're making
> Java look fun once more as at least I know *exactly* how things got done :)
>
>
> Regex (non functional) example:
>
>
> LiftRules.dispatch.prepend {
>
> case r @ Req("index" (html,jsp,htm) :: _, _, _) => () => Full(
> PermRedirectResponse("/", r))
>
> case r @ Req("home" :: _, _, _) => () => Full(PermRedirectResponse(
> "/", r))
>
> }
>
>
>
> As for the answers to "do it elsewhere" I really want this particular
> "business logic" right next to where I define the Menu and to ensure that
> I'm not mixing development with maintenance. Doing the latter can be a
> nightmare at the next upgrade. Bear in mind that I may be dealign with
> insanely long URLs. Also bear in mind that I may want to connect this to a
> data store at some point and make generating this map a dynamic process --
> e.g. read the database and generate the map on boot.
>
>
> Editorial note: The Lift (ahem Scala?) Brochure promised that functions
> were first class citizens of the language -- however I feel like methods
> have been railroaded out of town :) Would love to see how to connect the
> dots between the two so I can actually write code once more instead of just
> praying!
>

--
Vasya Novikov

Gary Hewett

unread,
Dec 26, 2014, 4:08:43 PM12/26/14
to lif...@googlegroups.com, n1m5-goo...@yandex.ru
Antonia - awesome - thank you! That "start" fix worked like a charm  

Vasya -- yes you re correct - I'm not looking of rt. site to "work" based on these patterns matches. I am looking to catch things that fall through the cracks and to be able to shunt old links the exist elsewhere to new current pages in a way that Google understands. 

I've moved from LiftRules.dispatch.prepend over to LiftRules.dispatch.append as I suspect we want this after the main SiteMap (Menu) gets checked. 

So to recap I think the site will do what it normally does through SiteMap unless and until a link is either malformed or and external inbound link that I have little control over needs to be redirected and then that will be picked up by this "last chance" block of code before heading into a properly presented 404 (which Il should be able to (hope to) figure out on my own...)  

Appreciate the help everyone! 
Reply all
Reply to author
Forward
0 new messages