Canonical URLs.

221 views
Skip to first unread message

ernest meyer

unread,
Apr 7, 2016, 4:02:13 AM4/7/16
to Joomla! General Development
Joomla 3.5.1 arrived with some statement about canonical URLs. Well, if there is an article on blastedsite.com with ID 2, and someone goes to blastedsite.com/2, Joomla still says the canonical URL is mysite.com/?2, no matter how mamy menus are made and no matter how many posts are written about it. So I wrote a plugin solution. However, I am not allowed to share it, because an adminstrator on forums.joomla.org said it is considered self promotion,  six months work, and several plugins posted, and after >1,000 downloads in the last year, that is the ONLY thing anyone from Joomla has said to me--since 2006 in fact. I tried sharing something here before, and that got no response, so this is my last share unless someone says something nice. I regret I cannot provide working code as I am informed that would be considered 'self promotion.' Here is where I got to. I  hope you all have a nice day. 


 
public function onAfterInitialise(){
 $this
->getCheckBoxes(self::$o,$this->params->get('pluginOptions'));//setup basic parameters
 
self::$o['debug'] =$this->params->get('debug');
 
if($this->app->isAdmin())
 
return;
 
self::$o['baselen'] =strlen(JURI::root());
 
self::$o['rootlen'] =strlen(JURI::root(true));
 $p
=JPluginHelper::getPlugin('content', 'pagebreak');
 
if(empty($p)){
 
self::$o['pageplugin'] =false;
 
return;
 
}
 $p
=new JRegistry($p->params);
 $s
=$p->get('style');
 
if(isset($s))
 
self::$o['pagestyle'] =$s;
 
if(self::$o['pagestyle']!="pages")
 
return;
 
if(self::$o["ysefpagination"]):
 $router
=$this->app->getRouter();
 $router
->attachParseRule(function(&$siteRouter,&$uri){//incoming URIs
 $query
=$uri->getQuery(true); //query as array
 $path
=$uri->getPath(); //path from base withourt leading slash
 $this
->queueMessage('parse: '.$path,1);
 
if(preg_match('~(.*?)(/?)section(s_all|_(\d+))$~',$path,$matches)){
 
if($matches[3]=='s-all'){
 $query
['showall']=1;//only on articles, not blogs
 $query
['limitstart']='';
 
}elseif($matches[4]==1){
 $query
['showall']='';
 $query
['limitstart']='';
 
}else{
 $query
['showall']='';
 $q
=$matches[4]-1;
 $query
['start']=$q;
 
}
 $uri
->setQuery($query);
 $uri
->setPath($matches[1]);
 
}
 
return array();
 
});
 $router
->attachBuildRule(function(&$siteRouter,&$uri){ //outgoing URIs
 $query
=$uri->getQuery(true); //query as array
 $path
=$uri->getPath(); //path from base withourt leading slash
 
if(!empty($query['showall']) && $query['showall']==1){
 $path
.="/sections_all"; //using underscore to avoid router hyphen bug in Joomla 3.3.6-
 
}else{
 
if(isset($query['limitstart'])){
 
if(!empty($query['limitstart'])){//if empty it is page 1
 $p
=$query['limitstart']+1;
 $path
.="/section_".$p;
 
}//else $path.="/section_0";
 unset
($query['limitstart']);
 
}if(isset($query['start'])){
 $p
=$query['start']+1;
 $path
.="/section_".$p;
 
}
 
}
 unset
($query['showall']);
 unset
($query['limitstart']);
 unset
($query['start']);
 $uri
->setPath($path);
 $uri
->setQuery($query);
 
return $uri;
 
},'postprocess');
 endif
;
 
//*  end onAfterInitialise ----*///
   
}
 
public function onAfterRoute(){
 
if($this->app->isAdmin()) return;
 $input
=$this->app->input;
 $option
=$input->get('option'); //com_content,....
 $view
=$input->get('view'); //article,category,categories,or featured
 
self::$o['showall'] =$input->get('showall'); //pagination
 
if(self::$o['pagestyle']=='pages' && self::$o['showall']!=1){
 
self::$o['showall'] =false;
 
}else{
 
self::$o['showall'] =true; //whether whole page is displaying
 
}
 
if(is_null($input->get('format'))): //output is not rss or template
 
if($option=='com_content'):
 
switch($view):
 
case ('article'):
 
case('featured'):
 
self::$o['view']=$view; break;
 
case('category'):
 
case('categories'):
 
if($input->get('layout')=="blog") self::$o['view']="blog";
 
else self::$o['view']="list";
 
break;
 
default:
 
self::$o['view']="content"; //for addons
 
break;
 endswitch
;
 elseif
($view!='print' && $view!='form'):
 
self::$o['view']="other"; //it is another kind of HTML page, not RSS or form or printing
 
else:
 
return;
 endif
;
 
else:
 
return; //self::$o['view'] is now 'article', 'featured', 'blog', 'list', 'other', or false.
 endif
;


 
if(!((self::$o["ysefcontent"]||self::$o["ycanonical"]) &&self::$o['view']!=false && self::$o['view']!="other"))
 
return;//done with setting up view, proceeding to SEF
 $id
=$input->get('id'); //ID for article
 $start
=$input->get('start'); //pagination
 $limitstart
=$input->get('limitstart'); //
 $Itemid
=$input->get('Itemid'); //
 
if(empty($lang))$lang=0;
 $menu
=$this->app->getMenu();
 $url
='index.php?option='.$option.'&view='.$view;
 
if(self::$o['view']=='blog')$url.='&layout=blog';
 
if(is_null($id)&& $view=="categories")$id=0;//id isn't set for top-level categories list.
 
if($view=="featured"){ //featured may not be home page
 $this
->menuCheck('index.php?option=com_content&view=featured',$menu,$Itemid);
 
if(isset($Itemid)){ //featured page may be amywhere, so check menu
 $this
->menuCheck($url,$menu,$Itemid);
 $url
.='&Itemid='.$Itemid;
 
}else{
 $menuItem
=$menu->getItems('link',$url,true);
 
if(!empty($menuitem))$url.='&Itemid='.$Itemid;
 
}
 
}elseif(isset($id)){ //can't assume menu ID is correct
 require_once
(JPATH_SITE . '/components/com_content/helpers/route.php');
 
if($view=="article"){ //if its article, get ID for category to compare SEF url.
 $ydb
=JFactory::getDbo();
 $catid
=$input->get('catid');
 $ydb
->setQuery('select catid from #__content where id='.$id);
 $cat
=$ydb->loadResult();
 
if(isset($catid) && $cat!=$catid)
 $this
->redirect(JRoute::_($url.'&catid='.$catid));
 $url
=ContentHelperRoute::getArticleRoute($id, $catid,$lang);
 
}else{ //if its category or categories, get route.
 $url
=ContentHelperRoute::getCategoryRoute($id,$lang);
 
}
 
}elseif(isset($Itemid)){ //no content ID, determine ID from menu ID and issue redirect
 $menuitem
=$menu->getItem($Itemid);
 
if(!empty($menuitem)){
 $menuid
=$menuitem->id;
 $url
=new JURI($menuitem->link);
 $this
->redirect($url);
 
}else{
 
return;//404 condition
 
}
 
}else{
 
//todo: add additional checks to find article or category name in menu or in database
 
return;//404 condition
 
}





jms

unread,
Apr 7, 2016, 5:48:00 AM4/7/16
to joomla-de...@googlegroups.com
Hello,

The best way to get a new feature in Joomla is to make a PR on github.
Canonicals in Joomla are limited to choosing another domain in the SEF plugin parameters. Which is indeed not very useful although its limited use is now corrected in core.

Implementing canonicals in the real sense of the functionality is therefore needed.

I suggest you propose a solution on https://github.com/joomla/joomla-cms
Your code should though obey to Joomla formatting.

Regards

JM
> --
> You received this message because you are subscribed to the Google Groups "Joomla! General Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to joomla-dev-gene...@googlegroups.com.
> To post to this group, send email to joomla-de...@googlegroups.com.
> Visit this group at https://groups.google.com/group/joomla-dev-general.
> For more options, visit https://groups.google.com/d/optout.

ernest meyer

unread,
Apr 7, 2016, 9:00:40 AM4/7/16
to Joomla! General Development
Very well. My code in total is 600 lines, and the tabs do not show up when I post here. if I add the whitespace to conform to Joomla formatting, it will easily be 1,000 lines. Please advise how exactly I am meant to share it as a PR without providing a link to my working plugin.  
...

brian teeman

unread,
Apr 7, 2016, 11:40:22 AM4/7/16
to Joomla! General Development
As J-M says - submit it as a Pull Request on github

Some documentation to help you can be found here https://docs.joomla.org/Portal:Joomla!_Code_Contributors

ernest meyer

unread,
Apr 7, 2016, 1:31:09 PM4/7/16
to Joomla! General Development
I reinstalled Eclipse and extensions and applied the formatting. The full source is now 2,500 lines, and I have no wish to inflict that on others. So what I have done is abstract the essential point on the query ID issue like this, which by itself is much simpler. I would very much appreciate any feedback before submitting a PR thing. 

This addresses the problem that uris in format http://site.com/2 will return an article of ID 2 and canonical uri http//site.com/?2 rather than any SEF URL or alias set for the item or menu. Some such URLs have been showing up in Google webmasters for my site as bad URLs for years now. Moreover, when site modules are set up to show only upon some menu items, such query-string URLs for articles, categories, etc do not load the modules correctly, as the module inclusion is driven by the menu, and the menu ID was not generated by the request. 

ironically, Joomla has contained a helper function that could fix the problem for some years, but it is not used during SEF routing.

So the resolution is, at its very basis. this simple, as the pertinent code for generating the SEF URL already exists. Currently I have completely set up in a system plugin's onAfterRoute() handler, and this is just the essential action of it:

Code:
protected $app;
protected $db;
private static $url;
//(other code here to filter out RSS, forms, etc.)
$input=$this->app->input;
if ($input->get('view') == "article"){
   $id = $input->get('id');
   $db = JFactory::getDbo();
   $db->setQuery('select catid from #__content where id=' . $id);
   $cat = $db->loadResult();
   $catid = $input->get('catid');
   if (isset($catid) && $cat != $catid) {//user fetched wrong category
      JRoute::_($url . '&catid=' . $cat);//(non-SEf $url was already built higher up)
      header("HTTP/1.1 301 Moved Permanently");
      header("Location: " . $url);
   }elseif(empty($catid)){//user didn't fetch by category at all

      require_once (JPATH_SITE . '/components/com_content/helpers/route.php');
      $url = ContentHelperRoute::getArticleRoute($id, $cat, $lang);
      if (rtrim($url, '/') != rtrim(JURI::current() , '/')) {
         header("HTTP/1.1 301 Moved Permanently");
         header("Location: " . $url);
      }
   }
}//do the same for categories....
self::$url=$url;

I think maybe the redirect should be an adminstration option, but the canonical URL should DEFINITELY be the URI produced by getArticleRoute(), not http://site.com/?2 . And currently the only way to fix that reliably is with a core override, or by changing the link set by the SEF plugin, which currently requires something like this:

Code:
function onAfterRender(){
   $src   =$this->app->getBody();
   preg_replace('~<link rel="canonical"[^>]*>~',
      '<link rel="canonical" href="'.self::$url.'">',$src);
   $this->app->setBody($src);
}

That is a very expensive call from a Joomla core perspective, which is also why I raise the issue. I hope the code formatting is acceptable this time, I am only trying to illustrate a point, and I am a K&R guy really. 

Ove Eriksson

unread,
Apr 17, 2016, 7:50:24 AM4/17/16
to Joomla! General Development

I don't get any canonical links after J 3.5.1 (as infograf768 wrote) and that is better than get false ones I guess.

Joomla needs a component, a view and a menu item to find a correct URL. The menu item can define the needed component, the view and item number for article, category or other item.

I do not get an article when I enter http://www.mysite.com/234/2352 but an item from another component. i.e. from the component used for the home page. The active page becomes the home page and shows a single item. The single item belongs to a complete other menu. (if item no is present, and I do not get any articles!).

I've tested it before with Featured articles as homepage. Then Joomla finds articles. If you created menu items for all the categories and articles the menu item might be correct.

As there is no canonical link created, I try to find a way to return a 404 if the URL isn't unambiguous but so far I didn't find any general way.

 I guess, that what you have is a solution for a special case with com_content.

ernest meyer

unread,
Apr 17, 2016, 9:44:20 AM4/17/16
to Joomla! General Development
That's interesting. I had not tried without featured as home page. i will do some more testing. Thank you for the pointer. 

Have you tried  http://www.mysite.com/?2352  , http://www.mysite.com/?=2352 , and using an article alias or menu alias without a category? Also there are URLs with the id and catid included after a hyphen that will return a page, but modules attached to any menu item associated with that page won't display, and the menu item will not be activated. 

It is also true for the user and news components. I haven't tested with other components. 


Ove Eriksson

unread,
Apr 18, 2016, 1:01:27 AM4/18/16
to Joomla! General Development
I think the following happens.

Joomla do not find what I wrote above about the needed information, menu item a.s.o., so it defaults to the homepage menu item and finds a component. Then the router for this component is called. e.g. the router for com_content then makes some assumptions,

com_content - routers.php - line 333 ...
 
The are some assumptions about ":"  in the segments and about (int) numbers in the first and last part of the segments.

I took me the time for some quick tests.
Example partly tested with my own homepage NOT com_content! More than one category level.

Example: full link (should be the canonical one in my case) http://localhost/joomla/menu-item/69-top-category/category-69/2796-my-item

These  finds the item but with the homepage active, not the correct one.:http://localhost/joomla without the menu-item in front
"/2796" --  "/69/2796-my-item" -- "2796-xxxx" -- /69/2796" -- "my-item" -> this one works if there is a menu item for this single item with the alias ""my-item" 

Category; "69-top-category"  -> 404 is correct as the slug is wrongt! -- "69-category-69" finds the category, active in homepage.

These finds the homepage:not the item without 404
"/?2796" -- "/?=2796"
 
That's what I found. Even if it would be possible to generate a canonical link I guess it's too late to add it to the document header at this stage. I would also prefer 404 instead of using the wrong menu item with the following styling problems. Creating menu items for each single item has only the small advantage that you do not need the numbers in the links. Is it possible to force a 404 somehow?

I'm sure not The Expert though!

Ove Eriksson

unread,
Apr 18, 2016, 2:42:19 AM4/18/16
to Joomla! General Development
Mybe I should add that if you add a Site Domain in the SEF plugin you'll get some canonical links. The behaviour was corrected with 3.5.1. It's purpose is only for "multi domain sites". Read the tooltip in the plugin. 

Ove Eriksson

unread,
Apr 18, 2016, 4:05:26 AM4/18/16
to Joomla! General Development
I think showing items in the wrong active menu is the bigger problem. But I now happend to find a page with 2 urls and one content.

Open your site mysite.com and you*ll land on the homepage. If this page is titled Home you*ll get the same content if you enter mysite.com/home. Not exactly correct I suppose! (if there is no canonical link for one of them.)

I only wanted to mention it.
 
Reply all
Reply to author
Forward
0 new messages