Url caching problem

13 views
Skip to first unread message

Alexander Obuhovich

unread,
Jul 13, 2011, 5:07:28 AM7/13/11
to In-Portal Bugs
In-Portal nice url caching system, that remembers what was parsed from url and don't do parsing next time, but uses data from cache.

Cache reset also works fine.

Here is the scenario when cache could be misleading:
  1. you have page http://www.website.com/directory.html with ID=2
  2. you have page that is shown, when a page isn't found (for 404 code) with ID=115
  3. visit directory page and witness that cache with ID=2 is created
  4. then rename directory page to directory2
  5. visit directory page (using old url) and witness that 404 page is displayed instead and cache is updated to reflect that (error logic here)
  6. rename directory2 to directory and visit directory page to see 404 instead of correct page
I think that we shouldn't put url parsing result to cache, when not all url was parsed correctly.



--
Best Regards,

http://www.in-portal.com
http://www.alex-time.com

Dmitry A.

unread,
Jul 15, 2011, 10:40:24 AM7/15/11
to in-port...@googlegroups.com
Hi Alex,


Yes, I totally agree here!

In other words, It's okay NOT to cache if page returns 404 or ANYTHING else (ie. no permission).

Please proceed with the task and patch. It's kind of critical for 5.1.3 in my opinion.


Thanks.


DA

Alexander Obuhovich

unread,
Jul 15, 2011, 10:45:41 AM7/15/11
to in-port...@googlegroups.com
No permission page isn't the case, since it doesn't affect url parsing process.


--
You received this message because you are subscribed to the Google Groups "In-Portal Bugs Team" group.
To view this discussion on the web visit https://groups.google.com/d/msg/in-portal-bugs/-/bFNH0RdvPI4J.
To post to this group, send email to in-port...@googlegroups.com.
To unsubscribe from this group, send email to in-portal-bug...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/in-portal-bugs?hl=en.

Phil -- wbtc.fr --

unread,
Jul 16, 2011, 11:53:08 PM7/16/11
to in-port...@googlegroups.com
In your test scenario, we should have a 301 error at step 5, it's totally wrong to output a 404. We should also give the new URI, as ID stayed the same, we know the new URI :)
Same process should happen for step 6.

To be more accurate, we should have error pages in a separate folder, to handle 403, 404, 301 (and maybe 407 when a page is in cache but doesn't exist anymore?).


2011/7/15 Alexander Obuhovich <aik....@gmail.com>

Alexander Obuhovich

unread,
Jul 17, 2011, 4:58:25 AM7/17/11
to in-port...@googlegroups.com
403 and 404 pages names are specified in configuration once for all themes and that's why it's not wise to place them in different location for each of the themes.

301 it's not a page at all, it's a redirect.

Also it's a hard thing to track all page renames, to finally display resulting page to user. Please refer to this task http://tracker.in-portal.org/view.php?id=360 for more information.

For example MediaWiki create NEW PAGE of redirect type with old page name, that only will redirect to correct page. And you will a problem, when you'll need to rename page back, since there is already another page with that name.

Also we could create PageRenames table (RenameId, PageId, FromPage, ToPage, RenamedOn) and track all page renames there. In time that table content could create redirect loops, since 2 different pages could have same urls over time.

Question count is more then answer count here.


I also don't know what other CMS are doing with page renaming (except MediaWiki).

Phil -- wbtc.fr --

unread,
Jul 17, 2011, 5:35:07 AM7/17/11
to in-port...@googlegroups.com
2011/7/17 Alexander Obuhovich <aik....@gmail.com>

403 and 404 pages names are specified in configuration once for all themes and that's why it's not wise to place them in different location for each of the themes.

ah, I didn't knew that, where can we set 403 pages template?
 
301 it's not a page at all, it's a redirect.

not exactly. It's a "moved permanently" error message, with a link returned by server in Location field. see rfc here
It avoid any page loss in search engines too.

Also it's a hard thing to track all page renames, to finally display resulting page to user. Please refer to this task http://tracker.in-portal.org/view.php?id=360 for more information.

about this task, it's not 302 error but 301.

My idea is not letting cache acting passively on results, but rather update it when we rename a page and save it. This way we could indicate in cache the 301 redirect and have a perfect moving message for users and search engines.
When we edit the page, we have all in hands to update cache if needed, and tell the system about new url, isn't it?
 
For example MediaWiki create NEW PAGE of redirect type with old page name, that only will redirect to correct page. And you will a problem, when you'll need to rename page back, since there is already another page with that name.

if there's another page by that name, we would have the same error message as now "page name must be unique" or page uri will be renamed if automatic filename is checked.
 
Also we could create PageRenames table (RenameId, PageId, FromPage, ToPage, RenamedOn) and track all page renames there. In time that table content could create redirect loops, since 2 different pages could have same urls over time.

Isn't too much, it we act from the beginning, when we rename the page effectively?
 
Question count is more then answer count here.


I also don't know what other CMS are doing with page renaming (except MediaWiki).

duno too 

Alexander Obuhovich

unread,
Jul 17, 2011, 6:35:05 AM7/17/11
to in-port...@googlegroups.com
ah, I didn't knew that, where can we set 403 pages template?

Configuration section location: "Configuration -> Website -> Advanced"
Setting name: "Template for "Insufficient Permissions" Error"

Below it is setting for 404 page.


not exactly. It's a "moved permanently" error message, with a link returned by server in Location field. see rfc here
It avoid any page loss in search engines too.

I see no point in creating separate page, that nobody will see, since based on send 301 header automatic redirect will be made by web-browser.


about this task, it's not 302 error but 301.

10x, fixed in task.


My idea is not letting cache acting passively on results, but rather update it when we rename a page and save it. This way we could indicate in cache the 301 redirect and have a perfect moving message for users and search engines.
When we edit the page, we have all in hands to update cache if needed, and tell the system about new url, isn't it?

In-Portal uses a cache differently. Along with each cached url record a page ID in it is stored. When page is changed, then all cached urls, where it's found are deleted. When page will be visited by new url, then new cache pointing to same ID will be rebuild.


It's universal caching system, that doesn't know that page caches should be treated differently. Also if I'll implement what you've suggested, then there will be 2 cache records for 1 page:
  • old url with 301 mark
  • new url
and both record will point to the same id.

However idea to mark page renames in cache records seems interesting to me. That all stuff won't be working when mod-rewrite is off of course.


if there's another page by that name, we would have the same error message as now "page name must be unique" or page uri will be renamed if automatic filename is checked

It will happen as you say already, but because of it I will need to edit 2 pages instead of 1 (as now) to restore my page original url.


Isn't too much, it we act from the beginning, when we rename the page effectively?

I didn't get what you meant here.

Phil -- wbtc.fr --

unread,
Jul 17, 2011, 7:53:48 AM7/17/11
to in-port...@googlegroups.com
ah, I didn't knew that, where can we set 403 pages template?

Configuration section location: "Configuration -> Website -> Advanced"
Setting name: "Template for "Insufficient Permissions" Error"

Below it is setting for 404 page.

thanks for this info 

not exactly. It's a "moved permanently" error message, with a link returned by server in Location field. see rfc here
It avoid any page loss in search engines too.

I see no point in creating separate page, that nobody will see, since based on send 301 header automatic redirect will be made by web-browser.

this is an SEO point of view: pages allready indexed won't become 404 pages, but 301 + new link.

My idea is not letting cache acting passively on results, but rather update it when we rename a page and save it. This way we could indicate in cache the 301 redirect and have a perfect moving message for users and search engines.
When we edit the page, we have all in hands to update cache if needed, and tell the system about new url, isn't it?

In-Portal uses a cache differently. Along with each cached url record a page ID in it is stored. When page is changed, then all cached urls, where it's found are deleted. When page will be visited by new url, then new cache pointing to same ID will be rebuild.


It's universal caching system, that doesn't know that page caches should be treated differently. Also if I'll implement what you've suggested, then there will be 2 cache records for 1 page:
  • old url with 301 mark
  • new url
and both record will point to the same id.

However idea to mark page renames in cache records seems interesting to me. That all stuff won't be working when mod-rewrite is off of course.

of course we are talking about mod-rewrite behavior. Keeping 301 is just an SEO best practice, but may it's not so much needed, as seo tends to be better and better everyday, they can quickly update their index...

if there's another page by that name, we would have the same error message as now "page name must be unique" or page uri will be renamed if automatic filename is checked

It will happen as you say already, but because of it I will need to edit 2 pages instead of 1 (as now) to restore my page original url.

yes, you'll need to edit 2 pages, but this is a standalone point of view. In everyday life of a website, the people in charge of content editing edits dozens of pages, then it's not surprising to edit one more when they need to override a name with another one. Same thing in files management ;-)

Isn't too much, it we act from the beginning, when we rename the page effectively?

I didn't get what you meant here.

It was a resume: acting from the beginning means updating cache actively on page save (if "filename" is changed), instead of letting it discovering an error.

Alexander Obuhovich

unread,
Oct 26, 2011, 9:50:49 AM10/26/11
to in-port...@googlegroups.com
Duplicate discussion here:


Dmitry, problem is still there. We need to fix it.

We forgot to create a task, since we discussed other related problems here (related to page renaming tracking).

Dmitry A.

unread,
Oct 26, 2011, 10:26:41 PM10/26/11
to in-port...@googlegroups.com
Here is a task:

1151: Caching Issue with 404 Page Not Found URLs


Still need to do the patch.

DA
Reply all
Reply to author
Forward
0 new messages