Broken links, when building through event, that runs from cron

4 views
Skip to first unread message

Alexander Obuhovich

unread,
Apr 26, 2010, 7:18:05 AM4/26/10
to In-Portal Bugs
Usually cron is used to perform some maintenance and site database updated. Recently I've came across situation, where this was not completely true.

Here is background info:
  • I have a page, that takes 5 minutes to display (is generated on the fly)
  • other website uses that page to import some data from my site
  • other site can't wait 5 minutes and closes connection before all data is retrieved
Solution:
  • prepare that page using cron and update it's contents twice per day
  • save prepared page to file
  • when other site asks for generated page, then return file, saved before
All worked fine, until I have need to place links to my own site in that generated file. In such case all links were look like "http://localhost/path/to/document_root/sample_page.html" instead of "http://www.site-name.com/sample_page.html".

This happened because we use $_SERVER['HTTP_HOST'] to build url and we don't have one, since cron doesn't use web server to run script, that generated that page.

Solution is to use domain from "config.php" file instead of one from $_SERVER['HTTP_HOST'] when we are running from cron. Only change is to be made in "kApplication::BaseURL" method.

--
Best Regards,

http://www.in-portal.com
http://www.alex-time.com

--
You received this message because you are subscribed to the Google Groups "In-Portal Bugs Team" group.
To post to this group, send email to in-port...@googlegroups.com.
To unsubscribe from this group, send email to in-portal-bug...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/in-portal-bugs?hl=en.

Phil -- wbtc.fr --

unread,
Apr 26, 2010, 7:30:17 AM4/26/10
to in-port...@googlegroups.com
of course this is a definitive solution, I'll forget cURL ^-^

2010/4/26 Alexander Obuhovich <aik....@gmail.com>

Dmitry A.

unread,
Apr 26, 2010, 8:19:30 AM4/26/10
to In-Portal Bugs Team
Thanks for bringing this up Alex.

Let's have a quick look if there is anything else critical that we
depend from Web-server ENV which is not available when we are using
PHP in command line mode.


DA.

On Apr 26, 6:30 am, "Phil -- wbtc.fr --" <p...@wbtc.fr> wrote:
> of course this is a definitive solution, I'll forget cURL ^-^
>
> 2010/4/26 Alexander Obuhovich <aik.b...@gmail.com>
>
>
>
>
>
> > Usually cron is used to perform some maintenance and site database updated.
> > Recently I've came across situation, where this was not completely true.
>
> > Here is background info:
>
> >    - I have a page, that takes 5 minutes to display (is generated on the
> >    fly)
> >    - other website uses that page to import some data from my site
> >    - other site can't wait 5 minutes and closes connection before all data
> >    is retrieved
>
> > Solution:
>
> >    - prepare that page using cron and update it's contents twice per day
> >    - save prepared page to file
> >    - when other site asks for generated page, then return file, saved
> >    before
>
> > All worked fine, until I have need to place links to my own site in that
> > generated file. In such case all links were look like "*
> >http://localhost/path/to/document_root/sample_page.html*" instead of "*
> >http://www.site-name.com/sample_page.html*".
>
> > This happened because we use $_SERVER['HTTP_HOST'] to build url and we
> > don't have one, since cron doesn't use web server to run script, that
> > generated that page.
>
> > Solution is to use domain from "config.php" file instead of one from
> > $_SERVER['HTTP_HOST'] when we are running from cron. Only change is to be
> > made in "kApplication::BaseURL" method.
>
> > --
> > Best Regards,
>
> >http://www.in-portal.com
> >http://www.alex-time.com
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "In-Portal Bugs Team" group.
> > To post to this group, send email to in-port...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > in-portal-bug...@googlegroups.com<in-portal-bugs%2Bunsubscribe@go oglegroups.com>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/in-portal-bugs?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups "In-Portal Bugs Team" group.
> To post to this group, send email to in-port...@googlegroups.com.
> To unsubscribe from this group, send email to in-portal-bug...@googlegroups.com.
> For more options, visit this group athttp://groups.google.com/group/in-portal-bugs?hl=en.

Alexander Obuhovich

unread,
Apr 26, 2010, 3:40:48 PM4/26/10
to in-port...@googlegroups.com
Also I were not using curl because it creates additional load on webserver and nginx and apache could simply drop my connection resulting empty page being saved.

Phil -- wbtc.fr --

unread,
Apr 26, 2010, 4:38:53 PM4/26/10
to in-port...@googlegroups.com
I agree, curl is the latest choice. I bet your question is related to sitemap generation, isn't it?

2010/4/26 Alexander Obuhovich <aik....@gmail.com>

Alexander Obuhovich

unread,
Apr 26, 2010, 5:03:34 PM4/26/10
to in-port...@googlegroups.com
Nope, it was site product XML export with all options and images to other website.

Phil -- wbtc.fr --

unread,
Apr 27, 2010, 6:06:39 AM4/27/10
to in-port...@googlegroups.com
ok. the same type of code could be used for sitemap generation, isn't it?

Dmitry Andrejev

unread,
May 8, 2010, 11:36:47 PM5/8/10
to in-port...@googlegroups.com
Back to original bug.

I have checked on that BaseUrl and we are using SERVER_NAME constant everywhere, but what's more strange is that Constant already setup to work as needed. This is a code from the startup.php

safeDefine('SERVER_NAME', $_SERVER['HTTP_HOST'] ? $_SERVER['HTTP_HOST'] : $vars['Domain']);


Alex, what you think on this?


Also, I have attached PHPINFO ran from non-Apache php process.




DA.
phpinfo_nowebserver.txt

Alexander Obuhovich

unread,
May 9, 2010, 7:12:18 AM5/9/10
to in-port...@googlegroups.com
We have code:

$_SERVER['HTTP_HOST'] = 'localhost' in cron.php and that's why it's broken. This was done because accessing directly to $_SERVER['HTTP_HOST'] in php from command line we will get error.

Nobody built links to the site from cron before and this problem was not discovered.

Dmitry A.

unread,
May 14, 2010, 12:11:31 PM5/14/10
to In-Portal Bugs Team
Sorry Alex, I haven't noticed that whole thing in tools/cron.php

if (CMD_MODE) {
define('DBG_SKIP_REPORTING', 1);
$_SERVER['HTTP_USER_AGENT'] = 'gecko';
$_SERVER['HTTP_HOST'] = 'localhost';
}


Yes, we definitely need a task where to convert to using the Domain
variable.

Additional note, if we allow pass domain name from the arguments or
something like that. I am think ahead when you might have 1 cron
running for site with SiteDomains or something which may build all
URLs (your case) with the same Domain the URL?


DA.

On May 9, 6:12 am, Alexander Obuhovich <aik.b...@gmail.com> wrote:
> We have code:
>
> *$_SERVER['HTTP_HOST'] = 'localhost'* in cron.php and that's why it's
> broken. This was done because accessing directly to $_SERVER['HTTP_HOST'] in
> php from command line we will get error.
>
> Nobody built links to the site from cron before and this problem was not
> discovered.
>
>
>
>
>
> On Sun, May 9, 2010 at 6:36 AM, Dmitry Andrejev <dandre...@gmail.com> wrote:
> > Back to original bug.
>
> > I have checked on that BaseUrl and we are using SERVER_NAME constant
> > everywhere, but what's more strange is that Constant already setup to work
> > as needed. This is a code from the startup.php
>
> > safeDefine('SERVER_NAME', $_SERVER['HTTP_HOST'] ? $_SERVER['HTTP_HOST'] :
> > $vars['Domain']);
>
> > Alex, what you think on this?
>
> > Also, I have attached PHPINFO ran from non-Apache php process.
>
> > DA.
>
> > On Tue, Apr 27, 2010 at 5:06 AM, Phil -- wbtc.fr -- <p...@wbtc.fr> wrote:
>
> >> ok. the same type of code could be used for sitemap generation, isn't it?
>
> >>  2010/4/26 Alexander Obuhovich <aik.b...@gmail.com>
>
> >>> Nope, it was site product XML export with all options and images to other
> >>> website.
>
> >>> On Mon, Apr 26, 2010 at 11:38 PM, Phil -- wbtc.fr -- <p...@wbtc.fr>wrote:
>
> >>>> I agree, curl is the latest choice. I bet your question is related to
> >>>> sitemap generation, isn't it?
>
> >>>> 2010/4/26 Alexander Obuhovich <aik.b...@gmail.com>
>
> >>>>> Also I were not using curl because it creates additional load on
> >>>>> webserver and nginx and apache could simply drop my connection resulting
> >>>>> empty page being saved.
>
> Best Regards,
>
> http://www.in-portal.comhttp://www.alex-time.com
>
> --
> You received this message because you are subscribed to the Google Groups "In-Portal Bugs Team" group.
> To post to this group, send email to in-port...@googlegroups.com.
> To unsubscribe from this group, send email to in-portal-bug...@googlegroups.com.

Alexander Obuhovich

unread,
Aug 31, 2010, 3:57:34 PM8/31/10
to in-port...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages