So with a site setup using alternative permalink structures (rather than the default ?p=X), when you go to a URL with that ?p=X on it, it will redirect you to the correct URL. What I'm seeing though is while my browser may do this, and headers checks appear to do this, the wp_remote_request returns a 200 and the content of the final URL. If I set the $options array to set 'redirection' to 0, then I get a WP_Error, neither of which tell me the new URL. I tried to get the information from wp_remote_request, through the 'response' / 'headers' keys of the returned array.
Anyone have any tips on how to use wp_remote_request AND get the destination URL?
wp_remote_head() will ignore redirects entirely (ie. it returns the headers of the exact url you request.) wp_remote_get() will treat a max-redirections-exceeded as a failed request, which makes sense under normal situations. However, returning a "normal" style return in the WP_Error error_data field may make life easier for developers..
On 28 February 2011 20:37, Scott Kingsley Clark <sc...@skcdev.com> wrote:
> So with a site setup using alternative permalink structures (rather than > the > default ?p=X), when you go to a URL with that ?p=X on it, it will redirect > you to the correct URL. What I'm seeing though is while my browser may do > this, and headers checks appear to do this, the wp_remote_request returns a > 200 and the content of the final URL. If I set the $options array to set > 'redirection' to 0, then I get a WP_Error, neither of which tell me the new > URL. I tried to get the information from wp_remote_request, through the > 'response' / 'headers' keys of the returned array.
> Anyone have any tips on how to use wp_remote_request AND get the > destination > URL?
Not sure if anyone knows this, but does the page get loaded twice or is the second time getting loaded from some sort of cache? I'm specifically calling to the idea of using wp_remote_head on a URL to check for a redirect, and then using wp_remote_request on the same URL to get the content / etc.
2 separate requests will be 2 separate requests. What's the use-case you're working on here? Personally, I'd do a normal fetch, followed by a head if it was a exceeded-redirects error if you want the body, otherwise, the url.. But i cant think of a case where you'd want one or the other..
On 1 March 2011 04:06, Scott Kingsley Clark <sc...@skcdev.com> wrote:
> Not sure if anyone knows this, but does the page get loaded twice or is the > second time getting loaded from some sort of cache? I'm specifically > calling > to the idea of using wp_remote_head on a URL to check for a redirect, and > then using wp_remote_request on the same URL to get the content / etc. > _______________________________________________ > wp-hackers mailing list > wp-hack...@lists.automattic.com > http://lists.automattic.com/mailman/listinfo/wp-hackers
Actually, this is in regards to a plugin I'm currently developing. It's in Beta right now but it's available on WP.org. It's called Search Engine and it's like a mini-Google on your site. It spiders your site (or other sites too) and indexes content into the DB.
<http://wordpress.org/extend/plugins/search-engine/>The use-case is that I want to be able to tell whether a page that's linked to on a site, is really redirected elsewhere. Right now, since I switched to wp_remote_request, I only get the content of the final destination page, without any knowledge of the path it's taken. So the best my script (or any script) can tell is that when you get content using wp_remote_request and it's redirected, there page exists at the URL requested -- oblivious to the real redirect happening. Previously I was using a home-brewed version similar to wp_remote_request but calling cURL and others manually).
So it looks like right now I'll need to do a little extra code to make my own wp_remote_request like function which does both the 301/302 redirect headers check and the body content return.
On Monday, February 28, 2011 5:11:22 PM UTC-6, Dion Hulse (dd32) wrote:
> 2 separate requests will be 2 separate requests. > What's the use-case you're working on here? > Personally, I'd do a normal fetch, followed by a head if it was a > exceeded-redirects error if you want the body, otherwise, the url.. > But i cant think of a case where you'd want one or the other..
> On 1 March 2011 04:06, Scott Kingsley Clark <sc...@skcdev.com> wrote:
> > Not sure if anyone knows this, but does the page get loaded twice or is > the > > second time getting loaded from some sort of cache? I'm specifically > > calling > > to the idea of using wp_remote_head on a URL to check for a redirect, and > > then using wp_remote_request on the same URL to get the content / etc. > > _______________________________________________ > > wp-hackers mailing list > > wp-ha...@lists.automattic.com > > http://lists.automattic.com/mailman/listinfo/wp-hackers
Not really. The wp_remote_request simply defaults to GET, you can change it to be HEAD, which is what it seems like you are wanting anyway. You can check to see if it is a redirect and then send another request. It does not sound like speed is a concern (albeit one factor since many sites can quite frankly get up there with the amount of redirects given Canonical URLs might give you (Hint: Should be at most 2 requests, one for the redirect and one for the actual page).
You'll probably want to use wp_remote_head() instead, since wp_remote_request() is a generic function made to accommodated the rest of the HTTP and HTTP extensions (there isn't any built-in calls support for Subversion or webdav).
Jacob Santos
On Mon, Feb 28, 2011 at 5:22 PM, Scott Kingsley Clark <sc...@skcdev.com>wrote:
> Actually, this is in regards to a plugin I'm currently developing. It's in > Beta right now but it's available on WP.org. It's called Search Engine and > it's like a mini-Google on your site. It spiders your site (or other sites > too) and indexes content into the DB.
> <http://wordpress.org/extend/plugins/search-engine/>The use-case is that I > want to be able to tell whether a page that's linked to on a site, is > really > redirected elsewhere. Right now, since I switched to wp_remote_request, I > only get the content of the final destination page, without any knowledge > of > the path it's taken. So the best my script (or any script) can tell is that > when you get content using wp_remote_request and it's redirected, there > page > exists at the URL requested -- oblivious to the real redirect happening. > Previously I was using a home-brewed version similar > to wp_remote_request but calling cURL and others manually).
> So it looks like right now I'll need to do a little extra code to make my > own wp_remote_request like function which does both the 301/302 redirect > headers check and the body content return.
> -Scott
> On Monday, February 28, 2011 5:11:22 PM UTC-6, Dion Hulse (dd32) wrote:
> > 2 separate requests will be 2 separate requests. > > What's the use-case you're working on here? > > Personally, I'd do a normal fetch, followed by a head if it was a > > exceeded-redirects error if you want the body, otherwise, the url.. > > But i cant think of a case where you'd want one or the other..
> > On 1 March 2011 04:06, Scott Kingsley Clark <sc...@skcdev.com> wrote:
> > > Not sure if anyone knows this, but does the page get loaded twice or is > > the > > > second time getting loaded from some sort of cache? I'm specifically > > > calling > > > to the idea of using wp_remote_head on a URL to check for a redirect, > and > > > then using wp_remote_request on the same URL to get the content / etc. > > > _______________________________________________ > > > wp-hackers mailing list > > > wp-ha...@lists.automattic.com > > > http://lists.automattic.com/mailman/listinfo/wp-hackers
The spidering process can really take a lot of time for a large site, and can end up eating resources and adding time to the infamous php max_execution_time so I was looking to cut corners. If I've gotta do two requests to do this, I'll do it. Thanks for the advice and attention.
On Monday, February 28, 2011 5:28:54 PM UTC-6, Jacob Santos wrote:
> Not really. The wp_remote_request simply defaults to GET, you can change it > to be HEAD, which is what it seems like you are wanting anyway. You can > check to see if it is a redirect and then send another request. It does not > sound like speed is a concern (albeit one factor since many sites can quite > frankly get up there with the amount of redirects given Canonical URLs > might > give you (Hint: Should be at most 2 requests, one for the redirect and one > for the actual page).
> You'll probably want to use wp_remote_head() instead, since > wp_remote_request() is a generic function made to accommodated the rest of > the HTTP and HTTP extensions (there isn't any built-in calls support for > Subversion or webdav).
> Jacob Santos
> On Mon, Feb 28, 2011 at 5:22 PM, Scott Kingsley Clark <sc...@skcdev.com > >wrote:
> > Actually, this is in regards to a plugin I'm currently developing. It's > in > > Beta right now but it's available on WP.org. It's called Search Engine > and > > it's like a mini-Google on your site. It spiders your site (or other > sites > > too) and indexes content into the DB.
> > <http://wordpress.org/extend/plugins/search-engine/>The use-case is that > I > > want to be able to tell whether a page that's linked to on a site, is > > really > > redirected elsewhere. Right now, since I switched to wp_remote_request, I > > only get the content of the final destination page, without any knowledge > > of > > the path it's taken. So the best my script (or any script) can tell is > that > > when you get content using wp_remote_request and it's redirected, there > > page > > exists at the URL requested -- oblivious to the real redirect happening. > > Previously I was using a home-brewed version similar > > to wp_remote_request but calling cURL and others manually).
> > So it looks like right now I'll need to do a little extra code to make my > > own wp_remote_request like function which does both the 301/302 redirect > > headers check and the body content return.
> > -Scott
> > On Monday, February 28, 2011 5:11:22 PM UTC-6, Dion Hulse (dd32) wrote:
> > > 2 separate requests will be 2 separate requests. > > > What's the use-case you're working on here? > > > Personally, I'd do a normal fetch, followed by a head if it was a > > > exceeded-redirects error if you want the body, otherwise, the url.. > > > But i cant think of a case where you'd want one or the other..
> > > On 1 March 2011 04:06, Scott Kingsley Clark <sc...@skcdev.com> wrote:
> > > > Not sure if anyone knows this, but does the page get loaded twice or > is > > > the > > > > second time getting loaded from some sort of cache? I'm specifically > > > > calling > > > > to the idea of using wp_remote_head on a URL to check for a redirect, > > and > > > > then using wp_remote_request on the same URL to get the content / > etc. > > > > _______________________________________________ > > > > wp-hackers mailing list > > > > wp-h...@lists.automattic.com > > > > http://lists.automattic.com/mailman/listinfo/wp-hackers
I needed the manual redirection because I needed the base_href when no base_href is given in the HTML source. I then need the redirected URI to use that as base_href
Code is not completely done since a use case like:
does not work yet since this site gives 4 as redirect url while (4) is actually a page. So i need to add another check for binary content in the beginning.
But for all none favicon self-redirection this should work.
On Tue, Mar 1, 2011 at 12:31 AM, Scott Kingsley Clark <sc...@skcdev.com>wrote:
> The spidering process can really take a lot of time for a large site, and > can end up eating resources and adding time to the infamous php > max_execution_time so I was looking to cut corners. If I've gotta do two > requests to do this, I'll do it. Thanks for the advice and attention.
> -Scott
> On Monday, February 28, 2011 5:28:54 PM UTC-6, Jacob Santos wrote:
> > Not really. The wp_remote_request simply defaults to GET, you can change > it > > to be HEAD, which is what it seems like you are wanting anyway. You can > > check to see if it is a redirect and then send another request. It does > not > > sound like speed is a concern (albeit one factor since many sites can > quite > > frankly get up there with the amount of redirects given Canonical URLs > > might > > give you (Hint: Should be at most 2 requests, one for the redirect and > one > > for the actual page).
> > You'll probably want to use wp_remote_head() instead, since > > wp_remote_request() is a generic function made to accommodated the rest > of > > the HTTP and HTTP extensions (there isn't any built-in calls support for > > Subversion or webdav).
> > Jacob Santos
> > On Mon, Feb 28, 2011 at 5:22 PM, Scott Kingsley Clark <sc...@skcdev.com > > >wrote:
> > > Actually, this is in regards to a plugin I'm currently developing. It's > > in > > > Beta right now but it's available on WP.org. It's called Search Engine > > and > > > it's like a mini-Google on your site. It spiders your site (or other > > sites > > > too) and indexes content into the DB.
> > > <http://wordpress.org/extend/plugins/search-engine/>The use-case is > that > > I > > > want to be able to tell whether a page that's linked to on a site, is > > > really > > > redirected elsewhere. Right now, since I switched to wp_remote_request, > I > > > only get the content of the final destination page, without any > knowledge > > > of > > > the path it's taken. So the best my script (or any script) can tell is > > that > > > when you get content using wp_remote_request and it's redirected, there > > > page > > > exists at the URL requested -- oblivious to the real redirect > happening. > > > Previously I was using a home-brewed version similar > > > to wp_remote_request but calling cURL and others manually).
> > > So it looks like right now I'll need to do a little extra code to make > my > > > own wp_remote_request like function which does both the 301/302 > redirect > > > headers check and the body content return.
> > > -Scott
> > > On Monday, February 28, 2011 5:11:22 PM UTC-6, Dion Hulse (dd32) wrote:
> > > > 2 separate requests will be 2 separate requests. > > > > What's the use-case you're working on here? > > > > Personally, I'd do a normal fetch, followed by a head if it was a > > > > exceeded-redirects error if you want the body, otherwise, the url.. > > > > But i cant think of a case where you'd want one or the other..
> > > > On 1 March 2011 04:06, Scott Kingsley Clark <sc...@skcdev.com> > wrote:
> > > > > Not sure if anyone knows this, but does the page get loaded twice > or > > is > > > > the > > > > > second time getting loaded from some sort of cache? I'm > specifically > > > > > calling > > > > > to the idea of using wp_remote_head on a URL to check for a > redirect, > > > and > > > > > then using wp_remote_request on the same URL to get the content / > > etc. > > > > > _______________________________________________ > > > > > wp-hackers mailing list > > > > > wp-h...@lists.automattic.com > > > > > http://lists.automattic.com/mailman/listinfo/wp-hackers
> I needed the manual redirection because I needed the base_href when no > base_href is given in the HTML source. > I then need the redirected URI to use that as base_href
> Code is not completely done since a use case like:
> does not work yet since this site gives 4 as redirect url while (4) is > actually a page. So i need to add another check for binary content in the > beginning.
> But for all none favicon self-redirection this should work.
> > I needed the manual redirection because I needed the base_href when no > > base_href is given in the HTML source. > > I then need the redirected URI to use that as base_href
> > Code is not completely done since a use case like:
> > does not work yet since this site gives 4 as redirect url while (4) is > > actually a page. So i need to add another check for binary content in the > > beginning.
> > But for all none favicon self-redirection this should work.
On Fri, Mar 11, 2011 at 3:06 PM, Edward de Leau <e...@leau.net> wrote:
> Thanks for the comments, need to do work on it :)
> 1) ..... I DO prefix: see line 1: namespace leau\co\wp_favicons; (!) > that is why namespaces have been created :)
> (makes it easy to create the plugins)
Ugh. Using namespaces in PHP is bad mojo. Mostly because PHP defined them to be as ugly as possible (seriously, backslashes?), but mainly because PHP versions under 5.3.0 don't support them. WordPress is only upping the requirement to PHP 5.2, and as you can see from the stats, 85% of people are running 5.2: http://wordpress.org/about/stats/
1. backslashes are ugly : yes i posted the same blogcomments on it grin but ... at least they are now here. And they are really needed functionality despite the look. Prefixes are there because PHP did not have namespaces like other programming languages. Seriously : ask plugin writers to create a unique prefix in front of EVERY class and global ?? That is probably even more uglier than the backslash :). A C# or Java developer would think wtf. That is a workaround for simply namespace \com\google\pluginA at the top of every file. Compare phpdoc @package WordPress and @package whatever in plugins. That is the same principle.
Without namespaces (the thousands of plugins and themes around WordPress * the amount of classes, functions and globals etc.. ) = tens of thousands of objects (!!) they use are in potential danger of naming conflicts..... with namespaces... there is none (if people use their domain as part of the namespace).
2. only supported in 5.3 and upwards: I think that is a host problem. { many hosts do support 5.3 but it requires a line in the .htaccess to use that bin dir} PHP 5.3 ... uhm what 2/3 years old? IF 85% of the hosts are now on 5.2 then probably the majority of them have a switch to try 5.3 from another bin. If people don't use that switch its probably because there was no reason to. If they find a plugin that needs it they turn on the switch, so I think that is a bit chicken and egg. But you are right the plugin will have a smaller audience. Then again by the time it is out of beta we are on php 7.5 :)
On Fri, Mar 11, 2011 at 10:12 PM, Otto <o...@ottodestruct.com> wrote: > On Fri, Mar 11, 2011 at 3:06 PM, Edward de Leau <e...@leau.net> wrote: > > Thanks for the comments, need to do work on it :)
> > 1) ..... I DO prefix: see line 1: namespace leau\co\wp_favicons; (!) > > that is why namespaces have been created :)
> > (makes it easy to create the plugins)
> Ugh. Using namespaces in PHP is bad mojo. Mostly because PHP defined > them to be as ugly as possible (seriously, backslashes?), but mainly > because PHP versions under 5.3.0 don't support them. WordPress is only > upping the requirement to PHP 5.2, and as you can see from the stats, > 85% of people are running 5.2: http://wordpress.org/about/stats/
On Fri, Mar 11, 2011 at 3:49 PM, Edward de Leau <e...@leau.net> wrote:
> 2. only supported in 5.3 and upwards: I think that is a host problem. > { many hosts do support 5.3 but it requires a line in the .htaccess to use > that bin dir} > PHP 5.3 ... uhm what 2/3 years old? IF 85% of the hosts are now on 5.2 then > probably the majority of them have a > switch to try 5.3 from another bin.
Brother, most of them are only now switching away from PHP 4. Seriously, on most hosting systems I've used, PHP 4 is the default. There's usually a switch to enable PHP 5, but that's the best you can find.
At this rate, you'll start seeing PHP 6 support on hosts in about 2023.
Now we are 0.5 year later and I see support with most big shared hosting ones: mediatemple, dreamhost, A1, lunarpages, host gator, etc... etc..
So I think the decision to go from EOL for anything earlier than 5.2 to the same discussion EOL for anything earlier than 5.3 will be earlier than 2032 :)
On Fri, Mar 11, 2011 at 10:57 PM, Otto <o...@ottodestruct.com> wrote: > On Fri, Mar 11, 2011 at 3:49 PM, Edward de Leau <e...@leau.net> wrote: > > 2. only supported in 5.3 and upwards: I think that is a host problem. > > { many hosts do support 5.3 but it requires a line in the .htaccess to > use > > that bin dir} > > PHP 5.3 ... uhm what 2/3 years old? IF 85% of the hosts are now on 5.2 > then > > probably the majority of them have a > > switch to try 5.3 from another bin.
> Brother, most of them are only now switching away from PHP 4. > Seriously, on most hosting systems I've used, PHP 4 is the default. > There's usually a switch to enable PHP 5, but that's the best you can > find.
> At this rate, you'll start seeing PHP 6 support on hosts in about 2023.
1. Check content-type, if exists. If it is "text/html" then run the filter to get the favicon.ico.
2. Oh my god, who would have thought an use case like this would have come up?
3. You need to look for "Refresh" header as well. Some web servers (IIS) will send Refresh instead of Location as well as web sites with a redirect message for systems that do not support redirects.
Jacob Santos
On Fri, Mar 11, 2011 at 2:09 PM, Edward de Leau <e...@leau.net> wrote:
> I needed the manual redirection because I needed the base_href when no > base_href is given in the HTML source. > I then need the redirected URI to use that as base_href
> Code is not completely done since a use case like:
> does not work yet since this site gives 4 as redirect url while (4) is > actually a page. So i need to add another check for binary content in the > beginning.
> But for all none favicon self-redirection this should work.
> On Tue, Mar 1, 2011 at 12:31 AM, Scott Kingsley Clark <sc...@skcdev.com > >wrote:
> > The spidering process can really take a lot of time for a large site, and > > can end up eating resources and adding time to the infamous php > > max_execution_time so I was looking to cut corners. If I've gotta do two > > requests to do this, I'll do it. Thanks for the advice and attention.
> > -Scott
> > On Monday, February 28, 2011 5:28:54 PM UTC-6, Jacob Santos wrote:
> > > Not really. The wp_remote_request simply defaults to GET, you can > change > > it > > > to be HEAD, which is what it seems like you are wanting anyway. You can > > > check to see if it is a redirect and then send another request. It does > > not > > > sound like speed is a concern (albeit one factor since many sites can > > quite > > > frankly get up there with the amount of redirects given Canonical URLs > > > might > > > give you (Hint: Should be at most 2 requests, one for the redirect and > > one > > > for the actual page).
> > > You'll probably want to use wp_remote_head() instead, since > > > wp_remote_request() is a generic function made to accommodated the rest > > of > > > the HTTP and HTTP extensions (there isn't any built-in calls support > for > > > Subversion or webdav).
> > > Jacob Santos
> > > On Mon, Feb 28, 2011 at 5:22 PM, Scott Kingsley Clark < > sc...@skcdev.com > > > >wrote:
> > > > Actually, this is in regards to a plugin I'm currently developing. > It's > > > in > > > > Beta right now but it's available on WP.org. It's called Search > Engine > > > and > > > > it's like a mini-Google on your site. It spiders your site (or other > > > sites > > > > too) and indexes content into the DB.
> > > > <http://wordpress.org/extend/plugins/search-engine/>The use-case is > > that > > > I > > > > want to be able to tell whether a page that's linked to on a site, is > > > > really > > > > redirected elsewhere. Right now, since I switched to > wp_remote_request, > > I > > > > only get the content of the final destination page, without any > > knowledge > > > > of > > > > the path it's taken. So the best my script (or any script) can tell > is > > > that > > > > when you get content using wp_remote_request and it's redirected, > there > > > > page > > > > exists at the URL requested -- oblivious to the real redirect > > happening. > > > > Previously I was using a home-brewed version similar > > > > to wp_remote_request but calling cURL and others manually).
> > > > So it looks like right now I'll need to do a little extra code to > make > > my > > > > own wp_remote_request like function which does both the 301/302 > > redirect > > > > headers check and the body content return.
> > > > -Scott
> > > > On Monday, February 28, 2011 5:11:22 PM UTC-6, Dion Hulse (dd32) > wrote:
> > > > > 2 separate requests will be 2 separate requests. > > > > > What's the use-case you're working on here? > > > > > Personally, I'd do a normal fetch, followed by a head if it was a > > > > > exceeded-redirects error if you want the body, otherwise, the url.. > > > > > But i cant think of a case where you'd want one or the other..
> > > > > On 1 March 2011 04:06, Scott Kingsley Clark <sc...@skcdev.com> > > wrote:
> > > > > > Not sure if anyone knows this, but does the page get loaded twice > > or > > > is > > > > > the > > > > > > second time getting loaded from some sort of cache? I'm > > specifically > > > > > > calling > > > > > > to the idea of using wp_remote_head on a URL to check for a > > redirect, > > > > and > > > > > > then using wp_remote_request on the same URL to get the content / > > > etc. > > > > > > _______________________________________________ > > > > > > wp-hackers mailing list > > > > > > wp-h...@lists.automattic.com > > > > > > http://lists.automattic.com/mailman/listinfo/wp-hackers
> 1. Check content-type, if exists. If it is "text/html" then run the filter > to get the favicon.ico.
> 2. Oh my god, who would have thought an use case like this would have come > up?
> 3. You need to look for "Refresh" header as well. Some web servers (IIS) > will send Refresh instead of Location as well as web sites with a redirect > message for systems that do not support redirects.
> Jacob Santos
> On Fri, Mar 11, 2011 at 2:09 PM, Edward de Leau <e...@leau.net> wrote:
> > I have implemented manual redirection for the wp-favicons plugin here:
> > I needed the manual redirection because I needed the base_href when no > > base_href is given in the HTML source. > > I then need the redirected URI to use that as base_href
> > Code is not completely done since a use case like:
> > does not work yet since this site gives 4 as redirect url while (4) is > > actually a page. So i need to add another check for binary content in the > > beginning.
> > But for all none favicon self-redirection this should work.
> > On Tue, Mar 1, 2011 at 12:31 AM, Scott Kingsley Clark <sc...@skcdev.com > > >wrote:
> > > The spidering process can really take a lot of time for a large site, > and > > > can end up eating resources and adding time to the infamous php > > > max_execution_time so I was looking to cut corners. If I've gotta do > two > > > requests to do this, I'll do it. Thanks for the advice and attention.
> > > -Scott
> > > On Monday, February 28, 2011 5:28:54 PM UTC-6, Jacob Santos wrote:
> > > > Not really. The wp_remote_request simply defaults to GET, you can > > change > > > it > > > > to be HEAD, which is what it seems like you are wanting anyway. You > can > > > > check to see if it is a redirect and then send another request. It > does > > > not > > > > sound like speed is a concern (albeit one factor since many sites can > > > quite > > > > frankly get up there with the amount of redirects given Canonical > URLs > > > > might > > > > give you (Hint: Should be at most 2 requests, one for the redirect > and > > > one > > > > for the actual page).
> > > > You'll probably want to use wp_remote_head() instead, since > > > > wp_remote_request() is a generic function made to accommodated the > rest > > > of > > > > the HTTP and HTTP extensions (there isn't any built-in calls support > > for > > > > Subversion or webdav).
> > > > Jacob Santos
> > > > On Mon, Feb 28, 2011 at 5:22 PM, Scott Kingsley Clark < > > sc...@skcdev.com > > > > >wrote:
> > > > > Actually, this is in regards to a plugin I'm currently developing. > > It's > > > > in > > > > > Beta right now but it's available on WP.org. It's called Search > > Engine > > > > and > > > > > it's like a mini-Google on your site. It spiders your site (or > other > > > > sites > > > > > too) and indexes content into the DB.
> > > > > <http://wordpress.org/extend/plugins/search-engine/>The use-case > is > > > that > > > > I > > > > > want to be able to tell whether a page that's linked to on a site, > is > > > > > really > > > > > redirected elsewhere. Right now, since I switched to > > wp_remote_request, > > > I > > > > > only get the content of the final destination page, without any > > > knowledge > > > > > of > > > > > the path it's taken. So the best my script (or any script) can tell > > is > > > > that > > > > > when you get content using wp_remote_request and it's redirected, > > there > > > > > page > > > > > exists at the URL requested -- oblivious to the real redirect > > > happening. > > > > > Previously I was using a home-brewed version similar > > > > > to wp_remote_request but calling cURL and others manually).
> > > > > So it looks like right now I'll need to do a little extra code to > > make > > > my > > > > > own wp_remote_request like function which does both the 301/302 > > > redirect > > > > > headers check and the body content return.
> > > > > -Scott
> > > > > On Monday, February 28, 2011 5:11:22 PM UTC-6, Dion Hulse (dd32) > > wrote:
> > > > > > 2 separate requests will be 2 separate requests. > > > > > > What's the use-case you're working on here? > > > > > > Personally, I'd do a normal fetch, followed by a head if it was a > > > > > > exceeded-redirects error if you want the body, otherwise, the > url.. > > > > > > But i cant think of a case where you'd want one or the other..
> > > > > > On 1 March 2011 04:06, Scott Kingsley Clark <sc...@skcdev.com> > > > wrote:
> > > > > > > Not sure if anyone knows this, but does the page get loaded > twice > > > or > > > > is > > > > > > the > > > > > > > second time getting loaded from some sort of cache? I'm > > > specifically > > > > > > > calling > > > > > > > to the idea of using wp_remote_head on a URL to check for a > > > redirect, > > > > > and > > > > > > > then using wp_remote_request on the same URL to get the content > / > > > > etc. > > > > > > > _______________________________________________ > > > > > > > wp-hackers mailing list > > > > > > > wp-h...@lists.automattic.com > > > > > > > http://lists.automattic.com/mailman/listinfo/wp-hackers
As an update for those who find this via Google and are also looking on how to work with redirects and getting information on where you were redirected to:
2) there is one open issue: https://core.trac.wordpress.org/ticket/16855#comment:29 (for Curl FOLLOWLOCATION must be set to false) (and other like wise new parameter) --> in my testset : 500 out of 1500 URLS get me the message "Too many redirects" ................ (!) (I dont think anyone ever used redirect=0)
3) but since the only reason why one would want to set redirect to 0 = knowing where we are heading to first ( at least that is what I think and since there is the missing feature in #2 apparently noone could have used it anyway in any succes): https://core.trac.wordpress.org/ticket/16950 which would work independent of (2) makes the code of (1) for a great part not necessary and simply requests that URL that is forwarded to with redirects as normal.
one addition: Im thinking to drop the HEAD requests first as suggested earlier. Too many cases where we have first a 200 then a 301/302 or a 405 method not allowed (the HEAD) etc... and only do gets. I have to run that through some more tests.
On Sat, Mar 12, 2011 at 5:07 AM, Edward de Leau <e...@leau.net> wrote:
> On Sat, Mar 12, 2011 at 4:50 AM, Jacob Santos <wordpr...@santosj.name>wrote:
>> 1. Check content-type, if exists. If it is "text/html" then run the filter >> to get the favicon.ico.
>> 2. Oh my god, who would have thought an use case like this would have come >> up?
>> 3. You need to look for "Refresh" header as well. Some web servers (IIS) >> will send Refresh instead of Location as well as web sites with a redirect >> message for systems that do not support redirects.
>> Jacob Santos
>> On Fri, Mar 11, 2011 at 2:09 PM, Edward de Leau <e...@leau.net> wrote:
>> > I have implemented manual redirection for the wp-favicons plugin here:
>> > I needed the manual redirection because I needed the base_href when no >> > base_href is given in the HTML source. >> > I then need the redirected URI to use that as base_href
>> > Code is not completely done since a use case like:
>> > does not work yet since this site gives 4 as redirect url while (4) is >> > actually a page. So i need to add another check for binary content in >> the >> > beginning.
>> > But for all none favicon self-redirection this should work.
>> > On Tue, Mar 1, 2011 at 12:31 AM, Scott Kingsley Clark <sc...@skcdev.com >> > >wrote:
>> > > The spidering process can really take a lot of time for a large site, >> and >> > > can end up eating resources and adding time to the infamous php >> > > max_execution_time so I was looking to cut corners. If I've gotta do >> two >> > > requests to do this, I'll do it. Thanks for the advice and attention.
>> > > -Scott
>> > > On Monday, February 28, 2011 5:28:54 PM UTC-6, Jacob Santos wrote:
>> > > > Not really. The wp_remote_request simply defaults to GET, you can >> > change >> > > it >> > > > to be HEAD, which is what it seems like you are wanting anyway. You >> can >> > > > check to see if it is a redirect and then send another request. It >> does >> > > not >> > > > sound like speed is a concern (albeit one factor since many sites >> can >> > > quite >> > > > frankly get up there with the amount of redirects given Canonical >> URLs >> > > > might >> > > > give you (Hint: Should be at most 2 requests, one for the redirect >> and >> > > one >> > > > for the actual page).
>> > > > You'll probably want to use wp_remote_head() instead, since >> > > > wp_remote_request() is a generic function made to accommodated the >> rest >> > > of >> > > > the HTTP and HTTP extensions (there isn't any built-in calls support >> > for >> > > > Subversion or webdav).
>> > > > Jacob Santos
>> > > > On Mon, Feb 28, 2011 at 5:22 PM, Scott Kingsley Clark < >> > sc...@skcdev.com >> > > > >wrote:
>> > > > > Actually, this is in regards to a plugin I'm currently developing. >> > It's >> > > > in >> > > > > Beta right now but it's available on WP.org. It's called Search >> > Engine >> > > > and >> > > > > it's like a mini-Google on your site. It spiders your site (or >> other >> > > > sites >> > > > > too) and indexes content into the DB.
>> > > > > <http://wordpress.org/extend/plugins/search-engine/>The use-case >> is >> > > that >> > > > I >> > > > > want to be able to tell whether a page that's linked to on a site, >> is >> > > > > really >> > > > > redirected elsewhere. Right now, since I switched to >> > wp_remote_request, >> > > I >> > > > > only get the content of the final destination page, without any >> > > knowledge >> > > > > of >> > > > > the path it's taken. So the best my script (or any script) can >> tell >> > is >> > > > that >> > > > > when you get content using wp_remote_request and it's redirected, >> > there >> > > > > page >> > > > > exists at the URL requested -- oblivious to the real redirect >> > > happening. >> > > > > Previously I was using a home-brewed version similar >> > > > > to wp_remote_request but calling cURL and others manually).
>> > > > > So it looks like right now I'll need to do a little extra code to >> > make >> > > my >> > > > > own wp_remote_request like function which does both the 301/302 >> > > redirect >> > > > > headers check and the body content return.
>> > > > > -Scott
>> > > > > On Monday, February 28, 2011 5:11:22 PM UTC-6, Dion Hulse (dd32) >> > wrote:
>> > > > > > 2 separate requests will be 2 separate requests. >> > > > > > What's the use-case you're working on here? >> > > > > > Personally, I'd do a normal fetch, followed by a head if it was >> a >> > > > > > exceeded-redirects error if you want the body, otherwise, the >> url.. >> > > > > > But i cant think of a case where you'd want one or the other..
>> > > > > > On 1 March 2011 04:06, Scott Kingsley Clark <sc...@skcdev.com> >> > > wrote:
>> > > > > > > Not sure if anyone knows this, but does the page get loaded >> twice >> > > or >> > > > is >> > > > > > the >> > > > > > > second time getting loaded from some sort of cache? I'm >> > > specifically >> > > > > > > calling >> > > > > > > to the idea of using wp_remote_head on a URL to check for a >> > > redirect, >> > > > > and >> > > > > > > then using wp_remote_request on the same URL to get the >> content / >> > > > etc. >> > > > > > > _______________________________________________ >> > > > > > > wp-hackers mailing list >> > > > > > > wp-h...@lists.automattic.com >> > > > > > > http://lists.automattic.com/mailman/listinfo/wp-hackers