Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » U++ Library support » U++ Core » HttpRequest : problem with multiple redirections ?
HttpRequest : problem with multiple redirections ? [message #43460] Thu, 07 August 2014 11:12 Go to next message
jibe is currently offline  jibe
Messages: 294
Registered: February 2007
Location: France
Experienced Member
Hi,

I want to get some data about books on various websites : ISBNdb, GoogleBooks, Worldcat... This works well with all of them, but not always with Amazon.fr : with it, using the same URL, I do not get always the same content! Sometimes, it is the one I get with Firefox, and sometimes it is another. This seems to be aleatory, and any content I get, there is never any error (HttpRequest::GetError() returns 0).

Trying with wget (I am using Linux), I see that with this URL, there is 3 redirections. Could it be the problem ? And if not, what could it be ?

My code is very simple : I'm supposed to know the URL for the book (if not, I make a search on the ISBN of the book, and I always find the right URL, even on Amazon.fr - if the book is known on the site, of course !)
String content;
HttpRequest http;
...
http.Url(url);
content = http.Execute();

An url showing this problem :
http://www.amazon.fr/14-Jean-Echenoz/dp/2707322571/ref=sr_1_1/278-1397759-3160153?ie=UTF8&qid=1372075436&sr=8-1&keywords=9782707322579


Any idea ?
Re: HttpRequest : problem with multiple redirections ? [message #43465 is a reply to message #43460] Thu, 07 August 2014 18:02 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
Hi,

I have tried and it seems OK. Hard to say more without knowing the "BAD" content.

Generally, the problem might be on server side, perhaps they are doing A-B testing or something.

Another issue is that redirections go bad - many sites are using cookies in redirection process; U++ HttpRequest tries to emulate the browser as much as possible, but of course it does not store cookies persistently, only across redirections.

In any case, I recommend putting HttpRequest::Trace into code to trace all HTTP comms in log, then perhaps you can compare them to Firefox logs (headers and such) or perhaps Chrome logs...

Mirek
Re: HttpRequest : problem with multiple redirections ? [message #43470 is a reply to message #43460] Fri, 08 August 2014 15:02 Go to previous messageGo to next message
jibe is currently offline  jibe
Messages: 294
Registered: February 2007
Location: France
Experienced Member
Hi Mirek,

Thanks for your reply. The "bad" content is not so bad, it seems to be another similar (outdated ?) page about the same book. The problem is that it's not organized the same way, so I don't retrieve the data I need, or I should parse it a different way.

I will try tracing the requests and see what can be done.

What is surprising is that I get (sometimes) this bad content only when I get directly the URL. The first time I look for a book, I make a search on the site, obtain a list of books, select the right one and follow the link. This link is the URL that I store and use next times, but curiously, I get always the right page when I first search the book rather than using the stored URL!

I just wanted to have other's opinion about this : anyway, I can workaround the problem either parsing the "bad" content when I get it, or doing the search of the book first rather than use the direct url. I'll let know if I find the reason of this bad content.

Thanks for your advices.
Re: HttpRequest : problem with multiple redirections ? [message #43476 is a reply to message #43470] Sat, 09 August 2014 09:51 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
jibe wrote on Fri, 08 August 2014 15:02
Hi Mirek,

Thanks for your reply. The "bad" content is not so bad, it seems to be another similar (outdated ?) page about the same book. The problem is that it's not organized the same way, so I don't retrieve the data I need, or I should parse it a different way.

I will try tracing the requests and see what can be done.

What is surprising is that I get (sometimes) this bad content only when I get directly the URL. The first time I look for a book, I make a search on the site, obtain a list of books, select the right one and follow the link. This link is the URL that I store and use next times, but curiously, I get always the right page when I first search the book rather than using the stored URL!

I just wanted to have other's opinion about this : anyway, I can workaround the problem either parsing the "bad" content when I get it, or doing the search of the book first rather than use the direct url. I'll let know if I find the reason of this bad content.

Thanks for your advices.


Are you using the same HttpRequest for both? In that case, it would mean cookies are responsible... HttpRequest preserves cookies even for successive calls. You can also try if that is the issue by using "CopyCookies" (copies cookies from one HttpRequest to another).

Mirek

[Updated on: Sat, 09 August 2014 09:51]

Report message to a moderator

Re: HttpRequest : problem with multiple redirections ? [message #43488 is a reply to message #43460] Mon, 11 August 2014 09:34 Go to previous message
jibe is currently offline  jibe
Messages: 294
Registered: February 2007
Location: France
Experienced Member
Hi, Mirek,

Yes, it's that Smile

I tried to remove cookies on my browser, and I obtain the "bad" page (curious site, giving an almost similar page with a very different code - all CSS classes and id are different ! - depending on the cookies...).

What is done in my application is that : the first time it looks for the book by the ISBN, obtain a list of the corresponding books (normaly only one, as 2 different books cannot have the same ISBN), then follow the link to get the page. I keep this URL in the database. It's sometime later that, if we use the link, we get the "bad" page. But in this case, I think that the cookie is no more available, as the application has been stopped...

Probably, I should keep the cookie in the database ? Well, I will see : probably a workaround will finaly be simpler.

Thank you for your help !
Previous Topic: FIX FindBest in Algo.h
Next Topic: HttpClient -> ssl
Goto Forum:
  


Current Time: Thu Mar 28 22:44:15 CET 2024

Total time taken to generate the page: 0.01262 seconds