ghost of delete key Posted December 3, 2009 Share Posted December 3, 2009 So now I'm really into PHP, and I'm doing something swift with cURL to transfer files and such. I need to use cURL as my host does not have allow_url_fopen enabled (for security). The problem is that while other URLs work fine, downloading images from ImageShack does not work. cURL seems to fail at locating the resource. I think it may have to do with headers, as of course the browser can fetch and display pics, as can these and other forums, but I don't really know, and I haven't been able to find any reference to this on the web. Anybody have any ideas? Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/ Share on other sites More sharing options...
K^2 Posted December 3, 2009 Share Posted December 3, 2009 When browser fetches a picture, it is a direct connection from your computer to the server hosting the image. It does not go through the host running the PHP. So it really means nothing. If your host does not allow PHP scripts to fetch data from other servers, you are pretty much out of luck. Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/?do=findComment&comment=1059668198 Share on other sites More sharing options...
fred Posted December 3, 2009 Share Posted December 3, 2009 Is cURL failing at the transfer or are you just not getting the response you're expecting? If the former, you can get a hint at what's wrong by checking curl_error(). If the latter, it kinda depends on what you are receiving. If it's a redirect, make sure cURL is set to follow them automatically. A 400 Bad Request means cURL is sending some weird headers so that's probably an error in your script. It could also be the server refusing to serve the image because it doesn't think it's a legitimate request. If you're lucky, you might just need to record your browser request headers and force cURL to send the same ones but I'd guess it's more likely to be refusing your IP, especially if you're with a bigger host sharing an IP with lots of other sites. It could be anything really. You haven't given much to go on but cURL has plenty of options for debugging (setting it to log to a file with the verbose option can be very helpful). Or if you think cURL is getting in the way, you could always use fsockopen() which doesn't need allow_url_fopen. Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/?do=findComment&comment=1059668551 Share on other sites More sharing options...
ghost of delete key Posted December 4, 2009 Author Share Posted December 4, 2009 Thanx for the replies. K^2, I think you missed what I'm saying: I'm fetching an image from a URL with the script, doing many fancy things to it server-side, then feeding it on to its destination. Think "image rotator on steroids". fred: curl_error() tells me error 6: "Curl error: Couldn't resolve host 'img266.imageshack.us'" Here's the snippet of where I crash: (this func is used to compare the remote file against a cached version to see if I need to blahblah...) function curl_last_mod($remote_file) { // return unix timestamp (last_modified) from a remote URL file $last_modified = $ch = $resultString = $headers = ''; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $remote_file); curl_setopt($ch, CURLOPT_TIMEOUT, 5); // 5 sec timeout curl_setopt($ch, CURLOPT_HEADER, 1); // make sure we get the header curl_setopt($ch, CURLOPT_NOBODY, 1); // make it a http HEAD request curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // write the response to a variable curl_setopt($ch, CURLOPT_FILETIME, 1 ); curl_setopt($ch, CURLOPT_FAILONERROR, 1); // MODIFICATION: curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2'); $i = 1; while ($i++ <= 2) { if(curl_exec($ch) === false){ echo 'Curl error: ' . curl_error($ch); // <<HERE'S WHERE IT DIES!! - godk exit; } $headers = curl_getinfo($ch); if ($headers['http_code'] != 200) { sleep(3); // Let's wait 3 seconds to see if its a temporary network issue. } else if ($headers['http_code'] == 200) { // we got a good response, drop out of loop. break; } } $last_modified = $headers['filetime']; if ($headers['http_code'] != 200) echo 'Curl error: ' . curl_error($ch); curl_close ($ch); return $last_modified;} Where it says "// MODIFICATION:"... This is not there in my original version. The script dies with the error 6. When I add the line to try to spoof a real browser, there is no more error, however there is no image either. I simply get a perfectly empty page. As I've mentioned before, the original script is successful with EVERY other image host I've tested it on, it is only ImageShack that borks the thing. My digging indicates that they seem to be doing some stuff to prevent automated scraping. I just found ImageShack's API, and although it's for uploading, I see there is a special user-agent being used, along with other junk. I have phpBB up and running on the host, and it can successfully fetch ImageShack, so I know it's not a server capability issue; but the app is friggin huge, and I haven't yet found the lines of code that do the dirtywork so I can see how to do it right. I need more coffee, and time to tear some hair out of my noggin. Any thoughts? Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/?do=findComment&comment=1059669317 Share on other sites More sharing options...
K^2 Posted December 4, 2009 Share Posted December 4, 2009 K^2, I think you missed what I'm saying: I'm fetching an image from a URL with the script, doing many fancy things to it server-side, then feeding it on to its destination. Think "image rotator on steroids". No, no. I got that part. If you run this script without modification, what do you get? Does image shack reply with anything? And where did all these come from? Because they shouldn't be there. Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/?do=findComment&comment=1059669348 Share on other sites More sharing options...
fred Posted December 4, 2009 Share Posted December 4, 2009 curl_error() tells me error 6: "Curl error: Couldn't resolve host 'img266.imageshack.us'" Sounds like a DNS issue. You could try gethostbyname('img266.imageshack.us') to see if it's a problem with cURL or a more general issue with the resolver. Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/?do=findComment&comment=1059669409 Share on other sites More sharing options...
ghost of delete key Posted December 5, 2009 Author Share Posted December 5, 2009 If you run this script without modification, what do you get? Does image shack reply with anything? And where did all these come from? Because they shouldn't be there. heh, yeah I see that... that was a quick cut& paste from Wordpad. The actual script on the server is correct: ...found a typo, fixed it: curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2'); ... and now setting CURLOPT_USERAGENT (or not!) the script dies with the aforementioned error. (zero output should have told me there was an error in the line; I've seen this already ) So anyway, calling gethostbyname('img266.imageshack.us') only returns the input string, not the IP address, indicating failure. This only reaffirms that the connection fails, but still doesn't tell me why only ImageShack does this, and no others. I've been scouring the web for answers for a couple of weeks, and either haven't found a suitable answer to the problem, or I have, and just don't recognize it yet. I've collected a few hundred bookmarks on this, so maybe I just need to slow down and sleep. I've been looking at fsockopen(), and now I'm bleeding from the eyes and ears. I think I'll try to resolve this issue first before stabbing myself with yet another knife. One step at a time. Imma fire off an email to ImageShack, and see if they're willing to spill any beans. I'm not expecting much, but it couldn't hurt. I'll also apply for a developer key to use the upload API; they promise tech support if you can promise 500 or more daily visits to your site. I'm wondering if that can mean hits through this proposed service... If that's the case, I can expect to generate maybe a thousand or more daily. Otherwise, it would only be a few per day through the user CP, if that much. In the meantime, any other ideas? Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/?do=findComment&comment=1059670560 Share on other sites More sharing options...
fred Posted December 5, 2009 Share Posted December 5, 2009 So anyway, calling gethostbyname('img266.imageshack.us') only returns the input string, not the IP address, indicating failure. You should be able to take that to your host and get them to have a look. There's not much you can do about a failing DNS resolution if you're on shared hosting. Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/?do=findComment&comment=1059670659 Share on other sites More sharing options...
ghost of delete key Posted December 6, 2009 Author Share Posted December 6, 2009 Well, thanks for the effort, but I gotta know it's not my host's fault. (I did mention that phpBB does it just fine from my site using Imageshack as a source, right? ) Imageshack, like World of Warcraft and Google, are doing something fancy to prevent scraping. The hack I tried came from one that works on WoW, but Imageshack seems to be a bit more sophisticated. Like I said before, the script works just fine for 99.9% of the URLs I feed it, including dozens of other image hosting services with which I tested it. Lemme ask y'all a question: If you were to use a site for image preprocessing which allowed you to simply enter the links to pics already at your favorite host (NOT requiring you to freshly upload somewhere), would you bother using it knowing that your usual host wasn't supported, or would you go through the effort of re-uploading your pics to another supported host? If your answer would be "screw it", that's why I have to crack this egg. And it WILL be done, I'm sure. btw, I found that I did indeed stumble upon info on what I need to do, and it is quite convoluted. I'll need to trap all the headers and cookies passed in a normal browser session, and duplicate that verbatim in my script. Certainly not graceful, and tedious. I'm off now to figure out Charles and/or Etherial, or something like that. I'm hoping Imageshack will get back to me with something more useful than a big fat "nope". I also found the image-getter in phpBB, and it's doing roughly what I just mentioned. The thing is, taking a function out of phpBB is like removing a malignant brain tumor; bits of it reach all over and are connected to everything. At least I see what they did there, and they're not using cURL. More later... Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/?do=findComment&comment=1059671937 Share on other sites More sharing options...
K^2 Posted December 6, 2009 Share Posted December 6, 2009 You can try to go ahead and obtain the IP for img266.imageshack.us manually, and use that in your path. That will circumvent the DNS problems. However, make sure that HTML request header contains img266.imageshack.us as the host name. Otherwise, you are likely to get an error in reply. Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/?do=findComment&comment=1059671968 Share on other sites More sharing options...
fred Posted December 6, 2009 Share Posted December 6, 2009 btw, I found that I did indeed stumble upon info on what I need to do, and it is quite convoluted. I'll need to trap all the headers and cookies passed in a normal browser session, and duplicate that verbatim in my script. Certainly not graceful, and tedious. I'm off now to figure out Charles and/or Etherial, or something like that. That's actually very quick and straightforward. You can just copy the headers from the LiveHttpHeaders FF extension and plug them straight into the CURLOPT_HTTPHEADER option. It's also completely pointless. The error message Couldn't resolve host tells you exactly what the problem is and since it's failing at the DNS stage, it's not even getting as far as sending the headers. You've reduced the code to a single function call that doesn't return what it should. Unless you think it's a bug in PHP, it's a matter for your host. If you had a dedicated server, you could try changing the DNS resolvers or installing your own. On shared hosting, there's nothing you can do. If you're still not convinced, try running your code on a few different servers. I just gave it a quick try and it resolves fine for me. Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/?do=findComment&comment=1059672136 Share on other sites More sharing options...
ghost of delete key Posted December 6, 2009 Author Share Posted December 6, 2009 I just gave it a quick try and it resolves fine for me. Not that I don't believe you, because I do, but... srsly? Then why in hell would that not work only for that one IP, where the rest of the world appears fine? If their server expects certain headers, wouldn't it refuse the connection, producing a failure to resolve, or at least look like it to the php engine on my host? Do you think it could have something to do with the fact that I'm driving a .co.cc domain parked on my freehost, trying to access xxx.imageshack.us? I've tried straight from the host IP as well, with the same results. Anyway, at the risk of beating a dead horse, the phpBB script on my site IS reading Imageshack just fine, dishing out avatars and sigs; it's MY script that sucks. I can't imagine that the nameservers are conspiring against me, unless I'm doing something wrong in my script, and Imageshack gives me the hand to talk to. I feel like I'm knocking on the door of an Ex who won't return my calls... I don't relish the thought of poring over host profiles to find the next "right" one, but if I must, so be it. It took me forever to find one with filesize limits large enough to hold phpBB AND have all the right scripting whizzbangs installed and whatnot. (the forum is mostly for live experimentation, but will eventually be the support room for the service.) I wish I could afford paid hosting. 'Cause ya gets what ya pays for. But thanks again for the pointers, guys. Sure I'm a webscript noob, but I guess every nail through my foot will make me walk stronger, huh? Just don't laugh too loud while I limp. I are sensitive. K^2 will be happy to know that I'm doing all this with only Notepad++ and a prayer, and not succumbing to the lazy temptations of NetBeans or Aptana Studio. (they won't run well on by box anyway.) I have to force myself to read this stuff like a novel, instead of treating it like the instructions on a box of mac & cheese. I refuse to be a script kiddie, and I won't settle for not knowing this crap inside and out. Back to the drawing board. Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/?do=findComment&comment=1059672353 Share on other sites More sharing options...
K^2 Posted December 7, 2009 Share Posted December 7, 2009 (edited) You seem to have some misconceptions on how HTTP protocol works. 1) Client connects to DNS server. 2) Client sends host name to DNS server. 3) DNS server sends back IP address of the host. 4) Client connects to host using IP address provided by DNS. 5) Client sends GET or POST request header to the host server. 6) Host server sends back reply header and body. Your header is only checked by host after step 5. DNS resolve failure happens somehwhere between steps 1 and 3. You see how it cannot have anything to do with the header? Why just that particular address? I don't know. Your web host might have imageshack blocked for some reason. It does sound too specific to be a random failure. To try and go around the DNS, try opening command promt, and typing in "ping img266.imageshack.us". That will give you an IP address. You can use address instead of the hosts name in the URL you feed to cURL. Just make sure that imageshack still thinks you are using host name. To do this, add another curl_setopt: curl_setopt($ch, CURLOPT_HOST, "img266.imageshack.us"); This way, you don't use DNS to obtain IP, but imageshack still thinks you did. Edited December 7, 2009 by K^2 Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/?do=findComment&comment=1059673824 Share on other sites More sharing options...
ghost of delete key Posted December 7, 2009 Author Share Posted December 7, 2009 You seem to have some misconceptions on how HTTP protocol works. 1) Client connects to DNS server. 2) Client sends host name to DNS server. 3) DNS server sends back IP address of the host. 4) Client connects to host using IP address provided by DNS. 5) Client sends GET or POST request header to the host server. 6) Host server sends back reply header and body. Your header is only checked by host after step 5. DNS resolve failure happens somehwhere between steps 1 and 3. You see how it cannot have anything to do with the header? Absolutely. Thanks, that clears that up for me. Like I said, I don't know all that much about the fine details of what's going on at "the other side" of my modem. I guess I should take the time to study the RFC specs, etc. huh? Why just that particular address? I don't know. Your web host might have imageshack blocked for some reason. It does sound too specific to be a random failure. Exactly! But it can't be blocked, the bb script works fine from there... To try and go around the DNS, try opening command promt, and typing in "ping img266.imageshack.us". That will give you an IP address. You can use address instead of the hosts name in the URL you feed to cURL. Just make sure that imageshack still thinks you are using host name. To do this, add another curl_setopt: curl_setopt($ch, CURLOPT_HOST, "img266.imageshack.us"); This way, you don't use DNS to obtain IP, but imageshack still thinks you did. See, if I knew what you mentioned at the top, I'd have thought of doing such an end-run myself. So, I just did this, and... IT WORKED. No problem. The image gets cached in my cubbyhole server-side and shows up in the browser. The funny thing is, adding curl_setopt($ch, CURLOPT_HOST, "img266.imageshack.us"); OR curl_setopt($ch, CURLOPT_HOST, "38.99.77.34"); works fine in this case either way, as does omitting it entirely! I don't know what to make of that. It doesn't matter what order each curl_setopt is added, does it? I haven't seen any reference to that notion, and I get the same result no matter where in the cURL session I place it. I now have the problem of actually resolving Imageshack URLS, as img266.imageshack.us is but one of many, and as we found before, calling gethostbyname('img266.imageshack.us') only returns an error, so can't be used to inject the IP. I can't expect the user to go to the trouble of resolving the server of each image and manually correcting the URL at the input... I see I have to rethink this script from the ground up. Thanks again, you got me in the right direction now. More later. Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/?do=findComment&comment=1059674016 Share on other sites More sharing options...
K^2 Posted December 8, 2009 Share Posted December 8, 2009 works fine in this case either way, as does omitting it entirely! I don't know what to make of that. That just means that imageshack doesn't really check the host field. Some servers do. Some actually use it to differentiate between different "sites" on one server. But since imageshack doesn't care about it, just ignore it. As a hacky workaround, you can try using some 3rd party server to resolve the IP, and then call your script with host name replaced by IP. There are plenty of sites out there that can give you IP if you enter a host name. Problem is, you'd probably have to parse an HTML page that the server returns to fetch the IP. Don't know if it would be worth the trouble for you. Another possibility is building your own database of IP addresses for img###.imageshack.us addresses. Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/?do=findComment&comment=1059675525 Share on other sites More sharing options...
ghost of delete key Posted December 8, 2009 Author Share Posted December 8, 2009 works fine in this case either way, as does omitting it entirely! I don't know what to make of that. That just means that imageshack doesn't really check the host field. Some servers do. Some actually use it to differentiate between different "sites" on one server. But since imageshack doesn't care about it, just ignore it. Ah, ok - that's good to know. As a hacky workaround, you can try using some 3rd party server to resolve the IP, and then call your script with host name replaced by IP. There are plenty of sites out there that can give you IP if you enter a host name. Problem is, you'd probably have to parse an HTML page that the server returns to fetch the IP. Don't know if it would be worth the trouble for you. Another possibility is building your own database of IP addresses for img###.imageshack.us addresses. Hmm... these are both notable options. Thanks. It might make sense to use the first tactic to build the second, and run from there. I'm taking a head break looking for more flexible php hosting, but virtually every other free host has some caveat which disqualifies it for my use. I may have to make some harsh compromises, or totally rework my script. I just want to get something running without too much blood loss. *** *** *** On a quasi-off-topic note, were you aware of this? http://www.gtaforums.com/index.php?=PHPE9568F36-D428-11d2-A769-00AA001ACF42 That blew my mind. There are others too. Link to comment https://gtaforums.com/topic/434454-php-curl-imageshack/?do=findComment&comment=1059675623 Share on other sites More sharing options...
Recommended Posts