Best way to parse http response body
Hello all, in each timed challenge we need to keep the http response body, serach a string on it, do some operations and do a new request with the result value. Now i already did real 11 and i had no problems parsing the response to find the number between the two tags <h1></h1>. Now the challenge is a bit different so we have to use regular expression to find the right string. Im here to ask suggestions on the better/easy way to parse the response body and search that string: Im using a curl php script that save the http response headers in a array then im using a regular expression in a preg match that find the string. Im having a lot of problems parsing the response body and finding the needed string.
Any help/suggestion will be appreciated, thanks.
Hi,
with cURL/PHP do like this:
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL,"http://www.hellboundhackers.org/challenges/timed/timed3/index.php");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_VERBOSE, 1);
curl_setopt($curl, CURLOPT_COOKIE,"PHPSESSID=5b1sXXXXo5niv5p0t24ntbh56X;fusion_user=13XXX.cXXX282138afbe9066b8be1cb426841d");
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux i686; it; rv:1.8.1.5) Gecko/20070713 Firefox/2.0.0.5");
$result = curl_exec ($curl);
curl_close ($curl);
print $result;
The variable $result will have the HTML body of the page. Remember, you must spoof the Useragent because HBH doesn't allow cURL traffic!
you can use python (the way I do it)
url = 'http://www.hellboundhackers.org/challenges/timed/timed4/index.php' footer = 'Your word is: <strong>' trailer = '</strong><br /><br /><form action='
… some urllib code here….
f = data.find(footer) t = data.find(trailer) data = data[f:t] data = data.split(':') data = data[1].replace(' ', '') data = data.replace('<strong>', '')
simple, not very elegant, but it works! :-)
please feel free to PM me.
contmp wrote: you can use python (the way I do it)
url = 'http://www.hellboundhackers.org/challenges/timed/timed4/index.php' footer = 'Your word is: <strong>' trailer = '</strong><br /><br /><form action='
… some urllib code here….
f = data.find(footer) t = data.find(trailer) data = data[f:t] data = data.split(':') data = data[1].replace(' ', '') data = data.replace('<strong>', '')
simple, not very elegant, but it works! :-)
please feel free to PM me.
word = re.findall('\<strong\>[A-Za-z0-9 ]+</strong>', html)[0].replace('<strong>','').replace('</strong>', '')
Zephyr_Pure wrote: [quote]The_Gman wrote: It was still on the front page, shithead
The post before yours was from a year ago, fucktard. Pay attention and shut the hell up if you don't know what you're talking about.[/quote] I wasn't denying that it was a year old, I was saying it was on the front page. Does it really bother you that a topic was moved up twenty lines?
Especially when I contributed something that can benefit others.
I'm glad you wasted your time posting :angry:
Dude, SHUT UP. You bumped a year-old thread… that's a no-no. Doesn't matter if what you said would help, because the OP has already forgotten about the topic. When the topic is long dead, the OP obviously isn't interested anymore. When the OP leaves a thread for dead, leave it dead. It's not fucking rocket science.