Welcome to HBH! If you have tried to register and didn't get a verification email, please using the following link to resend the verification email.

Regular Expressions


ghost's Avatar
0 0

Hello ladies and gentlemen. School has started this year, and I am on an advanced team(arrived late). We have been given perhaps over 200+ words to define by Monday and instead of doing that, I decided to write a Python script that will allow me to type the words and rip the definitions from http://www.dictionary.com. I noticed that it basically follows this pattern http://dictionary.com/search?q=(word). I wrote a script that gets the definitions, but now I need to be able to get the actual definitions and not the source code. The definitions that I need are located between

<td valign="top"> 

and

</table>

tags

Here is my script: http://pastebin.com/m2c908524

The source code for an actual page of dictionary.com http://pastebin.com/m161549b2


spyware's Avatar
Banned
0 0

Select everything between, and including those two tags, then strip the tags.


ghost's Avatar
0 0

Perhaps this: markup<table class="luna-Ent"><tr><td valign="top" class="dn">[0-9].</td><td valign="top">(.*)</td></tr></table>

it's eregi, not preg.

get match 2 (starts at 0) then strip tags aswell.


ghost's Avatar
0 0

Thanks guys, I managed to get it working when I stumbled on an old IBM article I didn't even have to use reg expressions. I'm going to put this in the code bank and credit you guys if it is up.


fashizzlepop's Avatar
Member
0 0

Thats sounds like a really helpful script I would use during school. Can you post the complete source if the code bank dont work?


ghost's Avatar
0 0

fashizzlepop wrote: Thats sounds like a really helpful script I would use during school. Can you post the complete source if the code bank dont work? The link is here: http://pastebin.com/f7e56f5a3 and it is in the code bank as well B) I'll be doing some more tweaking throughout the weekend as well so that it can read definitions from files, and be more efficient at removing HTML