Welcome to HBH! If you have tried to register and didn't get a verification email, please using the following link to resend the verification email.

webspider in python


ghost's Avatar
0 0

does anyone know any good tutorials on how to make a web spider in python.


richohealey's Avatar
Python Ninja
0 0

you just need the urllib module and the re module, tp search for <a> tags in the source of each page.


ghost's Avatar
0 0

you make that sounds pretty easy

problem is i don't know what those are so not much help heh

I'm not very decent in python, i'm alright in php but i think it make for a better engine if i use python.

this means alot of reading and trial and error I'm guessing but yea need a pointer on where to start.


ghost's Avatar
0 0

so does anyone know where i can find any good python tutorials on making a web spider?

and if not i may as well look at any php ones?


ghost's Avatar
0 0

WarpedSkittle wrote: and if not i may as well look at any php ones? I'll show you the source to mine, although it might be hard to understand because it is not commented and it might be a bit messy.


ghost's Avatar
0 0

that would be helpfull

also quick question along with that, are php made spiders much more limited in functionality?


ghost's Avatar
0 0

WarpedSkittle wrote: that would be helpfull

also quick question along with that, are php made spiders much more limited in functionality? Functionality: no Speed: somewhat

My bot can do anything the googlebot (which is written in C) can do, however, I think I read that googlebot can collect about 100 pages a second, while mine can collect about 1 page a second :p

And PM me when you want to see the source.