Using Google Search AJAX API in Python
Using Google Search AJAX API in Python
In this article, I will will attempt to teach you how to use the “Google AJAX Search API”. Lets break this down and look at each part.
-
Google -> Well, its obvious… duh
-
AJAX -> It is a programming language which stands for “Asynchronous Javascript and XML”. It is used to create better, faster and more interactive web applications.
-
Search -> It means that the user “queries” for something and gets the result. Much like Google Search.
-
API -> It stands for “Application Programming Interface”. It specifies how some software components must interact with each other. In this case, we can say that it is a set of programming instructions for accessing a Web-Tool (Web-based Software).
So what we want to do is write a program to interact with the Google AJAX Search API. That way we can get the results of a google search, by using a program. This will be helpful in Timed 6 (i think, haven’t completed that yet)
As the Title suggests,I will be explaining the python program to do this. One can obviously use other programming languages also, but thats some other Article.
Before you continue from here, it is essential that the reader know Basic Python Syntax, Certain Modules (urllib, urllib2). Whatever is not mentioned here, but are in the article, I suggest reading it side by side for easier understanding.
SO LET’S GET CODING !! :D
The first line is normally the import line. I know that i omitted the “#!/usr/bin/python” line, But that is understood. Now, the import line is the line where we import the modules we are about to use. Which in this case is
import urllib, simplejson
I will explain the module simplejson as you continue through the article. But for now you can say that it is used as an interpreter between Python and Javascript Object Notation (JSON). I would suggest going through this: http://pymotw.com/2/json/
Now, we need something to search for. So we can ask the user to input a query.
query = raw_input(“Please enter you query: “)
Now, probably comes something new. We have the query or the thing we want to search for. But before sending it directly to the site we have to convert it into something the Search Engine will understand. This is called URL Encoding. Its a pretty simple concept and you can google it if you want. To do this is python we use a built-in method in the urllib class called “urlencode”.
query = urllib.urlencode({‘q’:query})
The url to the Google Search AJAX API is given below.
url = ‘http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s’ % (query)
The above line stores the url we want to open to search for the user-defined query. What the above line actually does is set the variable url to “http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=url_encoded_query” Here we can see that there are two variables in the url. ie. ‘v’ and ‘q’. ‘v’ holds the version number of the search API which at the time of writting has only one value ‘1.0’. ‘q’ holds the user-defined query. There are a few more variables which i will tell at the end.
Now what we want to do is to make the python program open the url so that it will return the search results. To do this will make use of the ‘urlopen’ method in the urllib class.
search_results = urllib.urlopen(url)
Now the contents of the results page are stored in the search_results. But it is not stored directly in user readable format. It is shown in a JSON format. To extract information from the results page we will make use of the simplejson module. This simplejson module can be used to extract the information stored in the JSON format. So now what we are going to do is load the information onto another variable.
json = simplejson.loads(search_results.read()) results = json[‘responseData’][‘results’]
Now the search results are stored in the variable results. I am not really sure how to put this, but results variable is a LIST of DICTIONARIES. Meaning results[1] is a dictionary, results[2] is a dictionary and so on. One thing to remember is that this ONLY returns the first four search results.
Now its obvious, all we have left to do is print those results on screen. Each index the variable ‘results’ has a dictionary with the following keys. 1.GsearchResultClass 2.visibleUrl 3.titleNoFormatting 4.title 5.url 6.cacheUrl 7.unescapedUrl 8.content Now we will be using only the keys ‘title’ and ‘url’ and print those to screen. But you can print out any of the above. For printing the info on the screen.
for item in results: print item[‘title’] + “: “ + item[‘url’]
And thats it. You have just used a Python program to do a google search. Now, as i said before also. This only returns 4 search results. Which means the maximum index of results is 3 (because computer counting starts from 0(zero), wiz 0,1,2,3) To increase the number of results we will have to make a few changes to the url variable. Now thing you have to realise is that the API will return only 4 results at time. I had read about a method for the API to return 8 results but that did not work for me. So if the API returns only 4 results at a time, we can first ask for 4 results and then ask for another 4 and so on. The change to the url (NOT the variable… the actual URL) is :
url = ’http://ajax.googleapis.com/ajax/services/search/web?v=1.0&start=(start_index)&q=(url_encoded_query)
Here start_index must be a multiple of 4. What it tells the API is which index to start returning results from. I know its very bluntly put but you will understand pretty soon.
The complete code can be found here: http://pastebin.com/0Gba0hNC
Hope you learnt something. Dont forget to rate the article. And i am open to criticism so feel free to comment below or send me a PM if there are any corrections to be made to the Article.