deutsch     english    français     Print

 

6.3 BING SEARCH, DICTIONARY

 

 

INTRODUCTION

 

It is possible to use known search engines such as Google, Bing or Yahoo to programmatically perform a web search. For this you have to provide additional parameters to the specific URL of the provider that contain the search string, and perform an HTTP GET request with this. This data is evaluated by a web application, i.e. a program that runs on the web server, and the results are returned to you as a HTTP response [more...The server program can be written in different programming languages, are widely used PHP,
Java, Ruby and Python. Web Applications can be written using the Django framework in Python
].

Moreover some Web search providers make available search services that can be used via a programming interface (API, Application Programming Interface). Although these services are mostly with costs, there is sometimes a limited, but free version for training and development purposes. For example, using the Bing Search API, you can create your own search machine with an individualized layout.

Search APIs mostly return results in a special format, namely the JavaScript Object Notation (JSON). With the Python module json it is easy to convert the format into a Python dictionary. But to extract data of your interest, you have first to learn what is a Python dictionary.

PROGRAMMING CONCEPTS: Web application, Python dictionary

 

 

UNDERSTANDING A DICTIONARY

 

As the name suggests, a dictionary is a data structure similar to a dictionary book. You can imagine word pairs with words on the left being in a language you already know and ones on the right in a foreign language (we disregard any ambiguities). The example below shows some names of colors from English to German:

  Deutsch   Englisch
  blau   blue
  rot   red
  grün   green
  gelb   yellow

(In a real-world dictionary words are arranged alphabetically so that finding a specific word is simplified.)

The word on the left is the key and the word on the right is the value. A dictionary thus consists of key-value pairs [more..A key-value-structured data type is named an associative array, map, hash, Hashtable or HashMap]. Both keys and values can have any data type [more...Keys must have an immutable data type. This exludes lists for the key, but allows numbers, strings and tuples].

Your program translates the above colors from German to English. If the input is not in the dictionary, the error is caught and an error message appears.

lexicon = {"blau":"blue", "rot":"red", "grün":"green", "gelb":"yellow"}

print "All entries:"
for key in lexicon:
    print key + " -> " + lexicon[key]

while True:
    farbe = input("color (deutsch)?")
    if farbe in lexicon:
        print farbe + " -> " + lexicon[farbe]
    else:
        print farbe + " -> " + "(not translatable)" 
Highlight program code (Ctrl+C to copy, Ctrl+V to paste)

 

 

MEMO

 

A dictionary consists of key-value pairs. In contrast to a list, these pairs are not ordered. In the definition, you use a pair of curly brackets, separate the respective pairs with commas, and key and value with a colon.

Important operations:


dictionary[key]  provides the value for the key
dictionary[key] = value  adds a new key-value pair
len( dictionary)  provides the number of key-value pairs
del  dictionary(key)  deletes the pair (key and value) with the key
key in  dictionary  returns True when the key exists
dictionary.clear()  deletes all entries, what remains is an empty dictionary

A dictionary can be iterated through with a for loop

for key in dictionary: 

 

 

DICTIONARIES ARE EFFICIENT DATA STRUCTURES

 

You are right if you object to the thought that paired information can be saved in a list. It would be obvious to save each pair as a short list, all of which would be elements of a parent list. Why then is there a dictionary as a separate data structure?
The big advantage of dictionaries is that you can easily and quickly access its values when specifying the key with the bracket notation. So in other words, dictionaries are able to be browsed efficiently. Of course, the efficient retrieval of information only really matters when there are large amounts of data involved, for example when dealing with around a hundred or even thousands of key-value pairs.

As an interesting and useful application, your program should find the postal code of any city in Switzerland. For this, use the text file chplz.txt, which you can download by clicking on the hyperlink. Copy it into the directory where your program is located. The file is structured line by line as follows (and has no blank line, not even at the end):

Aarau:5000
Aarburg:4663
Aarwangen:4912
Aathal Seegraeben:8607
...

Your first task is to convert this text file into a dictionary. In order to do this, first load it in as a string with read() and then split it into individual lines using split("\n") [more...The file can be separated to the rows by default Windows with <CR> <LF>
or after standard Unix with <LF>. Read as a string, the newline character <LF> = \ n
].

To create the dictionary, separate the key and value in each row once again at the colon and add the new pairs to the (originally empty) dictionary using the bracket notation. Just like before with the colors example, you can now access the postal codes using the bracket notation.

file = open("chplz.txt")
plzStr = file.read()
file.close()

pairs = plzStr.split("\n")
print str(len(pairs)) + " pairs loaded"
plz = {}

for pair in pairs:
    element = pair.split(":")
    plz[element[0]] = element[1]

while True:
    town = input("City?")
    if town in plz:
        print "The postal code of " + town + " is " + str(plz[town])
    else:
        print "The city " + town + "was not found."
Highlight program code (Ctrl+C to copy, Ctrl+V to paste)

 

 

MEMO

 

It is very easy and quick to access a value for a certain key in a dictionary [more... There, the hash algorithm is used].

 

 

USING BING FOR YOUR OWN PURPOSES

 

Your program uses the Bing search engine to search for websites with a search string entered by the user and to write out the information provided. In order to access the Bing search machine, you need a personal authentication key. To acquire it, proceed as follows:

Visit the site https://www.microsoft.com/cognitive-services/en-us/apis and choose "Get started for free." You will be prompted to use your existing Microsoft account or create a new one. In the page titled Microsoft Cognitive Services you choose "APIs" and "Bing Web Search" and click on "Request new trials".  Scroll down and select "Search Bing-Free". After confirmation with "Subscribe" you get two key values. Save one of them with copy&paste for further use a a local text file. You can retrieve the keys any time under your Microsoft account.

In your program you send a GET request supplemented with the search string. The response from Bing is a string in which information is structured by curly brackets. The formatting is consistent with the JavaScript Object Notation (JSON). Using the method json.load() it can be converted into a nested Python dictionary, that can then be parsed more efficiently. During a test phase, you can analyze the nesting by writing out the appropriate information to the console. You can comment out or remove these lines later. What does Bing find for the search string "Hillary Clinton"?

import urllib2
import json

def bing_search(query):
    key = 'xxxxxxxxxxxxxxxxxxxxx' # use your personal key
    url = 'https://api.cognitive.microsoft.com/bing/v5.0/search?q=' + query
    urlrequest = urllib2.Request(url)
    urlrequest.add_header('Ocp-Apim-Subscription-Key', key)
    responseStr = urllib2.urlopen(urlrequest)
    response = json.load(responseStr)
    return response

query = input("Enter a search string(AND-connect with +):")
results = bing_search(query)
#print "results:\n" + str(results)
webPages = results['webPages']
print "Number of hits:", webPages["totalEstimatedMatches"]
print "Found URLs:"
values = webPages.get('value')
for item in values:
    print item["displayUrl"]
Highlight program code (Ctrl+C to copy, Ctrl+V to paste)

 

 

MEMO

 

As you can see, a dictionary can in turn contain other dictionaries as values. Thus, hierarchical information structures can be created, similar to XML.

The authentication key is used in a additional header entry of your GET request. You can modify the Bing search by additional query parameters. For example if you append "&count=20" to the URL, you get a total of 20 replies. For more information consult the API reference.

 

 

EXERCISES

 

1.


Improve the postal code program step by step so that a city will be found even if you:
a. input spaces before or after the name of the city
b. do not consistently adhere to the use of upper and lower case letters
c. write umlauts as Ae, Oe, ae, oe, and ue
d. omit accents (note: there is a conflict with ö)
Some places are ambiguous, but have additional information. How will you deal with this?

2.

Use Bing Search to write out the title and the content of the search results with the highest ranking in a HtmlPane. For example, something similar to the following should appear for the search string “tigerjython”: