27. Searching¶
Searching is an important and very common operation that computers do all the time. Searches are used every time someone does a ctrl-f for “find”, when a user uses “type-to” to quickly select an item, or when a web server pulls information about a customer to present a customized web page with the customer’s order.

There are a lot of ways to search for data. Google has based an entire multi-billion dollar company on this fact. This chapter introduces the two simplest methods for searching, the linear search and the binary search.
27.1. Reading From a File¶
Before discussing how to search we need to learn how to read data from a file. Reading in a data set from a file is way more fun than typing it in by hand each time.
Let’s say we need to create a program that will allow us to quickly find the name of a super-villain. To start with, our program needs a database of super-villains. To download this data set, download and save this file:
These are random names generated by the nine.frenchboys.net website, although last I checked they no longer have a super-villain generator. They have other cool random name generators though.
Save this file and remember which directory you saved it to.
In the same directory as super_villains.txt
, create, save, and run the
following Python program:
1 2 3 4 5 6 7 8 9 10 11 12 13 | def main():
""" Read in lines from a file """
# Open the file for reading, and store a pointer to it in the new
# variable "file"
my_file = open("super_villains.txt")
# Loop through each line in the file like a list
for line in my_file:
print(line)
main()
|
There is only one new command in this code: open
. Because it is a built-in
function like print
, there is no need for an import
. Full details on this
function can be found in the Python documentation but at this point the
documentation for that command is so technical it might not even be worth
looking at.
The above program has two problems with it, but it provides a simple example
of reading in a file. Line 6 opens a file and gets it ready to be read. The
name of the file is in between the quotes. The new variable my_file
is an object
that represents the file being read. Line 9 shows how a normal for
loop may be
used to read through a file line by line. Think of the file as a list of lines,
and the new variable line will be set to each of those lines as the program
runs through the loop.
Try running the program. One of the problems with the it is that the text is
printed double-spaced. The reason for this is that each line pulled out of the
file and stored in the variable line includes the carriage return as part of
the string. Remember the carriage return and line feed introduced back in
Chapter 1? The print
statement adds yet another carriage return and the result
is double-spaced output.
The second problem is that the file is opened, but not closed. This problem isn’t as obvious as the double-spacing issue, but it is important. The Windows operating system can only open so many files at once. A file can normally only be opened by one program at a time. Leaving a file open will limit what other programs can do with the file and take up system resources. It is necessary to close the file to let Windows know the program is no longer working with that file. In this case it is not too important because once any program is done running, the Windows will automatically close any files left open. But since it is a bad habit to program like that, let’s update the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | def main():
""" Read in lines from a file """
# Open the file for reading, and store a pointer to it in the new
# variable "file"
my_file = open("super_villains.txt")
# Loop through each line in the file like a list
for line in my_file:
# Remove any line feed, carriage returns or spaces at the end of the line
line = line.strip()
print(line)
my_file.close()
main()
|
The listing above works better. It has two new additions. On line 11 is a call
to the strip
method built into every String
class. This function returns a new
string without the trailing spaces and carriage returns of the original string.
The method does not alter the original string but instead creates a new one.
This line of code would not work:
line.strip()
If the programmer wants the original variable to reference the new string, she must assign it to the new returned string as shown on line 11.
The second addition is on line 14. This closes the file so that the operating system doesn’t have to go around later and clean up open files after the program ends.
27.2. Reading Into an Array¶
It is useful to read in the contents of a file to an array so that the program can do processing on it later. This can easily be done in python with the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | def main():
""" Read in lines from a file """
# Open the file for reading, and store a pointer to it in the new
# variable "file"
my_file = open("super_villains.txt")
# Create an empty list to store our names
name_list = []
# Loop through each line in the file like a list
for line in my_file:
# Remove any line feed, carriage returns or spaces at the end of the line
line = line.strip()
# Add the name to the list
name_list.append(line)
my_file.close()
print( "There were", len(name_list), "names in the file.")
main()
|
This combines the new pattern of how to read a file, along with the previously learned pattern of how to create an empty array and append to it as new data comes in, which was shown back in Adding to a List. To verify the file was read into the array correctly a programmer could print the length of the array:
print( "There were",len(name_list),"names in the file.")
Or the programmer could print the entire contents of the array:
for name in name_list:
print(name)
Go ahead and make sure you can read in the file before continuing on to the different searches.
27.3. Linear Search¶
If a program has a set of data in an array, how can it go about finding where a specific element is? This can be done one of two ways. The first method is to use a linear search. This starts at the first element, and keeps comparing elements until it finds the desired element (or runs out of elements.)
27.4. Linear Search Algorithm¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # --- Linear search
key = "Morgiana the Shrew"
# Start at the beginning of the list
current_list_position = 0
# Loop until you reach the end of the list, or the value at the
# current position is equal to the key
while current_list_position < len(name_list) and name_list[current_list_position] != key:
# Advance to the next item in the list
current_list_position += 1
if current_list_position < len(name_list):
print("The name is at position", current_list_position)
else:
print("The name was not in the list.")
|
The linear search is rather simple. Line 5 sets up an increment variable that
will keep track of exactly where in the list the program needs to check next.
The first element that needs to be checked is zero, so current_list_position
is set to zero.
The next line is a bit more complex. The computer needs to keep looping until one of two things happens. It finds the element, or it runs out of elements. The first comparison sees if the current element we are checking is less than the length of the list. If so, we can keep looping. The second comparison sees if the current element in the name list is equal to the name we are searching for.
This check to see if the program has run out of elements must occur first. Otherwise the program will check against a non-existent element which will cause an error.
Line 12 simply moves to the next element if the conditions to keep searching are met in line 9.
At the end of the loop, the program checks to see if the end of the list was reached on line 14. Remember, a list of n elements is numbered 0 to n-1. Therefore if i is equal to the length of the list, the end has been reached. If it is less, we found the element.
The full example with both the reading in the file and the search is below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | def main():
""" Read in lines from a file """
# Open the file for reading, and store a pointer to it in the new
# variable "file"
my_file = open("super_villains.txt")
# Create an empty list to store our names
name_list = []
# Loop through each line in the file like a list
for line in my_file:
# Remove any line feed, carriage returns or spaces at the end of the line
line = line.strip()
# Add the name to the list
name_list.append(line)
my_file.close()
print("There were", len(name_list), "names in the file.")
# --- Linear search
key = "Morgiana the Shrew"
# Start at the beginning of the list
current_list_position = 0
# Loop until you reach the end of the list, or the value at the
# current position is equal to the key
while current_list_position < len(name_list) and name_list[current_list_position] != key:
# Advance to the next item in the list
current_list_position += 1
if current_list_position < len(name_list):
print("The name is at position", current_list_position)
else:
print("The name was not in the list.")
main()
|
We can improve on this example by moving both the reading of the file and the search into their own functions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | def read_in_file(file_name):
""" Read in lines from a file """
# Open the file for reading, and store a pointer to it in the new
# variable "file"
my_file = open(file_name)
# Create an empty list to store our names
name_list = []
# Loop through each line in the file like a list
for line in my_file:
# Remove any line feed, carriage returns or spaces at the end of the line
line = line.strip()
# Add the name to the list
name_list.append(line)
my_file.close()
return name_list
def linear_search(key, name_list):
""" Linear search """
# Start at the beginning of the list
current_list_position = 0
# Loop until you reach the end of the list, or the value at the
# current position is equal to the key
while current_list_position < len(name_list) and name_list[current_list_position] != key:
# Advance to the next item in the list
current_list_position += 1
return current_list_position
def main():
key = "Morgiana the Shrew"
name_list = read_in_file("super_villains.txt")
list_position = linear_search(key, name_list)
if list_position < len(name_list):
print("The name", key, "is at position", list_position)
else:
print("The name", key, "was not in the list.")
main()
|
27.5. Variations On The Linear Search¶
Variations on the linear search can be used to create several common algorithms. Specifically, you can use it to see if any items in a list match a property, or if all items match a property. You can also use it to pull all matching items out of a list.
27.5.1. Does At Least One Item Have a Property?¶
For example, here is a function that uses the linear search to see if there
is an item in my_list
that matches the value in key
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | def check_if_one_item_has_property_v1(my_list, key):
"""
Return true if at least one item has a
property.
"""
list_position = 0
while list_position < len(my_list) and my_list[list_position] != key:
list_position += 1
if list_position < len(my_list):
# Found an item with the property
return True
else:
# There is no item with the property
return False
|
Using the break
statement, which exits a loop early, we can simplify the
code:
1 2 3 4 5 6 7 8 9 10 11 12 | def check_if_one_item_has_property_v2(my_list, key):
"""
Return true if at least one item has a
property.
"""
for item in my_list:
if item == key:
# Found an item that matched. Return True
return True
# Went through the whole list. Return False.
return False
|
27.5.2. Do All Items Have a Property?¶
How would you test to see if all items in a list match a property? We just need to reverse a few things in the algorithm above.
1 2 3 4 5 6 7 8 9 10 11 | def check_if_all_items_have_property(my_list, key):
"""
Return true if at ALL items have a property.
"""
for item in my_list:
if item != key:
# Found an item that didn't match. Return False.
return False
# Got through the entire list. There were no mis-matches.
return True
|
27.5.3. Create a List With All Items Matching a Property¶
Another common operation is to grab all the items out of a list that match:
1 2 3 4 5 6 7 8 9 10 | def get_matching_items(my_list, key):
"""
Build a brand new list that holds all the items
that match our property.
"""
matching_list = []
for item in my_list:
if item == key:
matching_list.append(item)
return matching_list
|
27.5.4. Full Example¶
For a full example, see below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | import random
def create_list(list_size):
""" Create a list of random numbers """
my_list = []
for i in range(list_size):
my_list.append(random.randrange(100))
return my_list
def check_if_one_item_has_property_v1(my_list, key):
"""
Return true if at least one item has a
property.
"""
list_position = 0
while list_position < len(my_list) and my_list[list_position] != key:
list_position += 1
if list_position < len(my_list):
# Found an item with the property
return True
else:
# There is no item with the property
return False
def check_if_one_item_has_property_v2(my_list, key):
"""
Return true if at least one item has a
property.
"""
for item in my_list:
if item == key:
# Found an item that matched. Return True
return True
# Went through the whole list. Return False.
return False
def check_if_all_items_have_property(my_list, key):
"""
Return true if at ALL items have a property.
"""
for item in my_list:
if item != key:
# Found an item that didn't match. Return False.
return False
# Got through the entire list. There were no mis-matches.
return True
def get_matching_items(my_list, key):
"""
Build a brand new list that holds all the items
that match our property.
"""
matching_list = []
for item in my_list:
if item == key:
matching_list.append(item)
return matching_list
def main():
# Create a list of 50 numbers
my_list = create_list(50)
print(my_list)
# Is at least one item zero?
key = 0
result = check_if_one_item_has_property_v1(my_list, 0)
if result:
print("At least one item in the list is", key)
else:
print("No item in the list is", key)
# Get items that match the key
matching_list = get_matching_items(my_list, key)
print("Matching items:", matching_list)
# Are all items matching?
result = check_if_all_items_have_property(my_list, key)
print("All items in random list matching?", result)
other_list = [0, 0, 0, 0, 0]
result = check_if_all_items_have_property(other_list, key)
print("All items in other list matching?", result)
main()
|
27.6. Variations On The Linear Search With Objects¶
For example, say we had a list of objects for our text adventure. We might want to check that list and see if any of the items are in the same room as our player. Or if all the items are. Or we might want to build a list of items that the user is carrying if they are all in a “special” room that represents the player’s inventory.
To begin with, we’d need to define our adventure object:
1 2 3 4 5 6 7 8 9 10 11 | class AdventureObject:
""" Class that defines an object in a text adventure game """
def __init__(self, description, room):
""" Constructor."""
# Description of the object
self.description = description
# The number of the room that the object is in
self.room = room
|
27.6.1. Does At Least One Item Have a Property?¶
Is at least one object in the specified room? We can check.
while
loop¶1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | def check_if_one_item_is_in_room_v1(my_list, room):
"""
Return true if at least one item has a
property.
"""
i = 0
while i < len(my_list) and my_list[i].room != room:
i += 1
if i < len(my_list):
# Found an item with the property
return True
else:
# There is no item with the property
return False
|
This could also be done with a for
loop. In this case, the loop will exit
early by using a return
once the item has been found. The code is shorter,
but not every programmer would prefer it. Some programmers feel that loops
should not be prematurely ended with a return
or break
statement. It all goes
to personal preference, or the personal preference of the person that is
footing the bill.
for
loop¶1 2 3 4 5 6 7 8 9 | def check_if_one_item_is_in_room_v2(my_list, room):
"""
Return true if at least one item has a
property. Works the same as v1, but less code.
"""
for item in my_list:
if item.room == room:
return True
return False
|
27.6.2. Do All Items Have a Property?¶
Are all the adventure objects in the same room? This code is very similar to the prior example. Spot the difference and see if you can figure out the reason behind the change.
1 2 3 4 5 6 7 8 | def check_if_all_items_are_in_room(my_list, room):
"""
Return true if at ALL items have a property.
"""
for item in my_list:
if item.room != room:
return False
return True
|
27.6.3. Create a List With All Items Matching a Property¶
What if you wanted a list of objects that are in room 5? This is a combination of our prior code, and the code to append items to a list that we learned about back in Introduction to Lists.
1 2 3 4 5 6 7 8 9 10 | def get_items_in_room(my_list, room):
"""
Build a brand new list that holds all the items
that match our property.
"""
matching_list = []
for item in my_list:
if item.room == room:
matching_list.append(item)
return matching_list
|
How would you run all these in a test? The code above can be combined with this code to run:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | def main():
object_list = []
object_list.append(AdventureObject("Key", 5))
object_list.append(AdventureObject("Bear", 5))
object_list.append(AdventureObject("Food", 8))
object_list.append(AdventureObject("Sword", 2))
object_list.append(AdventureObject("Wand", 10))
result = check_if_one_item_has_property_v1(object_list, 5)
print("Result of test check_if_one_item_has_property_v1:", result)
result = check_if_one_item_has_property_v2(object_list, 5)
print("Result of test check_if_one_item_has_property_v2:", result)
result = check_if_all_items_have_property(object_list, 5)
print("Result of test check_if_all_items_have_property:", result)
result = get_matching_items(object_list, 5)
print("Number of items returned from test get_matching_items:", len(result))
main()
|
For a full working example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | class AdventureObject:
""" Class that defines an alien"""
def __init__(self, description, room):
""" Constructor. Set name and color"""
self.description = description
self.room = room
def check_if_one_item_is_in_room_v1(my_list, room):
"""
Return true if at least one item has a
property.
"""
i = 0
while i < len(my_list) and my_list[i].room != room:
i += 1
if i < len(my_list):
# Found an item with the property
return True
else:
# There is no item with the property
return False
def check_if_one_item_is_in_room_v2(my_list, room):
"""
Return true if at least one item has a
property. Works the same as v1, but less code.
"""
for item in my_list:
if item.room == room:
return True
return False
def check_if_all_items_are_in_room(my_list, room):
"""
Return true if at ALL items have a property.
"""
for item in my_list:
if item.room != room:
return False
return True
def get_items_in_room(my_list, room):
"""
Build a brand new list that holds all the items
that match our property.
"""
matching_list = []
for item in my_list:
if item.room == room:
matching_list.append(item)
return matching_list
def main():
object_list = []
object_list.append(AdventureObject("Key", 5))
object_list.append(AdventureObject("Bear", 5))
object_list.append(AdventureObject("Food", 8))
object_list.append(AdventureObject("Sword", 2))
object_list.append(AdventureObject("Wand", 10))
result = check_if_one_item_is_in_room_v1(object_list, 5)
print("Result of test check_if_one_item_is_in_room_v1:", result)
result = check_if_one_item_is_in_room_v2(object_list, 5)
print("Result of test check_if_one_item_is_in_room_v2:", result)
result = check_if_all_items_are_in_room(object_list, 5)
print("Result of test check_if_all_items_are_in_room:", result)
result = get_items_in_room(object_list, 5)
print("Number of items returned from test get_items_in_room:", len(result))
main()
|
These common algorithms can be used as part of a solution to a larger problem, such as find all the addresses in a list of customers that aren’t valid.
27.7. Binary Search¶
A faster way to search a list is possible with the binary search. The process of a binary search can be described by using the classic number guessing game “guess a number between 1 and 100” as an example. To make it easier to understand the process, let’s modify the game to be “guess a number between 1 and 128.” The number range is inclusive, meaning both 1 and 128 are possibilities.
If a person were to use the linear search as a method to guess the secret number, the game would be rather long and boring.
Guess a number 1 to 128: 1
Too low.
Guess a number 1 to 128: 2
Too low.
Guess a number 1 to 128: 3
Too low.
....
Guess a number 1 to 128: 93
Too low.
Guess a number 1 to 128: 94
Correct!
Most people will use a binary search to find the number. Here is an example of playing the game using a binary search:
Guess a number 1 to 128: 64
Too low.
Guess a number 1 to 128: 96
Too high.
Guess a number 1 to 128: 80
Too low.
Guess a number 1 to 128: 88
Too low.
Guess a number 1 to 128: 92
Too low.
Guess a number 1 to 128: 94
Correct!
Each time through the rounds of the number guessing game, the guesser is able to eliminate one half of the problem space by getting a “high” or “low” as a result of the guess.
In a binary search, it is necessary to track an upper and a lower bound of the list that the answer can be in. The computer or number-guessing human picks the midpoint of those elements. Revisiting the example:
A lower bound of 1, upper bound of 128, mid point of \(\frac{128+1}{2}=64.5\).
Guess a number 1 to 128: 64
Too low.
A lower bound of 65, upper bound of 128, mid point of \(\frac{65+128}{2}=96.5\).
Guess a number 1 to 128: 96
Too high.
A lower bound of 65, upper bound of 95, mid point of \(\frac{65+95}{2}=80\).
Guess a number 1 to 128: 80
Too low.
A lower bound of 81, upper bound of 95, mid point of \(\frac{81+95}{2}=88\).
Guess a number 1 to 128: 88
Too low.
A lower bound of 89, upper bound of 95, mid point of \(\frac{89+95}{2}=92\).
Guess a number 1 to 128: 92
Too low.
A lower bound of 93, upper bound of 95, mid point of \(\frac{93+95}{2}=94\).
Guess a number 1 to 128: 94
Correct!
A binary search requires significantly fewer guesses. Worst case, it can guess a number between 1 and 128 in 7 guesses. One more guess raises the limit to 256. 9 guesses can get a number between 1 and 512. With just 32 guesses, a person can get a number between 1 and 4.2 billion.
To figure out how large the list can be given a certain number of guesses, the formula works out like \(n=x^{g}\) where \(n\) is the size of the list and \(g\) is the number of guesses. For example:
If you have the problem size, we can figure out the number of guesses using the log function. Specifically, log base 2. If you don’t specify a base, most people will assume you mean the natural log with a base of \(e \approx 2.71828\) which is not what we want. For example, using log base 2 to find how many guesses:
Enough math! Where is the code? The code to do a binary search is more complex than a linear search:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | # --- Binary search
key = "Morgiana the Shrew"
lower_bound = 0
upper_bound = len(name_list)-1
found = False
# Loop until we find the item, or our upper/lower bounds meet
while lower_bound <= upper_bound and not found:
# Find the middle position
middle_pos = (lower_bound + upper_bound) // 2
# Figure out if we:
# move up the lower bound, or
# move down the upper bound, or
# we found what we are looking for
if name_list[middle_pos] < key:
lower_bound = middle_pos + 1
elif name_list[middle_pos] > key:
upper_bound = middle_pos - 1
else:
found = True
if found:
print( "The name is at position", middle_pos)
else:
print( "The name was not in the list." )
|
Since lists start at element zero, line 3 sets the lower bound to zero. Line 4 sets the upper bound to the length of the list minus one. So for a list of 100 elements the lower bound will be 0 and the upper bound 99.
The Boolean variable on line 5 will be used to let the while loop know that the element has been found.
Line 8 checks to see if the element has been found or if we’ve run out of elements. If we’ve run out of elements the lower bound will end up equaling the upper bound.
Line 11 finds the middle position. It is possible to get a middle position
of something like 64.5. It isn’t possible to look up position 64.5.
(Although J.K. Rowling was rather clever in enough coming up with Platform
\(9\frac{3}{4}\), that doesn’t work here.) The best way of handling
this is to use the //
operator first introduced way back in Chapter 1.
This is similar to the /
operator, but will only return integer results.
For example, 11 // 2
would give 5 as an answer, rather than 5.5.
Starting at line 17, the program checks to see if the guess is high, low, or
correct. If the guess is low, the lower bound is moved up to just past the
guess. If the guess is too high, the upper bound is moved just below
the guess. If the answer has been found, found
is set to True
ending the search.
With the a list of 100 elements, a person can reasonably guess that on average with the linear search, a program will have to check 50 of them before finding the element. With the binary search, on average you’ll still need to do about seven guesses. In an advanced algorithms course you can find the exact formula. For this course, just assume average and worst cases are the same.