voronoi

Let's go out to eat!
Show me places I would like
By learning my tastes.

Introduction

In this project, you will create a visualization of restaurant ratings using machine learning and the Yelp academic dataset. In this sample visualization (the map image above), a local map of the Peninsula is segmented into regions, where each region is shaded by the predicted rating of the closest restaurant (yellow shading is 5 stars, blue shading is 1 star). Specifically, the visualization you will be constructing is a Voronoi diagram.

In the map image above, each dot represents a restaurant. The color of each dot is determined by the restaurant's location. For example, restaurants in downtown San Mateo are colored red. The user preferences that generated this map reflect a strong preference for Menlo Park and Redwood City area restaurants, as those southeastern Peninsula regions on the visualization are shaded yellow.

This project uses concepts from Sections 2.1, 2.2, 2.3, and 2.4.3 of Composing Programs. It also introduces techniques and concepts from machine learning, a growing field at the intersection of computer science and statistics that analyzes data to find patterns and make predictions.

Don't worry: you do not need to master Voronoi diagrams, machine learning, nor statistics to complete this project. Each step is broken into enough details to help you complete it, one problem at a time. Put everything together, and see cool statistical analysis of real data being visualized. How fun!

Starter files

Do not use cloud9 for this project. At some point, this project loads a graphical interface (map data and images through a web portal), which does not function correctly in cloud9.

Download the maps.zip archive (35.2 MB), which contains all the starter code and data sets. The project uses several files, but all of your changes will be made in only three files: utils.py, abstractions.py, and recommend.py.

  • utils.py: Utility functions for data processing
  • abstractions.py: Data abstractions used in the project
  • recommend.py: Machine learning algorithms and data processing
  • ok: The autograder
  • chs.py: Generic utility functions useful for any course project or lab
  • proj2.ok: The ok configuration file
  • data/: A directory of Yelp users, restaurants, and reviews
  • tests/: A directory of tests used by ok
  • users/: A directory of user files
  • visualize/: A directory of tools for drawing the final visualization

For best organization, place your unzipped maps/ directory inside your projects/ directory, and only keep one copy per machine! Trash duplicate copies to avoid the confusion of editing one version while expecting changes in another.

Logistics

This is a 2-week project. You may work with one other partner. However, you may not share code with students who are not your partner or copy from anyone else's code (including solutions that may exist online). Software will determine any violations and the penalty for anyone involved is a project score of zero with additional school and district academic integrity violation consequences.

Groups (partnerships) are set up in okpy.org. One partner can invite the other when clicking into the Project 2 assignment. The other partner must accept the partnership while they are logged into okpy.org. Once collaboration is set up as a group, both partners will have access to view backups and submissions for this assignment.

In the end, you will submit one project for both partners. The project is worth 220 mastery points. 200 points are assigned for correctness, and 20 points for the overall composition and readability of your program.

You will turn in the following files:

  • utils.py
  • abstractions.py
  • recommend.py

You do not need to modify or turn in any other files to complete the project. To submit the project, run the following command:

python3 ok --submit

You will be able to view your submissions on the OK dashboard.

For the functions that you are asked to complete, there may be some initial code that is provided. If you would rather not use that code, feel free to delete it and start from scratch, as long as you keep and use existing function names. You may also add new function definitions as you see fit. However, do not modify any other functions. Doing so may result in your code failing autograder tests. Also, please do not change any function signatures (names, argument order, or number of arguments) even if changing the body of a function.

Testing

Throughout this project, you should be testing the correctness of your code. It is good practice to test often, so that it is easy to isolate any problems. However, you should not be testing too often, to allow yourself and your partner the time to think through problems.

You are provided an autograder called ok to help you with testing your code and tracking your progress. The first time you run the autograder, you will be asked to log in with your OK account using your web browser. Please do so. Each time you run ok, it will back up your work and progress on online course servers.

The primary purpose of ok is to test your implementations, but there are two things you should be aware of.

First, some of the test cases are locked until you unlock them.

The purpose for unlocking test cases is to understand conceptually what your program should do first, before you start writing any code.

Once you have unlocked some tests and written some code, you can check the correctness of your program using the tests that you have unlocked.

You are recommended to repeat the submission process after you finish each problem. Only your last submission will be graded. It is also useful to have more backups of your code in case you run into a submission issue.

The tests folder is used to store autograder tests, so do not modify it. You may lose all your unlocking progress if you do. If you need to get a fresh copy, you can download the zip archive and copy it over, but you will need to start unlocking from scratch.

For reference, here are copies of the test cases provided: Project 2: Test Cases

Phase 0: Utilities

All changes in this phase will be made to utils.py.

Problem 0 (30 pt)

Before starting the core project, familiarize yourself with some Python features by completing utils.py. It's possible, but not required, for each function described in this problem to be implemented in only one line.

Problem 0.1: Using list comprehensions

A list comprehension constructs a new list from an existing sequence by first filtering the given sequence, and then computing an element of the result for each remaining element that is not filtered out. A list comprehension has the following syntax:

[<map expression> for <name> in <sequence expression> if <filter expression>]

For example, if you wanted to square every even integer in range(10), you could write:

>>> [x * x for x in range(10) if x % 2 == 0]
[0, 4, 16, 36, 64]

In utils.py, implement map_and_filter. This function takes in a sequence s, a one-argument function map_fn, and a one-argument function filter_fn. It returns a new list containing the result of calling map_fn on each element of s for which filter_fn returns a true value.

def map_and_filter(s, map_fn, filter_fn):
    """Returns a new list containing the results of calling map_fn on each
    element of sequence s for which filter_fn returns a true value.

    >>> square = lambda x: x * x
    >>> is_odd = lambda x: x % 2 == 1
    >>> map_and_filter([1, 2, 3, 4, 5], square, is_odd)
    [1, 9, 25]
    """
    "*** YOUR CODE HERE ***"

Problem 0.2: Using min

The built-in min function takes a sequence (such as a list or a dictionary) and returns the sequence's smallest element. The min function can also take a keyword argument called key, which must be a one-argument function. The key function is called with each element of the list, and the return values are used for comparison. For example:

>>> min([-1, 0, 1]) # no key argument; smallest input
-1
>>> min([-1, 0, 1], key=lambda x: x*x) # input with the smallest square
0

In utils.py, implement key_of_min_value, which takes in a dictionary d and returns the key that corresponds to the minimum value in d. This behavior differs from just calling min on a dictionary, which would return the smallest key. Make sure your solution uses the min function.

def key_of_min_value(d):
    """Returns the key in a dict d that corresponds to the minimum value of d.

    >>> letters = {'a': 6, 'b': 5, 'c': 4, 'd': 5}
    >>> min(letters)
    'a'
    >>> key_of_min_value(letters)
    'c'
    """
    "*** YOUR CODE HERE ***"

Problem 0.3: Using zip

The zip function defined in utils.py takes multiple sequences as arguments and returns a list of lists, where the i-th list contains the i-th element of each original list. For example:

>>> zip([1, 2, 3], [4, 5, 6])
[[1, 4], [2, 5], [3, 6]]
>>> for triple in zip(['a', 'b', 'c'], [1, 2, 3], ['do', 're', 'mi']):
...     print(triple)
['a', 1, 'do']
['b', 2, 're']
['c', 3, 'mi']

In utils.py, use the zip function to implement enumerate, which takes a sequence s and a starting index start. It returns a list of pairs, in which the i-th element is i+start paired with the i-th element of s. Make sure your solution uses the zip function and a range.

Note: zip and enumerate are also built-in Python functions, but their behavior is slightly different than the versions provided in utils.py. The behavior of the built-in variants may be described later in the course if our class gets there this year.

def enumerate(s, start=0):
    """Returns a list of lists, where the i-th list contains i+start and
    the i-th element of s.

    >>> enumerate([6, 1, 'a'])
    [[0, 6], [1, 1], [2, 'a']]
    >>> enumerate('five', 5)
    [[5, 'f'], [6, 'i'], [7, 'v'], [8, 'e']]
    """
    "*** YOUR CODE HERE ***"

Problem 0.4: distance and mean

Read over the distance and mean functions provided, and be able to explain what each does. If you have questions, please ask before moving on.

def distance(pos1, pos2):
    """Returns the Euclidean distance between pos1 and pos2, which are pairs.

    >>> distance([1, 2], [4, 6])
    5.0
    """
    return sqrt((pos1[0] - pos2[0]) ** 2 + (pos1[1] - pos2[1]) ** 2)
def mean(s):
    """Returns the arithmetic mean, or average, of a sequence of numbers s.

    >>> mean([-1, 3])
    1.0
    >>> mean([0, -3, 2, -1])
    -0.5
    """
    assert len(s) > 0, 'cannot find mean of empty sequence'
    return sum(s) / len(s)

As you work through this phase, you can unlock the test cases for these exercises and check your solutions by running ok:

python3 ok -q 00 -u
python3 ok -q 00

Problem A (20 pt)

20 of the 220 project 2 mastery points are earned from the overall composition of your program. You should go back and review your code in Phase 0. Your goal now should be readability. After making any change(s), make sure to retest your code to ensure you haven't introduced bugs.

Phase 1: Data Abstraction

All changes in this phase will be made to abstractions.py.

Problem 1 (10 pt)

Complete the implementations of the constructor and selectors for the restaurant data abstraction in abstractions.py. Two of the data abstractions have already been completed for you: the review data abstraction and the user data abstraction. Make sure that you spend some time reading over the existing code to understand how the review data abstraction and the user data abstraction work.

For the restaurant data abstraction, you can use any implementation you choose, but the constructor and selectors must be defined together such that the restaurant selectors return the correct field from the constructed restaurant. A starter implementation using a dictionary is provided, but it is incomplete as it only takes care of the name and location fields.

  • make_restaurant: return a restaurant constructed from five arguments:

    • name (a string)
    • location (a list containing latitude and longitude)
    • categories (a list of strings)
    • price (a number)
    • reviews (a list of review data abstractions created by make_review)
  • restaurant_name: return the name of a restaurant
  • restaurant_location: return the location of a restaurant
  • restaurant_categories: return the categories of a restaurant
  • restaurant_price: return the price of a restaurant
  • restaurant_ratings: return a list of ratings (numbers)
def make_restaurant(name, location, categories, price, reviews):
    """Return a restaurant data abstraction containing the name, location,
    categories, price, and reviews for that restaurant."""
    "*** YOUR CODE HERE ***"
    return {
        'name': name,
        'location': location,   # ADD LINES OF CODE WITH YOUR SOLUTION
    }

def restaurant_name(restaurant):
    """Return the name of the restaurant, which is a string."""
    return restaurant['name']

def restaurant_location(restaurant):
    """Return the location of the restaurant, which is a list containing
    latitude and longitude."""
    return restaurant['location']

def restaurant_categories(restaurant):
    """Return the categories of the restaurant, which is a list of strings."""
    "*** YOUR CODE HERE ***"

def restaurant_price(restaurant):
    """Return the price of the restaurant, which is a number."""
    "*** YOUR CODE HERE ***"

def restaurant_ratings(restaurant):
    """Return a list of ratings, which are numbers from 1 to 5, of the
    restaurant based on the reviews of the restaurant."""
    "*** YOUR CODE HERE ***"

Use OK to unlock and test your code:

python3 ok -q 01 -u
python3 ok -q 01

Problem 2 (10 pt)

Implement the restaurant_num_ratings and restaurant_mean_rating functions, without assuming any particular implementation of a restaurant. This part is after crossing the abstraction barrier, so use only constructors and selectors.

### === +++ RESTAURANT ABSTRACTION BARRIER +++ === ###

def restaurant_num_ratings(restaurant):
    """Return the number of ratings for the restaurant."""
    "*** YOUR CODE HERE ***"

def restaurant_mean_rating(restaurant):
    """Return the average rating for the restaurant."""
    "*** YOUR CODE HERE ***"

Hint: Use the len function and restaurant_ratings function to code restaurant_num_ratings; use the mean function and restaurant_ratings function to code restaurant_mean_rating.

Be sure not to violate abstraction barriers! Test your implementation before moving on:

python3 ok -q 02 -u
python3 ok -q 02

When you finish, you should be able to generate a visualization of all restaurants rated by a user. Use -u to select a user from the users/ directory (such as user one_cluster). As you will see later in this project, you can even create your own users.

python3 recommend.py
python3 recommend.py -u one_cluster

Note: You may have to refresh your browser to update the visualization.

Problem B

Problem B is a continuation of problem A. 20 of the 220 project 2 mastery points are earned from the overall composition of your program. You should go back and review your code in Phase 1. Your goal now should be readability. After making any change(s), make sure to retest your code to ensure you haven't introduced bugs.

Phase 2: Unsupervised Learning

All changes in this phase will be made to recommend.py.

Restaurants tend to appear in clusters (e.g. downtown San Mateo or downtown Redwood City). In this phase, you will devise a way to group together restaurants that are close to each other.

The k-means algorithm is a method for discovering the centers of clusters. It is called an unsupervised learning method because the algorithm is not told what the correct clusters are; it must infer the clusters from the data alone.

The k-means algorithm finds k centroids within a dataset that each correspond to a cluster of inputs. To do so, k-means begins by choosing k centroids at random, then alternates between the following two steps:

  1. Group the restaurants into clusters, where each cluster contains all restaurants that are closest to the same centroid.
  2. Compute a new centroid (average position) for each new cluster.

This visualization (http://tech.nitoyon.com/en/blog/2013/11/07/k-means/) is a good way to understand how the algorithm works.

Glossary

As you complete the remaining questions, you will encounter the following terminology. Be sure to refer back here if you're ever confused about what a question is asking.

  • location: A pair containing latitude and longitude. Note that this is not a data abstraction, so you can assume its implementation is a two-element sequence.
  • centroid: A location (see above) that represents the center of a cluster
  • restaurant: A restaurant data abstraction, as defined in abstractions.py
  • cluster: A list of restaurants
  • user: A user data abstraction, as defined in abstractions.py
  • review: A review data abstraction, as defined in abstractions.py
  • feature function: A single-argument function that takes a restaurant and returns a number, such as its mean rating or price

Problem 3 (10 pt)

Implement find_closest, which takes a location and a sequence of centroids (locations). It returns the element of centroids closest to location.

You should use the distance function from utils.py to measure distance between locations. The distance function calculates the Euclidean distance between two locations. It has been imported for you.

If two centroids are equally close, return the one that occurs first in the sequence of centroids.

Hint: Use the min function. You can approach this problem similarly to how you approached Problem 0.2 key_of_min_value.

def find_closest(location, centroids):
    """Return the centroid in centroids that is closest to location.
    If multiple centroids are equally close, return the first one.

    >>> find_closest([3.0, 4.0], [[0.0, 0.0], [2.0, 3.0], [4.0, 3.0], [5.0, 5.0]])
    [2.0, 3.0]
    """
    "*** YOUR CODE HERE ***"

Use OK to unlock and test your code:

python3 ok -q 03 -u
python3 ok -q 03

Problem 4 (20 pt)

Implement group_by_centroid, which takes a sequence of restaurants and a sequence of centroids (locations) and returns a list of clusters. Each cluster of the result is a list of restaurants that are closer to a specific centroid in centroids than any other centroid. The order of the list of clusters returned does not matter.

If a restaurant is equally close to two centroids, it is associated with the centroid that appears first in the sequence of centroids.

Hint: Use the provided group_by_first function to group together all values for the same key in a list of [key, value] pairs. You can look at the doctests to see how to use it.

def group_by_centroid(restaurants, centroids):
    """Return a list of clusters, where each cluster contains all restaurants
    nearest to a corresponding centroid in centroids. Each item in
    restaurants should appear once in the result, along with the other
    restaurants closest to the same centroid.
    """
    "*** YOUR CODE HERE ***"

Be sure not violate abstraction barriers! Test your implementation before moving on:

python3 ok -q 04 -u
python3 ok -q 04

Problem 5 (20 pt)

Implement find_centroid, which finds the centroid of a cluster (a list of restaurants) based on the locations of the restaurants. The centroid latitude is computed by averaging the latitudes of the restaurant locations. The centroid longitude is computed by averaging the longitudes.

Hint: Use the mean function from utils.py to compute the average value of a sequence of numbers.

def find_centroid(cluster):
    """Return the centroid of the locations of the restaurants in cluster."""
    "*** YOUR CODE HERE ***"

Be sure not violate abstraction barriers! Test your implementation before moving on:

python3 ok -q 05 -u
python3 ok -q 05

Problem 6 (20 pt)

Complete the implementation of k_means. In each iteration of the while statement,

  1. Group restaurants into clusters, where each cluster contains all restaurants closest to the same centroid. (Hint: Use group_by_centroid)
  2. Bind centroids to a new list of the centroids of all the clusters. (Hint: Use find_centroid)
def k_means(restaurants, k, max_updates=100):
    """Use k-means to group restaurants by location into k clusters."""
    assert len(restaurants) >= k, 'Not enough restaurants to cluster'
    old_centroids, n = [], 0
    # Select initial centroids randomly by choosing k different restaurants
    centroids = [restaurant_location(r) for r in sample(restaurants, k)]

    while old_centroids != centroids and n < max_updates:
        old_centroids = centroids
        "*** YOUR CODE HERE ***"
        # 1. clusters = list of clusters using restaurants and cendroids
        # 2. centroids = updated to a new list of centroids from every cluster
        "*** YOUR CODE ENDS HERE ***"        
        n += 1
    return centroids

Use OK to unlock and test your code:

python3 ok -q 06 -u
python3 ok -q 06

Your visualization can indicate which restaurants are close to each other (e.g. downtown San Mateo restaurants, downtown Redwood City restaurants). Dots that have the same color on your map belong to the same cluster of restaurants. You can get more fine-grained groupings by increasing the number of clusters with the -k option. You can also combine a specific user's ratings with the -u option with the clustering -k option.

python3 recommend.py -k 2
python3 recommend.py -u likes_everything -k 3

Congratulations! You've now implemented an unsupervised learning algorithm.

Problem C

Problem C is a continuation of problems A and B. 20 of the 220 project 2 mastery points are earned from the overall composition of your program. You should go back and review your code in Phase 2. Your goal now should be readability. After making any change(s), make sure to retest your code to ensure you haven't introduced bugs.

Phase 3: Supervised Learning

All changes in this phase will be made to recommend.py.

In this phase, you will predict what rating a user would give for a restaurant. You will implement a supervised learning algorithm that attempts to generalize from examples for which the correct rating is known, which are all of the restaurants that the user has already rated. By analyzing a user's past ratings, you can then try to predict what rating the user might give to a new restaurant. When you complete this phase, your visualization will include all restaurants, not just the restaurants that were rated by a user.

To predict ratings, you will implement simple least-squares linear regression, a widely used statistical method that approximates a relationship between some input feature (such as price) and an output value (the rating) with a line. The algorithm takes a sequence of input-output pairs and computes the slope and intercept of the line that minimizes the mean of the squared difference between the line and the outputs.

Problem 7 (30 pt)

Implement the find_predictor function, which takes in a user, a sequence of restaurants, and a feature function called feature_fn. find_predictor returns two values: a predictor function and an r_squared value.

Use least-squares linear regression to compute the predictor and r_squared. This method, described below, computes the coefficients a and b for the predictor line y = a + bx. The r_squared value measures how accurately this line describes the original data.

One method of computing these values is by calculating the sums of squares, S_xx, S_yy, and S_xy:

  • Sxx = Σi (xi - mean(xall_values))2
    • Assign S_xx to the sum of the square of every individual x in xs minus the mean (average) of all xs values
      • S_xx = sum([(x - mean(xs))**2 for x in xs])
  • Syy = Σi (yi - mean(yall_values))2
    • Compare S_xx above with S_yy to infer how to code this
  • Sxy = Σi (xi - mean(xall_values)) (yi - mean(yall_values))
    • Assign S_xy to the sum of the product of each individual x minus the mean of all xs with each individual y minus the mean of all ys (code shown below using multiple lines, without zip)
      • S_xy = 0
      • for i in range(len(xs)):
      •     S_xy = S_xy + (xs[i] - mean(xs)) * (ys[i] - mean(ys))

After calculating the sums of squares, the regression coefficients (a and b) and r_squared are defined as follows:

  • b = Sxy / Sxx
    • b = S_xy / S_xx
  • a = mean(yall_values) - b * mean(xall_values)
    • a = mean(ys) - b * mean(xs)
  • R2 = Sxy2 / (Sxx Syy)
    • r_squared = S_xy ** 2 / (S_xx * S_yy)

Hint: The mean and zip functions can be helpful here.

def find_predictor(user, restaurants, feature_fn):
    """Return a rating predictor (a function from restaurants to ratings),
    for a user by performing least-squares linear regression using feature_fn
    on the items in restaurants. Also, return the R^2 value of this model.

    Arguments:
    user -- A user
    restaurants -- A sequence of restaurants
    feature_fn -- A function that takes a restaurant and returns a number
    """
    reviews_by_user = {review_restaurant_name(review): review_rating(review)
                       for review in user_reviews(user).values()}

    xs = [feature_fn(r) for r in restaurants]
    ys = [reviews_by_user[restaurant_name(r)] for r in restaurants]

    "*** YOUR CODE HERE ***"
    b = 0         # REMOVE/REPLACE THIS LINE WITH YOUR SOLUTION
    a = 0         # REMOVE/REPLACE THIS LINE WITH YOUR SOLUTION
    r_squared = 0 # REMOVE/REPLACE THIS LINE WITH YOUR SOLUTION

    def predictor(restaurant):
        return b * feature_fn(restaurant) + a

    return predictor, r_squared

Use OK to unlock and test your code:

python3 ok -q 07 -u
python3 ok -q 07

Problem 8 (20 pt)

Implement best_predictor, which takes a user, a list of restaurants, and a sequence of feature_fns. It uses each feature function to compute a predictor function, then returns the predictor that has the highest r_squared value. All predictors are learned from the subset of restaurants reviewed by the user (called reviewed in the starter implementation).

Hint: The max function can also take a key argument, just like min.

def best_predictor(user, restaurants, feature_fns):
    """Find the feature within feature_fns that gives the highest R^2 value
    for predicting ratings by the user; return a predictor using that feature.

    Arguments:
    user -- A user
    restaurants -- A list of restaurants
    feature_fns -- A sequence of functions that each takes a restaurant
    """
    reviewed = user_reviewed_restaurants(user, restaurants)
    "*** YOUR CODE HERE ***"

Use OK to unlock and test your code:

python3 ok -q 08 -u
python3 ok -q 08

Problem 9 (20 pt)

Implement rate_all, which takes a user and list of restaurants. It returns a dictionary where the keys are the names of each restaurant in restaurants. Its values are ratings (numbers).

If a restaurant was already rated by the user, rate_all will assign the restaurant the user's rating. Otherwise, rate_all will assign the restaurant the rating computed by the best predictor for the user. The best predictor is chosen using a sequence of feature_fns.

Hint: You may find the user_rating function in abstractions.py useful.

def rate_all(user, restaurants, feature_fns):
    """Return the predicted ratings of restaurants by user using the best
    predictor based on a function from feature_fns.

    Arguments:
    user -- A user
    restaurants -- A list of restaurants
    feature_fns -- A sequence of feature functions
    """
    predictor = best_predictor(user, ALL_RESTAURANTS, feature_fns)
    reviewed = user_reviewed_restaurants(user, restaurants)
    "*** YOUR CODE HERE ***"

Be sure not violate abstraction barriers! Test your implementation before moving on:

python3 ok -q 09 -u
python3 ok -q 09

In your visualization, you can now predict what rating a user would give a restaurant, even if they haven't rated the restaurant before. To do this, add the -p option:

python3 recommend.py -u likes_rwc -k 5 -p

If you hover over each dot (a restaurant) in the visualization, you'll see a rating in parentheses next to the restaurant name.

Problem 10 (10 pt)

To focus the visualization on a particular restaurant category, implement search. The search function takes a category query and a sequence of restaurants. It returns all restaurants that have query as a category.

Hint: you might find a list comprehension useful here.

def search(query, restaurants):
    """Return each restaurant in restaurants that has query as a category.

    Arguments:
    query -- A string
    restaurants -- A sequence of restaurants
    """
    "*** YOUR CODE HERE ***"

Be sure not violate abstraction barriers! Test your implementation:

python3 ok -q 10 -u
python3 ok -q 10

Congratulations, you've completed the project! The -q option allows you to filter based on a category. For example, the following command visualizes all sandwich restaurants and their predicted ratings for the user who likes_expensive restaurants:

python3 recommend.py -u likes_expensive -k 2 -p -q Sandwiches

Problem D

Problem D is a continuation of problems A, B, and C. 20 of the 220 project 2 mastery points are earned from the overall composition of your program. You should go back and review your code in Phase 2. Your goal now should be readability. After making any change(s), make sure to retest your code to ensure you haven't introduced bugs.

Predicting your own ratings

Once you have reached this point, you should use your project to predict your own ratings. As you must have found out while working on this project, you sometimes find bugs from somewhere earlier in the project when completing a task. Why doesn't passing all of the test cases ensure your code is perfect? That's for you to do! The test cases are meant to help facilitate this. Make sure you use your project for predictions, because this will possibly help you determine more bugs in your code. Here's how:

  1. In the users/ directory, you'll see a couple of .dat files. Copy one of them and rename the new file to yourname.dat (for example, lai.dat).
  2. In the new file (e.g. lai.dat), you'll see something like the following:

    make_user(
        'Mr. Lai',     # name
        [              # reviews
            make_review('I Dumpling', 4.5),
            ...
        ]

    Replace the second line with your name (as a string).

  3. Replace the existing reviews with reviews of your own! You can get a list of known local restaurants with the following command:

    python3 recommend.py -r

    Rate a couple of your favorite (or least favorite) restaurants.

  4. Use recommend.py to predict ratings for you:

    python3 recommend.py -u lai -k 2 -p -q Sandwiches

    (Replace lai with the user name you created.) Play around with the number of clusters (the -k option) and try different queries (with the -q option)!

How accurate is your predictor? If needed, the Excel list of our restaurants is available for download here: restaurant list