previous next Title Contents

Sample Application


In this section we sketch out how to use the APIs in an application consisting of a predictor process and a solver process. The predictor process gathers and records votes and makes predictions. The solver process reads votes and computes successively more accurate models, periodically writing the models back to the database so they can be used by the predictor.

We start with the solver process, whose organization changes little from application to application. We then show a typical predictor process; here, the specifics will vary more between applications. Code that needs to be adapted to a particular environment (database, OS, etc.) is typeset <like this>.

Solver process

#include "eetypes.h"
#include "eesolve.h"
We create a single instance of the solver class:
ee_solve ee;
A real solver process might take command-line parameters, or even use a "GUI" to accept control parameters. Here we're keeping it simple:
main() {
When the process first starts, we read in the entire Votes, Persons, and Items tables.
    <SELECT * FROM Votes>
    while (<records remain>) {
	ee_person p;
	ee_item i;
	ee_vote v;
	<get p, i, v from current record>
	ee.set_vote(p, i, v);
	<move to next record>
    }
    <SELECT * FROM Persons>
    while (<records remain>) {
	ee_person p;
	ee_model m;
	<get p, m from current record>
	ee.set_person_model(p, m);
	<move to next record>
    }
    <SELECT * FROM Items>
    while (<records remain>) {
	ee_item i;
	ee_model m;
	<get i, m from current record>
	ee.set_item_model(i, m);
	<move to next record>
    }
Everything from here to the end of the program is a loop that is repeated until shutdown:
    while (1) {
First, process the VotesLog table to find out about recent updates to the Votes table:
	<SELECT * FROM VotesLog ORDER BY time>
	while (<records remain>) {
	    ee_person p;
	    ee_item i;
	    ee_vote v;
	    <get p, i, v from current record>
	    ee.set_vote(p, i, v); /* deletes if v is ee_vote_null */
	    <delete current record>
	    <move to next record>
	}
Now run the solver algorithm. It is based on iterative refinement, so running it longer gives increased accuracy. In determining the number of steps to run it, we need to balance response time, accuracy, and the cost of writing the models back to the database.
#define RESIDUE_THRESHOLD <tune for application>

	do ee.solve_step();
	while (RESIDUE_THRESHOLD
		 <= ee.get_residue() / ee.get_vote_count()
	       && <time limit not exceeded>);
After running the solver algorithm, write back the updated models to the Persons and Items table. This serves two purposes: checkpointing our state across shutdown/restart, and making the new item models available to the predictor process.
	<SELECT * FROM Persons>
	while (<records remain>) {
	    ee_person p;
	    ee_model m;
	    <get p from current record>
	    ee.get_person_model(p, m);
	    <update p's model to m in current record>
	    <move to next record>
	}
	<SELECT * FROM Items>
	while (<records remain>) {
	    ee_item i;
	    ee_model m;
	    <get i from record>
	    ee.get_item_model(i, m);
	    <update i's model to m in current record>
	    <move to next record>
	}
The code above depends on the Persons and Items tables having been initialized with an entry for each person and item, respectively. To avoid this dependency, we could have taken this alternate approach using iterators:
	ee_solve::person_iterator persons;
	for ( ; ; ) {
	    ee_person p = persons.get_next();
	    if (p==ee_person_null) break;
	    ee_model m;
	    ee.get_person_model(p, m);
	    <update p's model to m in current record>
	}
	ee_solve::item_iterator items;
	for ( ; ; ) {
	    ee_item i = items.get_next();
	    if (i==ee_item_null) break;
	    ee_model m;
	    ee.get_item_model(i, m);
	    <update i's model to m in current record>
	}
Note that this approach using iterators performs the database updates in an arbitrary order, which may not perform as well as performing the updates in index order.

The final step is to signal the predictor process that it should reset its cache, causing it to reload the updated item models from the database:

	<SELECT * FROM Generations WHERE id==1>
	<update generation = generation+1 in current record>
    }
}

Predictor process

From the point of view of the Each to Each APIs, there are three parts of the predictor process:

1. Providing database access to Each to Each functions.

2. Gathering votes and recording them in the database.

3. Making predictions.

Here we package all three parts in a single C++ program file.

Our first task is to provide the code that connects the Each to Each prediction functions with the votes and item models stored in the application-provided database. We do this by writing a new class, my_ee_predict, that is derived from ee_predict. In the new class, we will implement the virtual functions of the ee_predict class, which in C++ terminology is an "abstract" class. This means that it contains one or more "pure virtual functions" which aren't defined by the class itself, but which are called by other member functions of the class. We will also implement two additional functions: one to record a vote, and one to check for new data from the solver process. (In a real application, you might want to put my_ee_predict in a separate file.)

#include "eetypes.h"
#include "eepredict.h"
class my_ee_predict : public ee_predict {
public:
    virtual void get_person_votes(
	const ee_person x,
	void (*setvote)(
	    const ee_person x,
	    const ee_item y,
	    const ee_vote& v,
	    void* arg),
	void* arg) = 0;
    virtual void get_item_model(const ee_item y, ee_model& m);
    void record_vote(ee_person x, ee_item y, ee_vote v);
    /* Update stable storage and the cache to indicate that v is person x's
       new or revised vote for item y, or delete person x's vote for item y if v
       is null. */
    void test_generation();
    /* Test whether the solver process has recently written updated item models
       into stable storage, and if so flush all item models from the cache. */
}
First we provide the implementations of the two virtual functions:
void my_ee_predict::get_person_votes(
    const ee_person x,
    void (*setvote)(
	const ee_person x,
	const ee_item y,
	const ee_vote& v,
	void* arg),
    void* arg)
{
    ee_item item;
    ee_vote vote;
    <SELECT * FROM Votes WHERE person==x>
    while (<records remain>) {
	<get item, vote from record>
       (*setvote)(x, item, vote, arg);
    }
}
void my_ee_predict::get_item_model(const ee_item y, ee_model& m)
{
    ee_model model;
    <SELECT * FROM Items WHERE item==y>
    <get model from record>
    m = model;
}
Now we implement the record_vote function, which needs to update the Votes and VotesLog tables and also flush any previous vote that may have been cached. This function can be used for the case when a person wants to erase his vote, by passing ee_vote_null for v:
void my_ee_predict::record_vote(ee_person x, ee_item y, ee_vote v)
{
    timestamp t = now();
    <begin transaction>
    reset_vote(x, y, v);
    <SELECT * FROM Votes WHERE person==x AND item==y>
    if (<record exists>)
	if (v==ee_vote_null)
	    <delete current record from database>
	else
	    <update vote=v in current record>
    else
	if (v==ee_vote_null)
	    ; // nothing to do
	else
	    <insert new record in Votes Table with x, y, v>
    <insert new record in VotesLog table with x, y, v, t>
    <commit transaction>
}
Since these virtual functions access the database, calling them is relatively expensive. Thus the ee_predict implementation caches information in main memory to speed up subsequent predictions for the same persons and/or items. In the mean time, the solver process is computing more accurate models (based on recent votes), which it periodically writes back to the database. At that point, we'd like the predictor process to flush the cached information and once again call the virtual functions to load the latest information. The way we arrange to do this is to periodically (say, once an hour) poll the Generations table in the database. So the predictor process needs to call this function every so often, say by forking a thread that loops forever, sleeping for a half hour and calling this:
void ee_predict::test_generation()
{ static int last_generation = 0; int generation; <SELECT * FROM Generations WHERE id==1> <get generation from current record> if (generation < last_generation) { reset_all_models(); last_generation = generation; } }
That ends the implementation of my_ee_predict. Now we create a single instance of it to use throughout the predictor process:
my_ee_predict ee;
We're ready to do predictions. Here's an example of how to recommend up to five new Chinese restaurants in Paris:
#define WEIGHT_THRESHOLD <tune for application>
#define SCORE_THRESHOLD <tune for application>
ee_person p;      // input: person desiring prediction
<query database for Chinese restaurants in Paris that p has 
    not already rated>
while (<records remain>) {
    ee_item i;
    <get i from current record>
    ee_vote v = ee.predict_item(p, i);
    if (v.w < WEIGHT_THRESHOLD) continue;
    if (v.s < SCORE_THRESHOLD) continue;
    <add [i, v] to list of possibilities>
    <move to next record>
}
<sort list of possibilities in descending order of v>
return <first five restaurants from list>;
Here's an example of how to present up to three reviews of a hotel. You might consider treating user-written reviews about items as additional "first-class items". However, we recommend using person-person correlation (ee_predict::predict_person) to rank order a set of reviews in terms of the similarity of interests between the current user and the authors of the reviews.
ee_person p;      // input: current user
ee_item i;        // input: subject of review
<query database for reviews of hotel i>
while (<records remain>) {
    ee_person q;  // author
    <get q, review from current record>
    if (q == p) continue;
    ee_vote v = ee.predict_person(p, q);
    if (v.w < WEIGHT_THRESHOLD) continue;
    if (v.s < SCORE_THRESHOLD) continue;
   <add [review, v] to list of possibilities>
   <move to next record>
}
<sort list of possibilities in descending order of v>
return <first three reviews from list>;


previous next Title Contents