Skip to content
jasonadamyoung edited this page Jan 11, 2012 · 13 revisions

How Recommendations Work

The Algorithm

Learn currently uses a simplified algorithm that looks for learner connections to similar events to make a recommendation.

The list of events to recommend is by default the "upcoming events this week" + "recent events from last week". Recommendations are scheduled to go out Monday morning at 4am in the person's timezone and the recommendations are created at 3am UTC on monday mornings–there is code in place to handle exceptions for people who have a timezone east of UTC.

For each of those upcoming and recent events, a SOLR/Lucene "More Like This" (MLT) query is performed to pull a list of the 4 events that are most similar, according the MLT algorithm. This is essentially TF/IDF scoring with a term vector produced by the title and description for the given event. A score is returned that's only relevant for that set of search results (the MLT score cannot be used accurately across searches because every change to events changes the term frequencies in the data).

For each of those four events, the activity scores for the learners are calculated. Every activity has a "score" associated with that activity. The activity scores are added to the event and multiplied by the MLT score. The activity scores for each learner are summed across the events - and divided by the MLT score for the "most similar event" - and a score of 3.0 (equal to a bookmark or attended or watched or presented connection) will create a recommendation - if the learner is not already connected to the event.

What this means for this iteration of the algorithm, we are making the assumption that the top-scoring similar event is "good enough" to make a recommendation - it's considered a 1-to-1 match. An additive activity score of "3.0" for the top scoring similar event will generate a recommendation. The additional events just add additional scoring, relative to the MLT score.

In summary it's this:

[score; =\sum_{n=1}^{4}{\frac{\left( \left( \sum_{i=1}^{activities}{activity; score; for; event} \right)\cdot mlt; score; for; event \right)}{\max ; mlt; score; for; events}; }]

The Example

Let's do the math for an example event and for a single learner.

The event "What's new at Learn.extension.org?" has the following similar events and MLT scores for a given similarity search:

		<td>Introduction to Google Docs for Workflow Management</td>
		<td>0.130</td>
	</tr>
	<tr>
		<td>Creating a Digital Meeting Place to Foster Rural Design</td>
		<td>0.122</td>
	</tr>

	<tr>
		<td>Twitter #Hashtags for conferences, causes, chats, and connecting with people of similar interests.</td>
		<td>0.118</td>
	</tr>
</tbody>
Event MoreLikeThis Score
Extension's Learning Space 0.235

And Bob, our example learner - has the following connections to these similar events

		<td>Introduction to Google Docs for Workflow Management</td>
		<td>Bookmarked</td>
		<td>3.0</td>
	</tr>
	<tr>
		<td>Creating a Digital Meeting Place to Foster Rural Design</td>
		<td>Commented,Rated</td>
		<td>3.0 (2.0 + 1.0)</td>
	</tr>

	<tr>
		<td>Twitter #Hashtags for conferences, causes, chats, and connecting with people of similar interests.</td>
		<td>none</td>
		<td>0.0</td>
	</tr>
</tbody>
Event Activities Activity Score
Extension's Learning Space Presented 3.0

Bob's score would be:

(3.0 * .235)/.235 + (3.0 * .130)/.235 + (3.0 * .122)/.235 + (0.0 * .118)/.235 = 6.217

Bob would get this session recommended to him. Bob's recommendation email will include the top 3 upcoming events and the top 3 recent events (if there are that many events and scored recommendations).

The Problems

Basing similarity on Solr MoreLikeThis matching is going to create a problem condition for unique (or uniquely worded) events. Because MLT searches always return something the top-scoring similar event may not be similar at all - and the score really has no meaning across searches. It may be possible to do a run time search across all events to determine top similarity scores across all events currently in the system at that moment, but that's likely to take too long, and all the MLT docs have kind a King Lear-esque "that way is madness" caution.

So for unique events, maybe we'll just consider recommendations that form from that to be "serendipitous".

What's Next?

What would be better is to run the connections through a learning algorithm or a collaborative filtering algorithm of some kind or something simpler like Slope One. Most of these assume an existing item with rankings from others where you make a similar match. They don't really account for future incoming events. There are strategies (pdf) for accommodating future items that may work.

However, for all of these - I'm not sure yet we have enough data. LearnV2 just introduced additional ways of making connections to events that likely provide additional data points for indicating some kind of interest in the event and may provide a better learning algorithm (e.g. If I rate and comment - am I likely to attend a similar event? Could all the events I comment on be grouped together as a set of similar events?, etc.)

Clone this wiki locally