Predicting Tags for StackOverflow Posts

Abstract

Hashtags created by authors of online content provide a view of a user’s goals and interests. Predicting users’ interests can lead to improved, more user centered human-computer systems. Large-scale behavioral datasets such as Twitter, Facebook, LinkedIn, StackOverflow, and Yelp can be mined and explored to study these hashtags and how they relate to users’ interests. We explored the StackOverflow dataset, and developed an ACT-R inspired Bayesian probabilistic model that can predict the hashtags used by the author of the post. The model is 65% accurate when tasked to predict one tag per post on average. This is achieved by choosing the tag that has the highest log odds of being correct, given the tag’s prior log odds of occurrence and adjusting for the log likelihood ratio of the words in the post being associated with the tag. The model is a successful case showing that ACT-R’s declarative memory retrieval equations scale, and are relevant to task domains that require large-scale knowledge stores.


Back to Table of Contents