For each learner s we track her skill and for each item (lexeme) i we track its difficulty . When we want to predict the probality of correct answer, we substract item difficulty from a learner's skill and transform it by a logistic function.
This model is known as one-parameter IRT model (or Rasch model). We can use joint maximum likelihood to fit skills and difficulties, but this method is not applicable online, becuase it needs all data for its computation. To estimate parameters online, we can inspire from Elo rating system originally designed to rate chess players. This method can easily work with a stream of answers. After a learner s solves an item i, we update the learner's skill and the item difficulty as follows (constant K stands for sensitivity of the estimate to the last update and $correct_{si}\in[0, 1]$):
Initially, values of skills and difficulties are set to 0. To make the estimate stable and converging we need to replace the sensitivity constant K by an uncertainty function ensuring that later changes have less weight. Without that we could easily lose information from past updates.
Variable n stands for a number of past updates and a, b are meta-parameters fitted to data.
Important note: Since this model does not handle learning, it is necessary to update skills and difficulties only once per each learner and item (we ignore repeated answers).
To fit meta-parameters we use simple grid search.