Rating mechanics
The mechanics are subject to change as more data is collected and further variation and tunings are done
The variables involved
EloGrade uses a variation of the Elo rating system, so the elements involved are the same:
Arbitrary starting rating: 1500. On EloGrade, when we know nothing about a photo we set them to 1500 rating.
Curve function used: cumulative distribution function, in particular cdf((ratings_difference) / (200 * SQRT(2)))
Expected win/loss ratios vs ratings based on the function:
Actual win/loss ratios vs ratings after trial runs:
The other important but somewhat arbitrary variable is the K factor. EloGrade uses a variable K factor: ((1100.0 / (n*1.5 + 6)).to_f + 6.0).to_i where n is number of comparisons done. So the K factor starts off very large as we know nothing about the photo, and decreases rapidly eventually converging to a very low K factor of 6. This is based on the belief that the rating of a photo is stable (unlike player or athlete skill which may change quickly and need to be reflected).
Other EloGrade variations
- Currently we are testing an additional factor on the K-factor which we call stability. We randomly assign comparisons to two sub groups (call them SubA and subB) if you will. And two separate ratings are kept that are entirely independent of each other but have half the comparison count of the real rating. The difference in the ratings may have a correlation to how stable the real rating is.
- While Elo rating system is applied to any pairing (in fact not limited to pairing, as long as you can have expected values you can have any number compared at the same time), at EloGrade we've made the comparisons have a minimum of one identical comparison done by different users. This is in place to mitigate the immediate effects of users trying to game the system or doing random voting.