Theory and Philosophy
This document explains the details of EloGrade and presents the statistical background. For a simple explanation see the document How It Works.
Problems that exist with most subjective ratings systems
First, we don't need EloGrade if existing rating systems satisfied all our needs. But the fact is current ratings systems are rather lacking for photographs. They range from fixed score based (like on photo.net where it approximately averages ratings of 1 to 7 for aesthetic and originality that people give a photo) to open ended score based (like Flickr's interestingness which is a combination of how many favorites, comments, and views a photo gets).
The problems of both systems are rather apparent in that people don't count on their objectivity or consistency.
For the fixed score based systems the main reasons are:
- It is hard for raters to maintain consistency. If you rate a photo 7 today, your scale may change tomorrow or a week later and you would rate the same photo a 6 or 5. The burden of maintaining a fixed scale is on each user and is not enforced. In the typical 'average a score given by viewers' system, people tend to only review the extremes and give full stars or none.
- There is generally not a lot of granularity in the scoring. This presents tension with problem number 1. If you give lots of granularity (eg. you can score 1 to 1000) then it becomes harder for people to remember their scale and be consistent.
- There can be heavy bias due to the non random nature of people rating.
- There is no meaningful distance between scores. If you have a photo at a rating of 7, a photo at 6 and a photo at 5, what does that mean exactly?
- Sparseness of ratings can mean many photos go unrated.
For the open ended score based systems, the main reasons are:
- The intention is different. It is more popularity based. These systems are great at measuring popularity and determining photos that may be worth viewing, but they are not exactly measuring what is better inherently.
- Data is even more sparse, as most photos do not receive any comments or favorites.
Great implementations and communities obviously mitigate these problems in various ways. But EloGrade is built to overcome those problems and offer a system that is highly consistent and objective.
Subjectivity to Objectivity
"How good a photo is" is entirely subjective. In order to have ratings that have meaningful distance from each other, you need something objectively measurable. The way EloGrade does this is by using a rating that is a statistical prediction of how a random person will consider a photo.
Imagine a scenario where people are presented with 2 photos, A and B, and must pick one they consider better. For each person doing a comparison the act is entirely subjective. But now what if you had all people in the world participate. The percentage of all people in the world preferring A over B is a measurable quantity and exists as a truth.
Now further imagine that we had all people in the world compare all pair permutations of photos in the world. You would be able to map out in measurable distances all photos from each other (in fact you wouldn't need all permutations. If you then compressed it linearly (ignoring circular relationships which do present a problem) you'd have a linear scale on which every photo resided with objectively measurable distances from each other. You pick an arbitrary point, for instance the exact center from the maximum and minimum and give it a numerical value of 1500. Then give an arbitrary value to the distance on that scale, and the result is you have a true objective rating for every photo that has meaningful distance from each other. For instance, photo A that is 100 rating points above photo B has 64% of the world considering it a better photo, and Photo C which is 300 points above photo B would have 96% of the world who prefer C over B. These ratings are objective.
Granted, it is impossible to play out that scenario in practice, but what EloGrade does is attempt to calculate a statistical approximation to that true rating by processing comparisons done on the site as a sample population of the real world population of people and photos (of course, with all the failings and trapping of statistical bias and so on).
Thanks, Professor Arpad Elo
In 1960, professor Arpad Elo solved a similar problem using statistics which has since become known as the Elo Rating system, for the ranking of chess players. Variations of the original system has since become widely used in rankings and ladders in sports and video games. In a way, photography is even more suited to the Elo Rating system because the relative positions of photos are more fixed than chess players, gamers, or athletes' skills.
EloGrade uses a variation of the Elo Rating system that was tuned on initial data from mechanical turk runs. The algorithms will continue to be tuned for better accuracy, to see a detailed description see the document on rating mechanics.