Methodology of Testing Icons

On the occasion of the large scale usability test of the LibreOffice Writer icons, this article explains how subjective and objective measurements can help to understand the quality of an icon-term relationship.

A picture says more than 1000 words. On top of that, icons have a fixed size, no matter which language the software runs in. This is part of the reason why they are widely used to represent functions or actions for example in menu bars.

In the last weeks we have asked users to help us to assess the quality of the icons we use in the LibreOffice Writer menu bar. We used the Open Usability Platform UserWeave.net to conduct the tests. This platform presents a single term and a variety of icons. The task for the participant is to quickly select the best fitting icon for that term.

Combining objective and subjective measurements

The platform measures how fast an icon was chosen by the user. So it provides two distinct indicators:

  1. The subjective choice of an icon
  2. The objective time it takes to choose the icon
Screenshot of the Icontest

Fig. 1: Screenshot of the Icontest

The platform calculates an overall score for the icon-term relationship, that ranges between 10 for a perfect icons to a theoretical minimum of 0, when an icon was never associated with a term.

Implicitly some more indicators can be derived, like the number of missing assignments and the ratio of icons that get assigned to multiple terms. You can find values for each of them in the results.

Result for a single term

Fig. 2: Result for a single term

In the example above you can see that the icon for ‘Auto Spellcheck’ was chosen quickly, but often got mixed up with the icon for ‘Spelling and Grammar’, giving a poor overall score of 6.0.

You have to understand that this does not necessarily mean that the icon is unsuited. It might also be that the term it is supposed to be associated with is problematic. So the test really measures the strength of the relation between icon and term.

In the LibreOffice tests for example we use the tooltips that appear, when you hoover the mouse over the icons as terms. So we test the strength of the relation between the icon and the actually used term to explain the icon to the user.

Evaluate the quality of your existing icons

There are two possible ways to test icons. In the first one you present a set of icons and the corresponding tool tip for each of the icons. So you have one target icon for every term.

As a result you would hope for a perfect 10.0 for all your icon-term relationships. But in reality you get a feeling about the differing quality of the icons you provide. For this way of testing you can say as a rule of thumb that good icons score well above 9.0.

Find the best icon for your term

In a second use-case you try to find the best icon for a certain term. Therefore you present different icon solutions for this term. With this way of testing you have to be aware that icons that are similar, e.g. because they use the same metaphor or only differ in colour, will most likely not differ much in their result.

With the LibreOffice study we have taken the first step. We now know how good (some of) the different icon-term relationships in LibreOffice are, and we will publish the results in the next weeks. We will also take the next step and find better metaphors for the not-so-good-working icons using the second approach.

Join the discussion!

You are invited to take a look at the raw results of the studies in the LibreOffice project on UserWeave.net and discuss with us in the comments.