Quo vadis, Dolphin? The relation between the ISO 9241-110 and the rating of features.

We conducted a large study about strengths and weaknesses of file managers in may 2013. In this article we present the second part of the statistical analysis regarding the criteria of the ISO 9241-110.

Introduction

The ISO 9241-110 defines seven principles for usable interaction design:

  • Suitability (the dialog should be suitable for the user’s task and skill level)
  • Self-descriptiveness (the dialog should make it clear what the user should do next)
  • Controllability (the user should be able to control the pace and sequence of the interaction)
  • Familiarity (the dialog should be consistent with user’s expectations)
  • Robustness (the dialog should be forgiving)
  • Individualization (the dialog should be able to be customized by users)
  • Learnability (the dialog should support learning)

Applications should be developed based on these principles, which obviously is… challenging. So, if you decide to push controllability, for instance, it might harm individualization. Of course, all principles are relevant for the development, but an important part of the usability engineering process is to prioritize them in relation to target users and use scenarios. To do this prioritization the best, if not only valid approach is to interview users.

In our study about file managers we asked users to rate features of and satisfaction with their preferred file manager. We expect that the feature-set has an impact on the rating of the specific ISO usability principles. For example, when users confirm that their file manager provides good customization the respective usability principle ‘individualization’ should be highly correlated. Additionally, our interest was to explore the room for improvements regarding the different usability criteria, especially for the standard KDE file manager ‘Dolphin’.

Results

The average rating as shown in figure 1 along with statistical results in table 1 support previous results: in general the Microsoft Explorer is rated worst followed by Nautilus and Thunar. The command line interface receives pretty good values, just like Dolphin, Konqueror and Krusader. It is remarkable that the command line seems to be treated as least robust (“Are you able to achieve objectives in case of wrong input?”), but this observation has no statistical support, i.e. there is no difference between file managers regarding robustness.

Figure 1: Rating on usability criteria of ISO 9241-110 by file managers with at least 20 responses.

Figure 1: Rating on usability criteria of ISO 9241-110 by file managers with at least 20 responses.

Table 1: Statistical results; effect size (eta) is used for cell shading: small effects>=0.01 (light yellow), medium size=>0.06 (dark yellow), and large>=0.14 (ocher color).

Value F df eta p
Suitability 47.41 588 0.072 <0.001
Self descriptiveness 18.39 574 0.028 <0.001
Controllability 59.46 537 0.096 <0.001
Familiarity 11.82 574 0.017 <0.001
Robustness 0.62 460 -0.003 0.433
Individualisation 231.21 562 0.289 <0.001
Learnability 3.59 401 0.004 0.059

Our second analysis deals with the comparison of features with the usability rating of an existing application. Because it makes no sense to mix between-subject variance of file managers and within-subject variance of usability rating the following analysis was calculated for Dolphin only.

Figure 2 shows the correlation between feature ratings and usability criteria as a heatmap colored from cyan (low correlation) to magenta (high values). The most outstanding result can be found in the relation between the criterion individualization and the question about customization of the interface (which leads to most distinct results between file managers too, as shown in figure 1). Additionally, the criterion suitability has more relevance for rating than other aspects; the lowest impact has learnability. In respect to Dolphin’s features data handling, browsing, and customization are higher related to usability than other features.

Figure 2: Heatmap of correlations between rating of features and rating on ISO criteria (only Dolphin data is included).

Figure 2: Heatmap of correlations between rating of features and rating on ISO criteria (only Dolphin data is included).

Discussion

While the usability principles are reasonable for experts and, with some reflection, questions can be answered for existing dialogs, the normal user seems to face difficulties. We got a lot of complains during the conduction of the survey concerning the usability related questions. Although the seven ISO principles are widely accepted it is not easy to answer the corresponding questions. This is especially reflected in the criterion learnability, that has high variance and low correlation, but in most other variables too. So, what could be meant by the principles?

Suitability
The dialog should suit user’s tasks and skills, and support without unnecessary strain by attributes of the dialog system. That could be (and was apparently) understood in terms of feature richness. If all the core requirements are fulfilled the dialog suits users’ needs. Unnecessary prompts or confirmations (‘Do you really want to exit?’) may cut back suitability. For example, commands on the shell are highly specialized and therefore suit perfectly the (limited) requirements.

Self-descriptiveness
Each single step of processing has to be direct intelligible, and the user should always be informed about the scope of performance. Self-descriptiveness should therefore be highly related to familiarity. The use of common controls, standard short-cuts, or introductory text supports comprehension and facilitates self-descriptiveness. The CLI is one example for low self-descriptiveness – it cannot be learned without man pages.

Controllability
The dialog should be managed by the user; he or she controls pace and sequence of the interaction. High controllability will be reached by unconstrained inputs, by avoiding wizards, or by means of a sophisticated undo feature – all these contrast robustness (see below). Because complex tasks are split into single, simple operations with usually more parametrization the controllability of the CLI is barely improvable compared to GUIs.

Familiarity
The dialog has to be conform with user’s expectations out of his experience with previous work flow or user training. Originally, familiarity is meant as congruence with the handling in normal life. But software that becomes independent is often rated as familiar when it is conform with legacy products – big changes are not welcome. Unix commands haven’t changed in the last years, they are very familiar to experts. On the other hand, people used to operate GUIs might be confused and give low ratings to CLI’s familiarity.

Robustness
The intended result should be reached without or with minimal correction in case of faulty entries. Constrained input, confirmation dialogs, elaborated exception handling lead to robust dialogs. Faulty input on the shell is neither checked nor handled by default leading to low values of robustness against user failures.

Individualization
The dialog should be able to be adapted to individual needs or preferences of the users. Individualization is given by any means of configuration from switching features on/off, over modification of background, color or other design aspects to complex interaction pattern like scripts. The CLI provides parametrization not only on a functional level but to some extend as well for appearance and should hence be rated high.

Learnability
The dialog itself should support learning; the user has be able to manipulate the system without reading extensive documentation. Learnability might be attributed to the presence of tool-tips, the configuration of some kind of novice vs. expert mode, or the preference of wizard style control over free interaction. Obviously, there is no inherent support to learn how to use CLI tools which leads to low values.

Conclusion

This list is just an idea how to interpret the norm. At least the exemplary rating of the CLI might be seen different by others. Concluding, users should not get asked directly about usability. Rather an extensive list of aspects that denotes the principles is required. This will increase the time to answer a survey significantly (which we wanted to avoid with the plain questions). Psychologists put much effort in the generation of those questionnaires. So one of the best known concept is ‘the big five‘. But in this case the factors were generated from a huge stock of items per sophisticated statistical methods. On the contrary, the seven principles are predefined, and to arrange a questionnaire post-hoc is methodologically dubious. We will try to introduce a better solution in our next surveys based on scenarios, tasks, and goals.

Regarding Dolphin the results are pleasing: all usability principles are met, users are quite satisfied with the tool, and the rating outperforms that of similar tools. When looking in detail at the features, searching for files, filtering of data, previewing media, setting properties (but not viewing), and connectivity receive more heterogeneous replies which could be interpreted as room for future improvement.

In our next blog we will report results on who uses a particular file manager. So stay tuned.

Appendices

If you want to follow our analysis step-by-step you can download the raw data (QuoVadis Results.tar.gz) as well as R scripts (QuoVadis R-Scripts2.tar.gz).

And if you want to discuss the results in person: