WHY (and how) I BELIEVE in BELIV >10

It works!

It's a bit of a generalization, but early information visualization was very much focussed on systems. It tended to show that the new and exciting techniques we were able to produce had validity by showing that they worked. And we tended to do this by showing that systems worked. There was thus a focus on testing systems with people against specified tasks. In this context, in which systems were:

well defined;
the focus of our activity;
and, the best way to validate its contributions;

... metrics that captured errors and efficiency seemed adequate.

It's complicated?

But things have moved on. As tasks have become more specialized, less well defined and more complex, the techniques that we use, systems in which we implement them and evolution of systems through the design process have evolved in sophisticated ways. This has been in no small part due to the BELIV conference and all of the thinking and activity that is associated with its broadening community. The community strives to establish ways in which we can advance our knowledge by testing, designing, developing and using visualization techniques and systems.

It's improving!

We seem to be making some headway - broadening the ways in which we evaluate and understand visualization, asking difficult questions and developing approaches that may help us answer them. In his excellent and inspiring keynote at BELIV 2016 Enrico Bertini reminded us that great studies generate great questions - not necessarily great answers.

How do we know that?

This is why the BELIV meeting in Baltimore this autumn was so excellent, and why I hope it will be influential. There was plenty of good debate, and many important questions were raised. Sure, the presentations were so short that a good number of the nuances and important ideas and findings contained within the papers were missed. And the panel discussions didn't work too well because the panelists didn't know each other's work sufficiently and this was only variously related in many cases. But, the discussion was exciting - focusing broadly on methodology (there are some tough issues here), due in no small part to the star turn provided by @eager_eyes whose @gentle_grilling (Kosara, 2016) asked us calmly, but importantly, "how do we know that?".

How (much, and why) Do we BELIV that?

Or to rephrase this slightly, in a way that is less snappy, but that frames the question more directly in the context both of the workshop itself and the direction in which I would like to see it continue to move as a forum for methodological discussion in visualization research:
"to what extent do the approaches used in this work allow us to BELIV the findings?"

Trust me, it's all there in the paper

Back in the day, great ideas in InfoVis were presented with authority and strong arguments and implemented working systems, but it's fair to say that in many cases their presentation tended to be short on empirical evidence. These contributions were often influential, and remain important, but readers were asked to BELIV in the ideas that were communicated largely through trust. I was lucky enough to chair the committee that made the InfoVis Test-of-Time awards this year. This required a group of us to re-read papers from the 1995 and 1996 InfoVis symposia. There were some fantastic, influential ideas that informed practice and opened up new areas for research. Many of these remain important to our discipline and most of the papers are well worth reading again. Really, it's worth checking these papers through, to see both what was done and, how it was reported. The awards were deservedly won by Wise et al. (1995) for Galaxies and Themescapes and by Roth et al. (1996) for Visage. Neither paper mentions evaluation. Have a look.

Wise, J. A., Thomas, J. J., Pennock, K., Lantrip, D., Pottier, M., Schur, A., & Crow, V. (1995, October). Visualizing the non-visual: spatial analysis and interaction with information from text documents. In Information Visualization, 1995. Proceedings. (pp. 51-58). IEEE.

Roth, S. F., Lucas, P., Senn, J. A., Gomberg, C. C., Burks, M. B., Stroffolino, P. J., ... & Dunmire, C. (1996, October). Visage: a user interface environment for exploring information. In Information Visualization'96, Proceedings IEEE Symposium on (pp. 3-12). IEEE.

A different kind of evidence ...

These are influential and important papers, and yet the style is very different to the way in which we write today. The lack of empirical evidence is palpable. And it's not just these guys who were producing award winning research - its also true of the papers I was writing during the late 1990s on Exploring Spatial Data Representation with Dynamic Graphics" (Dykes, 1997) and Cartographic Visualization (Dykes, 1998). The style and expectations were different back then.

Do I BELIV this?

But this question - "to what extent do the approaches used in this work allow us to BELIV the findings?" - is absolutely fundamental to visualization research. And the kinds of skeptical, critical, reviewers and readers and listeners and bloggers and researchers and designers that Robert Kosara calls for in his excellent BELIV paper (Kosara, 2016) should be asking it continually - as well as the other key question : "How important is this?" And yet, as the community evaluates the research that is produced, review forms rarely ask "Do you BELIV the findings?".

Asking these questions routinely and with some skepticism is crucial to the integrity of our discipline as well as the dynamic and utility of our conferences. And it's important to others way beyond BELIV, Baltimore or InfoVis as we address the aim that Enrico expressed so eloquently in his inspiring keynote - to "develop research that is useful and used".

Do I BELIV this?

Why?
When?
How much?

Is it important?

How is knowledge acquired?

Fuelled by the various ideas presented during the course of the workshop, the final session at BELIV involved participants embarking upon an exciting debate about epistemology. This focus on the validity and scope of knowledge, methods through which it is reliably acquired, and relationships and differences between justified belief and opinion continued very usefully into the panels at IEEE VIS several days later. Important open debate on Improving Empirical Research, Application Papers and Pathways for Theoretical Advances seemed to me to be heavily informed by the energy, thinking and level of skeptical reflection that had occurred at BELIV.

A broad and developing perspective on evaluation ...

That's why I think it's essential that when considering evaluation, writing papers for BELIV, or deciding the future of the workshop - whether through through the Call for Papers and other formal descriptions of expectation or the less official, more personal ways in which we as a community of authors, reviewers and participants engage with the workshop and develop its content and culture - we must resist the strong temptation to implicitly or explicitly preface the term evaluation with ...

summative, or
experimental, or
controlled, or
quantitative,

That was what evaluation meant, but, thanks largely to a decade of BELIV we seem to be past that now. The term evaluate means 'associate a value with'. Values might be numeric. But they can also be personal, emotional, involve different combinations of qualitative and quantitative information and be established in many ways. Informative evaluation can occur throughout the design process and some of our most revealing experiences do not occur in controlled laboratory conditions.

Ultimately we associate values with our various outputs and experiences to give others an indication of their utility, and we must do so in ways that enable users of research to answer the question:
"to what extent do the approaches used in this work allow us to BELIV the findings?"

We require reliable ways to evaluate that fit our needs and so must think as broadly about evaluation as those involved in the discussions on methodology and epistemology at BELIV 2016. Because developing effective and appropriate means of giving those who use research the confidence to BELIV in research contributions requires us to establish methods that produce BELIVable results. We need to be clear about how much we know and how we know and why this is the case.

In an off the cuff remark Enrico claimed that ... "Any approach is fine as long as it is rigorous and well done ..."

This inclusive perspective is helpful. But for research to be reliable in ways that allow us to trust the results we must take this a little further. As Ana Maria Crisan said during a BELIV panel, the key question is ... "What's the right kind of evaluation for what you want to do?"

Evaluating information visualizations ...

There are some valuable and accessible ideas on all of this in Sheelagh Carpendale's excellent piece on "Evaluating Information Visualizations" (Carpendale, 2008). This is an informative and accessible read. It's a well known chapter and seems to me to have stood the test of a time when things move quickly. It is highly recommended. In the chapter Sheelagh outlines a number of challenges, benefits and risks associated with the trade-off between generalizability, precision and realism. She calls for "a broader appreciation of the variety of and importance of many different types of empirical methodologies" and describes and discusses a range of techniques that can help those evaluating information visualization to choose the right kind of evaluation for what they want to do. And, hopefully, to do it in ways that are sufficiently reliable to produce results that are BELIVable.

Sheelagh also offers a series of "important factors" to consider when publishing research. I would extend these to the evaluation of research - by reviewers for instance, but also by others in the community who use research findings - and recommend these issues be considered by those planning research. This will help us an answer Robert's question: "how do we know that?".

To paraphrase, and with my notes in [brackets], we should be asking ...

Is an empirical methodology sensitively chosen?
It should be a good fit to the research question, the situation and the research goals;
Is the study conducted with appropriate rigor?
Different methodologies have different requirements and these should be followed.
Trying to apply the rigor used in one methodology to another is not appropriate. [See below on reproducibility]
Are sufficient details published so that the reader can fully understand the processes?
If appropriate, can the study be reproduced?
[Note 'if appropriate'. Reproducibility is a means of establishing reliability in some (controlled) contexts and not others (which are less controlled)]
Are the claims that are made appropriate given the strengths of the chosen methodology.
For instance, if a given methodology does not generalize well, then generalizations should not be drawn from the results.

As we evaluate information visualization, we need reliable and usable knowledge and these important factors may help us achieve this. At the end of her piece, Sheelagh calls for "thoughtful application of a greater variety of evaluative research methodologies in information visualization".

A pre-conference methods festival

The evidence after 10 years of BELIV is that this seems to be happening, and the workshop is a great means of achieving this. (It would have been better still had it not clashed with Sheelagh's tutorial on Qualitative Methods.) But, given our collective history of dependence on evaluation that is based on metrics, errors and efficiency, and the way in which we have learned to use and critique this form of evaluation to achieve reliable results, we must be particularly wary of "trying to apply the [means through which we achieve] rigor used in one methodology to another". What we know about these particular approaches that enable us to establish the extent to which we BELIV the findings, is not applicable to other valid and reliable ways of associating values with systems, techniques and experiences. And as a community, we need to know more about this.

BELIV seems to me to be just the forum for the discussion of these issues and the development of appropriate methods and community knowledge in VIS. As an IEEE VIS workshop it happens at precisely the right time to influence the community. What better way to address methodological and epistemological issues in a discipline than by getting researchers together to discuss them one day before their major annual conference - at a pre-conference methods festival!

And the current focus of BELIV allows us to do this, as long as we remember to ignore the legacy temptation to preface the term "evaluation " with summative, or experimental, or controlled, or quantitative when we see or hear or think it. This is tough, and it's important to do it when we are considering, writing, discussing and evaluating BELIV contributions. My BELIV 2016 co-authors and I had to routinely remind ourselves that "this really is about evaluation" as we developed ideas and wrote our paper on Action Design Research and Visualization Design (McCurdy et al., 2016), which supports continuous approaches to evaluation in design research throughout the design process. This is neither summative, nor quantitative, nor controlled, nor experimental.

That's how I know that

I think we need to do this in defining the workshop too - to take advantage of, and indeed actively encourage the kind of debate that was stimulated by the reflection and forward thinking that occurred at the decennial BELIV meeting. The 2016 BELIV Call for Papers still harks back a little to the kinds of SYSTEM evaluation that was the focus of InfoVis research when the conference began. We have moved beyond time an errors, but we have also moved beyond system evaluation and "metrics". The workshop seems to me to already be scanning and marking out the broad landscape of epistemology as we try to establish reliable ways of doing visualization research that produces results that the consumers of research are prepared to BELIV - "That's how I know that"

A call for papers for BELIV >10

I think we can embrace this change explicitly, and so to contribute to debate about the future of the conference, and in the hope that colleagues will have the energy to organise and contribute to another workshop, I suggest a few minor changes for the CfP for BELIV>10.

We invite contributions to BELIV 2018, the international workshop on evaluation in visualization.

Established scientific methods like controlled experiments and standard metrics like time and error continue to be the workhorse of evaluation in visualization. Yet, There is a growing need for the visualization community to develop dedicated evaluation and reliable approaches and metrics for evaluation at all stages of the development life cycle visualization design and development that address specific needs in visualization.

The goal of the workshop is to continue the discussion and spread the word as on alternative and novel we develop, adopt, adapt and broaden the range of evaluation methods and methodologies for visualization to establish reliable means through which we can produce research findings in which we and the communities we serve can BELIV.

I hope that this slightly modified call rejects the systems legacy and broadens the methodological landscape in ways that reflect the discussion in Baltimore. This would give us the opportunity to develop what Laura MacNamara described as "epistemic cultures" in ways that encourage us, as a community, to learn about and apply appropriate methods for establishing and interpreting knowledge. This would enable us, and those who use our knowledge, to both know more and BELIV more realistically in what we know.

Developing epistemic culture

The ongoing readjustment of quantitative data analysis to establish reliable approaches in light of the NHST controversy is an example. In public discussion with Laura at BELIV, Matthew Kay identified his hugely appreciated efforts to improve quantitative methodology in our discipline in light of this thinking as a concerted effort to develop "epistemic culture". This issue was the focus of Pierre Dragicevic's informative and influential keynote at BELIV 14 - 'Bad Stats are Miscommunicated Stats'. These kinds of contributions (e.g. Kay & Heer, 2016) seem highly likely to make visualization research more reliable.

BELIV - legitimize, enhance and fuel debate on methods

After an impressive and influential decade of BELIV activity, I hope that moving BELIV>10 explicitly in the direction that was so interesting at IEEE VIS 2016 might legitimize, enhance and fuel the excellent debate that was had in Baltimore. Considering the broader context of visualization research, and the appropriateness and reliability of the methods that we employ to allow us to BELIV the findings, will help the community mature and should result in better research and better knowledge. It may move us further from a legacy view on evaluation and a focus on summative system evaluation as the primary means of generating knowledge about visualization.

I look forward to more of this discussion and think that the BELIV workshop has a key role to play in facilitating and communicating it. BELIV could, and perhaps should, be the venue that explores what it means to evaluate VIS research contributions as we skeptically ask:
"to what extent do the approaches used in visualization research allow us to BELIV the findings?"

ACRONYM

Of course, the ways of adapting the BELIV acronym in light of this change are manifold - particularly if one is as flexible as those who came up with the original. Equally predictably none work perfectly.

Some of the less lame candidates include ...

BELIVable Evaluation - Legitimizing Information Visualization
I am only half serious here
Beyond Evaluation - Learning about Information Visualization
Beyond Evaluation - methodoLogy in Information Visualization

... you can probably do significantly better.

But actually, why not just stick with BELIV - as a reminder of the workshop's great heritage and the expanding remit is has established. And as a statement of the continuing focus on the community's efforts to generate knowledge in ways that give us confidence that we can BELIV the outputs.

A methods workshop in which the focus is on establishing research findings in which we BELIV?

Acknowledgments

These are personal views, but have been developed and indeed inspired by the work, discussion and culture established at the three BELIV workshops I have attended. So thanks to the organisers and participants.

Many of the ideas are impossibly intertwined with the complex but developing perspective on Design Study Research that my colleagues and BELIV co-authors from the University of Utah - Nina McCurdy and Miriah Meyer - and I have been working on. Views developed in discussion with Nina and Miriah have greatly influenced my perspective on visualization methods and BELIV has also contributed to our ideas.

giCentre colleagues at City, University of London, have also been the source of discussion that has informed my views as we have developed our research using a range of qualitative and quantitative methods.

I take the blame for any inconsistency or misrepresentation - I hope I have avoided this and have made best efforts to do so. Please shout if I have messed something up.

References and Links

Carpendale, S. (2008). Evaluating information visualizations. In Information Visualization (pp. 19-45). Springer Berlin Heidelberg.

Dykes, J. A. (1997). Exploring spatial data representation with dynamic graphics. Computers & Geosciences, 23(4), 345-370.

Dykes, J. (1998). Cartographic visualization: exploratory spatial data analysis with local indicators of spatial association using TcI/Tk and cdv. The Statistician, 485-497.

Kay, M., & Heer, J. (2016). Beyond Weber's Law: A Second Look at Ranking Visualizations of Correlation. IEEE transactions on visualization and computer graphics, 22(1), 469-478.

Kosara, R. (2016). An Empire Built On Sand: Reexamining What We Think We Know About Visualization. In Proceedings of the Beyond Time and Errors on Novel Evaluation Methods for Visualization (pp. 162-168). ACM.

McCurdy, N., Dykes, J., & Meyer, M. (2016, October). Action Design Research and Visualization Design. In Proceedings of the Beyond Time and Errors on Novel Evaluation Methods for Visualization (pp. 10-18). ACM.