Prof. dr. Alexandru C. Telea
Department of Information and Computing Sciences
Faculty of Science, Utrecht University, the Netherlands
Data visualization has become an indispensible instrument in the toolset of the modern data scientist. Visualization serves various goals, and addresses various tasks, in the scientific discovery process, ranging from forming, refining, and validating hypotheses to presenting scientific results to both the specialized public and general audiences. In the last decade, many new visualization techniques and tools have emerged in the scientific arena. In the same time, the diversity of data (in terms of provenance, size, heterogeneity) and questions to be answered has increased considerably. This makes it challenging for the practitioner to choose the optimal combination of tools and techniques for a given problem at hand.
This course approaches data visualization from a practical perspective. We start by discussing the different types of data involved into the visualization process, outlining their particular challenges they pose to presentation and visual exploration. Next, we present visualization techniques that address these different data types, and discuss for each of them their advantages, limitations, and way of use. Finally, we discuss the overall process of designing effective visualizations, which entails mapping the exploration questions at hand to suitable techniques and choosing optimal parameter values for these techniques. We illustrate the end-to-end visualization design process by several examples involving real-world datasets and use-cases from both science and the industry.
After completing this course, participants should be able to complete the following tasks and answer the following questions:
- Understand the visualization-related challenges and constraints imposed by the data they work with (size, dimensionality, heterogeneity, attribute types, and time dependence): What kind of data do I work with?
- Understand the different types of visualization techniques (scientific visualization, information visualization, infographics) and their applicability for specific use-cases: Which visualization subfield does best address my problem?
- Choose a suitable family of visualization techniques that addresses well their data and exploration questions (charts, table lenses, parallel coordinates, scatterplots, projections, timelines, treemaps): What technique should I use for my problem?
- Parameterize the chosen visualization techniques to account for the constraints implied by their data and exploration tasks (color mapping, axis encoding, clutter reduction, usage of linked views, annotations, graphical presentation): How should I set all the parameters of the visualization technique I chose to get the best results?
- Choose a suitable visualization design for the communication or exploration task at hand (data exploration, scientific presentation, dissemination to non-specialists, dissemination to the grand public): How should I structure the entire visual presentation of my data/results to optimally convey my message?
- Be familiar with a number of generic open-source visualization toolsets that implement common visualization techniques: Where can I find easy-to-use software to create my visualizations?
Participants should have a general background in science, and to have worked in a data-intensive research context, involving the collection, analysis, and presentation of scientific datasets. An additional background in statistics and/or data science is seen as an advantage but is not mandatory.