Automatic Generation of Visual Presentations for Software Understanding
Abstract
It is commonly assumed that visual presentations can facilitate the understanding of software. However, in order for a presentation to be effective, it should be designed to take into account the information needs and background of the person viewing the presentation. Yet it is impractical to manually create and maintain different presentations for different classes of users, and the principles underlying the design of such presentations need to be better understood. This paper describes work on PESCE [Presentation Engine for Software Comprehension and Explanation], a system that addresses the problem of automatically generating visual explanations of software. The system uses a model of what the user knows about the system, the user's task, and a set of visualization rules to build consistent visual presentations about software objects. We illustrate how PESCE works, showing the interaction between visual rules, constraints, and presentation methods in a software explanation example.
Keywords
software visualization, software understanding, knowledge-based user interfaces, automated graphics generation, computer-human interaction
The importance of visual representations in understanding complex systems has been well established [24]. In the particular case of software systems, conceptual visualization becomes critical due to the absence of physical parts. As such systems grow in complexity, textual explanations get more difficult to understand; here is where graphical representations prove their worth.
Static, predefined diagrams have been used as documentation for complex software systems. The dynamic nature of software, and the different characteristics of users trying to perform software understanding conspire to lessen the utility of static diagrams. Dynamically generated presentations are therefore highly desirable.
The task of generating presentations can roughly be broken down into two steps: select what information to show (content selection), and how to show it (presentation generation) [25]. In this paper, we will focus on the presentation generation problem.
Dynamically generating presentations for software artifacts represents a difficult challenge. First of all, software artifacts are complex objects of arbitrary dimension [18], and software understanding requires the ability to understand these objects from different views and the ability to map between these views (multiple dimensionality) [12, 16]. Furthermore, any selected information about these artifacts needs to be tailored to fit different user levels of expertise and tasks [10]. The visual component adds extra complexity to the problem since the mechanisms for conceptual comprehension of graphical depictions are not well understood [21]. In addition, visually displaying the information introduces extra implementation constraints due to the limited amount of graphical resources available at any given time[11].
To address these problems, we have identified certain key components to automatically generate visual explanations of software systems [1]. These are:
In order to generate a presentation for a particular user, the problem has been divided in two main stages:
Spatial layout of the presentation (the diagram itself)
This can be treated as a constraint-satisfaction problem; diagrams are constructed and combined constrained by diagram generation rules, visual heuristics, and knowledge about the user.
Temporal layout (the animation and diagram view transitions)
The resulting diagram is reformatted to show the information relevant to the user in a series of steps in correct logical order (a presentation). This is a planning problem that needs to comply with pedagogical prerequisites and user model restrictions, to achieve a series of communicative goals.
The integration of these stages is not trivial, since changes in the temporal layout may involve modification of the solution provided in the spatial layout stage, therefore requiring a reformulation of the constraint satisfaction problem.
We are currently working on PESCE (Presentation Engine for Software Comprehension and Explanation) to provide a solution for the presentation generation problem. PESCE is a component of MediaDoc [7], a software engineering tool being developed at ISI that uses both textual and graphical presentations for software explanation. MediaDoc is a continuation of I-Doc [10], a previous project at ISI involving the automatic generation of user-tailored, text-based explanations of complex software systems. I-Docs hypertext presentations were appropriate for user queries, but lacked the power to clearly show high-level interactions between software objects.
Several systems have been developed that address the problem of automatically generating visual presentations. Some of these systems deal with the visualization of mainly quantitative information in the form of tables, graphs, etc. (e.g. SAGE [13], BOZ [6]). These scientific visualization systems do not provide the adequate techniques to represent abstract relationships between concepts, a feature that proves critical in software understanding.
Other systems generate planned multimodal presentations from some underlying representation (e.g. WIP [3, 26], COMET [8]). These systems have been successfully used to generate instructions for technical devices, a task that is similar to the explanation of software artifacts. On the other hand, they have not been used in the domain of software engineering, and it is not clear that they could provide the multiple integrated views needed for a clear understanding of conceptual relations between software objects.
Staskos work to visualize program execution through automated animations provides tools form understanding and debugging programs (LENS[15], GROOVE[9]). Nonetheless, the presentations generated are designed towards program debugging; their focus has not been to provide high-level, abstract visualization of the different components of a complex software system.
Another example is the RIGI system [16], a visual software understanding tool that provides different conceptual views of complex software systems. It does not, however, allow different conceptual views of a complex system to be shown at the same time and, more importantly, does not provide the display over time of multiple diagrams and animated presentations.
Zhou and Feiners IMPROVISE [27, 28] is a knowledge-based system that can automatically generate coherent visual discourse using a top-down, hierarchical-decomposition, partial-order planer. Their approach seems to be suitable to be applied to the software understanding problem, even though they have not tried such an application.
More information about related work in the field of software visualization can be found in [17, 19]. For a more formal discussion on knowledge representation in multimedia presentations see [4].
PESCE implements visual realizations of textual explanations generated by MediaDoc (see figure 1). MediaDoc is a tool for user-tailored software explanation; its main goal is the automated generation of multimedia animated presentations for the visualization of software systems by different users.
The main components of PESCE are a repository of visualization rules for software objects and relationships, a presentation engine that applies those rules to generate visual directives to display some given information about a software system, and a diagram generator that realizes those directives on the users screen. Presentation heuristics are used to make color and shape coding coherent with the users intended focus of attention throughout the presentation, since choosing the right visual presentation method(s) is key in conveying useful and precise information to the user and in avoiding sidetracks that may add confusion.

Figure 1: The MediaDoc Architecture
3.1. The Presentation Engine
The main component of PESCE is the presentation engine; it is implemented as a Perl script. This module receives relevant content information from MediaDocs explanation engine in response to a particular user query. That information is in a SGML-like format (figure 4) that can be easily parsed into individual objects and relationships; it also has the advantage of making it easier to interface PESCE to other software engineering systems besides MediaDoc.
From the information given, PESCE builds a data structure that will be searched to solve the presentation problem, i.e. how to coherently present that information. For each object and each relationship, an element is added to the structure containing descriptive information about the object or relationship and a link to the list of rules that may be used to visualize it. At the same time, each visualization rule points to a particular presentation method, and a series of visual constraints inherent to the method. These constraints may involve style, and may also be spatial (size, position), or temporal (order, duration).
Besides constraints and rules linked to elements on this data structure, PESCE relies on several global constraints. In particular, MediaDocs user model is accessed to generate the appropriate global constraints to properly tailor a visual presentation to the current user (e.g., some details are hidden from novice users and highlighted to expert users, color-based coding is avoided when generating diagrams for color-blind people, etc.). The information from the user model is represented in an SGML-like format; this feature gives PESCE the potential to be easily interfaced to different user models outside of MediaDoc.
Once this linked data structure is completed, it has to be traversed to generate the visualization of all its components, paying special attention to their constraints. This is done through a forward-chaining mechanism. Every time we get to an element where all its visualization rules lead to conflicts with existing constraints, we have to backtrack to the nearest element setting a conflicting constraint, and try an alternate rule. If the constraints being violated are global, then there is no solution for the inconsistency, and some of the constraints may have to be dropped, or the presentation will not be realizable. To choose what constraint should be discarded, each constraint has an associated importance value from 0 (irrelevant constraint) to 1 (mandatory constraint); the conflicting constraint with the least importance is dropped.
Once all the elements in the structure are realized successfully, the resulting set of visual directives is specially formatted and sent to the diagram generator for graphical display (see section 3.3).
In choosing a traversal sequence, a heuristic is needed to try to minimize backtracking. We have chosen a heuristic based on the complexity of the data elements and their connections; in our experience, more densely connected objects will lead to more spatial constraints, which are difficult to resolve once other constraints related to several simpler objects have been instantiated. By instantiating the more complex objects first, we reduce the level of backtracking up to the simpler ones. Also, when choosing a rule for a specific data element in the structure, a flexibility value from the individual constraints is used.
3.2. The Visualization Rules
Visual rules are used by the presentation engine to generate graphical representations of an object or relationship (figure 2). Each rule has 3 main components:
It is important to note that the rule/graphical method link allows us to show the abstract type of relationship/object for any particular screen object and the presentation method used to visualize it. This is useful if the user requests clarification about why some graphical information on the screen it is laid out in a particular way.
Rules and constraints have values related to them to help in the traversal and backtracking process. Rules have preference values associated to them to guide the forward chaining; they are used to choose what rule to try first for a given object or relationship, and range in value from 0 (never choose) to 1 (always choose). Constraints have an associated importance value to select the least important constraint to discard; it also ranges from 0 (irrelevant constraint) to 1 (mandatory constraint). The conflicting constraint with the least importance is dropped.
In the figure 2 example, we can distinguish 2 rules to be applied when the software artifact to be displayed is a relationship of type Part-of: show the components in a nested fashion (a inside b), or show them at different levels in a hierarchical diagram (b above a). The nested presentation will be usually selected first since the preference value for the rule is higher (0.8) than the one for the top-down approach (0.5). For the nested presentation, there are location constraints that are mandatory (a inside b - importance = 1.0), and a style constraint that is important, but not mandatory (color of a is different than color of b - importance = 0.7). For the top-down hierarchical diagram, only one of the 4 constraints is mandatory (location b is above a).

Figure 2: PESCEs internal representation of the Part-of relationship and two of its visualization rules
To define the rules for visual presentation, we investigated what characteristics make a visual presentation clear and useful; this is a difficult task since visual representation has been traditionally more of an art than a science. Nevertheless, we have been testing different rules for proper presentation generation based on the work of Tufte [22, 23, 24], Albers [2], Bertin [5], and Marks [20]. PESCEs current repository only contains a few rules, but we are working to incorporate more in the future. We are building a taxonomy for the application of presentation methods, both the ones already implemented in PESCE (MAP), and others that will need to be added in the future.
3.3. The Diagram Generator
The Diagram Generator, or Diagen, is a Java applet that represents objects and relationships through a graphical layout on a web page. Diagen is used to provide a graphical element to MediaDoc through PESCE, but has also been interfaced to several other packages.
The diagram is generated from an SGML-like description (MAP Markup language for Authoring Presentations-) provided by the applet server's machine. The applet requests the file from the server (PESCE in MediaDoc) and interprets the MAP description to create and display the diagram at the client site (see figure 3 for an example output screen). Foreground objects and actions can be specified on top of the graph to create time-sequenced animations; the user can interactively control these animations. For a complete description of the MAP specification see http://www.isi.edu/isd/I-DOC/diagen/map.html..

Figure 3: Snapshot of screen generated by Diagen
4. An Annotated Example
A simple example will help visualize the way PESCE works. The input corresponds to a partial, oversimplified high-level description of ModSAF. ModSAF (Modular Semi-Autonomous Forces) is part of a distributed simulation system for military training; its complexity (over 250 libraries and 1.5 million lines of code) makes it a good test for the usability of a software understanding system.
As a first step, MediaDocs explanation engine selects information about ModSAF to be shown to the user (part of it is shown in figure 4), and sends it to PESCE. The presentation engine parses the information about software objects and their relationships; a data structure is generated and linked for every object and relationship found (figure 5).
<OBJECT>SOFTWARE <TYPE> class </TYPE> <DESCRIPTION> Object is a software artifact </DESCRIPTION> </OBJECT> <SOFTWARE> ModSAF <DESCRIPTION> Modular Semi-Automated Forces </DESCRIPTION> </SOFTWARE> <SOFTWARE> SAFStation <DESCRIPTION> ModSAFs User Front End </DESCRIPTION> </SOFTWARE> <SOFTWARE> SAFSim <DESCRIPTION> ModSAFs Simulation Back End </DESCRIPTION> </SOFTWARE> <RELATIONSHIP> PART-OF <ATTRIBUTE> MORPHISM = 1-to-n </ATTRIBUTE> </RELATIONSHIP> <PART-OF> ModSAF <RANGE> SAFSim, SAFStation </RANGE> </PART-OF> |
|
Figure 4: Example content information to be visualized (partial) |
Once the internal representation of the presentation information is constructed, PESCE begins its traversal. In this example, the 7-children part-of relationship node is selected since it is the most complex one (figure 6). For the part-of relationship, the rule "nested layout" is instantiated. The intrinsic constraints for that rule (see figure 2) are checked and, since no conflicts exist, they are added to the active set of constraints. The presentation method is also added to a list of active methods.
Figure 5: Internal representation for the textual information in the example
When the next node is visited, the rule selection and verification of constraints is performed, including backtracking if conflicts arise. If backtracking is needed, the corresponding constraints and presentation methods are eliminated from the current list.
Figure 6: First step of the traversal; the most complex node is chosen. Note the node type and presentation method chosen at top right (the darkest node is the current one)
The rest of the nodes are visited in turn, repeating the verification of constraints and backtracking if conflicts exist, until all elements in the structure are instantiated (figure 7).

Figure 7: Last step of the traversal (instantiation of the "Creates" relationship note clear nodes that have been visited already)
As a last step, the presentation methods in the final list are performed, yielding a set of visual directives that is sent to Diagen to generate the visual presentation on the screen (figure 8).
In this example, we have shown an example entailing simple spatial rules to describe the functionality of PESCE. While the current PESCE implementation also uses temporal rules and constraints, we think including examples with temporal rules would add too much complexity to our description.
Figure 8: Final presentation generated for the example input data
5. Future Work and Conclusions
We are building a system that automatically generates a series of visual representations to form a coherent explanation about the components of a software system and their underlying relationships. For that purpose, we are integrating into our system a user model and a diagram visualization tool, both developed for MediaDoc, a set of visual rules derived from current literature, and an algorithm to instantiate these rules and check their inherent constraints. We have been testing several examples of visual presentations in response of user queries about a software system.
Currently, we are working on defining a larger, more general set of visual rules to address a wider range of software visualization cases, and on testing different heuristics to efficiently instantiate those presentation rules and their corresponding constraints for every object and relationship. A first working version of PESCE will be demonstrated at the EDCS 1998 meeting in Baltimore.
Acknowledgements
The research described in this paper is sponsored by DARPA under DARPA order number D880.
References
[1] R. Adobbati, Towards the Automated Generation of User-Tailored Visual Representation of Complex Software Systems. Poster Presented at the California Software Symposium, University of California, Irvine, CA (1997).
[2] E J. Albers, Interaction of Color. Yale University Press, New Haven, CT (1975).
[3] E. Andre, W. Finkler, W. Graf,, T. Rist, A. Schauder and W. Wahlster, WIP: The Automatic Synthesis of Multimodal Presentations. M. Maybury, ed., Intelligent Multimedia Interfaces, AAAI Press, Cambridge, MA (1993) 75-93.
[4] Y. Arens, E. Hovy and M. Vossers, On the Knowledge Underlying Multimedia Presentations. M. Maybury, ed., Intelligent Multimedia Interfaces, AAAI Press, Cambridge, MA (1993) 280-306.
[5] J. Bertin, Semiology of Graphics. University of Wisconsin Press, Madison, WI (1983).
[6] S.M. Casner, A Task-analytic Approach to the Automated Design of Graphic Presentations. ACM Transactions on Graphics 10(2), (1991) 111-151.
[7] A.Erdem, W.L.Johnson and Stacy Marsella, Task Oriented Software Understanding. Submitted to ASE98, Hawaii (1998).
[8] S. Feiner and K. McKeown, Automating the Generation of Coordinated Multimedia Explanations. IEEE Computer 24(10), (1991) 33-41.
[9] D. Jerding, J. Stasko, and T. Ball, Visualizing Interactions in Program Executions. Proceedings of the 1997 International Conference on Software Engineering (ICSE-97), Boston, MA, May 1997, pp. 360-370.
[10] W.L.Johnson and A.Erdem, Interactive Explanation of Software Systems. Automated Software Engineering 4, (1997) 53-75.
[11] Z. Kulpa, Diagrammatic Representation and Reasoning. Machine GRAPHICS & VISION 3(1-2), (1994) 77-103.
[12] A. Lakhotia, Understanding Someone Elses Code: Analysis of experiences. Journal of Systems and Software 20, (1993) 93-100.
[13] V. Mittal, S. Roth, J. Moore, J. Mattis and G. Carenini, Generating Explanatory Captions for Information Graphics. Proc. of the Fourteenth IJCAI, Montreal, Canada (1995).
[15] Mukherjea, Sougata and J. Stasko, Toward Visual Debugging: Integrating Algorithm Animation Capabilities within a Source Level Debugger. ACM Transactions on Computer-Human Interaction 1(3), (1994) 215-244.
[16] H. Muller, K. Wong And S. Tilley, Understanding Software Systems Using Reverse Engineering Technology. Colloquium on Object Orientation in Databases and Software engineering, The 62nd Congress of "LAssociation Canadienne Francaise pour lAvancement des Sciences", Montreal, Canada (1994).
[17] B.A. Myers, Taxonomies of Visual Programming and Program Visualization. Journal of Visual Languages and Computing 1(1), (1993) 97-123.
[18] M. Petre, A.F. Blackwell and T.R.G. Green, Cognitive Questions in Software Visualization. J.Stasko, J. Dominguez, B. Price and M. Brown, eds., Software Visualization: Programming as a Multi-Media Experience, MIT Press (1997).
[19] B.A. Price, R.M. Baecker and I.S. Small, A Principled Taxonomy of Software Visualization. Journal of Visual Languages and Computing 4(3), (1993) 211-266.
[20] K. Ryall, J. Marks, and S. Shieber, An Interactive Constraint-Based System for Drawing Graphs. Proc. of UIST 97, Banff, Alberta (1997) 97-104.
[21] M. Scaife and Y. Rogers, External Cognition: How Do Graphical Representations Work?. Int. Journal on Human-Computer Studies 45, Academic Press Limited (1996) 185-213.
[22] E.R. Tufte, The Visual Display of Quantitative Information. Graphics Press, Cheshire, CT (1983).
[23] E.R. Tufte, Envisioning Information. Graphics Press, Cheshire, CT (1990).
[24] E.R. Tufte, Visual Explanations. Graphics Press, Cheshire, CT (1997).
[25] M. Vossers, Automatic Generation of Formatted Text and Line Drawings. Masters Thesis, University of Nijmegen, The Netherlands (1991).
[26] W. Wahlster, E. Andre, W. Finkler, H. Profitlich and T. Rist, Plan-based Integration of Natural Language and Graphics Generation. Artificial Intelligence 63, (1993) 387-427.
[27] M.X. Zhou, and S.K. Feiner, Visual Task Characterization for Automated Visual Discourse Synthesis. Proc. ACM CHI '98, Los Angeles (1998).
[28] M.X. Zhou, and S.K. Feiner, The Representation and Usage of a Visual Lexicon for Automated Graphics Generation. Proc. IJCAI '97 (1997 Int. Joint Conf. on AI), Nagoya, Japan, (1997) 1056-1062.