Patrick J. FitzGerald is an Assistant Professor in the
School of Design & Technology
North Carolina State University
pat_fitzgerald@ncsu.edu
|
Tim Buie is a Graduate Student in the |
Michael Cuales is a Graduate Student in the |
Abstract
As designers of interfaces for multimedia learning technologies of the future, we are charged with the immense task of building systems that most effectively leverage emerging interface paradigms in education and game design, diverse approaches to how the brain learns, and the rapid developments in digital technology. These future systems will have tremendous power, featuring new forms of input/output devices, artificial intelligence, interactive 3-D real-time immersive environments, animated intelligent agents, and continuously updating contextually customized libraries of text, images and animations that present themselves in a "just in time" manner (Fitzgerald & Lester, 1997). What will this educational software of the future look like? How will it function? This paper introduces a vision for how one of these systems might work and the design parameters involved in developing the EyeCue System , a interdisciplinary project under development at at North Carolina State University.
Overview
The overarching goal of any interface is elegance. Great interfaces are understated and well crafted (Tufte, 1983). Well conceived interfaces don't call attention to themselves. They are transparent. Interface is not put on as window dressing after the software has been built. Rather, it grows from the initial vision of the project, concurrently and innertwined with the software development of the application. Flexing and stretching, the design bends between the constraints of the software and computer (not to mention more pragmatic issues of time and money) and the promise of amplifying the innate learning propensities of the user. The design refines itself as the development process continues.
In the past, effective computer use relied on the specialized ability of some humans to memorize and recall sequences of highly abstract syntax in a very specific order in order to interact with the computer (the command line interface). It was not until the emergence of a more intuitive interface, the windows metaphor, that a much greater number of people were able to incorporate computers into their lives as helpful and time-saving devices (Laurel, 1990). What advances, technical or metaphor-based, will constitute the next leap forward in intuitive interface design? Does the grammar of visual storytelling presented in film offer a standard that we can borrow? Would it be possible to have some form of conversational communication integrated with advances in graphics to increase the efficiency of learning and make information collection more manageable? What if your digital library could build specific multimedia presentations to answer your specific questions?
Let's have a look at some of the technologies that are very likely to be an integral part of learning technologies of the future. One picture of current research in educational technology resides in our labs at Intellimedia. Cutting edge research in numerous areas, these projects represent the beginning of a concentrated effort to integrate powerful new technologies, namely animated agents, 3-D real-time environments, and an area of artificial intelligence called natural language processing towards the creation of the next generation of learning tools. The following is a descriptive list of the research group itself and the projects currently underway.
IntelliMedia Initiative is a large-scale multi-disciplinary R&D effort being conducted at the School of Design and the College of Engineering at North Carolina State University. It has been established to create state-of-the-art intelligent multimedia educational technologies. Focusing on animated pedagogical agents, 3-D real-time environments, and interface design for education, this group undertakes a broad range of issues in basic research that examines the computer science, design, cognitive science, and educational problems in designing, developing, and empirically evaluating pedagogical environments and intelligent animated agents.
Animated intelligent agents represent a new generation of human computer interface design. Utilizing state-of-the-art technology, designers of agent technology integrate artificial intelligence with believable animated personas to produce a new level of interactivity and customized response. 3-D environments with "smart" camera planning and natural language narration offer new educational opportunities for vivid, customized multimedia explanations.
Intellimedia has five projects currently underway. Design-A-Plant is a design centered learning environment teaching plant physiology. Design-A-Plant features a bug-like creature called Herman. This animated agent offers customized advice and support to students as they construct plants in various environments. The Internet Advisor is the second project currently under construction. Featuring the intelligent agent Cosmo, the Internet Advisor helps explain the fundamentals of internet (TCP/IP) protocol through an interactive packet-matching game. Cosmo offers customized advice as the user guides his packet across the internet. This agent can be interacted with through an interface which allows the user to ask the agent to repeat, skip-over or reexplain what has just been presented. The third project underway at Intellimedia is 3-D SEE . The 3-D Self-Explaining Environment project promises to be a significant leap forward in learning technology. A combination of an animated intelligent agent technology and 3-D real-time environments (complete with a updating roving camera), this system is linked to a customized explanation generator to produce a highly interactive learning situation. The fourth project is the EyeCue System . A design prototype, the EyeCue System is a vision for the next generation of educational technology, integrating Intellimedia's latest technology through new interface concepts to produce highly interactive, easy to use desktop educational software. The most recent project underway at Intellimedia is PhysViz . A project conducted in collaboration with the North Carolina School of Science and Math, PhysViz will be an intelligent, real-time, 3-D application in the domain of physics.
Please visit our web site at http://multimedia.ncsu.edu/imedia for more detailed descriptions.
Design Strategy
The EyeCue System is a conceptual educational software prototype that leverages Intellimedia's latest technical advances. The technology presented in this interface is either currently part of one of our projects or is technically possible with a large team of artificial intelligence specialists. We need to take a long look towards the horizon, (not really so far away) and design systems that are technically feasible but not constrained by current (and temporary) limitations such as polygon count and today's computer processor speeds. These powerful systems will arrive in the very short term. Instead, we take for granted that technical progress will continue at current rates and focus our designs explicitly for the top priority, the end user.
The design strategy, at it's highest level, is clear; to build a system that makes learning easy and is very simple to use. Much like the experience of viewing successful film, the viewer should not be aware of the parts that make up the whole of the software experience (Boorstin, 1995). The experience, be it information presentation or information gathering, should be integrated. The media and the way it is presented should not call attention to themselves. Effective media is in service of the information it carries (Laurel, 1990; Tufte, 1990, Wurman, 1990).
But what about the media's relationship to the information? How can we take the best features of each media and combine them to make a more effective system? What information works best with what media? What are the digital environments' strengths and weaknesses?
Media Overview
Various kinds of information are most effectively carried by specific media (Wurman, 1990; Schacter, 1996; Boorstin, 1995). It might be much more efficient, for example, to show someone the text of a telephone number than to show them the visual sequence of the number being dialed. Likewise, the emotion of a certain character would be more accurately displayed by seeing a film of the emotion on the individual's face than to merely say that he was disappointed. Still, most information can be more vividly remembered if it is presented through multiple media, either simultaneously or sequentially (Schacter, 1996).
Text is a powerful medium for carrying abstract information. The ability of the viewer to review what is currently being read (reading at one's own pace) and the ability of the viewer to stop reading altogether (to digest information or problem solve) are powerful features of text. People can read much more than they can hear in the same amount time (Schacter, 1996). Text is not, however, effective at explaining complex spatial relationships.
Besides the obvious power it has in carrying information about 3-D space, video offers nuance of emotion. No other media can carry this sort of overlapping communication information so vividly. Subtleties of gesture, expression, intonation, and timing are unequaled by other media in this respect (Boorstin, 1995; Katz, 1991). Although video can carry the details of real world information, it has a significant drawback when used to explain information. It does not highlight information (Thomas & Johnston, 1981). It's noisy. Video shows us more than we need to see.
In addition, video doesn't offer us the ability to change the rate at which information is being delivered. It is linear and can only be crudely controlled with scrolling devices. According to Boorstin (1995) at top retention we can only cover about 10 pages of information in a 25 minute film. A more pragmatic assessment would be less than that.
Linear video, like text, offers little chance for adjusted detail on a specific topic. One must sit through the video, hoping for a more detailed description of the topic of interest. In the case of text, one must search out and find (in physical space) related material. Used in isolation, either approach is less than satisfactory as preeminent learning tools and this situation is avoidable in digital environments.
Animation offers many promises for clear communication with a diminishing downside (Lasseter, 1987; Jones, 1989; Thomas & Johnston, 1981). Digital animation software (both 2 and 3-D) offers clear advantages in scene and character animation. Time, effort, and the ability to revise serve as powerful pragmatic reasons to adopt digital animation over the traditional approaches to explaining information. These are not the primary reasons to use animation as the main visual carrier of information, however. The primary reason is simple: animation has the ability to exaggerate and sublimate information, enabling the designer to emphasize the salient features through exaggeration, while sublimating the unimportant information by not presenting it at all. Furthermore, animation can compress or expand time and space in a more believable manner than film or video (Thomas & Johnston, 1981). It can even change in appearance (show more, less, or even different detail) over time to aid the user in the attempt to recognize relationships and capture knowledge about a particular subject matter.
Another particularly important feature of animation is anticipation. The same is true for all efficient time sensitive information delivery systems (think highway signage). The user must have a good idea of what happens next (getting bearings on the situation) in order to appreciate the how of that particular action (Lasseter, 1987; Jones, 1989; Boorstin, 1995; Thomas & Johnston, 1981). In other words, if the user can easily guess what is about to happen, he can more readily concentrate on other aspects of the situation at hand. Good design is transparent.
Character animation (agents and avatars) is a great place to see some of these principles in action (Lasseter, 1987; Thomas & Johnston, 1981). Gesture, intonation, voice, expression, even posture aid in mutually supportive and clear communication. Notice how the eyes lead all action, even the head turn, telegraphing the movements and making the overall action smooth and understandable. Cosmo's visual appearance was designed with communication in mind; big hands, thin and less emphasized arms, large eyes and eye brows (for expression), and little emphasis given to less important parts of the body such as the midsection and legs. Character animation is the top of the animation, but it is worth climbing ladder (Thomas & Johnston, 1981). Clear communication with low noise awaits the diligent animator/designer.
Sample of character traits and personality in Avatar and or agent (QuickTime Movie)
Agents must also have a personality and appearance that will help the user infer what information should and shouldn't expect from these digital guides (Laurel, 1990). It is a poor idea to create a character that looks omnipotent but delivers much less. Furthermore, creating audience empathy with the agent through manifestations of personality can create motivation in the student learner that would not otherwise exist.
The Multimedia Approach
The drawbacks to using the computer as the main instrument of learning have traditionally been lack of access (a critical issue but statistically diminishing with cheaper computing and internet access), the ever present interface learning curve, and less than robust educational applications. Software (including the system software) needs to be designed better.
Of course, advantages of using interactive systems for learning are many. In the EyeCue project, our attempts to explain how a computer works are significantly aided by our ability to explode space and time in the computer. We are able to build a model of the micro-electronics of the motherboard on a more human scale so it is easier to relate to spacially. Likewise, time can be slowed to more readily understand the complex sequences of processes that occur. These are extremely useful forms of exaggeration that can make learning easier.
The technologies of 3-D real-time worlds are becoming commonplace. Software in which users can navigate avatars to solve problems (or kill anything that moves) have emerged to dominate the gaming market. Current speech synthesis technology allows us to hear verbal articulation (admittedly less that perfect) based on a text file. Interactivity allows the user to ask questions or proceed in a direction of their choice in a time-based manner. All this is potentially very powerful technology if integrated effectively. The real punch, however, comes from using artificial intelligence to build customized text-based explanations in a "just in time" manner. These natural language systems produce highly customized English language explanations based on the question you ask, the situation you are in, and what the system itself knows about the subject matter. (Please see Intellimedia technical papers to get more detailed explanations on natural language generation and knowledge-base learning environments). These articulated, customized explanations can be tied to camera planning and animations to provide vivid, information rich, real-time explanations.
Imagine a system that could re-explain 3-D information from other angles . Interpreted literally, the camera could re-present the information from another sequence of choreographed camera angles to give the viewer a redundant (this is not bad!) assessment of the information (Katz, 1991). A more powerful approach would include having the system revise the textual description (based on the users request), causing the associated camera views to change in angle, sequence, and focus. Additionally, the environment itself could reconfigure, highlighting certain features to isolate the particular points of emphasis. This approach would offer both repetition and novelty in the learning experience--a powerful combination for long-term retention (Schacter, 1996).
An Interface for Learning
The goal of an educational system is to help the student construct an accurate and rich mental model, there by establishing deep, elaborate encoding of the subject at hand. How is this done? How can our interface support this task? We set forth the following guidelines to help direct us in the iterative development of the EyeCue System:
A Powerful Classroom Tool
How might middle school teachers use this new technology? As we will see in the following description, the systems of tomorrow will offer immense amounts of information in a multimedia interactive format. It might well be that a student's ability to collect information in a highly selective manner will emerge as the most important skill-set students could acquire at this age.
Imagine students collecting and organizing bits and pieces of animation, sound and text in digital notebooks which could later be evaluated by their instructor for quality of perspective and presentation. Students might prepare for testing by exploring an environment and building a selective portfolio of information from which they could refer during teacher designed tests. In these tests, students would be lead through environments by agents who ask questions, offer advice and encourage students on toward a successful completion of their learning tour. Teachers preparing for a lecture could travel through this information landscape to select pertinent information for a very specific learning module that the software would then create into a vivid, interactive animation.
Sample of Interactive Animation in QuickTime Format
As with the radio and the television, guesses of the ultimate uses of these new technologies will probably seem humorous in retrospect. In the end, classroom use, evaluation, teaching innovation and technological evolution will bend and flex these learning tools toward their most effective use.
System Description
The following is a walkthough of the EyeCue System. This system has numerous major overlapping functionalities: the teacher or lesson plan mode, the problem solving mode, and the information gathering mode. This description will center primarily on the information gathering mode of the system.
Information Gathering
There are five key components to the information gathering mode of the system: a modifiable index of the lesson sits to the upper left, a user-centered hypertext system is located on the right hand side, a "go to" map sits at the bottom right, and finally, two animated agents, EyeCue and Whizlo, reside in the lower left and center (respectively) of the interface.
Whizlo is an energetic, uniwheeled, digital helper who has the double function of serving as both an agent and avatar. The user may guide Whizlo (the avatar) to complete tasks or explore aspects of the computer, but, when control is returned to the system, Whizlo (the agent) explores and asks questions of EyeCue by himself. This second type of interaction creates a sort of information movie that the audience can choose to watch (passively) or interact with (actively). The instructor can easily preprogram (literally show) the system what areas to cover (or pass over).
EyeCue is the character that represents the natural language generated explanations of the system. As the customized text is produced, voice synthesis software articulates it, creating a narration of the unfolding animation that is synched with the accompanying gestures and expressions of EyeCue. The visual appearance of this animated intelligent agent is designed to rhyme with the less than natural sound of current text to voice technology. Imagine the same system with a cute animal character but with the unemotional voice of a computer!
A modifiable index of the lesson
As the lesson begins, the index offers the user a syllabus-like functionality. Users can see that they are currently at lesson 3: the CD ROM. By holding down on the 3, the user may go to any of the other chapters listed here. Check marks indicate what lessons have been completed. The purple dot indicates the current lesson. Furthermore, a user or teacher may amend the lesson by clicking on and off major subjects or minor areas of focus. This information is then saved into the lesson planner for future use. Likewise, a preprogrammed lesson can be loaded into the current lesson plan. Teachers could purchase, create, or customize lessons for student use. These lessons could serve as auto-building linear presentations if the teacher so desired.
Customizable Intelligence
EyeCue is customizable. The user can customize his personality to be more talkative and friendly, or more to the point with less humorous interjection. Certainly, agent appearance itself could be customized by the user to switch out characters altogether. This functionality has minor effect to the underlying AI structure itself.
The AI software has a component called the user (or student) modeler. It is constantly updating itself to infer what it believes the student knows about the subject matter. To make the system more efficient immediately, the user can see and amend the student modeler. The experience button enables the user to update and modify any assumptions that the user modeler has incorrectly made. The user modeler could have (for example) incorrectly inferred that the user knows more about binary numbers than is actually the case. As the user detects this incorrect assumption through receiving overly complex descriptions and animations concerning binary numbers, the user (student) profile can be open and adjusted. The AI software will now take this new information into account when new presentations are constructed and presented.
Likewise, detail and pace can be adjusted on a global level. If the user is generally interested in a more cursory overview of information or a more relational perspective, this can be adjusted. The general pace (rate) of explanations can be customized. More detailed animations for specific types of information can also be requested.
The GO TO Map
The "go to" map serves numerous functions. It's primary job is to orient the viewer. In a complex micro world, understanding your position and scale help minimize confusion while exploring new areas and relationships [3]. As user rolls over the second level, the first level becomes smaller and less significant. Rolling over the bottom area results in similar functionality. By clicking on the arrows at the bottom of the map, the user can snap to the next most significant location. Rolling over any location with the cursor will call up text labels. The user may click on a location or use the scrolling titles to go to the next area of interest.
Learning-Centered Hypertext
The upper right section of the interface holds the text-based delivery system. It is designed to be cross-referential with convenient "just in time" functionality. The top part of this information bar has cross referencing terms (object and action) that produce customized summaries in the text box below. If a user is interested in how the action (DATA) affects the object (HARDDRIVE), he would simply hold down on the object label (currently the CD ROM) and scroll to the term harddrive. A new description would be produced and displayed in the text box below. The user-centered hypertext bar sits below the objects action. It offers a universal system to conveniently access various types of information tools for the term or topic you are currently interested in.
A task-centered interface, this system offers an overview functionality that displays customized textual information. A detail scroller enables the user to get more (or less) information on demand. Page numbers let the viewer know how much information exists and limited bookmarking offers the chance to review your steps. The user may click on any hypertext word to receive the same functionality. To eliminate the current text box and return to the original underlying text, click on the red dot in the corner.
To ask a question about the topic at hand, the user pull his cursor over the closest FAQ button. The frequently asked question functionality offers the user two main choices. One can pick from a list of software customized questions based on the user's profile and current situation, or a question can be built from the template questions located in the QUERY section. To ask a question, press the question mark to receive a textual and animated response. When the animation is complete, the system returns to the textual overview section.
The notes function is a storage mechanism which allows users to store selected concepts, terms, and animations for later use. Notes can be used by a teacher to build a customized lesson. These notes would then be loaded into the lesson planner and used as the pedagogical guide for the next day's lesson.
The most exciting functionality in notes is the summary function. By clicking on summary, the student asks the AI to rewire all collected notes into a coherent, pedagogically sound animation. The user may watch this animation for review or perhaps submit it as a thoughtfully constructed overview of the subject matter as a means of evaluation.
Finally, the animation function allows the user to see an animation of the term or topic he is currently interested in. Perhaps the most powerful feature of the system, the animation function allows for a customized animation to be played. This animation would be dictated by the natural language description produced. Tags for camera movement and animation sequences would be tied to this description. Finally, this description would be narrated by EyeCue. The user would see a dedicated animation of the topic at hand. For a more detailed animation/explanation, the user can pull down on the detail bar. The system takes this new information into account and builds a more in depth description with associated tags, ready for presentation.
Conclusion
The combination of artificial intelligence and 3-D environment technologies with innovative interface design offers great opportunities for efficient, customized learning. These integrated technologies will produce new learning tools which will make understanding and retaining many types of complex information much easier. As we know from the evolution of desktop computing, these new technological innovations need to be coupled with innovative design concepts to fully leverage their impressive potential. The next generation of educational technologies will promote curiosity driven learning. These systems will blur the line between exploring and learning, entertainment and scholarship.
References
Boorstin, J. (1995). Making Movies Work , Silman-James: Beverly Hills, CA.
Calvin, W. (1994). How Brains Think , BasicBooks: New York, New York.
FitzGerald, P. J. and Lester, J. C. (1997). Knowledge-based Learning Environments: A vision for the 21st century. In Peter H. Martorella, editor ,Interactive Technologies and the Social Sciences: Emerging Issues and Applications , 111--127. SUNY Press: New York.
Jones, C. (1989). Chuck Amuck , Warner Brothers: New York, New York.
Katz, S. (1991). Film Directing Shot by Shot , Michael Weiss Publications: San Diego, California.
Kosslyn, S. (1994). The Image and the Brain , MIT Press: Boston, Massachusetts.
Lasseter, J. (1987). Principles of traditional animation applied to 3[D] computer animation. In Proceedings of SIGGRAPH '87 , pages 35--44.
Laurel, Brenda (1990). The Art of Human-Computer Interface Design, Addison-Wesley: Boston, Massachusetts.
Lester J. C., FitzGerald P. J., & Stone, B. A. (in press). The pedagogical design studio: Exploiting artifact-based task models for constructivist learning. Proceedings of the Third International Conference on Intelligent User Interfaces .
Schacter, D. (1996). Searching for Memory , Basicbooks: New York, New York.
Thomas, F. & Johnston, O. (1981). The Illusion of Life: Disney Animation ., Walt Disney Productions,: New York., New York.
Tufte, E. R. (1983). The Visual Display of Quantitative Information..
Tufte, E. R. (1990). Envisioning Information, Visualizing Information , 1997. Graphics Press: Cheshire, Connecticut.
Wright R. (1994). The Moral Animal , Vintage Books: New York, New York.
Wurman, R. S. (1990). Information Anxiety , Doubleday: New York, New York.
* Support for Intellimedia was provided by the Office of the Provost at North Carolina State University, the National Science Foundation, and donations from Apple, IBM, and Novell.