AR paper 4

download AR paper 4

of 13

Transcript of AR paper 4

  • 7/24/2019 AR paper 4

    1/13

    An interaction approach to computer animationq

    Benjamin Walther-Franks , Rainer Malaka 1

    Research Group Digital Media, Universitt Bremen, Fb3, Bibliothekstr. 1, 28359 Bremen, Germany

    a r t i c l e i n f o

    Article history:

    Received 31 March 2014

    Revised 15 July 2014

    Accepted 19 August 2014

    Available online 2 September 2014

    Keywords:

    Motion design interfaces

    Performance animation

    Humancomputer interaction

    Design space

    a b s t r a c t

    Design of and research on animation interfaces rarely uses methods and theory of humancomputer-

    interaction (HCI). Graphical motion design interfaces are based on dated interaction paradigms, and novel

    procedures for capturing, processing and mapping motion are preoccupied with aspects of modeling andcomputation. Yet research in HCI has come far in understanding human cognition and motor skills and

    how to apply this understanding to interaction design. We propose an HCI perspective on computer ani-

    mation that relates the state-of-the-art in motion design interfaces to the concepts and terminology of

    this field. The main contribution is a design space of animation interfaces. This conceptual framework

    aids relating strengths and weaknesses of established animation methods and techniques. We demon-

    strate how this interaction-centric approach can be put into practice in the development of a multi-touch

    animation system.

    2014 Elsevier B.V. All rights reserved.

    1. Introduction

    Moving images are omnipresent in cinema, television, com-

    puter games and online entertainment. Digital media such as text,

    images and film are nowadays produced by a diverse crowd of

    authors, ranging from beginners and laymen to professionals. Yet

    animation is still seen by most people as a highly sophisticated

    process that only experts can master, using complex interfaces

    and expensive equipment. However, consumer motion capture

    technology has recently enabled and created a mass-market for

    easy-to-use animation tools: computer games. In contrast to most

    professional animation tools, recent games employ full-body inter-

    action for instance via Kinect, allowing users to control a virtual

    character instantaneously through their body. This trend is feeding

    back into the area of the experts, with researchers investigating

    time-efficient interfaces for computer puppetry using the Kinect

    (e.g. [61,55]. Computer animation is currently seeing an influx of

    ideas coming from the world of easy-to-use game interface made

    for players with no prior training. Game designers in turn are

    informed by design knowledge and methods developed over

    decades of research in humancomputer interaction (HCI).

    It is thus time that computer animation be approached from an

    HCI perspective. This could aid describing and analyzing the vast

    spectrum of animation techniques ranging fromvery intuitive pup-

    petry interfaces for computer games to highly sophisticated con-

    trol in advanced animation tools. Our goal is to understand

    principles that underlie humanmachine interactions in computer

    animation. With new ways of thinking about interactions with

    continuous visual media and a thorough investigation of new ani-

    mation interfaces on a theoretical foundation, motion design inter-

    faces can be made more beginner and expert friendly.

    This can be achieved by embedding computer animation meth-

    ods and interfaces in an HCI context. Trends in motion design

    interfaces can be connected with discussions on next generation

    interfaces in HCI. Theoretical frameworks can aid us in tackling

    the concrete user interface issues by a profound analysis, which

    can aid the process of designing new mechanisms for more natural

    and intuitive means of motion creation and editing.

    This article approaches this goal in three main steps. We will

    first reviewrelated work fromcomputer graphics, human computer

    interaction and entertainment computing from a user- and inter-

    face-centric perspective with a focus on methods, mappings and

    metaphors. In the second step we construct a design spacefor inter-

    faces that deal with spatiotemporal media. In the third step, the

    utility of this conceptual framework is illustrated by applying it in

    designing a multi-touch interactive animation system.

    2. Animation techniques: an interaction view

    Computer-based frame animation is the direct successor of tra-

    ditional hand-drawn animation, and still the main method.

    Advances in sensing hardware and processing power have brought

    http://dx.doi.org/10.1016/j.entcom.2014.08.007

    1875-9521/2014 Elsevier B.V. All rights reserved.

    q This paper has been recommended for acceptance by Andrea Sanna. Corresponding author. Tel.: +49 421 218 64414.

    E-mail addresses: [email protected](B. Walther-Franks),[email protected](R. Malaka).1 Tel.:+49 421 218 64402.

    Entertainment Computing 5 (2014) 271283

    Contents lists available at ScienceDirect

    Entertainment Computing

    j o u r n a l h o m e p a g e : e e s . e l s e v i e r . c o m / e n t c o m

    http://dx.doi.org/10.1016/j.entcom.2014.08.007mailto:[email protected]:[email protected]://dx.doi.org/10.1016/j.entcom.2014.08.007http://www.sciencedirect.com/science/journal/18759521http://ees.elsevier.com/entcomhttp://ees.elsevier.com/entcomhttp://www.sciencedirect.com/science/journal/18759521http://dx.doi.org/10.1016/j.entcom.2014.08.007mailto:[email protected]:[email protected]://dx.doi.org/10.1016/j.entcom.2014.08.007http://-/?-http://-/?-http://-/?-http://-/?-http://crossmark.crossref.org/dialog/?doi=10.1016/j.entcom.2014.08.007&domain=pdfhttp://-/?-
  • 7/24/2019 AR paper 4

    2/13

    entirely new possibilities. Motion capture records the live perfor-

    mance of actors, introducing a new form of animation more akin

    to puppetry than traditional animation. Programmed animation

    enables realistic simulations to provide interesting secondary

    motion and create more believable worlds.

    Traditionally, in computer-based keyframe animation, only

    extreme poses or key frames need to be manually established by

    the animator. Each keyframe is edited using manipulation tools,

    which can be specialized for the target domain, e.g. character

    poses. Some manipulation tools allow influencing dynamics

    directly in the scene view. The most common means of specifying

    dynamics is by using global descriptions, such as time plots or

    motion paths. Spatial editing between keyframes can be achieved

    indirectly by editing interpolation functions or by defining a new

    key pose.

    Motion timing is usually done via global descriptions of dynam-

    ics. However, some temporal control techniques directly operate

    on the target. Snibbe[58]suggests timing techniques that do not

    require time plots but can be administered by directly manipulat-

    ing the target or its motion path in the scene view. As with spatial

    editing, the practicality of temporal editing with displacement

    functions depends heavily on the underlying keyframe distribu-

    tion. Timing by direct manipulation in the scene view is also sup-

    ported by the latest animation software packages. Tweaking

    motion trail handles allows for temporal instead of spatial transla-

    tion; visual feedback can be given by changing frame numbers

    adjacent to the handle. Spatial control of time has also been pro-

    posed for video navigation[15].

    Motion graphs are two-dimensional plots that map transforma-

    tion values (vertical axis) against time (horizontal axis). With a

    2DOF input device, such a graph thus allows integrated, simulta-

    neous spatiotemporal control. In keyframe animation the motion

    editor is the standard way to manage keyframe value interpolation,

    typically by means of Bezier curve handles.

    In contrast to keyframe animation, performance animation uses

    motion capturing of live performance of an actor or puppeteer by

    tracking a number of key points in space over time and combiningthem to obtain a representation of the performance. The recorded

    data then drives the motion of a digital character. The entire proce-

    dure of applying motion capture data to drive an animation is

    referred to as performance animation [44]. In a typical setup, an

    actors motion is first recorded, then the data is cleaned, processed

    and applied to a digital character. Since the digital character can

    have quite different proportions than the performer, retargeting

    the motion data is a non-trivial task [24]. In this form of perfor-

    mance animation, capture and application of motion data to an

    animation are two separate processes, data handling is done off-

    line. Online performance animation immediately applies captured

    data to a digital character, creating animation instantly, allowing

    the performer to react immediately to the results or to interact

    with an audience[59,24]. Processing limitations sometimes entailthat performers can often only see a low-fidelity pre-visualization

    of the final rendering[44].

    Many performance animation efforts aim to represent human

    motion accurately and limit the abstraction to a minimum and

    the motion capture performers use only the senses with which

    they have learned to act (e.g. kinaesthetic and proprioceptive feed-

    back). For performance animation of stylized or non-humanoid

    characters it is desirable to control them in a less literal fashion.

    Such a style of performance control is often referred to as computer

    or digital puppetry [3,59]. Just as traditional puppeteers would rely

    on mirrors or camera feeds to adjust their performance, computer

    puppetry requires instant renderings of the applied input to allow

    performers to adjust their motions. Real-time mappings either use

    high bandwidth devices for coordinated control of all characterDOF, or employ models based on example data or a physical

    simulation. One challenge is to control a high number of degrees

    of freedom (DOF) at the same time.

    Real-time control of humanoid characters suggest literal map-

    pings from the puppeteers physique to the characters skeleton.

    Non-humanoid characters such as animals, monsters or animate

    objects are difficult since they have a vastly different morphology

    and motion style to humans. Seol et al. [55]address this by learn-

    ing mappings through users mimicking creature motion during a

    design phase. These learnt mappings can then be used and com-

    bined during online puppetry. In similar work, Yamane et al. [66]

    propose matching human motion data to non-humanoid charac-

    ters with a statistical model created on the basis of a small set

    manually selected and created human-character pose pairs; how-

    ever, this process is conducted offline. The technique for optimal

    mapping of a human input skeleton onto an arbitrary character

    skeleton proposed by Sanna et al.[67]manages without any man-

    ual examples and finds the best match between the two based

    solely on structural similarities.

    For animation techniques on desktop input devices, however,

    typically less DOF are available. Recently this has been addressed

    by multi-touch input devices, which enable techniques for simulta-

    neous rotation, scaling and translation (RST) for 4DOF control of a

    2D target[26]. Reisman et al. [52]developed a technique for inte-

    grated rotation and translation of 3D content using an arbitrary

    amount of contact points on an interactive surface.

    When input devices of lesser DOF than the object parameters

    are used, integrated control is not possible. This is a common prob-

    lem in desktop interaction for navigating and editing 3D media,

    since most desktop input and display devices only have two DOF.

    Interface designers thus often face the problem of mapping two

    control DOF to a higher-dimensional target parameter space. A

    solution is to separate the degrees of control, i.e. splitting object

    DOF into manageable subsets [4]. With single-pointer input

    devices, this necessitates a sequential control of such subsets, e.g.

    through displays of multiple orthographic projections of the scene

    in one split screen or through spatial handles that are overlaid on

    top of the target object. [4].If high-DOF devices are not available and temporal multiplexing

    is not desired, interface designers can choose to constrain the

    interaction to reduce required control DOF. A challenge for design-

    ers is that the model behind the constraint must be understood by

    the user, for instance by basing them on mechanisms already

    known from other contexts.

    Yamane and Nakamura [64] present a pin-and-drag interface

    for posing articulated figures. By pinning down parts of the figure,

    such as the end-effectors (feet or hands) and dragging others, the

    whole character can be controlled with relative ease. Joint motion

    ranges, the current joint configuration and the user-set joint con-

    straints (pins) thus allow constrained control of several character

    DOF with as few as two position input DOF for a 2D character.

    The various constraints are prioritized so that dragging constraintsare always fulfilled and solved by differential kinematics that give

    a linear relationship between the constraints and the joint

    velocities.

    Several research projects have attempted to leave the world of

    explicit mappings and enable low-to-high-dimensional control,

    bimanual interaction and multi-user interaction implicitly by sim-

    ulating real-world physics. Frohlich et al. [20]let users kinemati-

    cally control intermediate objects that are attached to target

    objects by springs. The spring attachment is also used by Agrawala

    and Balakrishnan[1]to enable interaction with a physically simu-

    lated virtual desktop, the Bumptop.

    Limitations in the motion capture system or the performers

    physiology to produce certain desired motions can be overcome

    by simulating parts of the body and their interaction with the envi-ronment. Ishigaki et al. [31] combine real-time full-body motion

    272 B. Walther-Franks, R. Malaka/ Entertainment Computing 5 (2014) 271283

  • 7/24/2019 AR paper 4

    3/13

    capture data, physical simulation and a set of motion examples to

    create character movement that a user cannot easily perform, such

    as climbing or swimming. The virtual environment contains prede-

    fined interaction points such as the handles of a monkey bar or a

    rope. Once the characters end-effectors are brought into proximity

    of an interaction point, control changes so that the character

    motion is no longer fully controlled by the motion capture. A sim-

    plified simulation that treats the intentional contact as a universal

    joint connected to the characters centre of mass by a linear damp-

    ened spring enables the calculation of the overall dynamics of the

    character.

    Even when input and output degrees of freedom match, physi-

    cal interdependencies of input DOF can still limit a mapping. In full

    body tracking, the joint locations are dependent on actor size and

    body proportions. If the performers proportions significantly differ

    from character proportions, this can lead to problems with the

    character interacting with objects in the scene, such as the floor

    or props. For this problem of retargeting of motion capture data

    to a new character, Shin et al.[57]propose an approach that maps

    input based on a few simple heuristics, e.g., considering the dis-

    tance of end-effectors to an object in the scene.

    For live performances, control needs to be addressed with high-

    bandwidth input devices or performers acting in parallel. With

    recorded performances the puppeteer has more options. Capture

    sequences or just parts of them can be retaken, or slightly modi-

    fied, and complex motion can be built up in passes. Layered or

    multi-track animation allows the performer to concentrate on only

    a small amount of action at a time and create articulated motions

    step by step. Oore et al. [50]employ layered motion recording for

    controlling subsets of a characters DOF. For the animation of a

    humanoid, they divide the character DOF into three parts and ani-

    mate these sequentially: Two 6DOF devices are used to control the

    motion of both legs, both arms, and torso and head in three passes.

    Dontcheva et al. [13] make motion layering to the principle of their

    live animation system.

    Video games have a strong connection to animation. Most mod-

    ern video games make heavy use of animation in order to breathelife into the game world. In this sense, games are one application

    area amongst many others, such as film, television, or education.

    But animation is also created with and in video games. The actions

    taken by players, the responses of game elements constitute a form

    of motion design, often conveying a certain story. This is most evi-

    dent in game genres where players control characters in a virtual

    world, like a puppeteer controls a puppet. However, animating

    for video games differs significantly to animating for film or televi-

    sion. While in film characters and objects are only viewed from a

    specific camera angle, in interactive media such as video games,

    the behavior and the view are spontaneously defined by the player.

    The animator cannot foresee the decisions of the player, which is

    why he must create animations for all possible player actions that

    must meet certain criteria of completeness and realism. Suchmotion libraries contain elementary animation sequences can then

    be looped, blended and combined in real-time by the game engine

    [37]. By interactively directing pre-defined animations, players

    thus essentially perform a kind of digital puppetry with indirect

    control.

    Motion control through high-DOF input devices extends the

    degree of control, further blurring the lines between gaming and

    puppetry: as players are able to influence more character DOF,

    their possibilities for expression are increased. However, while all

    games use some form of motion capture, few offer motion editing

    required in animation practice: if a player is not satisfied with his

    performance, he will have to do it again. Most games lack tech-

    niques for even the basic task of time control, with notable excep-

    tions such as Prince of Persia: Sands of Time [60], Zeit2 [5] andBraid [49], in which the player must navigate time as well as space.

    Yet while these games incorporate time control in innovative ways,

    they do not provide the degree of control and editing required for

    professional animation.

    In Machinima, the art of 3D game-based filmmaking, animation

    and video games ultimately come together to form a novel means

    of creating animated movies[37]. Using game engines for anima-

    tion or virtual filming has benefits as well as limitations. Modern

    3D games provide a complete game world with physics, animated

    models, and special effects while offering comparatively simple

    controls for puppeteering game characters. This gives authors a

    lot to build upon, as opposed to other methods where animations

    must be created from scratch. The limitations lie in the depen-

    dency on the game developer with their short product cycles, their

    game engine and assets, and the legal issues involved in using

    these. Computer puppetry in games remains limited, as is any per-

    formance control interface that merely activates and blends pre-

    defined animations.

    Viewing the state-of-the-art in animation with a coherent focus

    on the user, mappings and control DOF is a first step in analyzing

    the current generation and developing for the next generation of

    interfaces. The next step is to further structure this treatment: a

    theoretical framework identifies explicit aspects of interaction in

    computer animation tools.

    3. A design space for computer animation interfaces

    Even though there is an increasing trend in computer graphics

    research to consider the needs of the artist (e.g.[51,54], most work

    on animation interfaces does not consider aspects of HCI. An inter-

    action perspective on computer animation can help to construct a

    design space of user interfaces for spatiotemporal media. Such a

    design space can structure the designers options and aid research-

    ers in analyzing the state of the art.

    Existing interface design frameworks cannot be readily used for

    animation interfaces, as they are either too general or too specific.

    General frameworks[21,48]span too large a space or only analyzecertain aspects of interaction like input devices but not their map-

    ping to output [9], while domain-specific frameworks [8,14] are

    too focused.

    Jacob et al. [33]present a framework for reality-based interac-

    tion (RBI) that includes four themes: Naive physics (NP) reflects

    the innate human understanding of basic concepts of physics, such

    as gravity, forces and friction; body awareness and skills (BAS)

    describes our sense of our own body and what we can do with

    it;environment awareness and skills(EAS) covers how humans per-

    ceive and mentally model their environment and place themselves

    in relation to it;social awareness and skills(SAS) stands for humans

    as social animals, who generate meaning by relating to other

    human beings. Considering the four RBI themes for computer ani-

    mation, many techniques aim to tap the artists innate understand-ing of spacetime processes, relating to the theme of naive physics

    (NP). The environment awareness and skills theme (EAS) comes

    into play as soon as humans interact with these real world

    spacetime processes. For instance, multi-finger deformation tech-

    niques for 2D puppetry on interactive surfaces [45] rely onour nat-

    ural sense of timing and real-world experience with objects (NP,

    EAS). In fact any technique based on motion capture for defining

    dynamics relies on users intuitive sense of space and time (NP,

    EAS). Performance controls for digital puppetry use the performers

    understanding of their body (BAS). As Kipp and Nguyen[39]illus-

    trate, a puppeteer uses complex coordinated hand movements to

    bring a wooden puppet to life. Even the technique for low-fidelity

    input via mouse and keyboardof Laszlo et al. [40] exploits both an

    animators motor learning skills and their ability to reason aboutmotion planning (BAS). Collaboration in computer animation is

    B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283 273

  • 7/24/2019 AR paper 4

    4/13

    common, as large productions require teams to work together, but

    does not usually involve close coordination during a single task.

    Multi-user puppetry interfaces are different in that they tap the

    ability of humans to relate to other human beings (SAS). The four

    qualities must be traded off against other desirable qualities, such

    as expressive power, efficiency, versatility, ergonomics, accessibil-

    ity and practicality, on a case-to-case basis.

    While these themes are relevant for designing any kind of novel

    interactive systems aiming at reality-based interaction, they are

    rather general. For a conceptual framework specific to animation

    it is thus necessary to define a new design space. In the following

    we discuss the aspects we have identified in our work as relevant

    to such a framework. We motivate their inclusion and relate them

    to each other. We will also relate our framework to the RBI

    framework.

    3.1. Aspects of design

    Analogous to general models of humancomputer interaction,

    computer animation involves a dialog between a human artist

    (animator, actor or puppeteer) and the application, a virtual artifact

    (the animation). This occurs through a hardwaresoftware machine(the animation software and the hardware running it, including

    input and output periphery). A design framework should consider

    aspects of these entities and their relations. Fig. 1shows this basic

    triangular structure that describes two views of this human-arti-

    fact dialog, one that takes the machine as a mediator into account

    (left and lower edge: artist-machine-artifact) and one that

    abstracts from it (right edge: artist-artifact). Seven aspects charac-

    terize these entities and their relations: task, integration, corre-

    spondence, metaphor, directness, orchestration and spacetime.

    In the following we will discuss these seven design aspects and

    their relevance for HCI and animation tools.

    Animation tools for productive use are designed around the

    taskfor which they are intended. Decomposition breaks down tasks

    into further subtasks, which can be, in turn, repeatedly brokendown until one arrives at basic tasks at the desired level of decom-

    position which is frequently used to structure interaction tech-

    niques [17,4,29]. At the top level, the main tasks in animation

    design are motion creation (generating from scratch), motion edit-

    ing (adapting an existing design) and viewing (for visual feedback

    on spatial and temporal design). At a lower level, task decomposi-

    tion structure varies highly with the type of animation artifact, i.e.

    character animation or environment effects. Tool generality [53]or

    versatility [33] characterizes the variety of interaction tasks that

    can be performed with an interface. This can range from support-

    ing a large amount of tasks from varied application domains to

    only supporting a single, domain-specific task. Tasks are the goal

    of interaction and aim at creating the animation. Therefore, our

    design space links the aspects of tasks to the virtual artifact (Fig. 1).

    An input device defines theintegrationof control how many

    DOF can be changed at the same time from the same input source

    [2]. Performance controls are traditionally very specialized, e.g.

    using full-body motion capture suits or special hand-puppet input

    devices[59,34]. Yet research has also brought forward more gen-

    eral controls, such as the 2D multi-point deformation technique

    of Igarashi et al. [30]. Since computer animation often involves

    domain objects with large amounts of degrees of freedom (even

    a simple 3D articulated biped will have around 30 DOF), special-

    ized high-DOF input devices allow for a high level of integration.

    Ideally the input device should match the structure of the task

    Jacob et al. [32]. In most situations the DOF of the input device

    are not sufficient and solutions like artificial separation or con-

    straining mappings based on a certain model have to be found. If

    other considerations lead to using lower-DOF input devices, tasks

    should be adapted accordingly, e.g. by separating translation and

    orientation [43]. The aspect of integration is mostly construed from

    the set-up of the input device. We thus locate the aspect of integra-

    tion next to the machine in the design space (Fig. 1).

    Correspondence describes how the morphology of the physical

    input through the input device and the resulting response of the

    artifact relate[29]. Bodenheimer et al.[3]distinguish performance

    animation controls by the degree of abstraction in the sense of cor-

    respondence. At the one end of the spectrum, mappings are pri-

    marily concerned with the character or style of the motion rather

    than literal mappings between performer and target. Such map-

    pings are more commonly used in computer puppetry. At the other

    end of the spectrum are efforts to accurately represent motion that

    strive to limit the degree of abstraction to a minimum. A high spa-

    tial correspondence between input and output requires less mental

    effort since it draws on our experience in using our own body andencountering real-world objects (BAS, EAS). UI designers must face

    the tradeoffs between better learnability through high correspon-

    dence and the range of motions that can be expressed. The aspect

    of correspondence bridges the virtual artifact and the machine

    characteristics (machine-artifact edge inFig. 1).

    Themetaphoris a notion for describing the mapping of cogni-

    tive intentions to physical device interaction using concepts

    known from other domains [47,4]. In the conversation metaphor

    the user engages in a written or spoken dialogue with the machine.

    They are well suited for high-level operations, but less suited for

    spatial precision and expression. Today graphical user interfaces

    represent the dominating manipulation metaphor, where the user

    acts upon a virtual worldrather than using language as an interme-

    diary. Manipulation interfaces tap our naive understanding of thelaws of physics (NP), our motor memories (BAS) and how we per-

    ceive and interact with our surroundings (EAS). Manipulation

    using instruments requires more learning and mental resources,

    as well as introducing indirection [65,22]. Sensors tracking the

    users body promote an embodiment metaphor where the user

    identifies with parts of a virtual world in a more literal way. For

    avatar control, embodied interaction builds on our proprioceptive

    and kinaesthetic senses (BAS), and can aid our feeling of presence

    in virtual environments (EAS). Embodiment has been picked up in

    current trends in computer animation that criticizes the complex

    and abstract nature of motion design tools based on the WIMP par-

    adigm. Since the aspect of metaphors is central to the artists cog-

    nitive understanding of his or her activity our design space links it

    to the artist inFig. 1.

    Fig. 1. The design space of animation interfaces characterizes the entities involved

    in the interaction and their relations.

    274 B. Walther-Franks, R. Malaka/ Entertainment Computing 5 (2014) 271283

  • 7/24/2019 AR paper 4

    5/13

    Directness characterizes the physical distance between user

    and the target. This includes both the spatial and the temporal

    offset from input to output[2]. In our understanding of directness

    we consider the relation betweenuser (artist) and the physical rep-

    resentation of the animation through the machine (as illustrated

    on the triangular design space inFig. 1). Cognitive facets of direct-

    ness have also been considered in other definitions [22,65], but

    these can be covered in interaction metaphors.

    Since computer animation interfaces deal with continuous or

    time-based media with multiple spatial and one temporal dimen-

    sion, interfaces need to support viewing and modeling not only of

    static spaces but of their dynamics as well. As humans inhabit a

    spacetime continuum, and all our actions always have a temporal

    dimension, any kind of interaction between a human and a com-

    puter to create, edit or view dynamic content relates the humans

    spacetime to the mediums spacetime. User time is generally

    referred to as real time, which is continuous, the data time as

    virtual or stream time, which is discrete [42,12]. Depending on

    animation method and technique, the real time of user input can

    affect the virtual time or not. Or only either spatial or temporal

    parameters of the animation are changed. This suggests that

    there are different ways in which real spacetime can be mapped

    to virtual spacetime. So far the literature lacks a structured

    approach to characterizing the relations of user and artifact space

    and time. We will therefore propose a taxonomy in the next sec-

    tion, that sorts interaction techniques based on which components

    of real and virtual spacetime are involved. This spacetime aspect

    abstracts the relation of user and application from the device level,

    which is why it is located on the artist-artifact edge of our design

    space diagram (Fig. 1).

    As a central element of our design space, Orchestration

    describes in which order which parts of the users body perform

    which sub-task through which input device. Since humans are

    most adept at crafting with their hands, and for long time

    humancomputer interfaces were optimized for manual control,

    orchestration has been best studied for hand-based interaction.

    Findings from behavioral psychology show that the dominantand non-dominant hands are optimized for distinct roles in most

    tasks. For instance, in the task of writing the non-dominant hand

    first establishes a reference frame relative to which the dominant

    hand then operates. Using this knowledge in devising bimanual

    interaction techniques can have benefits for efficiency [6], Hinckley

    et al. [68], Balakrishnan and Kurtenbach [69]) and cognition, by

    changing how users think about a task [35,41]. Many every-day

    activities also show complex orchestrations of more than just the

    hands, such as driving a car where feet control speed, hands the

    steering, and fingers additional controls such as lights. Since

    orchestration considers human, application and the mediating

    device to an equal degree, it is situated at the center of the triangle

    relation diagram representing the design space (Fig. 2).

    3.2. Spacetime: a new design aspect

    The concept of spacetime control mappings considers any nav-

    igation, creation or editing operation on a continuous visual med-

    ium as a mapping from real spacetime of the input device (the

    control dimensions) to virtual spacetime of the presentation med-

    ium (the presentation dimensions). The output mediums presen-

    tation dimensions can be viewed and edited integrally or

    separately regarding space and time. For instance, while frame-

    based animation edits poses and the time instants at which they

    occur separately, performance-based or procedural approaches

    usually define motion in an integrated fashion. Both real space

    and time can control either or both virtual space and time. A first

    step in structuring these relations is to collapse the individualspatial dimensions to a single abstract space dimension, so that

    we need only consider the two dimensions space, time on user

    and medium side. The next step is to consider how these two

    abstract input dimensions (control) affect the output dimensions

    (presentation). The central idea underlying the construction of cat-

    egories is that one or both control dimensions can affect one or

    both presentation dimensions.

    Four basic spacetime categories of mappings can be con-

    structed from the possible combinations of the two sets (control

    space, control time) and (presentation space, presentation time):

    space? space

    space? time

    time? space

    time? time

    Often presentation space and time will be modified in an inte-

    grated fashion, or spatial and temporal control will both figure into

    the inputoutput relation. For this we introduce two control-inte-grated spacetime categories that cover inputoutput mappings in

    which both control dimensions contribute to the relation

    spacetime? space (i.e., space? space and time? space)

    spacetime? time (i.e., space? time and time? time)

    and two presentation-integrated spacetime categories in which

    both presentation dimensions are affected by the interaction:

    space? spacetime (space? space and space? time)

    time? spacetime (time? space, time? time)

    The final cases are the fully integrated spacetime categories

    spacetime? spacetime (space? space and time? time)

    spacetime? timespace (space? time and time? space)

    which reflect that integrated control dimensions affecting presenta-

    tion domains in an integrated way can be matched in two ways.

    These ten spacetime categories cover all variants of mapping user

    spacetime to medium spacetime. A simple means of visualizing

    this is a 3 3 matrix, where the central cell is compartmented into

    two, since relating both control and presentation space and time is

    ambiguous (Fig. 2).

    The first row of the matrix describes control mappings that only

    look at the spatial component of the input and do not consider the

    timing of the users input. The third row describes control

    mappings where input has no spatial component, and the user only

    administers state changes with temporal triggers via controls

    such as buttons. The second row describes control mappings wherespatial input stands in a temporal context. There are borderline

    Fig. 2. The taxonomy of spacetime mappings is structured based on how user

    input in real spacetime controls medium output in virtual spacetime.Fig. 3gives

    examples of these categories.

    B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283 275

    http://-/?-
  • 7/24/2019 AR paper 4

    6/13

    cases between temporal and spatiotemporal control: If trigger

    controls exert spatial changes (such as move a step in a certain

    direction), we speak of spatial control.

    While some mappings can be easily sorted into these categories,

    for others it may appear less clear. In the following we consider

    each category individually and show that it is possible to find

    examples of actual interfaces for all of them (see alsoFig. 3).

    Controls in thespace

    ?

    spacecategory use the spatial compo-

    nent of user actions to affect the spatial dimensions of the medium.

    Most kinds of interactive editing techniques in computer-aided

    design fall into this category. A straightforward one-to-one

    mapping of viewer time to medium time (time? time) is video

    playback. Examples ofspace? timemappings are timelines that

    employ a linear spatial representation of time for navigating or

    alteringtime-dependent media. Software packages for frame-based

    animation make heavy use of linear time plots for temporal naviga-

    tionand timing transformations. Lesscommon are examplesfor the

    time? space category. Passive navigation techniques for virtual

    environments make use of such mappings [4]. After choosing a

    target or route either automatically or with the user in the loop,

    the systemnavigates the user along the route or to the target, map-

    ping user time to medium space. Editing operations are rare in this

    category, since the single input DOF is insufficient for most editing

    tasks.

    In mapping input spacetime for manipulating space only, the

    redundant DOFs can be used either for enhanced robustness or

    for controlling further parameters. For editing a static image, the

    temporal component of the user input can, for instance, be used

    to control the stroke type of the virtual brush (spacetime?

    space). Velocity-based spatial navigation techniques include input

    space and time in the traversal of virtual space. The presentation

    time can also be steered: interactive continuous adjustment of

    playback speed (e.g. via a slider or wheel) changes video or anima-

    tion playback during playback spatiotemporal input affects the

    viewing of medium time (spacetime? time).

    The category space? spacetime can be found in time plots

    that are a common means of graphically representing a variable

    changing over time. Animation packages usually feature a graph

    editor that enables integrated shifting of key positions and the val-

    ues they represent in time and one (spatial) dimension. Three-

    dimensional representations of a video stream, video streamers,

    even allow spacetime video editing [56]. The mapping category

    time?

    spacetime is realized in automated navigation through

    a dynamic medium: scripted camera movement through animated

    scenes navigates both the time and the space of the target medium.

    It is often used for cut-scenes in video games, so-called cinematics,

    when interactive control is taken from the player for a short time

    in favor of progressing the narrative with pre-defined camera

    movement. This is different from video playback, where the spatial

    component of the medium (the video frame) is not navigated dur-

    ing playback. While the result is essentially the same, this distinc-

    tion is down to the fundamental difference in the medium data:

    For video, the projection from 3D to 2D is already integrated into

    the visual data (the video frames), while in 3D the projection is

    determined at run-time.

    The spacetime? spacetimemappings can be found in many

    examples of user interfaces for virtual worlds. Spatial actions

    browse or alter the mediums space, and user and medium time

    are linearly related. Such mappings are common for interfaces that

    require high user immersion. Most performance controls for inte-

    grated motion creation also fall into this category, e.g. in interactive

    video games or in performance animation. The remaining inverse

    mapping of users spacetime to virtual timespace do not seem

    to be used for practical implementations. They could, however, be

    related to temporal triggers of a user (such as releasing some event)

    that influences some graphical representation where theusers spa-

    tial input controls temporal parameters of the event.

    The spacetime view of operations on continuous visual media

    give a new perspective on the types of such operations: whether

    they are invasive (editing) or non-invasive (viewing) and whether

    Spatial Manipulation

    Manipulators/Gizmos/Handles

    Posing a character

    Motion Editing

    Graph Editor

    Adjusting ease-in/ease-out

    Time Control

    Timeline Bar

    Browsing a video

    Applications

    Techniques

    Scenario

    Applications

    Techniques

    Scenario

    Applications

    Techniques

    Scenario

    Interactive Travel in Static Virtual Environments

    Steering

    Browsing a 3D information space

    Time Space Time Space-Time

    Playback

    Triggers/Buttons

    Watching a video

    Performance Animation, Video Games

    Computer Puppetry

    Animating a character

    Time Control

    Jog Shuttle

    Browsing a video

    Passive Travel in Static Virtual Environments

    Target-based Navigation/Fly-Throughs

    Exploring architectural models

    Passive Travel in Dynamic Virtual Environments

    Target-based Navigation/Fly-Throughs

    Watching a cut-scene in a 3D video game

    Space Space Space Space-Time Space Time

    Space-Time Space

    Time Time

    Space-Time Space-Time Space-Time Time

    Fig. 3. Nine categories of spacetime mappings with example applications, techniques and scenarios of use. (Figure contains cropped stills of third party material licensed

    under CC BY 3.0. Top left, top right and bottom left images attributed to the Durian Blender Open Movie Project; bottom left image attributed to Frontop Technology Co., Ltd;bottom center image attributed to Apricot Blender Open Game Project).

    276 B. Walther-Franks, R. Malaka/ Entertainment Computing 5 (2014) 271283

  • 7/24/2019 AR paper 4

    7/13

    they involve creating new designs from scratch or refining existing

    designs. Firstly, collapsing all spatial parameters into one abstract

    space dimension hides the fact that, as a rule, both control and

    medium space involve multiple spatial parameters, while time

    only constitutes a single quantity on each side. This has an impact

    on the distribution of invasive versus non-invasive operations in

    the matrix: techniques employing time as input (third row) are

    mainly used for passive navigation, rather than for spatial manip-

    ulation. This is because space offers more input dimensions and we

    can navigate space easier than time. This asymmetry has shaped

    how we mentally model the abstract dimension of time: we rather

    think of time in terms of space than vice versa[10]. Secondly, the

    columns sort mappings intorefinementthrough spatial editing and

    temporal editing (left and right column), and creation through inte-

    grated influence on medium spacetime (center column). Thirdly,

    in many cases the distinction between non-invasive and invasive

    operations is a theoretical one. A fly-through of a 3D scene can

    either be seen as a navigation that does not change the dataset

    or as a camera animation that does. The criteria for distinction

    should come from the application: is the camera animation being

    created a part of the medium or is it an ephemeral product of

    the viewing operation? This distinction has an effect on categoriza-

    tion, too.

    3.3. Limitations

    The aspects characterizing the design space of animation inter-

    faces constitute a high-level framework. As such they provide a

    structure and cues for design reasoning and analysis, rather than

    concrete guidelines. In the following we will illustrate its utility

    by showing how we used the design space in developing novel ani-

    mation techniques. More case studies and examples are required to

    illustrate its application in the multitude of animation-related

    issues.

    The design space does not offer a set of orthogonal dimensions,

    rather its aspects are interrelated. For example, the nature of thetask is linked to the type of spacetime mapping: automation

    cantake control away from the user up to the point that spatiotem-

    poral input (e.g. continuous control of a puppets legs) can be

    reduced to temporal input (e.g. triggering puppet walk cycles with

    a button). Another example of such dependencies is that the choice

    of metaphor determines the magnitude of directness: from indirect

    manipulation over direct manipulation to embodiment. The inter-

    relation between the seven design aspects may be not surprising,

    as each can be seen as a perspective on the same issuedesigning

    user interfaces for controlling spatiotemporal phenomena.

    The design space presented in this section is a conceptual

    framework for analyzing and designing animation interfaces. It

    uses established design aspects identified in the HCI literature.

    For describing relations of input and animation spacetime, which

    are central to this class of interface, we could not rely on any prior

    work. For this aspect we developed a taxonomy for sorting map-

    pings into categories based on how they relate input and output

    spacetime. Next we will show how we have used these design

    aids in practice, both evaluating them as design tools and using

    them to propose novel animation interfaces.

    4. A multi-touch animation system

    In order to illustrate the utility of the design space as an aid for

    designing animation interfaces, we explain howit was employed in

    the development of a novel animation system that we have pre-

    sented in prior work (Walther-Franks et al. [70]). We go beyond

    the original work by explicating the design approach underlyingit. The design space-driven approach was chosen in lieu of the first

    iterations of a human-centered design process. In our experience

    with proposing novel interaction paradigms these stages of an iter-

    ative design approach have the issue that users are unfamiliar with

    the possibilities of novel technologies and are strongly biased by

    existing solutions. The design space can help to guide the first

    phase of design until users can be provided with artifacts to

    experience.

    Even though free-space 3D input devices have recently become

    highly popular in particular in combination with game consoles,

    they still lack the possibility for accurate and precise control

    needed for serious animation editing. Systems like the Kinect are

    good for high-level avatar control, with predefined animations.

    For more accurate editing, these systems are not yet feasible.

    Direct-touch interactive surfaces provide better precision for ani-

    mation tasks, and have the best makings for high directness and

    correspondence of interaction. The potential of interactive surfaces

    has been explored for various applications but only a few consider

    animation [45,39]. Most surface-based 3D manipulation tech-

    niques are not developed and evaluated for motion capture. Fur-

    thermore, most projects only look at individual techniques and

    lack a system perspective. However, this is necessary to shed light

    on real-world problems such as integrating tools into whole work-

    flows or dealing with the realities of software engineering.

    4.1. Design approach

    Going through the design aspects of our framework, we con-

    sider options and make decisions, building up a design approach

    to follow for the implementation.

    4.1.1. Task

    As a typical animation task we decided for performance anima-

    tion of 3D rigid body models. Working with three-dimensional

    content poses the challenge of a discrepancy between input space

    (2D) and output space (3D). In recent years researchers have

    started investigating 3D manipulation on interactive surfaces, from

    shallow depth manipulation[27]to full 6DOF control[28,52]. Theproblemfor surface-based motion capture is to design spatial map-

    pings that allow expressive, direct performance control by taking

    into account the unique characteristics of multi-touch displays.

    Many performance control interfaces are designed to optimally

    suit a specific task, such as walk animation or head motion. This

    means that for each type of task the performer must learn a new

    control mapping. This is somewhat supported by specialized

    devices that afford a certain type of control. For 2DOF input devices

    like the mouse this is transferred to digital affordances like handles

    of a rig. These map more complex changes in character parameters

    to the translation of manipulators. The specialization is designed

    into the rig, equalizing control operations to general translation

    tasks. Since interactive surfaces have a 2DOF integrated input

    structure, we copy this approach for our system.An important secondary task is defining the view on the scene.

    Since direct-touch performance controls are defined by the current

    projection, this puts a high demand on view controls regarding

    flexibility, efficiency and precision. With few exemptions [16,23],

    research on surface-based 3D interaction has not dealt much with

    view control. Yet 3D navigation is essential for editing complex

    scenes in order to acquire multiple perspectives on the target or

    zoom in on details. Some surface-based virtual reality setups use

    implicit scene navigation by tracking user head position and orien-

    tation. However, this limits the range of control. For unconstrained

    access to all camera degrees of freedom a manual approach offers

    the highest degree of control. A common solution is to introduce

    different modes for object transformation and view transformation

    (camera panning, zooming, rotation/orbiting). This is prevalent indesktop 3D interaction, where virtual buttons, mouse buttons or

    B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283 277

    http://-/?-http://-/?-
  • 7/24/2019 AR paper 4

    8/13

    modifier keys change between object and view transformations.

    While zooming and panning cover the cameras three translational

    DOF, the third rotational DOF, camera roll, is less essential since the

    camera up vector usually stays orthogonal to a scene ground plane.

    While in desktop environments this DOF separation is mainly

    owed to low-DOF input devices it can also be employed on devices

    that allow more integrated transformation techniques, in order to

    allow more precise control [46]. We opt for separated control of

    camera parameters to enable precise view adjustments.

    4.1.2. Integration

    Multi-touch interactive surfaces provide two control DOF per

    contact. The combination of multiple points can be used to create

    integrated controls for 2D and 3D rotation and translation. Yet

    Martinet et al. [43] point out that multi-touch-based surface

    interaction cannot truly support integrated 6DOF control. They

    propose the depth-separated screen-space (DS3) technique which

    allows translation separate from orientation. Like the Sticky Tools

    technique of Hancock et al.[28], the number of fingers and where

    they touch the target (direct) or not (indirect) determines the

    control mode. Full 3D control can also be achieved by additive

    motion layering: changing the control-display mapping (e.g. by

    navigating the view) between takes allows control of further

    target DOF.

    Other important factors for efficiency are easy switching

    between capture and view operations and dedicating hands to

    tasks. This requires that a single hand be able to activate different

    input modes with as little effort as possible. Widgets as an obvious

    solution produce clutter and interfere with performance controls

    that already require visual handles. Modal distinction by on- or

    off-target hit testing can be problematic if the target has unusual

    shape or dimensions. In order to separate between capture and

    view control, we employ multi-finger chording in which the num-

    ber of fingers switch between modes.

    4.1.3. Correspondence

    Interactive surfaces promote motor and perceptual correspon-dence between input and output. However, this correspondence

    is difficult to maintain when planar input space and higher-dimen-

    sional parameter space have to be matched. For a start, users only

    interact with two-dimensional projections of three-dimensional

    data. For instance, to translate a handle in the screen z-dimension,

    one cannot perform the equivalent motion with standard sensing

    hardware. The problem with the third dimension on interactive

    surfaces is that barring above-the-surface input, manipulations in

    the screen z dimension cannot maintain this correspondence, since

    input motions can only occur in a plane. Following the integrality

    of touch input, this means that the 2 input DOF need to be mapped

    to 2 translation parameters of the target (e.g. the handle of a char-

    acter rig) so that they follow the same trajectory.

    4.1.4. Metaphor

    The congruent input and output space of direct input devices

    promotes a manipulation style of interaction. Most manipulation

    techniques for interactive surfaces are kinematic mappings, where

    individual surface contacts exert a pseudo friction force by sticking

    to objects or pinning them down. As an alternative to kinematic

    control, Cao et al.[7]and Wilson et al. [63] propose surface-based

    manipulation through virtual forces. This offers a more compre-

    hensive and realistic simulation of physical forces and is also used

    in desktop-based and immersive virtual environments. Different

    metaphors in the same system can enhance the distinction

    between controls that otherwise have much in common. For

    instance, in the example of desktop 3D interaction, editing usually

    employs the direct or instrumented interaction metaphors, whileview controls bear more resemblance to steering. This could also

    support the mental distinction between phenomenologically simi-

    lar spatial editing and navigation operations on interactive

    surfaces.

    Manipulation is the most general metaphor for puppet control.

    Through manipulation the puppeteer can flexibly create and

    release mappings with a drag-and-drop style of interaction, direct-

    ness minimizes mediation between user and target domain. For

    complex transformations, as is often necessary in character anima-

    tion, rigs should be designed so that handles promote as direct a

    manipulation as possiblemeaning that handles should be co-

    located with the features they influence and the handle-feature

    mapping designed to support maximal correspondence. Regarding

    kinematic versus physics-based manipulation mappings, realism

    and emergent control styles stand against precision, predictability

    and reliability. In animation, full control has a higher priority than

    realism, which is why we opt for purely kinematic controls.

    4.1.5. Directness

    Interactive surfaces can reduce the distance between the user

    and the target to a minimum. However, touch input also has poten-

    tial disadvantages such as imprecision (when mapping the finger

    contact area to a single point) and occlusion of on-screen content

    through the users fingers, hands and arms [62]. Re-introducing

    indirection can alleviate the occlusion problem. Since absolute

    input techniques require to reach every part of the screen which

    may become difficult when the display exceeds a certain size, lim-

    iting the area of interaction to a part of the screen or indirection

    mechanisms can help [18]. The spatial distance between input

    and target can also be used as a parameter for interaction design.

    For instance, fingers or pens touching the target can control differ-

    ent DOF than off-target contacts (mode change). Layered motion

    recording can involve manipulating moving targets after the initial

    capture pass. Relative mapping applies transformation relative to

    the initial input state. This allows arbitrary input location, and

    clutching can increase the comfort of use. Both absolute and rela-

    tive input can be applied locally and globally, which makes a sig-

    nificant difference when controlling behavior of a feature thatinherits motion from its parents. Local mapping allows the user

    to ignore motion of parent features and concentrate on local trans-

    formations. By default, performance control of a feature overwrites

    any previous recordings made for it. In this way, performers can

    practice and test a motion until they get it right. They might how-

    ever want to keep aspects of an original recording and change oth-

    ers. Blending a performance with a previous recording expands the

    possibilities for control. It allows performance-based editing of

    existing animations.

    4.1.6. Orchestration

    Studies by Forlines et al. [19]and Kin et al. [38]demonstrated

    that the benefits of two-handed (symmetric) input also transfer

    to interactive surfaces for basic selection and dragging tasks. Thedifficulty is to get users to use both hands, since single-handed

    controls in typical UIs can prime them. To maximize the options,

    our system should allow one-handed as well as symmetrical and

    asymmetrical bimanual input. The 2D capture approach implicates

    that no single spatial manipulation requires more than a single

    hand. Consequentially, two single-handed operations can easily

    be combined to enable parallel operation, for instance one hand

    per character limb, allowing emergent asymmetric and symmetric

    control (cf.[11]).

    If individual sets of camera parameters are controlled with a

    single hand, this allows emergent styles of interaction. Combining

    two different camera operations, one with each hand, allows

    asymmetric view control. For instance, left hand panning and right

    hand zooming can be combined to simultaneous 3DOF view con-trol. A combination of left-handed view control with right-handed

    278 B. Walther-Franks, R. Malaka/ Entertainment Computing 5 (2014) 271283

  • 7/24/2019 AR paper 4

    9/13

    performance control even enables interaction styles that follow

    principles of asymmetric bimanual behavior [25]: the left hand

    can operate the view, which will be at a lower spatial and temporal

    frequency and with precedence to the right hand, which acts in the

    reference frame provided by the left. This approach can be used to

    simplify view attaching for editing in dynamic reference frames:

    attaching the camera to the current reference frame for all camera

    operations provides the benefits of kinaesthetic reference frames

    and solves the issue of direct control with dynamic targets.

    4.1.7. Spacetime

    Direct-touchspatial editingis almostexclusivelyevaluated in the

    scope of basic object editingin static environments(space? space).

    Non-spatial trigger input by tapping the screen (time? time) is

    commonly employed for discrete navigation of image sequences

    or videos, e.g. TV sports presenters reviewing video recordings of a

    game. With the exception of Moscovich et al. [45] and Kipp and

    Nguyen [39], the potential of direct touch for motion capture

    (spacetime? spacetime) has received little attention in prior

    research. Surface-specific techniques thus seem mainly aligned

    along symmetric spacetime categories. The absence of passive,

    time-based mappings or graphical depictions of time might be just

    because the coupling of input and output so strongly affords direct,

    continuous manipulation as opposed to tool use or automation.

    While it is still pure conjecture, it is possible that direct-touch

    promotes symmetric spacetime mappings which couple user and

    medium space and time more literally, while indirect input might

    be better suited for more mediated spacetime controls.

    4.2. Prototype system

    We implemented the design approach in a working prototype of

    a multi-touch animation system (Walther-Franks et al. [70]). We

    decided to build upon the existing 3D modelling and animation

    software Blender. The animation system is built around a core of

    performance controls. View controls and a time control interface

    complete the basic functionality. Each control can be operated witha single hand. This allows the user to freely combine two opera-

    tions, e.g. capturing the motion of two features at once or wielding

    the view and the puppet at the same time. Since Blender neither

    supports multi-touch input nor concurrent operations, changes

    were necessary to its user interface module, especially the event

    system. We established a TUIO-based multi-touch interface. TUIO

    is an open, platform independent framework that defines a com-

    mon protocol and API for tangible interfaces and multi-touch sur-

    faces [36]. It is based on the Open Sound Control (OSC) protocol, an

    emerging standard for interactive environments. We implemented

    chording techniques for mouse emulation by mapping multiple

    finger cursors to single 2-DOF input events. This suffices for sin-

    gle-hand input. For bimanual interaction the contacts are clustered

    using a spatial as well as a temporal threshold. Fingers are onlyadded to the gesture if they are within a certain distance of the

    centroid of the gestures cursor cluster, otherwise they create a

    new multi-finger gesture. After initial registration the gesture can

    be relaxed, i.e. the finger constellation required for detection need

    not be maintained during the rest of the continuous gesture. This

    means that adding or removing a finger to the cluster will not

    change the gesture, making continuous gestures resistant to track-

    ing interruptions or touch pressure relaxation. This multi-touch

    integration already enables the use of tools via multi-touch ges-

    tures with one hand at a time. For two-handed control it was nec-

    essary to extend the single pointer UI paradigm implemented in

    Blender such that two input sources (two mice or two hands)

    can operate independently and in parallel.

    Performance controls use selection and translation operators(Fig. 4). The translation operator works along the two axes defined

    by the view plane. Single finger input maps to selection (tap) and

    translation (drag). In linked feature hierarchies such as skeleton

    rigs, the translation is applied to the distal bone end, rotating the

    bone around screen z axis. Dragging directly on a target enables

    selection and translation in a single fluid motion. Alternatively,

    the drag gesture can be performed anywhere on screen, also allow-

    ing indirect control of a prior selected target. Indirect dragging thus

    requires prior selection to determine the input target. Selection is

    the only context-dependent operator, as it determines the target

    by ray casting from the tapped screen coordinates.

    Layered animation is supported via absolute and additive map-

    pings. Absolute mode is the standard mapping, additive mode

    must be activated via the GUI. The standard absolute mapping

    overwrites any previous transformation at the current time. In

    the absence of parent motion this ensures 1:1 correspondence

    between input and output. With parent motion, control becomes

    relative to the parent frame of reference (local). Additive layering

    preserves existing motion and adds the current relative transfor-

    mation to it. By changing the view between takes so that the

    inputoutput mapping affects degrees of freedom that could not

    be affected in previous takes (e.g. by orbiting the view 90 degrees

    around screen y), this enables the animator to add depth and thus

    create more three-dimensional motion.

    The three camera operators pan, orbit and zoom map to

    two-, three-, and four-finger gestures (Fig. 5). Assigning chorded

    multi-finger gestures to view operators does not have any prece-

    dent in the real world or prior work, and there are good arguments

    for different choices. A sensible measure is the frequency of use of a

    certain view control, and thus one could argue that the more com-

    monly used functions should be mapped to the gestures with less

    footprint, i.e. fewer fingers. Camera dolly move or zoom is probably

    the least used view control, which is why we decided to map it to

    the four finger gesture: users can zoom in and out by moving four

    fingers up or down screen y. Three fingers allow camera orbit by

    the turntable metaphor: movement along the screen x axis controls

    turntable azimuth, while motion along screen y controls camera

    altitude. Two fingers pan the view along view plane x and y axes.Like transformation controls, camera controls are context-free,

    meaning they can be activated anywhere on camera view.

    A view attachment mode, when active, fixes the view camera to

    the currently selected feature during all camera operations, mov-

    ing the camera along with dynamic targets (Fig. 6). The camera-

    feature offset is maintained and can be continuously altered

    depending on camera operator as described above. After establish-

    ing the attachment by starting a view control gesture, new targets

    can be selected and manipulated. Releasing the camera control

    immediately ends the attachment, rendering the camera static.

    By combining one-handed view control and capture in an asym-

    metric manner, this approach can solve indirection in control of

    dynamic targets.

    The time control interface features several buttons and a time-line. Simple play/pause toggle buttons start and stop the playback

    within a specified time range. A timeline gives the animator visual

    feedback on the remaining loop length in multi-track capture, sup-

    porting anticipation. It also enables efficient temporal navigation:

    with a one-finger tap the animator can set the playhead to a spe-

    cific frame. A continuous horizontal gesture allows for interactive

    playback, allowing direct control of playback speed.

    5. Evaluation

    The design framework was a powerful aid for structuring design

    options for the novel multi-touch animation system presented

    above. We have also used it in the design of a performance-basedanimation timing technique (Walther-Franks et al. [71]) and are

    B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283 279

    http://-/?-http://-/?-
  • 7/24/2019 AR paper 4

    10/13

    Fig. 4. Direct and indirect performance control.

    Fig. 5. Basic view transformations with continuous multi-finger gestures.

    280 B. Walther-Franks, R. Malaka/ Entertainment Computing 5 (2014) 271283

  • 7/24/2019 AR paper 4

    11/13

    employing it in ongoing projects. A design framework as presented

    in this paper cannot be directly evaluated. Its usefulness and

    appropriateness is rather proven indirectly through evaluations

    of prototypical systems built on its theoretical foundation. For this

    reason we will next summarize the evaluation of the multi-touch

    animation system.

    We evaluated the resulting system in an informal user study.

    Aspects of interest were the reception and use of single- and

    multi-track capture and camera controls, specifically in how far

    two-handed interaction strategies would be employed. Since the

    direct animation system has a high novelty and is still at prototype

    stage, a formative evaluation was chosen in order to guide further

    research. Formative evaluations are common in research and

    development of 3D user interfaces [4]. Six right-handed individuals

    aged between 23 and 31 years, four male, two female, took part in

    our study. All came from a computer science and/or media produc-

    tion background. Two of these judged their skill level as frequent

    users of animation software, one as an occasional user and three

    as rarely using such software. In session of about 30 min, the users

    did free animations of a stylized human puppet. An articulated

    mannequin was rigged with seven handles that provided puppetry

    controls (three bones for control of the body and four inverse kine-

    matic handlers for hand and foot end effectors). The inverse kine-

    matics handlers allowed expressive control of the multi-joint limbswhile keeping complexity at a minimum. The goal was to explore

    what own animation goals users would come up with given the

    digital puppet. The study ran the prototype on a rear-projected

    horizontal interactive tabletop employing the diffuse illumination

    technique with a height of 90 cm, screen diagonal of 52 inch and

    a resolution of 1280 800 pixels.

    The results of the study revealed that participants took to the

    controls easily. Most stated that they enjoyed using our system.

    The performance control interface was straightforward for initial

    animations. Multi-track animation was mainly used to animate

    separate features in multiple passes, less to adjust existing anima-

    tion. The more complex additive mapping was hardly used and

    met with initial confusion, although explanation and experiment-

    ing usually solved this. The view controls were quickly understoodand were used without difficulty. The most commonly used cam-

    era operation was orbit. As all participants were familiar with

    the timeline metaphor they had no problems understanding it.

    Most subjects easily employed the absolute positioning of the

    playhead to jump to a frame and to scrub along the timeline to

    review the animation they had created. One participant used the

    timeline for a method of animation somewhere between perfor-

    mance and frame-based animation: using the left hand for play-

    head and the right for pose control, he exerted a fast, efficient

    pose-to-pose animation style. Five out of six participants mani-

    fested asymmetric bimanual styles of interaction. An emergent

    strategy of half of our studys participants was to dedicate the left

    hand for view or time controls and the right for capture. Further,

    one participant controlled two puppet features simultaneously.Three used their left hand to attach the view to the mannequin

    for animating its limbs once they had created animation for the

    root bone. The benefit of locking the view to a frame of reference

    in this way seemed immediately apparent to them, and was

    greeted with enthusiasm in two cases.

    Given the short timeframe and lack of experience in perfor-

    mance animation, participants were able to create surprisingly

    refined character motion. Four were able to create expressive char-

    acter animations within the short timeframe of 10 min in the free

    animation task. These were a walk, jump and squat motions and

    dance moves.

    Inexperienced users had a harder time to comprehend spatial

    relationships, while those with more experience in 3D animation

    notably picked up controls more fluently. This comes as no sur-

    prise, as using and controlling software takes time and practice,

    regardless of interface. For novice and casual users, our 2DOF strat-

    egy seems appropriate, since it constrains manipulation by the

    depth dimension. However, the interface might need improvement

    visualizing these constraints and giving more hints on depth cues.

    6. Conclusion and discussion

    Current animation system are too complex and inefficient for

    the high demand in animated content today. In order to make themmore efficient and accessible to a broad range of users we have to

    look at such tools from an HCI perspective. Our work has taken

    steps in this direction. A review summarized related work in com-

    puter animation interfaces regarding issues of control and use. A

    design space characterized important aspects of animation inter-

    faces on varying levels of abstraction. A taxonomy for spacetime

    interactions with spatiotemporal media described how user and

    medium space and dynamics relate in animation interfaces. The

    use of this conceptual framework was demonstrated in the design

    of a multi-touch animation system. For this proof-of-concept proto-

    type we used interactive surfaces as high-bandwidth direct input

    devices. It features robust, easy to understand, and conflict-free

    unimanual mappings for performance and view control that can

    be combined for efficient bimanual interaction. A user study veri-fied the design approach by showing largely positiveuser reactions.

    The majority of users employed both hands in emergent asymmet-

    ric and symmetric bimanual interaction.

    Animations are created by people for people in order to inform,

    educate or entertain. Striving for higher usability by applying

    knowledge on physiological and psychological human factors is

    the foundation of humancomputer interaction, and one of the

    main points of our work. However, animation is primarily still an

    art and a craft. Just as good animations have always been created

    by artists with capability and skill, next generation animation

    interfaces will still require talent and training on behalf of the user.

    But in contrast to current mainstream tools they can help to ease

    the effort in training and allow animators to express their creativ-

    ity more efficiently. While animation tools cannot enablecompletely uninitiated people to create stunning motion designs

    Fig. 6. The view attaching technique. Features can inherit motion from parents animated in previous motion layers. In such cases direct control is not possible. By attaching

    the view to the features frame of reference, direct control is reintroduced.

    B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283 281

    http://-/?-
  • 7/24/2019 AR paper 4

    12/13

    without significantly constraining creativity, they can do a lot more

    to make the learning curve less steep. We believe that next gener-

    ation tools should incorporate everyone from beginners to experi-

    enced professionals, by being easy to learn, but hard to master. In

    this we hold it with voices in the community that, rather than

    making systems easy to use, intend to accelerate the progress from

    novices to experts[35], by letting users feel like naturals[62].

    Acknowledgement

    This work was funded in part by the Klaus Tschira Stiftung.

    Appendix A. Supplementary data

    Supplementary data associated with this article can be found, in

    the online version, at http://dx.doi.org/10.1016/j.entcom.2014.

    08.007.

    References

    [1] Anand Agrawala, Ravin Balakrishnan, Keepin it real: pushing the desktop

    metaphor with physics, piles and the pen, in: Proceedings of the SIGCHI

    Conference on Human Factors in Computing Systems, CHI 06, ACM, NewYork,NY, USA, 2006, pp. 12831292.

    [2] Michel Beaudouin-Lafon, Instrumental interaction: an interaction model for

    designing post-WIMP user interfaces, in: Proceedings of the SIGCHI conference

    on Human Factors in Computing Systems, CHI 00, ACM, New York, NY, USA,

    2000, pp. 446453.

    [3] B. Bodenheimer, C. Rose, S. Rosenthal, J. Pella, The process of motion capture:

    dealing with the data, in: D. Thalmann, M. van de Panne (Eds.), Computer

    Animation and Simulation 97. Eurographics/ACM SIGGRAPH, 1997.

    [4] Doug.A. Bowman, Ernst. Kruijff, Joseph.J. LaViola, Ivan. Poupyrev, 3D User

    Interfaces: Theory and Practice, Addison-Wesley, 2004.

    [5] Brightside Games. Zeit2. Ubisoft, 2011.

    [6] W. Buxton, B. Myers. A study in two-handed input, in: Proceedings of the

    SIGCHI Conference on Human Factors in Computing Systems, CHI 86, ACM,

    New York, NY, USA, 1986, pp. 321326.

    [7] Xiang Cao, Andrew D. Wilson, Ravin Balakrishnan, Ken Hinckley, Scott E.

    Hudson, ShapeTouch: leveraging contact shape on interactive surfaces, in:

    2008 IEEE International Workshop on Horizontal Interactive Human Computer

    Systems (TABLETOP). IEEE, October 2008, pp. 129136.

    [8] S. K. Card, J. Mackinlay. The structure of the information visualization design

    space, in: Information Visualization, 1997. Proceedings., IEEE Symposium on ,

    volume 0, IEEE, Los Alamitos, CA, USA, October 1997, pp. 9299.

    [9] Stuart.K. Card, Jock.D. Mackinlay, George.G. Robertson, A morphological

    analysis of the design space of input devices, ACM Trans. Inf. Syst. 9 (2)

    (April 1991) 99122.

    [10] Daniel. Casasanto, Lera. Boroditsky, Time in the mind: using space to think

    about time, Cognition 106 (2) (February 2008) 579593.

    [11] Lawrence D. Cutler, Bernd Frhlich, Pat Hanrahan. Two-handed direct

    manipulation on the responsive workbench, in: SI3D 97: Proceedings of the

    1997 Symposium on Interactive 3D Graphics, ACM, New York, NY, USA, 1997,

    pp. 107114.

    [12] J.D.N. Dionisio, A.F. Cardenas, A unified data model for representing

    multimedia, timeline, and simulation data, IEEE Trans. Knowledge Data Eng.

    10 (5) (September 1998) 746767.

    [13] Mira. Dontcheva, Gary. Yngve, Zoran. Popovic, Layered acting for character

    animation, ACM Trans. Graph. 22 (3) (July 2003) 409416.

    [14] Tanja Dring, Axel Sylvester, Albrecht Schmidt. A design space for ephemeral

    user interfaces, in: Proceedings of the 7th International Conference onTangible, Embedded and Embodied Interaction, TEI 13, ACM, New York, NY,

    USA, 2013, pp. 7582.

    [15] Pierre Dragicevic, Gonzalo Ramos, Jacobo Bibliowitcz, Derek Nowrouzezahrai,

    Ravin Balakrishnan, Karan Singh. Video browsing by direct manipulation, in:

    Proceedings of the SIGCHI Conference on Human Factors in Computing

    Systems, CHI 08, ACM, New York, NY, USA, 2008, pp. 237246.

    [16] J. Edelmann, A. Schilling, S. Fleck. The DabR a multitouch system for intuitive

    3D scene navigation, in: 3DTV Conference. The True Vision Capture,

    Transmission and Display of 3D Video, 2009. IEEE, May 2009, pp. 14.

    [17] James.D. Foley, Andries. van Dam, Steven.K. Feiner, John F. Hughes, Computer

    Graphics Principles and Practice, Addison-Wesley, 1996.

    [18] Clifton Forlines, Daniel Vogel, Ravin Balakrishnan. Hybrid Pointing: fluid

    switching between absolute and relative pointing with a direct input device,

    in: UIST 06: Proceedings of the 19th Annual ACM Symposium on User

    Interface Software and Technology, ACM, New York, NY, USA, 2006, pp. 211

    220.

    [19] Clifton Forlines, Daniel Wigdor, Chia Shen, Ravin Balakrishnan. Direct-touch

    vs. mouse input for tabletop displays, in: Proceedings of the SIGCHI Conference

    on Human Factors in Computing Systems, CHI 07, ACM, New York, NY, USA,2007, pp. 647656.

    [20] B. Frohlich, H. Tramberend, A. Beers, M. Agrawala, D. Baraff. Physically-based

    manipulation on the responsive workbench, in: IEEE Virtual Reality 2000,

    volume 0, IEEE Comput. Soc., Los Alamitos, CA, USA, 2000, pp. 511.

    [21] David M. Frohlich, The design space of interfaces, in: Lars. Kjelldahl (Ed.),

    Multimedia, Eurographic Seminars, Springer, Berlin Heidelberg, 1992, pp. 53

    69.

    [22] DavidM Frohlich, Direct manipulation and other lessons, in: Martin G. He-

    lander, Thomas K. Landauer, Prasad V. Prabhu (Eds.), Handbook of Human

    Computer Interaction, Elsevier, North-Holland, 1997, pp. 463488.

    [23] Chi W. Fu, Wooi B. Goh, Junxiang A. Ng. Multi-touch techniques for exploring

    large-scale 3D astrophysical simulations, in: Proceedings of the 28thinternational conference on Human factors in computing systems, CHI 10,

    ACM, New York, NY, USA, 2010, pp. 22132222.

    [24] Michael. Gleicher, Animation from observation: motion capture and motion

    editing, SIGGRAPH Comput. Graph. 33 (4) (November 1999) 5154 .

    [25] Y. Guiard, Asymmetric division of labor in human skilled bimanual action: the

    kinematic chain as a model, J. Motor Behav. 19 (4) (December 1987) 486517.

    [26] Marc S. Hancock, F. D. Vernier, Daniel Wigdor, Sheelagh Carpendale, and Chia

    Shen. Rotation and translation mechanisms for tabletop interaction, in:

    Horizontal Interactive HumanComputer Systems, 2006. TableTop 2006.

    First IEEE International Workshop on, 8 pp+. IEEE, January 2006.

    [27] Mark Hancock, Sheelagh Carpendale, Andy Cockburn. Shallow-depth 3d

    interaction: design and evaluation of one-, two- and three-touch techniques,

    in: Proceedings of the SIGCHI Conference on Human Factors in Computing

    Systems, CHI 07, ACM, New York, NY, USA, 2007, pp. 11471156.

    [28] Mark Hancock, Thomas T. Cate, Sheelagh Carpendale. Sticky tools: Full 6DOF

    force-based interaction for multi-touch tables, in: Proceedings of Interactive

    Tabletops and Surfaces 2009, 2009.

    [29] Ken. Hinckley, Daniel. Wigdor, Input Technologies and Techniques, Taylor &

    Francis, 2012. Chapter 9.

    [30] Takeo. Igarashi, Tomer. Moscovich, John.F. Hughes, As-rigid-as-possible shape

    manipulation, ACM Trans. Graph. 24 (3) (2005) 11341141.

    [31] Satoru Ishigaki, Timothy White, Victor B. Zordan, C. Karen Liu, Performance-

    based control interface for character animation, ACM Trans. Graph. 28 (3)

    (2009) 18. July.

    [32] Robert J.K. Jacob, Linda E. Sibert, Daniel C. McFarlane, M. Preston Mullen,

    Integrality and separability of input devices, ACM Trans. Comput. Hum.

    Interact. 1 (1) (1994) 326. March.

    [33] Robert J. K. Jacob, Audrey Girouard, Leanne M. Hirshfield, Michael S. Horn, Orit

    Shaer, Erin T. Solovey, Jamie Zigelbaum. Reality-based interaction: a

    framework for post-WIMP interfaces, in: Proceedings of the Twenty-sixth

    Annual SIGCHI Conference on Human Factors in Computing Systems, CHI 08,

    ACM, New York, NY, USA, 2008, pp. 201210.

    [34] John Jurgensen. From muppets to digital puppets, August 2008. URLhttp://

    www.youtube.com/watch?v=GN8WbHomQJg.

    [35] Paul Kabbash, William Buxton, Abigail Sellen. Two-handed input in a

    compound task, in: Proceedings of the SIGCHI Conference on Human Factors

    in Computing Systems, CHI 94, ACM, New York, NY, USA, 1994, pp. 417423.[36] Martin Kaltenbrunner, Till Bovermann, Ross Bencina, Enrico Costanza. TUIO

    a protocol for table based tangible user interfaces, in: Proceedings of the 6th

    International Workshop on Gesture in HumanComputer Interaction and

    Simulation (GW 2005), Vannes, France, 2005.

    [37] Matt Kelland, Dave Morris, Dave Lloyd Machinima, Making Animated Movies

    in 3D Virtual Environments, Ilex, Lewes, 2005.

    [38] Kenrick Kin, Maneesh Agrawala, Tony DeRose. Determining the benefits of

    direct-touch, bimanual, and multifinger input on a multitouch workstation, in:

    Proceedings of Graphics Interface 2009, GI 09, Canadian Information

    Processing Society, Toronto, Ontario, Canada, Canada, 2009, pp. 119124.

    [39] Michael Kipp, Quan Nguyen. Multitouch puppetry: creating coordinated 3D

    motion for an articulatedarm, in: ACMInternational Conference on Interactive

    Tabletops and Surfaces, ITS 10, ACM, New York, NY, USA, 2010, pp. 147156.

    [40] Joseph Laszlo, Michiel van de Panne, Eugene Fiume. Interactive control for

    physically-based animation, in: SIGGRAPH 00: Proceedings of the 27th Annual

    Conference on Computer Graphics and Interactive Techniques, ACM Press/

    Addison-Wesley Publishing Co., New York, NY, USA, 2000, pp. 201208.

    [41] Andrea Leganchuk, Shumin Zhai, William Buxton, Manual and cognitive

    benefits of two-handed input: an experimental study, ACM Trans. Comput.Hum. Interact. 5 (4) (1998) 326359. December.

    [42] Thomas D.C. Little, in: Time-based Media Representation and Delivery, ACM

    Press/Addison-Wesley Publishing Co., New York, NY, USA, 1994, pp. 175200.

    [43] A. Martinet, G. Casiez, L. Grisoni, Integrality and separability of multitouch

    interaction techniques in 3D manipulation tasks, IEEE Trans. Vis. Comput.

    Graph. 18 (3) (March 2012) 369380.

    [44] Alberto Menache. Understanding motion capture for computer animation.

    2011.

    [45] T. Moscovich, T. Igarashi, J. Rekimoto, K. Fukuchi, J. F. Hughes. A multi-finger

    interface for performance animation of deformable drawings, in: UIST 2005

    Symposium on User Interface Software and Technology, October 2005.

    [46] Miguel A. Nacenta, Patrick Baudisch, Hrvoje Benko, Andrew D. Wilson.

    Separability of spatial manipulations in multi-touch interfaces, in: GI 09:

    Proceedings of Graphics Interface 2009, Canadian Information Processing

    Society, Toronto, Ontario, Canada, Canada, 20