AR paper 4

7/24/2019 AR paper 4

1/13

An interaction approach to computer animationq

Benjamin Walther-Franks , Rainer Malaka 1

Research Group Digital Media, Universitt Bremen, Fb3, Bibliothekstr. 1, 28359 Bremen, Germany

a r t i c l e i n f o

Article history:

Received 31 March 2014

Revised 15 July 2014

Accepted 19 August 2014

Available online 2 September 2014

Keywords:

Motion design interfaces

Performance animation

Humancomputer interaction

Design space

a b s t r a c t

Design of and research on animation interfaces rarely uses methods and theory of humancomputer-

interaction (HCI). Graphical motion design interfaces are based on dated interaction paradigms, and novel

procedures for capturing, processing and mapping motion are preoccupied with aspects of modeling andcomputation. Yet research in HCI has come far in understanding human cognition and motor skills and

how to apply this understanding to interaction design. We propose an HCI perspective on computer ani-

mation that relates the state-of-the-art in motion design interfaces to the concepts and terminology of

this field. The main contribution is a design space of animation interfaces. This conceptual framework

aids relating strengths and weaknesses of established animation methods and techniques. We demon-

strate how this interaction-centric approach can be put into practice in the development of a multi-touch

animation system.

2014 Elsevier B.V. All rights reserved.

1. Introduction

Moving images are omnipresent in cinema, television, com-

puter games and online entertainment. Digital media such as text,

images and film are nowadays produced by a diverse crowd of

authors, ranging from beginners and laymen to professionals. Yet

animation is still seen by most people as a highly sophisticated

process that only experts can master, using complex interfaces

and expensive equipment. However, consumer motion capture

technology has recently enabled and created a mass-market for

easy-to-use animation tools: computer games. In contrast to most

professional animation tools, recent games employ full-body inter-

action for instance via Kinect, allowing users to control a virtual

character instantaneously through their body. This trend is feeding

back into the area of the experts, with researchers investigating

time-efficient interfaces for computer puppetry using the Kinect

(e.g. [61,55]. Computer animation is currently seeing an influx of

ideas coming from the world of easy-to-use game interface made

for players with no prior training. Game designers in turn are

informed by design knowledge and methods developed over

decades of research in humancomputer interaction (HCI).

It is thus time that computer animation be approached from an

HCI perspective. This could aid describing and analyzing the vast

spectrum of animation techniques ranging fromvery intuitive pup-

petry interfaces for computer games to highly sophisticated con-

trol in advanced animation tools. Our goal is to understand

principles that underlie humanmachine interactions in computer

animation. With new ways of thinking about interactions with

continuous visual media and a thorough investigation of new ani-

mation interfaces on a theoretical foundation, motion design inter-

faces can be made more beginner and expert friendly.

This can be achieved by embedding computer animation meth-

ods and interfaces in an HCI context. Trends in motion design

interfaces can be connected with discussions on next generation

interfaces in HCI. Theoretical frameworks can aid us in tackling

the concrete user interface issues by a profound analysis, which

can aid the process of designing new mechanisms for more natural

and intuitive means of motion creation and editing.

This article approaches this goal in three main steps. We will

first reviewrelated work fromcomputer graphics, human computer

interaction and entertainment computing from a user- and inter-

face-centric perspective with a focus on methods, mappings and

metaphors. In the second step we construct a design spacefor inter-

faces that deal with spatiotemporal media. In the third step, the

utility of this conceptual framework is illustrated by applying it in

designing a multi-touch interactive animation system.

2. Animation techniques: an interaction view

Computer-based frame animation is the direct successor of tra-

ditional hand-drawn animation, and still the main method.

Advances in sensing hardware and processing power have brought

http://dx.doi.org/10.1016/j.entcom.2014.08.007

1875-9521/2014 Elsevier B.V. All rights reserved.

q This paper has been recommended for acceptance by Andrea Sanna. Corresponding author. Tel.: +49 421 218 64414.

E-mail addresses: [email protected](B. Walther-Franks),[email protected](R. Malaka).1 Tel.:+49 421 218 64402.

Entertainment Computing 5 (2014) 271283

Contents lists available at ScienceDirect

Entertainment Computing

j o u r n a l h o m e p a g e : e e s . e l s e v i e r . c o m / e n t c o m
http://dx.doi.org/10.1016/j.entcom.2014.08.007mailto:[email protected]:[email protected]://dx.doi.org/10.1016/j.entcom.2014.08.007http://www.sciencedirect.com/science/journal/18759521http://ees.elsevier.com/entcomhttp://ees.elsevier.com/entcomhttp://www.sciencedirect.com/science/journal/18759521http://dx.doi.org/10.1016/j.entcom.2014.08.007mailto:[email protected]:[email protected]://dx.doi.org/10.1016/j.entcom.2014.08.007http://-/?-http://-/?-http://-/?-http://-/?-http://crossmark.crossref.org/dialog/?doi=10.1016/j.entcom.2014.08.007&domain=pdfhttp://-/?-

7/24/2019 AR paper 4

2/13

entirely new possibilities. Motion capture records the live perfor-

mance of actors, introducing a new form of animation more akin

to puppetry than traditional animation. Programmed animation

enables realistic simulations to provide interesting secondary

motion and create more believable worlds.

Traditionally, in computer-based keyframe animation, only

extreme poses or key frames need to be manually established by

the animator. Each keyframe is edited using manipulation tools,

which can be specialized for the target domain, e.g. character

poses. Some manipulation tools allow influencing dynamics

directly in the scene view. The most common means of specifying

dynamics is by using global descriptions, such as time plots or

motion paths. Spatial editing between keyframes can be achieved

indirectly by editing interpolation functions or by defining a new

key pose.

Motion timing is usually done via global descriptions of dynam-

ics. However, some temporal control techniques directly operate

on the target. Snibbe[58]suggests timing techniques that do not

require time plots but can be administered by directly manipulat-

ing the target or its motion path in the scene view. As with spatial

editing, the practicality of temporal editing with displacement

functions depends heavily on the underlying keyframe distribu-

tion. Timing by direct manipulation in the scene view is also sup-

ported by the latest animation software packages. Tweaking

motion trail handles allows for temporal instead of spatial transla-

tion; visual feedback can be given by changing frame numbers

adjacent to the handle. Spatial control of time has also been pro-

posed for video navigation[15].

Motion graphs are two-dimensional plots that map transforma-

tion values (vertical axis) against time (horizontal axis). With a

2DOF input device, such a graph thus allows integrated, simulta-

neous spatiotemporal control. In keyframe animation the motion

editor is the standard way to manage keyframe value interpolation,

typically by means of Bezier curve handles.

In contrast to keyframe animation, performance animation uses

motion capturing of live performance of an actor or puppeteer by

tracking a number of key points in space over time and combiningthem to obtain a representation of the performance. The recorded

data then drives the motion of a digital character. The entire proce-

dure of applying motion capture data to drive an animation is

referred to as performance animation [44]. In a typical setup, an

actors motion is first recorded, then the data is cleaned, processed

and applied to a digital character. Since the digital character can

have quite different proportions than the performer, retargeting

the motion data is a non-trivial task [24]. In this form of perfor-

mance animation, capture and application of motion data to an

animation are two separate processes, data handling is done off-

line. Online performance animation immediately applies captured

data to a digital character, creating animation instantly, allowing

the performer to react immediately to the results or to interact

with an audience[59,24]. Processing limitations sometimes entailthat performers can often only see a low-fidelity pre-visualization

of the final rendering[44].

Many performance animation efforts aim to represent human

motion accurately and limit the abstraction to a minimum and

the motion capture performers use only the senses with which

they have learned to act (e.g. kinaesthetic and proprioceptive feed-

back). For performance animation of stylized or non-humanoid

characters it is desirable to control them in a less literal fashion.

Such a style of performance control is often referred to as computer

or digital puppetry [3,59]. Just as traditional puppeteers would rely

on mirrors or camera feeds to adjust their performance, computer

puppetry requires instant renderings of the applied input to allow

performers to adjust their motions. Real-time mappings either use

high bandwidth devices for coordinated control of all characterDOF, or employ models based on example data or a physical

simulation. One challenge is to control a high number of degrees

of freedom (DOF) at the same time.

Real-time control of humanoid characters suggest literal map-

pings from the puppeteers physique to the characters skeleton.

Non-humanoid characters such as animals, monsters or animate

objects are difficult since they have a vastly different morphology

and motion style to humans. Seol et al. [55]address this by learn-

ing mappings through users mimicking creature motion during a

design phase. These learnt mappings can then be used and com-

bined during online puppetry. In similar work, Yamane et al. [66]

propose matching human motion data to non-humanoid charac-

ters with a statistical model created on the basis of a small set

manually selected and created human-character pose pairs; how-

ever, this process is conducted offline. The technique for optimal

mapping of a human input skeleton onto an arbitrary character

skeleton proposed by Sanna et al.[67]manages without any man-

ual examples and finds the best match between the two based

solely on structural similarities.

For animation techniques on desktop input devices, however,

typically less DOF are available. Recently this has been addressed

by multi-touch input devices, which enable techniques for simulta-

neous rotation, scaling and translation (RST) for 4DOF control of a

2D target[26]. Reisman et al. [52]developed a technique for inte-

grated rotation and translation of 3D content using an arbitrary

amount of contact points on an interactive surface.

When input devices of lesser DOF than the object parameters

are used, integrated control is not possible. This is a common prob-

lem in desktop interaction for navigating and editing 3D media,

since most desktop input and display devices only have two DOF.

Interface designers thus often face the problem of mapping two

control DOF to a higher-dimensional target parameter space. A

solution is to separate the degrees of control, i.e. splitting object

DOF into manageable subsets [4]. With single-pointer input

devices, this necessitates a sequential control of such subsets, e.g.

through displays of multiple orthographic projections of the scene

in one split screen or through spatial handles that are overlaid on

top of the target object. [4].If high-DOF devices are not available and temporal multiplexing

is not desired, interface designers can choose to constrain the

interaction to reduce required control DOF. A challenge for design-

ers is that the model behind the constraint must be understood by

the user, for instance by basing them on mechanisms already

known from other contexts.

Yamane and Nakamura [64] present a pin-and-drag interface

for posing articulated figures. By pinning down parts of the figure,

such as the end-effectors (feet or hands) and dragging others, the

whole character can be controlled with relative ease. Joint motion

ranges, the current joint configuration and the user-set joint con-

straints (pins) thus allow constrained control of several character

DOF with as few as two position input DOF for a 2D character.

The various constraints are prioritized so that dragging constraintsare always fulfilled and solved by differential kinematics that give

a linear relationship between the constraints and the joint

velocities.

Several research projects have attempted to leave the world of

explicit mappings and enable low-to-high-dimensional control,

bimanual interaction and multi-user interaction implicitly by sim-

ulating real-world physics. Frohlich et al. [20]let users kinemati-

cally control intermediate objects that are attached to target

objects by springs. The spring attachment is also used by Agrawala

and Balakrishnan[1]to enable interaction with a physically simu-

lated virtual desktop, the Bumptop.

Limitations in the motion capture system or the performers

physiology to produce certain desired motions can be overcome

by simulating parts of the body and their interaction with the envi-ronment. Ishigaki et al. [31] combine real-time full-body motion

272 B. Walther-Franks, R. Malaka/ Entertainment Computing 5 (2014) 271283

7/24/2019 AR paper 4

3/13

capture data, physical simulation and a set of motion examples to

create character movement that a user cannot easily perform, such

as climbing or swimming. The virtual environment contains prede-

fined interaction points such as the handles of a monkey bar or a

rope. Once the characters end-effectors are brought into proximity

of an interaction point, control changes so that the character

motion is no longer fully controlled by the motion capture. A sim-

plified simulation that treats the intentional contact as a universal

joint connected to the characters centre of mass by a linear damp-

ened spring enables the calculation of the overall dynamics of the

character.

Even when input and output degrees of freedom match, physi-

cal interdependencies of input DOF can still limit a mapping. In full

body tracking, the joint locations are dependent on actor size and

body proportions. If the performers proportions significantly differ

from character proportions, this can lead to problems with the

character interacting with objects in the scene, such as the floor

or props. For this problem of retargeting of motion capture data

to a new character, Shin et al.[57]propose an approach that maps

input based on a few simple heuristics, e.g., considering the dis-

tance of end-effectors to an object in the scene.

For live performances, control needs to be addressed with high-

bandwidth input devices or performers acting in parallel. With

recorded performances the puppeteer has more options. Capture

sequences or just parts of them can be retaken, or slightly modi-

fied, and complex motion can be built up in passes. Layered or

multi-track animation allows the performer to concentrate on only

a small amount of action at a time and create articulated motions

step by step. Oore et al. [50]employ layered motion recording for

controlling subsets of a characters DOF. For the animation of a

humanoid, they divide the character DOF into three parts and ani-

mate these sequentially: Two 6DOF devices are used to control the

motion of both legs, both arms, and torso and head in three passes.

Dontcheva et al. [13] make motion layering to the principle of their

live animation system.

Video games have a strong connection to animation. Most mod-

ern video games make heavy use of animation in order to breathelife into the game world. In this sense, games are one application

area amongst many others, such as film, television, or education.

But animation is also created with and in video games. The actions

taken by players, the responses of game elements constitute a form

of motion design, often conveying a certain story. This is most evi-

dent in game genres where players control characters in a virtual

world, like a puppeteer controls a puppet. However, animating

for video games differs significantly to animating for film or televi-

sion. While in film characters and objects are only viewed from a

specific camera angle, in interactive media such as video games,

the behavior and the view are spontaneously defined by the player.

The animator cannot foresee the decisions of the player, which is

why he must create animations for all possible player actions that

must meet certain criteria of completeness and realism. Suchmotion libraries contain elementary animation sequences can then

be looped, blended and combined in real-time by the game engine

[37]. By interactively directing pre-defined animations, players

thus essentially perform a kind of digital puppetry with indirect

control.

Motion control through high-DOF input devices extends the

degree of control, further blurring the lines between gaming and

puppetry: as players are able to influence more character DOF,

their possibilities for expression are increased. However, while all

games use some form of motion capture, few offer motion editing

required in animation practice: if a player is not satisfied with his

performance, he will have to do it again. Most games lack tech-

niques for even the basic task of time control, with notable excep-

tions such as Prince of Persia: Sands of Time [60], Zeit2 [5] andBraid [49], in which the player must navigate time as well as space.

Yet while these games incorporate time control in innovative ways,

they do not provide the degree of control and editing required for

professional animation.

In Machinima, the art of 3D game-based filmmaking, animation

and video games ultimately come together to form a novel means

of creating animated movies[37]. Using game engines for anima-

tion or virtual filming has benefits as well as limitations. Modern

3D games provide a complete game world with physics, animated

models, and special effects while offering comparatively simple

controls for puppeteering game characters. This gives authors a

lot to build upon, as opposed to other methods where animations

must be created from scratch. The limitations lie in the depen-

dency on the game developer with their short product cycles, their

game engine and assets, and the legal issues involved in using

these. Computer puppetry in games remains limited, as is any per-

formance control interface that merely activates and blends pre-

defined animations.

Viewing the state-of-the-art in animation with a coherent focus

on the user, mappings and control DOF is a first step in analyzing

the current generation and developing for the next generation of

interfaces. The next step is to further structure this treatment: a

theoretical framework identifies explicit aspects of interaction in

computer animation tools.

3. A design space for computer animation interfaces

Even though there is an increasing trend in computer graphics

research to consider the needs of the artist (e.g.[51,54], most work

on animation interfaces does not consider aspects of HCI. An inter-

action perspective on computer animation can help to construct a

design space of user interfaces for spatiotemporal media. Such a

design space can structure the designers options and aid research-

ers in analyzing the state of the art.

Existing interface design frameworks cannot be readily used for

animation interfaces, as they are either too general or too specific.

General frameworks[21,48]span too large a space or only analyzecertain aspects of interaction like input devices but not their map-

ping to output [9], while domain-specific frameworks [8,14] are

too focused.

Jacob et al. [33]present a framework for reality-based interac-

tion (RBI) that includes four themes: Naive physics (NP) reflects

the innate human understanding of basic concepts of physics, such

as gravity, forces and friction; body awareness and skills (BAS)

describes our sense of our own body and what we can do with

it;environment awareness and skills(EAS) covers how humans per-

ceive and mentally model their environment and place themselves

in relation to it;social awareness and skills(SAS) stands for humans

as social animals, who generate meaning by relating to other

human beings. Considering the four RBI themes for computer ani-

mation, many techniques aim to tap the artists innate understand-ing of spacetime processes, relating to the theme of naive physics

(NP). The environment awareness and skills theme (EAS) comes

into play as soon as humans interact with these real world

spacetime processes. For instance, multi-finger deformation tech-

niques for 2D puppetry on interactive surfaces [45] rely onour nat-

ural sense of timing and real-world experience with objects (NP,

EAS). In fact any technique based on motion capture for defining

dynamics relies on users intuitive sense of space and time (NP,

EAS). Performance controls for digital puppetry use the performers

understanding of their body (BAS). As Kipp and Nguyen[39]illus-

trate, a puppeteer uses complex coordinated hand movements to

bring a wooden puppet to life. Even the technique for low-fidelity

input via mouse and keyboardof Laszlo et al. [40] exploits both an

animators motor learning skills and their ability to reason aboutmotion planning (BAS). Collaboration in computer animation is

B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283 273

7/24/2019 AR paper 4

4/13

common, as large productions require teams to work together, but

does not usually involve close coordination during a single task.

Multi-user puppetry interfaces are different in that they tap the

ability of humans to relate to other human beings (SAS). The four

qualities must be traded off against other desirable qualities, such

as expressive power, efficiency, versatility, ergonomics, accessibil-

ity and practicality, on a case-to-case basis.

While these themes are relevant for designing any kind of novel

interactive systems aiming at reality-based interaction, they are

rather general. For a conceptual framework specific to animation

it is thus necessary to define a new design space. In the following

we discuss the aspects we have identified in our work as relevant

to such a framework. We motivate their inclusion and relate them

to each other. We will also relate our framework to the RBI

framework.

3.1. Aspects of design

Analogous to general models of humancomputer interaction,

computer animation involves a dialog between a human artist

(animator, actor or puppeteer) and the application, a virtual artifact

(the animation). This occurs through a hardwaresoftware machine(the animation software and the hardware running it, including

input and output periphery). A design framework should consider

aspects of these entities and their relations. Fig. 1shows this basic

triangular structure that describes two views of this human-arti-

fact dialog, one that takes the machine as a mediator into account

(left and lower edge: artist-machine-artifact) and one that

abstracts from it (right edge: artist-artifact). Seven aspects charac-

terize these entities and their relations: task, integration, corre-

spondence, metaphor, directness, orchestration and spacetime.

In the following we will discuss these seven design aspects and

their relevance for HCI and animation tools.

Animation tools for productive use are designed around the

taskfor which they are intended. Decomposition breaks down tasks

into further subtasks, which can be, in turn, repeatedly brokendown until one arrives at basic tasks at the desired level of decom-

position which is frequently used to structure interaction tech-

niques [17,4,29]. At the top level, the main tasks in animation

design are motion creation (generating from scratch), motion edit-

ing (adapting an existing design) and viewing (for visual feedback

on spatial and temporal design). At a lower level, task decomposi-

tion structure varies highly with the type of animation artifact, i.e.

character animation or environment effects. Tool generality [53]or

versatility [33] characterizes the variety of interaction tasks that

can be performed with an interface. This can range from support-

ing a large amount of tasks from varied application domains to

only supporting a single, domain-specific task. Tasks are the goal

of interaction and aim at creating the animation. Therefore, our

design space links the aspects of tasks to the virtual artifact (Fig. 1).

An input device defines theintegrationof control how many

DOF can be changed at the same time from the same input source

[2]. Performance controls are traditionally very specialized, e.g.

using full-body motion capture suits or special hand-puppet input

devices[59,34]. Yet research has also brought forward more gen-

eral controls, such as the 2D multi-point deformation technique

of Igarashi et al. [30]. Since computer animation often involves

domain objects with large amounts of degrees of freedom (even

a simple 3D articulated biped will have around 30 DOF), special-

ized high-DOF input devices allow for a high level of integration.

Ideally the input device should match the structure of the task

Jacob et al. [32]. In most situations the DOF of the input device

are not sufficient and solutions like artificial separation or con-

straining mappings based on a certain model have to be found. If

other considerations lead to using lower-DOF input devices, tasks

should be adapted accordingly, e.g. by separating translation and

orientation [43]. The aspect of integration is mostly construed from

the set-up of the input device. We thus locate the aspect of integra-

tion next to the machine in the design space (Fig. 1).

Correspondence describes how the morphology of the physical

input through the input device and the resulting response of the

artifact relate[29]. Bodenheimer et al.[3]distinguish performance

animation controls by the degree of abstraction in the sense of cor-

respondence. At the one end of the spectrum, mappings are pri-

marily concerned with the character or style of the motion rather

than literal mappings between performer and target. Such map-

pings are more commonly used in computer puppetry. At the other

end of the spectrum are efforts to accurately represent motion that

strive to limit the degree of abstraction to a minimum. A high spa-

tial correspondence between input and output requires less mental

effort since it draws on our experience in using our own body andencountering real-world objects (BAS, EAS). UI designers must face

the tradeoffs between better learnability through high correspon-

dence and the range of motions that can be expressed. The aspect

of correspondence bridges the virtual artifact and the machine

characteristics (machine-artifact edge inFig. 1).

Themetaphoris a notion for describing the mapping of cogni-

tive intentions to physical device interaction using concepts

known from other domains [47,4]. In the conversation metaphor

the user engages in a written or spoken dialogue with the machine.

They are well suited for high-level operations, but less suited for

spatial precision and expression. Today graphical user interfaces

represent the dominating manipulation metaphor, where the user

acts upon a virtual worldrather than using language as an interme-

diary. Manipulation interfaces tap our naive understanding of thelaws of physics (NP), our motor memories (BAS) and how we per-

ceive and interact with our surroundings (EAS). Manipulation

using instruments requires more learning and mental resources,

as well as introducing indirection [65,22]. Sensors tracking the

users body promote an embodiment metaphor where the user

identifies with parts of a virtual world in a more literal way. For

avatar control, embodied interaction builds on our proprioceptive

and kinaesthetic senses (BAS), and can aid our feeling of presence

in virtual environments (EAS). Embodiment has been picked up in

current trends in computer animation that criticizes the complex

and abstract nature of motion design tools based on the WIMP par-

adigm. Since the aspect of metaphors is central to the artists cog-

nitive understanding of his or her activity our design space links it

to the artist inFig. 1.

Fig. 1. The design space of animation interfaces characterizes the entities involved

in the interaction and their relations.


7/24/2019 AR paper 4

5/13

Directness characterizes the physical distance between user

and the target. This includes both the spatial and the temporal

offset from input to output[2]. In our understanding of directness

we consider the relation betweenuser (artist) and the physical rep-

resentation of the animation through the machine (as illustrated

on the triangular design space inFig. 1). Cognitive facets of direct-

ness have also been considered in other definitions [22,65], but

these can be covered in interaction metaphors.

Since computer animation interfaces deal with continuous or

time-based media with multiple spatial and one temporal dimen-

sion, interfaces need to support viewing and modeling not only of

static spaces but of their dynamics as well. As humans inhabit a

spacetime continuum, and all our actions always have a temporal

dimension, any kind of interaction between a human and a com-

puter to create, edit or view dynamic content relates the humans

spacetime to the mediums spacetime. User time is generally

referred to as real time, which is continuous, the data time as

virtual or stream time, which is discrete [42,12]. Depending on

animation method and technique, the real time of user input can

affect the virtual time or not. Or only either spatial or temporal

parameters of the animation are changed. This suggests that

there are different ways in which real spacetime can be mapped

to virtual spacetime. So far the literature lacks a structured

approach to characterizing the relations of user and artifact space

and time. We will therefore propose a taxonomy in the next sec-

tion, that sorts interaction techniques based on which components

of real and virtual spacetime are involved. This spacetime aspect

abstracts the relation of user and application from the device level,

which is why it is located on the artist-artifact edge of our design

space diagram (Fig. 1).

As a central element of our design space, Orchestration

describes in which order which parts of the users body perform

which sub-task through which input device. Since humans are

most adept at crafting with their hands, and for long time

humancomputer interfaces were optimized for manual control,

orchestration has been best studied for hand-based interaction.

Findings from behavioral psychology show that the dominantand non-dominant hands are optimized for distinct roles in most

tasks. For instance, in the task of writing the non-dominant hand

first establishes a reference frame relative to which the dominant

hand then operates. Using this knowledge in devising bimanual

interaction techniques can have benefits for efficiency [6], Hinckley

et al. [68], Balakrishnan and Kurtenbach [69]) and cognition, by

changing how users think about a task [35,41]. Many every-day

activities also show complex orchestrations of more than just the

hands, such as driving a car where feet control speed, hands the

steering, and fingers additional controls such as lights. Since

orchestration considers human, application and the mediating

device to an equal degree, it is situated at the center of the triangle

relation diagram representing the design space (Fig. 2).

3.2. Spacetime: a new design aspect

The concept of spacetime control mappings considers any nav-

igation, creation or editing operation on a continuous visual med-

ium as a mapping from real spacetime of the input device (the

control dimensions) to virtual spacetime of the presentation med-

ium (the presentation dimensions). The output mediums presen-

tation dimensions can be viewed and edited integrally or

separately regarding space and time. For instance, while frame-

based animation edits poses and the time instants at which they

occur separately, performance-based or procedural approaches

usually define motion in an integrated fashion. Both real space

and time can control either or both virtual space and time. A first

step in structuring these relations is to collapse the individualspatial dimensions to a single abstract space dimension, so that

we need only consider the two dimensions space, time on user

and medium side. The next step is to consider how these two

abstract input dimensions (control) affect the output dimensions

(presentation). The central idea underlying the construction of cat-

egories is that one or both control dimensions can affect one or

both presentation dimensions.

Four basic spacetime categories of mappings can be con-

structed from the possible combinations of the two sets (control

space, control time) and (presentation space, presentation time):

space? space

space? time

time? space

time? time

Often presentation space and time will be modified in an inte-

grated fashion, or spatial and temporal control will both figure into

the inputoutput relation. For this we introduce two control-inte-grated spacetime categories that cover inputoutput mappings in

which both control dimensions contribute to the relation

spacetime? space (i.e., space? space and time? space)

spacetime? time (i.e., space? time and time? time)

and two presentation-integrated spacetime categories in which

both presentation dimensions are affected by the interaction:

space? spacetime (space? space and space? time)

time? spacetime (time? space, time? time)

The final cases are the fully integrated spacetime categories

spacetime? spacetime (space? space and time? time)

spacetime? timespace (space? time and time? space)

which reflect that integrated control dimensions affecting presenta-

tion domains in an integrated way can be matched in two ways.

These ten spacetime categories cover all variants of mapping user

spacetime to medium spacetime. A simple means of visualizing

this is a 3 3 matrix, where the central cell is compartmented into

two, since relating both control and presentation space and time is

ambiguous (Fig. 2).

The first row of the matrix describes control mappings that only

look at the spatial component of the input and do not consider the

timing of the users input. The third row describes control

mappings where input has no spatial component, and the user only

administers state changes with temporal triggers via controls

such as buttons. The second row describes control mappings wherespatial input stands in a temporal context. There are borderline

Fig. 2. The taxonomy of spacetime mappings is structured based on how user

input in real spacetime controls medium output in virtual spacetime.Fig. 3gives

examples of these categories.

http://-/?-

7/24/2019 AR paper 4

6/13

cases between temporal and spatiotemporal control: If trigger

controls exert spatial changes (such as move a step in a certain

direction), we speak of spatial control.

While some mappings can be easily sorted into these categories,

for others it may appear less clear. In the following we consider

each category individually and show that it is possible to find

examples of actual interfaces for all of them (see alsoFig. 3).

Controls in thespace

?

spacecategory use the spatial compo-

nent of user actions to affect the spatial dimensions of the medium.

Most kinds of interactive editing techniques in computer-aided

design fall into this category. A straightforward one-to-one

mapping of viewer time to medium time (time? time) is video

playback. Examples ofspace? timemappings are timelines that

employ a linear spatial representation of time for navigating or

alteringtime-dependent media. Software packages for frame-based

animation make heavy use of linear time plots for temporal naviga-

tionand timing transformations. Lesscommon are examplesfor the

time? space category. Passive navigation techniques for virtual

environments make use of such mappings [4]. After choosing a

target or route either automatically or with the user in the loop,

the systemnavigates the user along the route or to the target, map-

ping user time to medium space. Editing operations are rare in this

category, since the single input DOF is insufficient for most editing

tasks.

In mapping input spacetime for manipulating space only, the

redundant DOFs can be used either for enhanced robustness or

for controlling further parameters. For editing a static image, the

temporal component of the user input can, for instance, be used

to control the stroke type of the virtual brush (spacetime?

space). Velocity-based spatial navigation techniques include input

space and time in the traversal of virtual space. The presentation

time can also be steered: interactive continuous adjustment of

playback speed (e.g. via a slider or wheel) changes video or anima-

tion playback during playback spatiotemporal input affects the

viewing of medium time (spacetime? time).

The category space? spacetime can be found in time plots

that are a common means of graphically representing a variable

changing over time. Animation packages usually feature a graph

editor that enables integrated shifting of key positions and the val-

ues they represent in time and one (spatial) dimension. Three-

dimensional representations of a video stream, video streamers,

even allow spacetime video editing [56]. The mapping category

time?

spacetime is realized in automated navigation through

a dynamic medium: scripted camera movement through animated

scenes navigates both the time and the space of the target medium.

It is often used for cut-scenes in video games, so-called cinematics,

when interactive control is taken from the player for a short time

in favor of progressing the narrative with pre-defined camera

movement. This is different from video playback, where the spatial

component of the medium (the video frame) is not navigated dur-

ing playback. While the result is essentially the same, this distinc-

tion is down to the fundamental difference in the medium data:

For video, the projection from 3D to 2D is already integrated into

the visual data (the video frames), while in 3D the projection is

determined at run-time.

The spacetime? spacetimemappings can be found in many

examples of user interfaces for virtual worlds. Spatial actions

browse or alter the mediums space, and user and medium time

are linearly related. Such mappings are common for interfaces that

require high user immersion. Most performance controls for inte-

grated motion creation also fall into this category, e.g. in interactive

video games or in performance animation. The remaining inverse

mapping of users spacetime to virtual timespace do not seem

to be used for practical implementations. They could, however, be

related to temporal triggers of a user (such as releasing some event)

that influences some graphical representation where theusers spa-

tial input controls temporal parameters of the event.

The spacetime view of operations on continuous visual media

give a new perspective on the types of such operations: whether

they are invasive (editing) or non-invasive (viewing) and whether

Spatial Manipulation

Manipulators/Gizmos/Handles

Posing a character

Motion Editing

Graph Editor

Adjusting ease-in/ease-out

Time Control

Timeline Bar

Browsing a video

Applications

Techniques

Scenario

Applications

Techniques

Scenario

Applications

Techniques

Scenario

Interactive Travel in Static Virtual Environments

Steering

Browsing a 3D information space

Time Space Time Space-Time

Playback

Triggers/Buttons

Watching a video

Performance Animation, Video Games

Computer Puppetry

Animating a character

Time Control

Jog Shuttle

Browsing a video

Passive Travel in Static Virtual Environments

Target-based Navigation/Fly-Throughs

Exploring architectural models

Passive Travel in Dynamic Virtual Environments

Target-based Navigation/Fly-Throughs

Watching a cut-scene in a 3D video game

Space Space Space Space-Time Space Time

Space-Time Space

Time Time

Space-Time Space-Time Space-Time Time

Fig. 3. Nine categories of spacetime mappings with example applications, techniques and scenarios of use. (Figure contains cropped stills of third party material licensed

under CC BY 3.0. Top left, top right and bottom left images attributed to the Durian Blender Open Movie Project; bottom left image attributed to Frontop Technology Co., Ltd;bottom center image attributed to Apricot Blender Open Game Project).


7/24/2019 AR paper 4

7/13

they involve creating new designs from scratch or refining existing

designs. Firstly, collapsing all spatial parameters into one abstract

space dimension hides the fact that, as a rule, both control and

medium space involve multiple spatial parameters, while time

only constitutes a single quantity on each side. This has an impact

on the distribution of invasive versus non-invasive operations in

the matrix: techniques employing time as input (third row) are

mainly used for passive navigation, rather than for spatial manip-

ulation. This is because space offers more input dimensions and we

can navigate space easier than time. This asymmetry has shaped

how we mentally model the abstract dimension of time: we rather

think of time in terms of space than vice versa[10]. Secondly, the

columns sort mappings intorefinementthrough spatial editing and

temporal editing (left and right column), and creation through inte-

grated influence on medium spacetime (center column). Thirdly,

in many cases the distinction between non-invasive and invasive

operations is a theoretical one. A fly-through of a 3D scene can

either be seen as a navigation that does not change the dataset

or as a camera animation that does. The criteria for distinction

should come from the application: is the camera animation being

created a part of the medium or is it an ephemeral product of

the viewing operation? This distinction has an effect on categoriza-

tion, too.

3.3. Limitations

The aspects characterizing the design space of animation inter-

faces constitute a high-level framework. As such they provide a

structure and cues for design reasoning and analysis, rather than

concrete guidelines. In the following we will illustrate its utility

by showing how we used the design space in developing novel ani-

mation techniques. More case studies and examples are required to

illustrate its application in the multitude of animation-related

issues.

The design space does not offer a set of orthogonal dimensions,

rather its aspects are interrelated. For example, the nature of thetask is linked to the type of spacetime mapping: automation

cantake control away from the user up to the point that spatiotem-

poral input (e.g. continuous control of a puppets legs) can be

reduced to temporal input (e.g. triggering puppet walk cycles with

a button). Another example of such dependencies is that the choice

of metaphor determines the magnitude of directness: from indirect

manipulation over direct manipulation to embodiment. The inter-

relation between the seven design aspects may be not surprising,

as each can be seen as a perspective on the same issuedesigning

user interfaces for controlling spatiotemporal phenomena.

The design space presented in this section is a conceptual

framework for analyzing and designing animation interfaces. It

uses established design aspects identified in the HCI literature.

For describing relations of input and animation spacetime, which

are central to this class of interface, we could not rely on any prior

work. For this aspect we developed a taxonomy for sorting map-

pings into categories based on how they relate input and output

spacetime. Next we will show how we have used these design

aids in practice, both evaluating them as design tools and using

them to propose novel animation interfaces.

4. A multi-touch animation system

In order to illustrate the utility of the design space as an aid for

designing animation interfaces, we explain howit was employed in

the development of a novel animation system that we have pre-

sented in prior work (Walther-Franks et al. [70]). We go beyond

the original work by explicating the design approach underlyingit. The design space-driven approach was chosen in lieu of the first

iterations of a human-centered design process. In our experience

with proposing novel interaction paradigms these stages of an iter-

ative design approach have the issue that users are unfamiliar with

the possibilities of novel technologies and are strongly biased by

existing solutions. The design space can help to guide the first

phase of design until users can be provided with artifacts to

experience.

Even though free-space 3D input devices have recently become

highly popular in particular in combination with game consoles,

they still lack the possibility for accurate and precise control

needed for serious animation editing. Systems like the Kinect are

good for high-level avatar control, with predefined animations.

For more accurate editing, these systems are not yet feasible.

Direct-touch interactive surfaces provide better precision for ani-

mation tasks, and have the best makings for high directness and

correspondence of interaction. The potential of interactive surfaces

has been explored for various applications but only a few consider

animation [45,39]. Most surface-based 3D manipulation tech-

niques are not developed and evaluated for motion capture. Fur-

thermore, most projects only look at individual techniques and

lack a system perspective. However, this is necessary to shed light

on real-world problems such as integrating tools into whole work-

flows or dealing with the realities of software engineering.

4.1. Design approach

Going through the design aspects of our framework, we con-

sider options and make decisions, building up a design approach

to follow for the implementation.

4.1.1. Task

As a typical animation task we decided for performance anima-

tion of 3D rigid body models. Working with three-dimensional

content poses the challenge of a discrepancy between input space

(2D) and output space (3D). In recent years researchers have

started investigating 3D manipulation on interactive surfaces, from

shallow depth manipulation[27]to full 6DOF control[28,52]. Theproblemfor surface-based motion capture is to design spatial map-

pings that allow expressive, direct performance control by taking

into account the unique characteristics of multi-touch displays.

Many performance control interfaces are designed to optimally

suit a specific task, such as walk animation or head motion. This

means that for each type of task the performer must learn a new

control mapping. This is somewhat supported by specialized

devices that afford a certain type of control. For 2DOF input devices

like the mouse this is transferred to digital affordances like handles

of a rig. These map more complex changes in character parameters

to the translation of manipulators. The specialization is designed

into the rig, equalizing control operations to general translation

tasks. Since interactive surfaces have a 2DOF integrated input

structure, we copy this approach for our system.An important secondary task is defining the view on the scene.

Since direct-touch performance controls are defined by the current

projection, this puts a high demand on view controls regarding

flexibility, efficiency and precision. With few exemptions [16,23],

research on surface-based 3D interaction has not dealt much with

view control. Yet 3D navigation is essential for editing complex

scenes in order to acquire multiple perspectives on the target or

zoom in on details. Some surface-based virtual reality setups use

implicit scene navigation by tracking user head position and orien-

tation. However, this limits the range of control. For unconstrained

access to all camera degrees of freedom a manual approach offers

the highest degree of control. A common solution is to introduce

different modes for object transformation and view transformation

(camera panning, zooming, rotation/orbiting). This is prevalent indesktop 3D interaction, where virtual buttons, mouse buttons or

http://-/?-http://-/?-

7/24/2019 AR paper 4

8/13

modifier keys change between object and view transformations.

While zooming and panning cover the cameras three translational

DOF, the third rotational DOF, camera roll, is less essential since the

camera up vector usually stays orthogonal to a scene ground plane.

While in desktop environments this DOF separation is mainly

owed to low-DOF input devices it can also be employed on devices

that allow more integrated transformation techniques, in order to

allow more precise control [46]. We opt for separated control of

camera parameters to enable precise view adjustments.

4.1.2. Integration

Multi-touch interactive surfaces provide two control DOF per

contact. The combination of multiple points can be used to create

integrated controls for 2D and 3D rotation and translation. Yet

Martinet et al. [43] point out that multi-touch-based surface

interaction cannot truly support integrated 6DOF control. They

propose the depth-separated screen-space (DS3) technique which

allows translation separate from orientation. Like the Sticky Tools

technique of Hancock et al.[28], the number of fingers and where

they touch the target (direct) or not (indirect) determines the

control mode. Full 3D control can also be achieved by additive

motion layering: changing the control-display mapping (e.g. by

navigating the view) between takes allows control of further

target DOF.

Other important factors for efficiency are easy switching

between capture and view operations and dedicating hands to

tasks. This requires that a single hand be able to activate different

input modes with as little effort as possible. Widgets as an obvious

solution produce clutter and interfere with performance controls

that already require visual handles. Modal distinction by on- or

off-target hit testing can be problematic if the target has unusual

shape or dimensions. In order to separate between capture and

view control, we employ multi-finger chording in which the num-

ber of fingers switch between modes.

4.1.3. Correspondence

Interactive surfaces promote motor and perceptual correspon-dence between input and output. However, this correspondence

is difficult to maintain when planar input space and higher-dimen-

sional parameter space have to be matched. For a start, users only

interact with two-dimensional projections of three-dimensional

data. For instance, to translate a handle in the screen z-dimension,

one cannot perform the equivalent motion with standard sensing

hardware. The problem with the third dimension on interactive

surfaces is that barring above-the-surface input, manipulations in

the screen z dimension cannot maintain this correspondence, since

input motions can only occur in a plane. Following the integrality

of touch input, this means that the 2 input DOF need to be mapped

to 2 translation parameters of the target (e.g. the handle of a char-

acter rig) so that they follow the same trajectory.

4.1.4. Metaphor

The congruent input and output space of direct input devices

promotes a manipulation style of interaction. Most manipulation

techniques for interactive surfaces are kinematic mappings, where

individual surface contacts exert a pseudo friction force by sticking

to objects or pinning them down. As an alternative to kinematic

control, Cao et al.[7]and Wilson et al. [63] propose surface-based

manipulation through virtual forces. This offers a more compre-

hensive and realistic simulation of physical forces and is also used

in desktop-based and immersive virtual environments. Different

metaphors in the same system can enhance the distinction

between controls that otherwise have much in common. For

instance, in the example of desktop 3D interaction, editing usually

employs the direct or instrumented interaction metaphors, whileview controls bear more resemblance to steering. This could also

support the mental distinction between phenomenologically simi-

lar spatial editing and navigation operations on interactive

surfaces.

Manipulation is the most general metaphor for puppet control.

Through manipulation the puppeteer can flexibly create and

release mappings with a drag-and-drop style of interaction, direct-

ness minimizes mediation between user and target domain. For

complex transformations, as is often necessary in character anima-

tion, rigs should be designed so that handles promote as direct a

manipulation as possiblemeaning that handles should be co-

located with the features they influence and the handle-feature

mapping designed to support maximal correspondence. Regarding

kinematic versus physics-based manipulation mappings, realism

and emergent control styles stand against precision, predictability

and reliability. In animation, full control has a higher priority than

realism, which is why we opt for purely kinematic controls.

4.1.5. Directness

Interactive surfaces can reduce the distance between the user

and the target to a minimum. However, touch input also has poten-

tial disadvantages such as imprecision (when mapping the finger

contact area to a single point) and occlusion of on-screen content

through the users fingers, hands and arms [62]. Re-introducing

indirection can alleviate the occlusion problem. Since absolute

input techniques require to reach every part of the screen which

may become difficult when the display exceeds a certain size, lim-

iting the area of interaction to a part of the screen or indirection

mechanisms can help [18]. The spatial distance between input

and target can also be used as a parameter for interaction design.

For instance, fingers or pens touching the target can control differ-

ent DOF than off-target contacts (mode change). Layered motion

recording can involve manipulating moving targets after the initial

capture pass. Relative mapping applies transformation relative to

the initial input state. This allows arbitrary input location, and

clutching can increase the comfort of use. Both absolute and rela-

tive input can be applied locally and globally, which makes a sig-

nificant difference when controlling behavior of a feature thatinherits motion from its parents. Local mapping allows the user

to ignore motion of parent features and concentrate on local trans-

formations. By default, performance control of a feature overwrites

any previous recordings made for it. In this way, performers can

practice and test a motion until they get it right. They might how-

ever want to keep aspects of an original recording and change oth-

ers. Blending a performance with a previous recording expands the

possibilities for control. It allows performance-based editing of

existing animations.

4.1.6. Orchestration

Studies by Forlines et al. [19]and Kin et al. [38]demonstrated

that the benefits of two-handed (symmetric) input also transfer

to interactive surfaces for basic selection and dragging tasks. Thedifficulty is to get users to use both hands, since single-handed

controls in typical UIs can prime them. To maximize the options,

our system should allow one-handed as well as symmetrical and

asymmetrical bimanual input. The 2D capture approach implicates

that no single spatial manipulation requires more than a single

hand. Consequentially, two single-handed operations can easily

be combined to enable parallel operation, for instance one hand

per character limb, allowing emergent asymmetric and symmetric

control (cf.[11]).

If individual sets of camera parameters are controlled with a

single hand, this allows emergent styles of interaction. Combining

two different camera operations, one with each hand, allows

asymmetric view control. For instance, left hand panning and right

hand zooming can be combined to simultaneous 3DOF view con-trol. A combination of left-handed view control with right-handed


7/24/2019 AR paper 4

9/13

performance control even enables interaction styles that follow

principles of asymmetric bimanual behavior [25]: the left hand

can operate the view, which will be at a lower spatial and temporal

frequency and with precedence to the right hand, which acts in the

reference frame provided by the left. This approach can be used to

simplify view attaching for editing in dynamic reference frames:

attaching the camera to the current reference frame for all camera

operations provides the benefits of kinaesthetic reference frames

and solves the issue of direct control with dynamic targets.

4.1.7. Spacetime

Direct-touchspatial editingis almostexclusivelyevaluated in the

scope of basic object editingin static environments(space? space).

Non-spatial trigger input by tapping the screen (time? time) is

commonly employed for discrete navigation of image sequences

or videos, e.g. TV sports presenters reviewing video recordings of a

game. With the exception of Moscovich et al. [45] and Kipp and

Nguyen [39], the potential of direct touch for motion capture

(spacetime? spacetime) has received little attention in prior

research. Surface-specific techniques thus seem mainly aligned

along symmetric spacetime categories. The absence of passive,

time-based mappings or graphical depictions of time might be just

because the coupling of input and output so strongly affords direct,

continuous manipulation as opposed to tool use or automation.

While it is still pure conjecture, it is possible that direct-touch

promotes symmetric spacetime mappings which couple user and

medium space and time more literally, while indirect input might

be better suited for more mediated spacetime controls.

4.2. Prototype system

We implemented the design approach in a working prototype of

a multi-touch animation system (Walther-Franks et al. [70]). We

decided to build upon the existing 3D modelling and animation

software Blender. The animation system is built around a core of

performance controls. View controls and a time control interface

complete the basic functionality. Each control can be operated witha single hand. This allows the user to freely combine two opera-

tions, e.g. capturing the motion of two features at once or wielding

the view and the puppet at the same time. Since Blender neither

supports multi-touch input nor concurrent operations, changes

were necessary to its user interface module, especially the event

system. We established a TUIO-based multi-touch interface. TUIO

is an open, platform independent framework that defines a com-

mon protocol and API for tangible interfaces and multi-touch sur-

faces [36]. It is based on the Open Sound Control (OSC) protocol, an

emerging standard for interactive environments. We implemented

chording techniques for mouse emulation by mapping multiple

finger cursors to single 2-DOF input events. This suffices for sin-

gle-hand input. For bimanual interaction the contacts are clustered

using a spatial as well as a temporal threshold. Fingers are onlyadded to the gesture if they are within a certain distance of the

centroid of the gestures cursor cluster, otherwise they create a

new multi-finger gesture. After initial registration the gesture can

be relaxed, i.e. the finger constellation required for detection need

not be maintained during the rest of the continuous gesture. This

means that adding or removing a finger to the cluster will not

change the gesture, making continuous gestures resistant to track-

ing interruptions or touch pressure relaxation. This multi-touch

integration already enables the use of tools via multi-touch ges-

tures with one hand at a time. For two-handed control it was nec-

essary to extend the single pointer UI paradigm implemented in

Blender such that two input sources (two mice or two hands)

can operate independently and in parallel.

Performance controls use selection and translation operators(Fig. 4). The translation operator works along the two axes defined

by the view plane. Single finger input maps to selection (tap) and

translation (drag). In linked feature hierarchies such as skeleton

rigs, the translation is applied to the distal bone end, rotating the

bone around screen z axis. Dragging directly on a target enables

selection and translation in a single fluid motion. Alternatively,

the drag gesture can be performed anywhere on screen, also allow-

ing indirect control of a prior selected target. Indirect dragging thus

requires prior selection to determine the input target. Selection is

the only context-dependent operator, as it determines the target

by ray casting from the tapped screen coordinates.

Layered animation is supported via absolute and additive map-

pings. Absolute mode is the standard mapping, additive mode

must be activated via the GUI. The standard absolute mapping

overwrites any previous transformation at the current time. In

the absence of parent motion this ensures 1:1 correspondence

between input and output. With parent motion, control becomes

relative to the parent frame of reference (local). Additive layering

preserves existing motion and adds the current relative transfor-

mation to it. By changing the view between takes so that the

inputoutput mapping affects degrees of freedom that could not

be affected in previous takes (e.g. by orbiting the view 90 degrees

around screen y), this enables the animator to add depth and thus

create more three-dimensional motion.

The three camera operators pan, orbit and zoom map to

two-, three-, and four-finger gestures (Fig. 5). Assigning chorded

multi-finger gestures to view operators does not have any prece-

dent in the real world or prior work, and there are good arguments

for different choices. A sensible measure is the frequency of use of a

certain view control, and thus one could argue that the more com-

monly used functions should be mapped to the gestures with less

footprint, i.e. fewer fingers. Camera dolly move or zoom is probably

the least used view control, which is why we decided to map it to

the four finger gesture: users can zoom in and out by moving four

fingers up or down screen y. Three fingers allow camera orbit by

the turntable metaphor: movement along the screen x axis controls

turntable azimuth, while motion along screen y controls camera

altitude. Two fingers pan the view along view plane x and y axes.Like transformation controls, camera controls are context-free,

meaning they can be activated anywhere on camera view.

A view attachment mode, when active, fixes the view camera to

the currently selected feature during all camera operations, mov-

ing the camera along with dynamic targets (Fig. 6). The camera-

feature offset is maintained and can be continuously altered

depending on camera operator as described above. After establish-

ing the attachment by starting a view control gesture, new targets

can be selected and manipulated. Releasing the camera control

immediately ends the attachment, rendering the camera static.

By combining one-handed view control and capture in an asym-

metric manner, this approach can solve indirection in control of

dynamic targets.

The time control interface features several buttons and a time-line. Simple play/pause toggle buttons start and stop the playback

within a specified time range. A timeline gives the animator visual

feedback on the remaining loop length in multi-track capture, sup-

porting anticipation. It also enables efficient temporal navigation:

with a one-finger tap the animator can set the playhead to a spe-

cific frame. A continuous horizontal gesture allows for interactive

playback, allowing direct control of playback speed.

5. Evaluation

The design framework was a powerful aid for structuring design

options for the novel multi-touch animation system presented

above. We have also used it in the design of a performance-basedanimation timing technique (Walther-Franks et al. [71]) and are

http://-/?-http://-/?-

7/24/2019 AR paper 4

10/13

Fig. 4. Direct and indirect performance control.

Fig. 5. Basic view transformations with continuous multi-finger gestures.


7/24/2019 AR paper 4

11/13

employing it in ongoing projects. A design framework as presented

in this paper cannot be directly evaluated. Its usefulness and

appropriateness is rather proven indirectly through evaluations

of prototypical systems built on its theoretical foundation. For this

reason we will next summarize the evaluation of the multi-touch

animation system.

We evaluated the resulting system in an informal user study.

Aspects of interest were the reception and use of single- and

multi-track capture and camera controls, specifically in how far

two-handed interaction strategies would be employed. Since the

direct animation system has a high novelty and is still at prototype

stage, a formative evaluation was chosen in order to guide further

research. Formative evaluations are common in research and

development of 3D user interfaces [4]. Six right-handed individuals

aged between 23 and 31 years, four male, two female, took part in

our study. All came from a computer science and/or media produc-

tion background. Two of these judged their skill level as frequent

users of animation software, one as an occasional user and three

as rarely using such software. In session of about 30 min, the users

did free animations of a stylized human puppet. An articulated

mannequin was rigged with seven handles that provided puppetry

controls (three bones for control of the body and four inverse kine-

matic handlers for hand and foot end effectors). The inverse kine-

matics handlers allowed expressive control of the multi-joint limbswhile keeping complexity at a minimum. The goal was to explore

what own animation goals users would come up with given the

digital puppet. The study ran the prototype on a rear-projected

horizontal interactive tabletop employing the diffuse illumination

technique with a height of 90 cm, screen diagonal of 52 inch and

a resolution of 1280 800 pixels.

The results of the study revealed that participants took to the

controls easily. Most stated that they enjoyed using our system.

The performance control interface was straightforward for initial

animations. Multi-track animation was mainly used to animate

separate features in multiple passes, less to adjust existing anima-

tion. The more complex additive mapping was hardly used and

met with initial confusion, although explanation and experiment-

ing usually solved this. The view controls were quickly understoodand were used without difficulty. The most commonly used cam-

era operation was orbit. As all participants were familiar with

the timeline metaphor they had no problems understanding it.

Most subjects easily employed the absolute positioning of the

playhead to jump to a frame and to scrub along the timeline to

review the animation they had created. One participant used the

timeline for a method of animation somewhere between perfor-

mance and frame-based animation: using the left hand for play-

head and the right for pose control, he exerted a fast, efficient

pose-to-pose animation style. Five out of six participants mani-

fested asymmetric bimanual styles of interaction. An emergent

strategy of half of our studys participants was to dedicate the left

hand for view or time controls and the right for capture. Further,

one participant controlled two puppet features simultaneously.Three used their left hand to attach the view to the mannequin

for animating its limbs once they had created animation for the

root bone. The benefit of locking the view to a frame of reference

in this way seemed immediately apparent to them, and was

greeted with enthusiasm in two cases.

Given the short timeframe and lack of experience in perfor-

mance animation, participants were able to create surprisingly

refined character motion. Four were able to create expressive char-

acter animations within the short timeframe of 10 min in the free

animation task. These were a walk, jump and squat motions and

dance moves.

Inexperienced users had a harder time to comprehend spatial

relationships, while those with more experience in 3D animation

notably picked up controls more fluently. This comes as no sur-

prise, as using and controlling software takes time and practice,

regardless of interface. For novice and casual users, our 2DOF strat-

egy seems appropriate, since it constrains manipulation by the

depth dimension. However, the interface might need improvement

visualizing these constraints and giving more hints on depth cues.

6. Conclusion and discussion

Current animation system are too complex and inefficient for

the high demand in animated content today. In order to make themmore efficient and accessible to a broad range of users we have to

look at such tools from an HCI perspective. Our work has taken

steps in this direction. A review summarized related work in com-

puter animation interfaces regarding issues of control and use. A

design space characterized important aspects of animation inter-

faces on varying levels of abstraction. A taxonomy for spacetime

interactions with spatiotemporal media described how user and

medium space and dynamics relate in animation interfaces. The

use of this conceptual framework was demonstrated in the design

of a multi-touch animation system. For this proof-of-concept proto-

type we used interactive surfaces as high-bandwidth direct input

devices. It features robust, easy to understand, and conflict-free

unimanual mappings for performance and view control that can

be combined for efficient bimanual interaction. A user study veri-fied the design approach by showing largely positiveuser reactions.

The majority of users employed both hands in emergent asymmet-

ric and symmetric bimanual interaction.

Animations are created by people for people in order to inform,

educate or entertain. Striving for higher usability by applying

knowledge on physiological and psychological human factors is

the foundation of humancomputer interaction, and one of the

main points of our work. However, animation is primarily still an

art and a craft. Just as good animations have always been created

by artists with capability and skill, next generation animation

interfaces will still require talent and training on behalf of the user.

But in contrast to current mainstream tools they can help to ease

the effort in training and allow animators to express their creativ-

ity more efficiently. While animation tools cannot enablecompletely uninitiated people to create stunning motion designs

Fig. 6. The view attaching technique. Features can inherit motion from parents animated in previous motion layers. In such cases direct control is not possible. By attaching

the view to the features frame of reference, direct control is reintroduced.

http://-/?-

7/24/2019 AR paper 4

12/13

without significantly constraining creativity, they can do a lot more

to make the learning curve less steep. We believe that next gener-

ation tools should incorporate everyone from beginners to experi-

enced professionals, by being easy to learn, but hard to master. In

this we hold it with voices in the community that, rather than

making systems easy to use, intend to accelerate the progress from

novices to experts[35], by letting users feel like naturals[62].

Acknowledgement

This work was funded in part by the Klaus Tschira Stiftung.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, in

the online version, at http://dx.doi.org/10.1016/j.entcom.2014.

08.007.

References

[1] Anand Agrawala, Ravin Balakrishnan, Keepin it real: pushing the desktop

metaphor with physics, piles and the pen, in: Proceedings of the SIGCHI

Conference on Human Factors in Computing Systems, CHI 06, ACM, NewYork,NY, USA, 2006, pp. 12831292.

[2] Michel Beaudouin-Lafon, Instrumental interaction: an interaction model for

designing post-WIMP user interfaces, in: Proceedings of the SIGCHI conference

on Human Factors in Computing Systems, CHI 00, ACM, New York, NY, USA,

2000, pp. 446453.

[3] B. Bodenheimer, C. Rose, S. Rosenthal, J. Pella, The process of motion capture:

dealing with the data, in: D. Thalmann, M. van de Panne (Eds.), Computer

Animation and Simulation 97. Eurographics/ACM SIGGRAPH, 1997.

[4] Doug.A. Bowman, Ernst. Kruijff, Joseph.J. LaViola, Ivan. Poupyrev, 3D User

Interfaces: Theory and Practice, Addison-Wesley, 2004.

[5] Brightside Games. Zeit2. Ubisoft, 2011.

[6] W. Buxton, B. Myers. A study in two-handed input, in: Proceedings of the

SIGCHI Conference on Human Factors in Computing Systems, CHI 86, ACM,

New York, NY, USA, 1986, pp. 321326.

[7] Xiang Cao, Andrew D. Wilson, Ravin Balakrishnan, Ken Hinckley, Scott E.

Hudson, ShapeTouch: leveraging contact shape on interactive surfaces, in:

2008 IEEE International Workshop on Horizontal Interactive Human Computer

Systems (TABLETOP). IEEE, October 2008, pp. 129136.

[8] S. K. Card, J. Mackinlay. The structure of the information visualization design

space, in: Information Visualization, 1997. Proceedings., IEEE Symposium on ,

volume 0, IEEE, Los Alamitos, CA, USA, October 1997, pp. 9299.

[9] Stuart.K. Card, Jock.D. Mackinlay, George.G. Robertson, A morphological

analysis of the design space of input devices, ACM Trans. Inf. Syst. 9 (2)

(April 1991) 99122.

[10] Daniel. Casasanto, Lera. Boroditsky, Time in the mind: using space to think

about time, Cognition 106 (2) (February 2008) 579593.

[11] Lawrence D. Cutler, Bernd Frhlich, Pat Hanrahan. Two-handed direct

manipulation on the responsive workbench, in: SI3D 97: Proceedings of the

1997 Symposium on Interactive 3D Graphics, ACM, New York, NY, USA, 1997,

pp. 107114.

[12] J.D.N. Dionisio, A.F. Cardenas, A unified data model for representing

multimedia, timeline, and simulation data, IEEE Trans. Knowledge Data Eng.

10 (5) (September 1998) 746767.

[13] Mira. Dontcheva, Gary. Yngve, Zoran. Popovic, Layered acting for character

animation, ACM Trans. Graph. 22 (3) (July 2003) 409416.

[14] Tanja Dring, Axel Sylvester, Albrecht Schmidt. A design space for ephemeral

user interfaces, in: Proceedings of the 7th International Conference onTangible, Embedded and Embodied Interaction, TEI 13, ACM, New York, NY,

USA, 2013, pp. 7582.

[15] Pierre Dragicevic, Gonzalo Ramos, Jacobo Bibliowitcz, Derek Nowrouzezahrai,

Ravin Balakrishnan, Karan Singh. Video browsing by direct manipulation, in:

Proceedings of the SIGCHI Conference on Human Factors in Computing

Systems, CHI 08, ACM, New York, NY, USA, 2008, pp. 237246.

[16] J. Edelmann, A. Schilling, S. Fleck. The DabR a multitouch system for intuitive

3D scene navigation, in: 3DTV Conference. The True Vision Capture,

Transmission and Display of 3D Video, 2009. IEEE, May 2009, pp. 14.

[17] James.D. Foley, Andries. van Dam, Steven.K. Feiner, John F. Hughes, Computer

Graphics Principles and Practice, Addison-Wesley, 1996.

[18] Clifton Forlines, Daniel Vogel, Ravin Balakrishnan. Hybrid Pointing: fluid

switching between absolute and relative pointing with a direct input device,

in: UIST 06: Proceedings of the 19th Annual ACM Symposium on User

Interface Software and Technology, ACM, New York, NY, USA, 2006, pp. 211

220.

[19] Clifton Forlines, Daniel Wigdor, Chia Shen, Ravin Balakrishnan. Direct-touch

vs. mouse input for tabletop displays, in: Proceedings of the SIGCHI Conference

on Human Factors in Computing Systems, CHI 07, ACM, New York, NY, USA,2007, pp. 647656.

[20] B. Frohlich, H. Tramberend, A. Beers, M. Agrawala, D. Baraff. Physically-based

manipulation on the responsive workbench, in: IEEE Virtual Reality 2000,

volume 0, IEEE Comput. Soc., Los Alamitos, CA, USA, 2000, pp. 511.

[21] David M. Frohlich, The design space of interfaces, in: Lars. Kjelldahl (Ed.),

Multimedia, Eurographic Seminars, Springer, Berlin Heidelberg, 1992, pp. 53

69.

[22] DavidM Frohlich, Direct manipulation and other lessons, in: Martin G. He-

lander, Thomas K. Landauer, Prasad V. Prabhu (Eds.), Handbook of Human

Computer Interaction, Elsevier, North-Holland, 1997, pp. 463488.

[23] Chi W. Fu, Wooi B. Goh, Junxiang A. Ng. Multi-touch techniques for exploring

large-scale 3D astrophysical simulations, in: Proceedings of the 28thinternational conference on Human factors in computing systems, CHI 10,

ACM, New York, NY, USA, 2010, pp. 22132222.

[24] Michael. Gleicher, Animation from observation: motion capture and motion

editing, SIGGRAPH Comput. Graph. 33 (4) (November 1999) 5154 .

[25] Y. Guiard, Asymmetric division of labor in human skilled bimanual action: the

kinematic chain as a model, J. Motor Behav. 19 (4) (December 1987) 486517.

[26] Marc S. Hancock, F. D. Vernier, Daniel Wigdor, Sheelagh Carpendale, and Chia

Shen. Rotation and translation mechanisms for tabletop interaction, in:

Horizontal Interactive HumanComputer Systems, 2006. TableTop 2006.

First IEEE International Workshop on, 8 pp+. IEEE, January 2006.

[27] Mark Hancock, Sheelagh Carpendale, Andy Cockburn. Shallow-depth 3d

interaction: design and evaluation of one-, two- and three-touch techniques,

in: Proceedings of the SIGCHI Conference on Human Factors in Computing

Systems, CHI 07, ACM, New York, NY, USA, 2007, pp. 11471156.

[28] Mark Hancock, Thomas T. Cate, Sheelagh Carpendale. Sticky tools: Full 6DOF

force-based interaction for multi-touch tables, in: Proceedings of Interactive

Tabletops and Surfaces 2009, 2009.

[29] Ken. Hinckley, Daniel. Wigdor, Input Technologies and Techniques, Taylor &

Francis, 2012. Chapter 9.

[30] Takeo. Igarashi, Tomer. Moscovich, John.F. Hughes, As-rigid-as-possible shape

manipulation, ACM Trans. Graph. 24 (3) (2005) 11341141.

[31] Satoru Ishigaki, Timothy White, Victor B. Zordan, C. Karen Liu, Performance-

based control interface for character animation, ACM Trans. Graph. 28 (3)

(2009) 18. July.

[32] Robert J.K. Jacob, Linda E. Sibert, Daniel C. McFarlane, M. Preston Mullen,

Integrality and separability of input devices, ACM Trans. Comput. Hum.

Interact. 1 (1) (1994) 326. March.

[33] Robert J. K. Jacob, Audrey Girouard, Leanne M. Hirshfield, Michael S. Horn, Orit

Shaer, Erin T. Solovey, Jamie Zigelbaum. Reality-based interaction: a

framework for post-WIMP interfaces, in: Proceedings of the Twenty-sixth

Annual SIGCHI Conference on Human Factors in Computing Systems, CHI 08,

ACM, New York, NY, USA, 2008, pp. 201210.

[34] John Jurgensen. From muppets to digital puppets, August 2008. URLhttp://

www.youtube.com/watch?v=GN8WbHomQJg.

[35] Paul Kabbash, William Buxton, Abigail Sellen. Two-handed input in a

compound task, in: Proceedings of the SIGCHI Conference on Human Factors

in Computing Systems, CHI 94, ACM, New York, NY, USA, 1994, pp. 417423.[36] Martin Kaltenbrunner, Till Bovermann, Ross Bencina, Enrico Costanza. TUIO

a protocol for table based tangible user interfaces, in: Proceedings of the 6th

International Workshop on Gesture in HumanComputer Interaction and

Simulation (GW 2005), Vannes, France, 2005.

[37] Matt Kelland, Dave Morris, Dave Lloyd Machinima, Making Animated Movies

in 3D Virtual Environments, Ilex, Lewes, 2005.

[38] Kenrick Kin, Maneesh Agrawala, Tony DeRose. Determining the benefits of

direct-touch, bimanual, and multifinger input on a multitouch workstation, in:

Proceedings of Graphics Interface 2009, GI 09, Canadian Information

Processing Society, Toronto, Ontario, Canada, Canada, 2009, pp. 119124.

[39] Michael Kipp, Quan Nguyen. Multitouch puppetry: creating coordinated 3D

motion for an articulatedarm, in: ACMInternational Conference on Interactive

Tabletops and Surfaces, ITS 10, ACM, New York, NY, USA, 2010, pp. 147156.

[40] Joseph Laszlo, Michiel van de Panne, Eugene Fiume. Interactive control for

physically-based animation, in: SIGGRAPH 00: Proceedings of the 27th Annual

Conference on Computer Graphics and Interactive Techniques, ACM Press/

Addison-Wesley Publishing Co., New York, NY, USA, 2000, pp. 201208.

[41] Andrea Leganchuk, Shumin Zhai, William Buxton, Manual and cognitive

benefits of two-handed input: an experimental study, ACM Trans. Comput.Hum. Interact. 5 (4) (1998) 326359. December.

[42] Thomas D.C. Little, in: Time-based Media Representation and Delivery, ACM

Press/Addison-Wesley Publishing Co., New York, NY, USA, 1994, pp. 175200.

[43] A. Martinet, G. Casiez, L. Grisoni, Integrality and separability of multitouch

interaction techniques in 3D manipulation tasks, IEEE Trans. Vis. Comput.

Graph. 18 (3) (March 2012) 369380.

[44] Alberto Menache. Understanding motion capture for computer animation.

2011.

[45] T. Moscovich, T. Igarashi, J. Rekimoto, K. Fukuchi, J. F. Hughes. A multi-finger

interface for performance animation of deformable drawings, in: UIST 2005

Symposium on User Interface Software and Technology, October 2005.

[46] Miguel A. Nacenta, Patrick Baudisch, Hrvoje Benko, Andrew D. Wilson.

Separability of spatial manipulations in multi-touch interfaces, in: GI 09:

Proceedings of Graphics Interface 2009, Canadian Information Processing

Society, Toronto, Ontario, Canada, Canada, 20

AR paper 4

Documents

Transcript of AR paper 4