Monday 1 August 2011

The Relationship Between Audio and Immersion in Video Games

One of the things I attempted to demonstrate in a recent work in progress presentation was how flawed approaches to speech in video games can disrupt player immersion. I used the following videos to provide examples of how poor recreation of the human voice can intrude upon the player's experience.



Examples of Speech & Voice Issues in Video Games from Liam Donnellan on Vimeo.


Examples of In-Game Speech Issues in Sports Video Games from Liam Donnellan on Vimeo.

What I failed to communicate during the presentation is why player immersion is so important in the first place. This post intends to briefly outline the importance of immersion in games and discuss the contribution of audio to the phenomenon.


What is Immersion?


Before discussing the importance of immersion in video games, it is crucial to define the term. Garneau (2001)  summarises: 'Immersion covers the pleasures of being in a different environment than usual... by physical means or by use of one's imagination.'. Concordantly, Huiberts (2010) notes: 'Most people who have ever played a computer game are likely to have experienced certain feelings of being absorbed by it... most commonly called "immersion"...'. For the purposes of this blog post then, immersion can be defined as the psychological state a player enters while playing a game, in which they imagine themselves to be part of a game's environment.


Why is Immersion Important?

Woyach (2008) notes that immersion '[lowers] a person's need to suspend their disbelief* by removing the text, the seat, or the keyboard, placing the person in the scene itself... ...until [the game environment] is as real to us as everyday life'. In role-playing games, where stories are often set in other-worldly environments, it's clear to see why a suspension of disbelief on the player's part could be useful. In the case of sports games I would argue that a suspension of disbelief is required for the player to attach any meaning to his/her actions within the game. This would be particularly true in PDC...'s 'Career Mode' in which the player can create a custom made professional darts player and compete with the real world professionals in various competitions. 

* A semi-conscious decision in which a player puts aside their disbelief and accept a premise as being real for the duration of a given story. (Definition from MediaCollege.com)


Of interest: Results of a user survey by Huiberts (2010) show that most players (at least 72% in this case) like to be immersed into games.


The Role of Audio

Taylor (2002) notes 'Diegetic immersion requires that the game have a consistent world, so that the player is not forced from immersion by inconsistencies of the game space'. It is therefore required of a games audio to help create and maintain a consistent reality if a player is to become effectively immersed. Before attempting to aid immersion with game audio however, it is fruitful to analyse scenarios where audio can disrupt the phenomenon. Huiberts (2010) identifies three dimensions of immersion which can be disrupted by audio.

Sensory Immersion: '...concerns engagement with the sensory rewarding aspects of a game... ...often stimulate the feeling of being [in the game world]' See discussion below.

Challenge-based Immersion: '...concerns the engagement with a competitive process, problem solving, interacting with the game and competing or cooperating with others.' This is often cause by music [ video example pending] (Huiberts, 2010). 

Imaginative Immersion: '...concerns the engagement with the "imaginary world and fantasy, game characters, worlds and story line"...' This is demonstrated above in God of War, Duke Nukem and Lord of the Rings Online, as these issues specifically concern characters within an imaginary world.

In the Rugby League Live and PDC... examples above, the most prevalent disruption is of sensory immersion, specifically when the commentary systems in each game output unnatural speech and inappropriately selected speech topics respectively. These inadequacies make the player aware of the system, and result in technological listening* from the player, rather than the desired semantic listening*.

* Semantic listening and technological listening (coined by Smalley and Chion respectively) are terms used in the field of electroacoustic composition. Semantic listening is used for the purpose of gaining information about what is communicated in the sound (Chion, 1994) whereas technological listening occurs when a listener perceives the technology behind a sound rather than the sound itself (Smalley, 1997). I believe these terms are applicable to my area of research, where disruption of sensory immersion causes a (brief) switch from semantic (listening to the commentary for information) to technological listening (hearing the system behind the commentary, rather than the commentary itself).



Summary

Thus, it would appear that a game world's consistency is of key importance to aiding immersion. That is not to say a game's reality needs to be consistent with our own reality (Taylor, 2002). Demonstrating this, Nintendo franchises such as Pokémon bear little resemblance to the world we live in, yet keep young players playing for hours on end (Thomsen, 2010). Game audio can aid immersion in maintaining a game world's internal consistency. In the cases of sports simulations where the aim is to recreate the scenario of a given sport, audio also needs to be consistent with the real world, as players that are fans of the sport are likely to notice any unnatural or unbelievable output from audio systems.


References

Chion, M. (1994). AudioVision: Sound on Screen. Columbia University Press.  

Garneau, P.A. (2001). Fourteen Forms of Fun. Gamasutra. [Online] Available from: http://www.gamasutra.com/features/20011012/garneau_01.htm [Last Accessed 1/8/2011]

Huiberts, S (2010). The Role of Audio for Immersion in Computer Games. PhD thesis, Ultrecht School of the Arts, University of Portsmouth. [Online] Available from: http://download.captivatingsound.com/Sander_Huiberts_CaptivatingSound.pdf [Last Accessed 1/8/2011]

Media College (2011). Suspension of Disbelief. MediaCollege.com. [Online] Available from: http://www.mediacollege.com/glossary/s/suspension-of-disbelief.html [Last Accessed 1/8/2011]

Smalley, D. (1997). Spectromorphology: Explaining sound-shapes, Cambridge: Cambridge University Press.

Taylor, L. (2002). Video Games: Perspective, Point-of-View, and Immersion. PhD thesis, University of Florida. [Online] Available from: http://www.laurientaylor.org/research/taylor_l.pdf [Last Accessed 1/8/2011]

Thomsen, M. (2010). Designing for Immersion: Recreating Physical Games. Gamasutra. [Online] Available from: http://www.gamasutra.com/view/feature/4236/designing_for_immersion_.php [Last Accessed 1/8/2011]

Woyach, S. (2008). Immersion Through Video Games. illumin. [Online] Available from: http://illumin.usc.edu/article.php?articleID=103&page=1 [Last Accessed 1/8/2011]



Bibliography

3D Realms, Gearbox Software. (2011). Duke Nukem Forever. 2K Games.

Big Ant Studios, (2010). Rugby League Live. Tru Blue Entertainment.

Ready at Dawn, Sony Computer Entertainment Santa Monica Studio (2010). God of War: Ghost of Sparta. Sony Computer Entertainment.

Redoubt, Rebellion Developments (2010) PDC World Championship Darts Pro Tour. O-Games

Turbine, Inc. (2007). Lord of the Rings Online: Shadows of Angmar. Midway Games

Thursday 14 July 2011

Introduction to Final Project - Contextually Driven Speech

For the final project of the Sound and Music for Interactive Games course I shall be developing a contextually driven darts commentary system for a darts game I have previously made in Max/MSP.


What is meant by contextually driven speech?

Contextually driven speech uses a list of contexts (assigned with boolean variables) to calculate the most effective lines of dialogue to output. For instance, if in the dying minutes of a football game a player who has previously missed 10 shots scores a goal, the commentator might choose to say 'and finally he gets the goal he's been chasing all game!'.

The list of contexts leading to the selection of this line might include:

Goal has been scored = TRUE
Player has scored previously = FALSE
Player has >=10 misses = TRUE
Match time >85 minutes = TRUE

These 4 contexts hold the information needed for the example line of commentary to be selected.

As demonstrated by Durity and Piltz at the 41st International AES Conference in 2011, commentary systems that are triggered on a play-by-play basis are diminishing in popularity in favour of contextually driven speech. In Durity's demonstration, it was shown how the commentary in EA Canada's FIFA10 makes decisions not only on in-game contexts but also events from the real world. For example, an impressive victory by Manchester United last weekend might be acknowledged by the commentary team during a game in 'career mode'.

Why a darts game?

As noted in a previous post, the audio systems in PDC World Championship Darts 2009 ultimately output unbelievable commentary, focusing on play-by-play (per dart) hits and misses from the player and rarely (if at all) commenting on the wider context of the developing match. In the more recent PDC World Championship Darts Pro Tour, while the issue has been partly tackled with both the introduction of a second commentator and recording of more 'colour' commentary (i.e. not solely play-by-play), analysis reveals that there is still a large difference between the phrase content output of the video game commentary and that of real life commentary.

As a darts fan and game audio enthusiast, I feel that with today's technology it would be perfectly possible to create a more authentic commentary system for darts using contextually driven speech. Further, a general conceptual design of a sports commentary system would benefit other games that suffer from similar issues. Ultimately, these two products are the main goals of my project.

References



Durity, G. & Piltz, D. (2011) Production and Implementation Methodologies of Contextually Driven Speech. Electronic Arts Canada. Proceedings of the AES 41st International Conference, London, UK, 2011 February 2-4


Rebellion Developments (developers) PDC World Championship Darts 2009, Oxygen (2009)

Rebellion Developments (developers) PDC World Championship Darts Pro Tour, O-Games (2010)

Friday 19 November 2010

Accompanying Videos for 'What's So Special About Interactive Audio?' Report

Monkey Island 2 with iMUSE - Seamless Transition Demonstration



PDC World Championship Darts 2009 - Poor Audio Ruins Game

Super Mario Galaxy - Interactive Music Example #1


Super Mario Galaxy - Interactive Music Example #2


Spore Cell Stage - Procedural Music Generation Example

Wednesday 20 October 2010

Interactive Music - Super Mario Galaxy

Super Mario Galaxy was released in late 2007 for Nintendo Wii and is the third 3D platformer in the Mario franchise. The official soundtrack for the game was released in early 2008 and was proclaimed to have the 'Best Design in Audio' in Edge Magazine's 'Edge Awards 2007'. While being great fun to play, the sound team have made a sound job (no pun intended) of fusing interactive music with full pre-recorded composition.

The linear (non-interactive) elements of the Super Mario Galaxy score are fully orchestrated. This was a well-thought out choice on the behalf of chief composer/arranger Mahito Yokota and 'audio adviser' Koji Kondo to emphasise the grandeur of the game's environment. Interestingly, Yokota and Kondo hold the view that high-quality, sweeping orchestral scores are not always appropriate to Nintendo titles.

"It almost seems like while you’re playing the game, the music is coming from a CD player, and not from the game console and it feels like you are obligated to play the game in time to the music. For that reason, Nintendo has only used a live orchestral soundtrack on a few occasions in the past." - Koji Kondo in an interview for wii.com's 'Iwata Asks: Super Mario Galaxy' segment.

An important aspect of the Super Mario Galaxy ethos is rhythm. In an interview with Music4Games (now archived at originalsoundversion.com), Yokota comments:

"...we were making music in order to make them match well with [the] game tempo of Mario Galaxy and [imagined] that people [would want to] explore the magnificent universe. I think you will notice when you play the game that tempo is very much constant, although rhythm of the music may be epic, because we prepared orchestrated tunes that will well suit tempo of the game play."

It's clear from associated interviews that the team at Nintendo EAD (Entertainment Analysis and Development) spend a lot of time thinking about how best their music can be implemented to compliment game-play and don't just throw in the first piece they come up with as a constant bed. Further proof of this ideal is sound designer/programmer Masafumi Kawamura's MIDI-based triggering system which, using MIDI data, only allows certain sound effects to trigger in time with the streaming orchestral score. To do this, the team had the live orchestra record all 28 pieces in time with a metronome.

Currently playing through the game myself, I've come across a few of the game's different interactive music systems so far. The musical note collection mini-game plays an arrangement of the classic 'Underworld' them from the original Super Mario Bros (NES). Each small segment of the music is played back at a varied rate depending on how quickly Mario approaches the note.

The ball-rolling mini-game is another element of game-play in which Mario balances on a large ball and works past obstacles to reach the end of the level. In this particular level the speed and pitch of the music is varied based on how fast Mario is moving. 

While simple and perhaps not highlights of the game, these solid examples of interactive music systems add to the overall game-play experience in more subtle ways and help make it incredibly enjoyable for the player.

Tuesday 12 October 2010

Non-Repetitive Design: Spore's Procedural Music System

Spore (released for PC and Mac in 2008) is an multi-genre, artificial life game by EA, in which the player creates an initially single-celled organism to be guided through 5 stages of evolution. While this genre may not attract all gamers, Spore's procedural music system is certainly something to be examined for anyone with a remote interest in game audio.


"...the music a player hears will develop and mutate along with their style of play." - Chris Steffen of Rolling Stone regarding Spore. [1]


This is the fundamental concept behind the game's musical system. Co-developed by Brian Eno, the system takes variables from user input and uses mathematical algorithms to create control data which subtly changes certain aspects of game-play, thus enriching the player's experience. The following clip is a demonstration of this principle in action. (2:55 onwards)

Creature Stage:


It can be seen in the clip that as the player changes the appearance of his creature in the editor the music responds appropriately. The initial, conspicuous sound effect is used as a reward for the player's action. Meanwhile, the musical structure (instrumentation and themes) is inconspicuously manipulated based on the new data. A similar example can be seen at the start of Stage 5 when building your first spaceship:

Space Stage


As previously mentioned, Spore spans 5 stages of evolution with each Stage incorporating entirely different game-play elements to the previous (hence the game's 'multi-genre' tag). The audio team (including Kent Jolly and Aaron McLeran) managed to effectively adapt different parameters in each Stage, giving each it's own identity and character. One major selling point for Spore is it's re-playability and the following clips demonstrate just how unique an experience each iteration of play can manufacture.

Tribal Stage: Primitive Creatures Hunting

Tribal Stage: Scurrying, Vicious Creatures Fighting

The clips show how the music reacts to intensity states (heightened by danger, proximity thereto, etc.) and the creature make-up to provide a reactive soundtrack to the game.


So exactly how is the music generated? I was interested to find the answer to this question. The following sped-up clip of a player creating a hi-tech car in the Civilisation Stage emphasises certain points of the playback process.


Civilisation Stage - Fast-Forwarded Car Creation (4:00 in)


As the car is built, you can hear the drum loop go around in the background. The delayed synths which constantly play are played back in random order with certain notes assigned to different player actions. The other synths have a low percentage chance of playing at a certain bar (adhering to the previously mentioned player action variables) which is why sometimes we hear them but mostly do not. This basis is the core function for all of the creation opportunities within Spore. A similar system is used to create the music in the Cell Stage, but the exact variables affecting the musical direction are unclear.

Cell Stage:



Procedural based music is regarded by many game audio enthusiasts to be the future of sound and music in interactive environments. A variety of academic papers written on the subject can be found here:- http://www.procedural-audio.com/papers.htm

General articles can be found at the following address*:- http://blog.lostchocolatelab.com/2010/09/procedural-sound-now-links.html





References

Wednesday 6 October 2010

A Brief Analysis of the use of Audio in PDC World Darts Championship 2009 for Nintendo Wii







Overview
PDC World Championship Darts 2009 (herein referred to as WCD 2009) is, as the name suggests, a darts game featuring the players and competitions of the Professional Darts Corporation. It was released on May 29th 2009 and is the second instalment of the series for Nintendo Wii. The game features commentary from Sky Sports commentator Sid Waddell and referee audio from Bruce Spendley and Russ "The Voice" Bray.

A Technical Glimpse
While boasting a more powerful CPU (729MHz) and GPU (243MHz) than the Playstation 2 (CPU:294 MHz, GPU:147MHz), the Wii's 'optical disc' medium is still the size of a DVD. This means that although the Wii will run more smoothly than the PS2 under demanding conditions, it has the same memory limitations of the 4.7GB (single layer) or 8.54 GB (dual layer) disc. With this in mind the difference should not lie in the amount of audio stored on the disc but in the complexity of the systems devised to playback the audio effectively.

The Breakdown
Once a game is loaded, there are four major components to WCD 2009's audio. The following is a short analysis of each.

1) Entrance Music

After a game is customized and loaded, the two players are introduced by an announcer as they make their entrance. Watching Player 1 walk toward the oche may be entertaining (particularly with the crowd chanting along to the PDC theme tune) but the second entrance reveals the first obvious flaw in the game's audio: every single player in the game has the same entrance music. While not exactly a technical fault, fans of the PDC will be somewhat taken out of the game by this as they are used to hearing each pro player's theme tune.

2) The Crowd

As expected, the crowd provides feedback based on how well the players are throwing. They cheer when a double is hit, or when a player achieves maximum score and will gasp if a player fails to hit a treble. More often than not the crowd bed is fairly constant. While there is crowd feedback, I would say there is not enough of it. When comparing a game of WCD 2009 to a real match one can see that there is a complete lack of atmosphere in the former.

The crowd is dropped out of the mix to accommodate a heartbeat sound effect when a player has hit two treble 20s with his first two darts or when going for a leg winning double. I feel this was a mistake on the developers part as the crowd feedback at these pivotal moments could really add to the game-play. Trying to imagine no crowd sound when Phil Taylor is attempting a nine-dart finish in the following video is difficult.




The audiences at PDC events are well-known for their interaction and outright madness and i think oxygen games really missed a trick in not taking the time to further develop this aspect of the game.

3) The Commentary

This is arguably the most important (or at least most prominent) aspect of the game-play. Sid Waddell calls the action dart-for-dart, quite literally at times. It's easy to hear that Mr. Waddell has put his own personal spin on the script given to him, commentating in his unique style. The in-game commentary is occasionally sparse but when it does become busier it still sounds as if Sid Waddell is reacting to each individual dart and not the match as a whole.

A few vital aspects of a good commentary system are missing. For example, there doesn't appear to be any acknowledgement of how close the darts are, a miss is a miss in the game's eyes no matter how close. There is no reaction or comment on any of the statistics of the game, which along with the repetitive nature of darts, fails to increase the player's interest. There are some points in the game where the commentary requires an interruption as the topic is made redundant by the game-play. For example, whilst I played against Phil Taylor, Sid Waddell commented 'The kind of checkout he may be expected to take, if he's to win this match'. By the time Sid had reached the word 'checkout' Phil Taylor had already missed his first treble, meaning he couldn't win on the current throw. The commentary system needs to acknowledge this and interrupt accordingly. Finally, there appears to be some trouble with the game's scorekeeping, as Sid Waddell often tells me I'm in a winning position when I am clearly not.

4) Number Calling

A self explanatory aspect but important nonetheless as without the gravelly tones of Russ Bray shouting 'One hundred and eighty!' a vital part of the game's environment would be missing. One thing to note about the referee's number calling is that it is the only sound effect other than ongoing commentary that is present during the heartbeat effect. This was most likely deemed effective by the development team and it can work quite well.

Why do these problems exist?
It is difficult to explain why there is no variation in entrance music; I can certainly recall old wrestling games on the Playstation (PSX, let alone PS2) having different entrance music for each wrestler. For a platform appropriate comparison, this clip from WWE Smackdown vs. Raw 2009 Wii edition shows just how much variety in entrances a Wii game released six months previous to WCD 2009 could achieve in a game that arguably has a lot more going on.



Perhaps Oxygen games could not acquire the licenses for the players’ theme songs and if that is the case then the game could've been improved by specifically composed pieces for the game. Whatever the case, I find it hard to believe there was not enough disk space, especially when compared with the vast amount of audio files on other Wii games (unless Oxygen had a particularly poor audio codec).

The 'crowd' effect is evidently quite difficult to achieve. I would tackle this problem by incorporating two or three cross-fading crowd beds of a large crowd (perhaps even a genuine PDC audience) and record the one off woops and jeers to layer at opportune moments. I feel that the WCD 2009 crowd is just not large enough and this takes away from the game-play. Great audience beds can be achieved on Wii as demonstrated by the WWE, NBA 2K10 and FIFA09 clips.

As you may have noticed while watching the real darts clip, while the in-game commentary is satisfactory as an audio system, it is not very true to real life. I feel a big difference made in commentary systems recently is the use of two commentators rather than one. For instance, in the following clip of 2K Sport’s NBA 2K10 (Wii edition, released November 2009), the two commentators can be heard to react to each other, as well as the game-play, giving the user a more authentic experience.




The commentary system is made using the concept of ‘Stitched Speech’. This is the idea that a commentated sentence can be made up of pre-recorded phrases played back in different orders, thus adding variation on a linear sentence. For instance, the name of a player can be recorded just once or twice, but used in a large amount of different situations, depending on how their game is going. This used to be very important, due to the low memory limitations of CDs but since games have begun to use DVDs and particularly Blu-Ray discs, it has now become common for games to have lengthy, non-stitched audio from commentators (as heard in the 2K10 clip).

This brings me to my next point: there is very little lengthy commentary present in WCD 2009. One would think that at the very least, Phil Taylor would have some commentary cues about his past historic darting victories but the most I’ve ever heard (having played the game regularly for 2 months) is a short quip about how many times he’s played Van Barneveld. The commentary needs more of these segments to flow. Listen to how Andy Gray makes Manchester United specific comments in this clip of FIFA09 (Wii Edition, Released 3rd October, 2008)




http://www.youtube.com/watch?v=T9cfgEjWS9Q&feature=related

An interesting part of Sid Waddell’s commentary in real life is the way his voice rises in pitch and volume as he gets excited. It would be an interesting idea in the coming instalments to experiment with how digital pitch and volume alteration could make his voice sound more excited during the bigger moments of the game.

I was quite interested to discover exactly how the number calling system would work. So, using Max/MSP I have created a system which will keep score of a darts match consisting of two players and announce the total amount a player hits on his/her throw, using the concept of stitched speech. It currently has no audio, but the fundamental theory is there and the array of buttons indicates that the audio would be played by the system once loaded.

 
Summary
Having gone into administration in October 2009, Oxygen games were probably struggling for resources when creating this game. They are still around today however under the new name of O-Games, and are due to release the latest PDC game for Wii 'PDC World Championship Darts:Pro Tour' in November of this year. It will be interesting to see just how much the game's audio engine is improved under their new guise.