At the 2016 September Seattle VR hackathon, 6 hackers combined forces to create a VR musical sandbox with an 8 speaker, cube shaped ambisonics sound system and a mocap suit. Utilizing 2 htc vives, we created a "multiplayer" VR experience that put one user in the middle of a musical sandbox and surround sound system. The main player was able to trigger "sound toys" and move them 360° and with elevation in space. The second player is in a full mocap suit and can virutally share space with player one.
We all had different roles and each of us had big contributions.
Andrew Luck - Audio Engineer, Max Patcher / Team Organizer / Concept
Evie Powell - Lead Software Engineer and UX Designer / Unity Developer
Lou Ward - Chief Visual Effects Designer/Mocap Master
Wil Bown - Unity OSC Implementation
Gus McManus - Sound Designer
Vida Powell - Playtester, Production Assistant
Sound and Music
I was responsible for the 3D diffusion and worked on the sound effects and music with Gus McManus. Unity 3D was chosen for the visual component and communicated via OSC with Cycling 74's Max. Max was used for the audio engine portion of the experience. Most of my programming required taking messages from Unity3D and to provide spatial information for sound diffusion.
Below shows a mixer labeled with respective emitters. The row of meters below show the diffusion of the signal to the loudspeakers. On the right is a top down graph of the emitter location in space. This assumes the listener is static, in the middle of the speaker cube. This is the front-end of the Max patch that communicates with Unity3D.
Communicating with Unity3D
The OSC protocol was utilized to send coordinates, triggers, and interaction values between Unity and MaxMSP via UDP. Wil Bown integrated the OSC into Unity3D with help of "OSCSharp".
Here’s an example OSC message array:
/bubble/pop/aed 1 2.293 0.235 3.532 1
/bubble/pop/ is the namespace of the OSC message, the type is spherical and indicated by the aed symbol. The next value, the 1, indicates which sound emitter this message is for. The three floats following are azimuth, elevation, and distance, respectively. These are recieved from Unity into max to encode the spatial information. The following value is the group. All of our sounds remained in group 1 for this project.
Diffusion of Sound
The ICST Ambisonics Tools package from the Zurich University of Arts was chosen for spacial encoding/decoding. Audio was mostly playback of recorded wav files or synths into the absorption, doppler, and ambisonics encoder. Next, the ambisonics are decoded and discretely distributed over the speaker configuration.
The loudspeaker system consisted of a variety of 5” studio monitors. We attenuated these with a microphone test to set the volumes equally and distributed the system into the most perfect cube shape we could achieve.
Gus McManus and I wrote the theme in our late night delirium at the hackathon.
Environment & 3d Assets
Written by Lou Ward
Lou Ward and Alexander Moed worked on creating 3d assets, the atmosphere, and making sure the mocap was up and running correctly. Lou was primarily involved in creating aesthetic, look, and feel of the environment. Alexander created 3D assets and helped with any mocap needs. After deciding to build a Daili & Kandisky styled atmosphere, Alex and Lou made a low poly environment to allow the human form to be focal point. Alex and Lou experimented with physics to create more organic shapes and animations. Real time physics could have been problematic and gpu intensive. Alexander experimented with blendshapes in maya to create more organic looking instruments/objects. We created an array of objects even for future request, i.e. avatars for samsung gear vr spectator mode.
The mocap system was the Perception Neuron Motion Capture Suit. This suit has IMU sensors and is very sensitive to magnetic fields (3ft plus). “The IMU is a single unit in the electronics module which collects angular velocity and linear acceleration data which is sent to the main processor”. The system has 32 sensors and differentiates itself by having very accurate finger tracking.
When a dancer would get into the suit we would calibrate using the necessary poses, putting the character's body in steady, a, t, and s shapes. We then verify Axis software is broadcasting tcp using bvh, which one of the standard outputs for mocap. Evie Powell and Wil Bown did the networking, using duplicating mesh to show in the new scene. For the design of the character, we originally had a mesh with a bunch of squares. Evie Powell created a shader that used the mesh and put squares on top of the vertices. This allowed the user to go through them without tons of clipping errors and it made the human form more abstract.
About the Shader
The shader was a basic geometry shader that procedurally encases each original vertex with a cube. Geometry shading happens between the vertex shader pass and the fragment shader pass, where additional triangles are added on the GPU. This allows for much faster render passes using much simpler geometry and smaller models file sizes. The shader opens up a lot of opportunity for incorporating musical feedback into our visual style. We can use the cube shader to tie geometry size, shape, and color to variables like volume, BPM, and FFT in a musical experience.
Written by Evie Powell
The user interactions with visual effects were designed by Evie Powell. The primary interaction for the “main player” is to pick up and move musical tracks or musical effects that take the form of several abstract objects in the scene. The music responds to the positional placement of these objects in real time with 1:1 spatial accuracy. Because the player is experiencing all of this using Ambisonics, spectators get to experience the real time mixing live as well.
The dancer can also influence sound and visuals by touching things in the level. In this build, the dancer user cannot directly place music. Together this makes for a novel collaborative musical / artistic experience that is unlike anything you’ve ever experienced before: neither as a VR user or a spectator.
Newton VR is the basis so that all interactions had VR friendly physics based interactions. Picking up objects, placing them, and throwing them feel very natural in pop rocks. The player had abstract gender neutral hands that were responsive and easily communicated their function without pesky tutorial sessions or UI text.
When a player squeezes a trigger the hands respond by making a small “halfway gripping” gesture” which communicates that picking up an item is possible. The hands will fully grip and latch on to an interactive track when a player presses trigger on an interactable object.
A player may also have a point interaction by naturally reaching out significantly from their body. If the players hand is significantly far from their body center while pressing trigger the hand switches to a pointing state, which allows different types of interactions with the environment, like shooting sound particles or interacting with an object that is not within ones reach. These controls were designed to feel natural and to inspire player experimentation.
Given more time, here's a short list of improvements and expansions:
1. Less "backtrack" and more improvisation.
2. The experience itself could have a "story"
3. The triggers and overall Max project can be cleaned up a bit
4. More synthesis and sound sculpting, visual feedback
5. More speakers, bigger space
6. Integrate elevation for ambisonics monitor
Without CNMAT MAX/MSP externals and ICST Ambisonics package, this would not have been accomplish-able.