Case Study: Sound For A Netflix Comedy Special
David A. Arnold’s ‘Fat Ballerina’
Overseeing the production sound & audio post for David A. Arnold’s Netflix Comedy Special, ‘Fat Ballerina’ posed a number of niche specific challenges that aren’t seen on most productions. If you haven’t seen ‘Fat Ballerina’, it was released on Netflix in spring of 2020.
Recording stand up comedy isn’t just about capturing the comedian’s performance, but the room full of people they are interacting with.
Check out this teaser for the special:
As a producer, you are likely less interested in the technical challenges (other than them being met), and more interested in what a project like this costs and how to appropriately budget for it. Feel free to skip to the cost breakdown.
Sound For Comedy: Technical Challenges & Solutions
The venue for a comedy special has a big impact on the final product not just visually, but for how the audio turns out as well. Venue size & shape, room acoustics, and the placement of microphones and the PA system in the venue all drastically impact how audio will sound in the final product. The venue for ‘Fat Ballerina’ was the Cleveland Improv, which has a somewhat unique layout. It isn’t a very deep venue but instead very wide with the stage surrounded by audience seating on three of the four sides of the stage.
Fortunately, the venue was very acoustically dry which is a great place to start, however the venue PA system used multiple speakers to create a very wide dispersion pattern. The venue space, the audience, the comedian’s dialog, and the venue’s PA are all incredibly intertwined. The PA system excites the room the audience is in, which complicates capturing clean audience reactions while also getting clean dialog from the comedian without too much room sound.
Recording The Comedian
Since recording the comedian isn’t just about capturing their performance, but isolating them as much as possible from the environment they are in (that includes the audience and a PA system), opting for a wireless handheld dynamic mic is a great option. Dynamic mics by nature have much lower sensitivity (and heavier) capsules, so unless your sound source is especially loud or quite close to the capsule, they do an excellent job at rejecting the world around them. Their ability to reject doesn’t only increase the signal to noise ratio in the comedian’s dialog, but also helps prevent accidental feedback from the PA system it’s also being pushed through as well. While dynamic mics generally aren’t quite as transparent or natural sounding, and tend to experience more pronounced artifacts from proximity effect, ultimately their advantages outweigh their cons. These shortcomings can be improved relatively easily in post-production with some light corrective EQ & multiband dynamics processors.
Recording The Audience
Audience reactions are an integral part of the stand up comedy special experience, but also the most challenging part of the special to record. The audience is a large area of people that need coverage, in addition to that the PA system is filling the venue making it harder to record clean audience sounds that doesn’t destroy the quality of the comedian’s dialog when mixed together.
Since the audience was spread out in a wide area I opted for a ‘zoned’ approach with multiple stereo pairs capturing different parts of the audience. The first stereo pair was placed close to the PA speakers in the null of a bidirectional mic to give as much rejection of that adjacent undesired sound source as possible. While positioning and polar pattern choice can dramatically reduce the amount of bleed, because of their proximity to the speakers the bleed that does exist is very dry which means more of that source can be used in the final mix. The secondary stereo pair was a bit further into the audience in front of the stage, with a hyper cardioid pickup pattern the speakers were directly in the mics null point. Additionally after speaking with the house sound guy we opted to lower the PA volume, and shot off the PA speakers that were filling the balcony that was where video village and staging was.
Sync for 5 Cameras
Contrary to popular belief, sync and timecode are not the same thing. This misconception can lead to sync drift issues in productions that require long continuous takes, such as comedy specials that are recording a 1.5-2 hour performance. Here is an article if you’d like to learn more about the differences between sync & timecode. Long story short, timecode is metadata but once you hit record, cameras rely on their own clock for controlling the frame rate of the sensor capture. Minute differences in the timing of the timecode & camera’s clock can add up over the course of takes. The solution here is to use a masterclock that provides a central source for Timecode & Genlock which replaces the camera’s internal clock source.
On ‘Fat Ballerina’ there were a total of 5 cameras, 4 Sony FS-7s & 1 Sony F55. Since all of these cameras accept timecode & genlock inputs, each of them had a Timecode Systems Ultra Sync Ones on them RF synced to the master clock in my audio bag ensuring subframe accurate sync all day long.
Mix & Timecode For Reference Edit
As part of the workflow to speed up the post-production process, a live switched edit was fed to a Convergent Design Odyssey7Q+ recorder, that way the client could receive a copy of the two performances with burnt-in timecode to write a paper edit of which performances to use in the actual editing process. To make this workflow work though, the recorder needed to have matching timecode to the camera’s video & audio files, as well as a reference mix providing audio to the recorded output. Since video village, DIT, VTR & the sound cart were set up right next to one another, it was as easy as a hardwired timecode feed to the Odyssey recorder & hardwired mix output from my recorder that was fed to the Blackmagic ATEM switcher embedding the audio into it’s SDI output that the Odyssey was recording.
Large Dynamic Changes
The comedian’s voice & the audience are incredibly dynamic sources with huge differences in their volume range. The comedian could go from a whisper to a scream, likewise the audience can range from a few people chuckling to full applause & cheering, all of which needs to come in clearly. To ensure everything came in consistently; Input gains were ridden throughout the show. This keeps the especially loud peaks from distorting, and the lulls from being buried in the digital noise floor.
In audio post-production, keeping David’s performance & the audience at a consistent level across the entire comedy special was important. Each clip was leveled using clip gain adjustments; By doing this prior to feeding into a compressor, it keeps the average peak level within consistent range prior to hitting a dynamics processor. The big advantage to leveling volume before hitting a compressor is that it never has to work very hard to reduce the dynamic range. This yields much more subtle results compared to pushing into a compressor harder trying to control a larger starting dynamic range. Audience sounds were also subtly compressed as well, and from there the ratio of David’s voice & the audience could be automated to control the mix balance throughout the show. These constant ratio changes brought in another set of challenges though, dealing with room resonances.
Every room has resonances, and even though this venue was fairly dry, it was no exception to this. Furthermore, as David’s voice was also fed through the venue’s PA system, it further brought out resonances in the room excited by his voice and captured by the crowd mics. These resonances are far from flattering, and while they cannot be removed, they can definitely be downplayed. As the levels & ratio between his dialog & the audience reactions were constantly changing, so would the tonality of his voice throughout the special. The constantly changing tonality can be very distracting to the viewer; Going from fairly dry to very roomy sounding in an instant.
To minimize this problem & make sure his dialog had a consistent and acceptable level of room resonance throughout the entire special, multiband dynamics came to the rescue. While EQs can help control resonances to a degree, it’s ultimately it’s not the best tool in this case. Instead by using two different techniques with multiband processors, the end product had a consistency that was pretty amazing considering the constantly open room mics that were changing in ratio to his dialog throughout the entire show. Since the room resonances are mostly only an issue when excited by the venue PA, you really just want to pull those resonances out while leaving the rest of the audience sound as untouched as possible. Doing this required a multiband compressor on the audience mics focused on the frequency band where those room resonances sit, but sidechained to the comedian’s handheld mic. This functionally attenuates problematic frequencies only when triggered by David’s voice. Secondly from there using multiband expansion & compression, I can further control room resonances by attenuating frequencies further when below a certain threshold, and as the audience gets louder, keep that band from exceeding another threshold. While this technique won’t remove the reverb and resonant frequencies of the room, it does ensure that they are at a consistent level throughout the length of the performance.
Check out my video on how I approach recording comedy.
Probably the biggest question on most producer’s minds is “What does this all cost?”. Now that we know the details of what this project costs, we can start to breakdown what it costs to accomplish a project like this. There are a few different pricing factors that can be simplified into three categories, Labor, Equipment, & Post-Production Mixing.
Labor – $1755
One complicating factor in this project was not having access to the venue for a tech prep day till the day of the shoot, which meant a lot of work needed to be done in a short time. To make this happen on time required an early load in coupled with a late wrap; since we were shooting two shows in one night. In production, labor for sound is generally billed at $650 for a 10 hour day, but pushing beyond 10 hours pushes into overtime at a 1.5x hourly rate till 12 hours, and 2x hourly rate beyond 12 hours. Looking at a 19 hour long day, from 7am to 2am, that works out to 9 hours of overtime, totalling $1105 for overtime beyond the day rate of $650.
Equipment – $925
Since this is a fairly unique production as far as equipment goes, the simplest way is to break it down by what equipment was used for this shoot. While there is a lot more gear that goes into any equipment package then is listed, it’s generally accepted that you rent the core pieces of equipment, and it comes with the cabling, stands, power solutions, and any other accessories needed to make them functional. Here is a breakdown of the equipment package used:
1 Field Mixer with Integrated Multitrack Timecode Recorder @ $150
2 Stereo Microphone Kits @ $100/each
2 Wireless Handheld Dynamic Microphones @ $75/each
1 Wireless Lavalier Kit @ $75
5 RF Timecode & Genlock Sync Boxes @ $50/each
2 Wireless IFB Headsets @ $50/each
Audio Post-Production – $4750
Between the dialog edit, audience edit, music edit, clean up, sound design, mixing (and revisions), and printing the final mixes for delivery, it ended up totalling 38 hours of work. Between labor and the cost of the studio rental, that totals $4750 for the audio post-production on ‘Fat Ballerina’. For post, labor of the mixer comes at $75/hour with the rental of an acoustically accurate mixing stage costing $50/hour.
Total Cost – $7430