Figure 1 — Example of a boosted video chat set up with blurry bokeh background, RGB lighting, well lit subject, no visible AV equipment.
This guide was originally posted on Notion (here) on June 26, 2020.
If you’re interested in learning more, the On Deck Podcaster fellowship starts soon. Check out their gear guide here.
Overview
In this document, I’ll go over what are the elements required so that you can make it an experience to be on video chat with you. As some may have noticed — it can make you stand out (I’ve seen people get distracted by the video & stare way too long…), it can speak for you (by reflecting your personality & interests), and it can leave quite an impression. You may be surprised to know that with even a few minor changes, you can greatly enhance your own experience with some relatively inexpensive and non-technical tweaks. However, all of the top of the line upgrades will unfortunately require 💸 money.
Now, there are three primary issues to contend with in order to create a high quality video experience. I will go through each aspect in detail and how to address them all as well as potential upgrades for multiple budgets.
The 3 issues, sorted from the easiest (and cheapest) to fix to the hardest (and most expensive) to fix, are:
- ⏳ Latency
- 🎙️ Audio Quality
- 🎬 Image Quality
Then, at the end, I’ll share lists of different configurations, again, at different budgets. And, we’ll open up to Q&A and consultation. Feel free to interrupt at any time, all of this information will be available online after the session.
Latency
⏲️ Baseline
Regardless of your internet connection bandwidth, Zoom adds a baseline latency of about 500ms. This means that after you talk, it takes 500ms PLUS travel time to the other attendees. For a call from San Francisco to NYC, this travel time is about 80ms.
This means… with no other latencies involved, after you start talking, the other attendees won’t hear you for another half second and change. Let’s calculate what happens when you add more latency.
📡 Wifi
You may be surprised to learn that wireless internet is a terrible choice for video calls. Wired ethernet adds only .3ms delay even over long cable runs. That delay is imperceptible.
At best, a typical wifi connection will add an extra 3ms hop for every router or repeater you have. At worst, it will cause your visual frames to freeze and audio to be interrupted every few hundred milliseconds causing an overall delay of 500ms. This delay is incredibly perceptible.
500ms + 80ms + 500ms + 1080ms = 1.08seconds
Using wifi for video chat is like trying to talk to your date at a restaurant where everyone around you is yelling at the top of their lungs. For you to be heard, you have to keep yelling at the top of your lungs repeatedly until your date receives your message. Try to have a conversation like that and see how you like it.
So, switch to a wired network connection (.3ms latency).
📶 Bluetooth
Similarly, Bluetooth is a terrible choice for video calls. One of the more popular options is the Apple AirPods Pro. This has an at-best latency of 144 ms. Assuming you’re both on wifi and both use AirPods Pro:
144ms + 500ms + 80ms + 500ms + 144ms = 1368ms ~ 1.4seconds
That is just shy of 1.4 seconds each way.
So, switch to a wired audio connection (<.1ms latency).
🥇 Best case scenario
Switching to all wired connections gives you a latency of:
500ms + 80ms + .3ms + .1ms = 580.4ms
This can mean the difference between having a relatively comfortable, but not ideal conversation (.5 second delay) to having one where you are constantly interrupting each other (1.4 second delay).
🎙️ Audio Quality
🎧 Audio Processing
If you weren’t already convinced about the value of switching to wired audio, ponder this: if you and your attendees sit in noisy rooms, using speakers, and a built in microphones, you are all going to send a noisy signal — you’ll be sending over the internet more than JUST the sound of your voice. Your computer will do the best it can to filter out frequencies outside normal vocal range but there are many noises that overlap with that range. Then, as it reaches another speaker’s computer, that computer will also try its best to filter out noise. But it won’t be perfect and as they speak, they will also be including your noise with their noise (which may also include your voice too). And as the number of guests grows, this problem becomes untenable.
So, use headphones.
I recommend in-ear monitors (IEMs) which provide studio grade audio with sound isolation (passive noise reduction) and if you buy clear ones, they give a very subtle, nondescript appearance.
Shot illustrating the subtle nature of in-ear monitor headphones.
If you can, it would be advisable to use a dedicated microphone connected to an audio interface that supports studio-grade audio processing. This includes using compressors (ensures your voice is never too loud), noise gates (prevents a lot of noise before going to Zoom or other tool), and boosts (amplifies certain frequencies to increase audible clarity).
🎶 Acoustic Treatment
You can eliminate a lot of noise before it even hits your microphone by acoustically treating your room. This can mean acoustic foam on walls, wood/fiberglass sound diffusers, or acoustic blankets. These are meant to prevent noise from bouncing around in your room, prevent your voice from echoing and reverberating through the room, and to muffle stray sounds.
🎙️ Cardioid Microphones
If you are in a room that may have loud noise intermittently, use a dedicated microphone with super-cardioid pickup patterns. This just means that the microphone has a narrow range of where it can read audio from the environment and therefore eliminate any sound that comes in at the wrong angle. This will dramatically drop off the amount of noise introduced before it even reaches the audio interface.
If you are in a soundproofed studio, you are free to use a cardioid pickup microphone which ignores sound from the rear but picks up everything with high sensitivity in the front.
I don’t recommend a hyper-cardioid shotgun microphone or an omnidirectional microphone for indoors as it is very likely to pick up reflections in even a perfectly acoustically treated room.
🎬 Image Quality
A high image quality experience consists of three things:
- composition & style
- sufficient network bandwidth for your desired resolution
- proper exposure & lighting
⛰️ Composition & Style
This is where you can take things up a notch. Well-framed composition and visual styling can be a game changer.
🖼️ Composition & Framing
Ideally, you’ll want to position yourself in the center of the frame. Your face should be close enough to clearly be the focus or most important part of the frame yet not be so close that your guests can see your nose hairs. And not so far away that the guests can’t make out your facial expressions & body language.
Additionally, if you place yourself off to one side, you’ll create a large area of negative space which can draw viewers’ eyes away from the intended focal target: you. Rather than suggest that you should never do this, you should only do this if your intent IS to draw your viewers’ eyes to something in that negative space like a brand logo, informative picture, or comic element.
To create the best possible effect, I suggest investing in a teleprompter for your camera. With my setup, I have my camera behind the teleprompter and an iPad mirroring my computer’s screen. The iPad’s screen is therefore reflected through the half-reflective teleprompter into my eyes. As I am looking directly into the camera lens, I am seeing the other person’s face giving the impression of eye contact from the guest’s point of view. This way, the guest isn’t staring at my forehead the entire time watching me constantly look down below the camera.
⬅️ Don’t Neglect the Background
The background can be incredibly useful when video chatting with someone. It provides a place for viewer’s eyes to rest from focusing on you and your facial expressions and a place for them to explore. For my setup, I have deliberately arranged the camera to showcase my actual home office setup as well as interesting bits and pieces that might be on my desk for that week. Some have noticed the oscilloscope, others notice the additional camera, but most notice the massive screens with a weekly changing wallpaper.
🎨 Color Correction & Color Grading
Depending on what kind of camera you decide to use, image sensors are not perfect. They approximate (but not closely) with what the human eye can see; however, the color spectrum overlap is often… poor. Sony alpha cameras in particular are known to emphasize yellow too much and improperly handle reds.
Example of color grading using a LUT. Right is before grading, left is after grading. Performed using an Atomos Ninja V with custom LUT.
Therefore, it is helpful to color correct the image. Some color correction is possible in-camera, if you take a look at some popular camera-specific LUT packages like Leeming LUT Pro, they come with camera-specific settings to change to compensate for sensor idiosyncrasies.
🚄 Resolution & Bandwidth
It is very simple. More resolution packs more pixels into the screen which means more details are visible. Zoom has a maximum supported streaming resolution of 1080p; Hangouts, 720p. With those constraints, it may seem like overkill to use a higher resolution camera, but your chat guests will notice the difference between 1080p and 720p. However, unless your chat system supports 2160p or higher, your guests are unlikely to notice the difference between an expensive 8K camera and a 4K camera.
Many cameras, like webcams built into laptops and dedicated ones in smartphones, are built to be incredibly small. They are built to be so small, in fact, that they have hit the limits of optical physics a long time ago. And therefore, to get good imagery, they must use tricks in software to compensate.
You may notice that all dedicated cameras like DSLRs & mirrorless cameras (as well as professional film & ENG cameras) are quite large by comparison for what may seem like the same job. This is on purpose. With that, you get: high resolution directly out of the camera (no processing, no tricks) as well as high creative control of the optics themselves (and therefore control over the resulting video).
🗒️ Technical NoteWhen you take a high quality photo with say an iPhone 11, it is actually taking 16 photos in series, it is doing a ton of computational processing on them individually, and, then, it finally stitching them all together into a single image. This allows for large windows of depth of focus (and therefore sharper images), better color rendition, and better resolution than would otherwise be possible with the physical camera. Unfortunately, that trick does not really apply well to video, you can’t (currently anyway) take 16 photos for every frame of video and expect to process all of that in real time let alone give you direct creative control of the process.
Unfortunately, laptops, even recently released ones, will typically come with non-removable webcam with a maximum resolution of 720p. This is true of the latest 2019 MacBook Pro. Recently released phones will frequently have a 1080p camera (or even 4K like the latest iPhone 11) so if that is an option for you and it is higher than your laptop’s camera resolution, you may want to consider using that as a webcam.
You also happen to need enough bandwidth to send your video (and to receive others’). The uploading of 1080p video requires ~5Mbps, 4k video requires >24Mbps. If your connection does not support this, uninterrupted, you will not be able to stream your video regardless of how amazing your video quality is.
💡 Exposure & Lighting
This, by itself, will probably give you the most bang for your buck visually — lighting! Cameras and lenses, given a specific set of aperture + shutter speed + ISO, will operate best at specific exposure or brightness ranges. Even a 720p laptop webcam will benefit greatly from being properly exposed to the sensor’s ideal brightness range. So, if you’re using a webcam and see a lot of pixelation (a lot of grainy noise or blocky patches of different colors that show up especially in darker regions of your video), you need to add more light to your scene and that will dramatically boost your image quality.
For our purposes, we want a well-lit, properly exposed scene and so there are three things you need to worry about: the background, the foreground, and if you wear glasses.
🗒️ Technical NoteAn important thing to note is that different lights can have different color temperatures. A warm yellow light may be 3000K, while a daylight fluorescent will be around a whitish 5000K, and sunlight is bluish at 6500K. Except for certain rare scenarios, it is normally not a good idea to mix different color temperature lights on the same due to how they change colors of objects in a scene in different ways.Additionally, your camera has to be correctly set to balance based on the color temperature of the lights in the scene so that the sensor understands what is considered pure “white”. This is difficult to fix in post-production let alone live in a stream.
THE BACKGROUND
So, some people here will typically work with their back facing a window. On even an overcast day (let alone a sunny one), the sunlight from the window will almost always overpower anything else in your scene — causing your face to be, at best, a dark silhouette. To compensate for this, you would need an equally bright light focused on your face. High quality lights bright enough to match the sun “properly” are upwards of $4000 each. But even your basic desk lamp will technically be an improvement over the silhouette. Or you could, you know, move your desk. Facing the window would balance the light on you with the room behind you for upwards of $0 and you get the benefit of a typically soft, flattering light on your face. And you probably get a nicer view. Win-win.
🗒️ Technical NoteAn important thing to note is that different lights can have different color temperatures. A warm yellow light may be 3000K, while a daylight fluorescent will be around a whitish 5000K, and sunlight is bluish at 6500K. Except for certain rare scenarios, it is normally not a good idea to mix different color temperature lights on the same due to how they change colors of objects in a scene in different ways.Additionally, your camera has to be correctly set to balance based on the color temperature of the lights in the scene so that the sensor understands what is considered pure “white”. This is difficult to fix in post-production let alone live in a stream.
THE FOREGROUND
Once you have your background set to not be overpowering to the foreground, you’ll want to make sure your foreground is well lit. Ideally, this will be with a soft, diffuse light which casts equally soft, flattering shadows. This light is referred to as the “key” light. A hard light, like your typical desk lamp or flashlight will cast very sharp shadows on your face which can give a dramatic look but is typically distracting for video calls because the lit and shadowed portions of your face will be exposed differently.
To separate the background and the foreground, one of the best ways is to use a backlight, also referred to as a rim light. This should be a bright light but dimmer than the key light at approximately a 45 degree angle behind you in either left or right of you. It should ideally be 45 degrees up above you as well. Backlights add specular highlights to your hair and shoulders which create a distinct separation between you and whatever is behind you. This is especially useful for green screen chroma keying.
If you can improve the lighting of your scene using any of the above techniques, you can greatly improve your output even with a weak 720p camera.
DEALING WITH EYEGLASSES
Lighting angles and placement changes drastically if you wear glasses. You’ll want to take advantage of angle of incidence to ensure the light is not visible reflected through your glasses. This usually means placing it in front of you but above your head. This will, however, unavoidably create some harsher shadows.