Lesson 1: General Audio Basics

Welcome to the other side of the glass, a place where music lovers gather and hone in on their craft to create and shape sonic art. This lesson will focus mainly on the fundamentals of audio engineering. There are many skills to be mastered, concepts to be understood, fulfillment to be had, and many career paths to consider. With a complete understanding of the knowledge laid out in this lesson and throughout our program, you will have the skills to perform a multitude of jobs in the audio field.


What makes a good audio engineer?

Audio engineering is a unique profession that utilizes a combination of creativity, resourcefulness, technical skill, interpersonal relations, and intuition. A capable engineer has an understanding of three main concepts. They are:

  • Technical skill
  • Musicality
  • Psychology (interpersonal relationships, emotion and empathy)

While the text and lesson portion of the program will focus mainly on technical understanding and application, the hands on portion (i.e., sessions) will allow you to learn some of the more “intangible” types of knowledge that can only be learned through experience and observation. The ways an engineer might use interpersonal skill to coax the best possible performance out of an artist, to help them feel comfortable and confident, can be some of the most valuable skills to have at your disposal. With some, it’s as simple as positive reinforcement, but everyone has unique needs and tastes. The way you deal with your clients is so important that it can make or break you as an engineer.

What is audio?

To properly understand how to manipulate audio, first we have to know what it actually is. Before we have any audio, first we need sound. Sound, also known as acoustic energy, is a wave caused by vibrations in air. A sound’s source will move back and forth, causing air particles around it to also move back and forth, causing the air particles next to it to move, and so on until the vibrations in the air particles reach our ears. To make an audio signal out of this acoustic energy, we need to use a microphone. They come in all types, shapes, and sizes, but they all perform the same task using a similar principle. This principle is called transduction. Transduction is the change in energy from one form to another. In the case of the microphone, or transducer, it changes acoustic energy to electrical energy. This brings us to the answer to our original question: What is audio?

Audio is the electrical representation of acoustical energy. This is our most basic understanding of what audio really is: electricity, or alternating current. Using tools to measure this electrical signal, we can learn much about acoustical energy and its electrical counterpart, audio--how it behaves, how it moves, and how it can be manipulated. Let’s start with the most simple sound wave: the sine wave. This is a graphical representation of a sine wave. it consists of numerous rises and dips in amplitude over the course of time. The rises are called compressions and the dips are called rarefactions. The two of these together make up one cycle of a wave. The measure of the length of this cycle is called a wavelength. Measuring how many cycles per one second will tell us the frequency of a particular wave. This is a very important term and concept in the audio world.

  • Frequency is measured in Hertz, or more commonly seen as Hz (or kHz, which stands for kiloHertz).

  • Frequency, or Hz, is a measurement of how many cycles of a sound wave happen in one seconds time.

  • Frequency relates directly to pitch:

  • The lower the frequency/pitch, the longer the wavelength; the higher the frequency/pitch, the shorter the wavelength.

  • There are many more frequencies than there are named pitches. For example, the 'A' pitch most often used for tuning acoustic instruments is 440 hz. 439 Hz and 441 Hz are all frequencies, but they’re not technically known as an 'A.'

While the speed at which these compressions and rarefactions are happening at is referred to as frequency, the measure of how high and low these rises and dips go is called their amplitude. This can be measured in several different ways, but what they are measuring in audio is their voltage level. For all of us listening to this waveform, that voltage directly relates to its volume. So, for all intents and purposes, amplitude is an indicator of a waveform’s volume. The higher and lower the rises and dips, the great the voltage and volume of the waveform.


The sine wave is the simplest of waveforms, but there are many, many more types of waveforms out there, each with their own unique patterns and complexities. Other basic waveforms are the square wave, the triangle wave, and the sawtooth wave. These types of waves are most commonly seen in synthesizers, but have other purposes as well. Most waveforms that are naturally occurring are complex waveforms, or combinations of many different simple waveforms. For example, see the waveform below. This is a picture of a recorded snare drum’s waveform. As you can see, it looks very different from a sine wave. It has many changes in amplitude (volume), and a lot of very fast peaks and dips. It has a very loud beginning, a quick tapering off in volume, and a relatively short time before it returns to nothing. These characteristics comprise a waveform's envelope, or how a waveform unfolds over time. Every waveform’s envelope can be described with these 4 terms: attack, decay, sustain, and release, commonly referred to as ADSR curves.

ADSR Curves

Let’s take a minute to dive into each one of these terms.

  • Attack: the speed at which a waveform moves from silence to its maximum volume. In the instance of the snare, it has a very fast attack with a high maximum volume.

  • Decay: the amount of time it takes a waveform to move from its maximum peak to its sustained fundamental tone. In the instance of the snare, the decay happens almost as quickly as the attack.

  • Sustain: how long the fundamental tone of the waveform lasts before it starts returning to a state of rest. In the instance of the snare, it is much longer than the attack and the decay times, but is still relatively short when compared to waveforms made by other instruments.

  • Release: how long it takes a waveform to return to a state of rest after its fundamental tone. In the instance of the snare, the sustain and the release are happening almost simultaneously, slowly returning to a state of rest while maintaining the drum’s resonance, or "tone."

Every waveform has a different combination of these four characteristics. Another word that directly relates to envelope, frequency, and other harmonic content is timbre. Timbre (pronounced TAM-burr) is the unique sound of any particular instrument. It’s the combination of all of these factors together that gives every sound we hear its own identity. It's what makes my voice different from yours, one drum sound different from another, the reverberation from a plate unique from that of a spring or hall.  Timbre can be described as a particular sonic footprint for any sound or instrument.

As stated above, timbre relates not only to envelope and frequency, but also to other harmonic content. Let’s take a closer look at this.

Frequencies, Instruments, and Harmonics

Now we know that every instrument has its own timbre; in addition, each instrument also has its own fundamental frequency and set of harmonics.  The fundamental frequency of an instrument is the frequency where it resonates. This will be the  loudest frequency over all other harmonic frequencies. It varies from instrument to instrument, but it usually correlates with the instrument’s size. The bigger the instrument, the lower its fundamental frequency, and the smaller the instrument, the higher its fundamental frequency. For example, a kick drum usually has a fundamental of about 50 Hz, while a snare drum’s is usually around 200 Hz. It might sound strange, but every object in space has its own particular frequency at which it resonates.

Now, let’s talk harmonics. Talking about harmonics, like many of these terms and concepts, can start us down a road of getting deeper and deeper into this fascinating aspect of sound and audio; so with that in mind, let’s keep this as simple and as straight to the point as possible. Along with these fundamental frequencies, instruments also resonate at other frequencies that have a certain relationship with the fundamental. This relationship has to do with harmonic series. While mostly taught in music studies, this concept is very enlightening and important to the world of acoustic sound and audio. The harmonic series explains what frequencies will resonate with any given fundamental. Here’s a picture to help explain:


Here, the fundamental frequency of 55 Hz (or A1) is causing all of the other shown frequencies to resonate as well. As a rule of thumb, the closer in frequency a harmonic is to the fundamental, the louder it resonates. These pitches are as follows:

1. Fundamental
2. One octave above
3. One octave plus one fifth above
4. Two Octaves above
5. Two Octaves plus one third above
6. Two octaves and a fifth above
7. Two octaves and a flat seventh above
8. Three octaves above
9. Three octaves and a second above
10. Three octaves and a third above
11. Three octaves and an augmented 4th above
12. No longer a discernable pitch in relation to the fundamental

In reality, 5th harmonic in normally the highest frequency you'll be able to hear. Also, if you’re wondering what happens around the 12th harmonic: that harmonic frequency lands in between actual named notes in the western music scale, so it has no actual pitch name. There are also some instances where a fundamental can cause harmonics to happen below its pitch. These are called sub-harmonics. While not as likely to hear as your basic upper harmonics, these do happen quite often and are important when mixing low end, like your kick drum and bass guitar. Interesting, huh?

Given that every instrument ever created has a different fundamental frequency, as does every singing and talking human, it can be overwhelming (and impossible) to memorize all of this information . Thankfully the below illustration provides approximate but accurate fundamental and harmonic ranges for common instruments, and also some nifty lingo that audio engineers like to throw around instead of using specific numbers. Check it out:


Hopefully, you can refer to this in times of need. There’s one more basic topic about sound that needs to be addressed. This next attribute of audio is something engineers have to think about every day in recording, mixing, and sound sculpting. It’s importance cannot be emphasized enough. I’m talking about phase.


The idea of what phase is is actually very simple, but in audio engineering, phase is relevant in a variety of situations. Phase is essentially a relationship between two or more waveforms in time. Depending on how these two waveforms align with each other, their relationship can either be beneficial or detrimental to the audio. This is because waveforms are additive, meaning, whatever their values are at a given point in time add or "sum" together at the audio output. Let’s look at an example below.


Below, the two waveforms on the right are of the same signal and in phase, and add together to make the third bottom waveform. Usually, the phase relationship between two or more signals in a realistic environment is never perfectly in or out of phase. What does this mean in the actual recording of audio? How does this affect how you should be recording instruments, vocals and effects?

In the middle, we have two sine waves happening simultaneously, going in exact opposite directions (i.e., while one is going up, the other is going down). When this happens, the two waves cancel each other out, leaving silence at the output. If the two sine waves are perfectly aligned with each other, the two waves add to each other, doubling their original output value. 


A single fact can help to immediately tell you what type of situations might end up having phase issues. The only time that phase can be an issue is when recording one source with multiple microphones. First thing that comes to mind? Drums. Phase alignment is an incredibly important part of recording drums, but this principle applies to all things recorded with multiple mics. Thankfully, most recording equipment available today is made with a very helpful little button with a symbol that looks like this: ∅. This button signifies “phase flip” or "polarity reverse," which inverts the phase 180 degrees. Basically it just means that whatever was going up is now going down and vice versa. While this usually helps, sometimes the audio may not sound “in phase” in its original state or with the phase flip button engaged. In this situation, the only way to fix this is to move the mic’s position either closer to or further from the source.

More times than not, two complex waveforms slightly out of phase with each other combine to create comb filtering, or phase cancellations happening at different frequencies usually related to the harmonic series. The frequency response chart will end up resembling something like this:

While not usually desirable, interesting sounding effects can happen with comb-filters like these. We will learn more about that in a later chapter; for now, lets discuss how all of this information relates to us humans when it hits our ears.

Human Hearing and the Frequency Spectrum

Our ears are another marvel of nature. Such complex machines, made up of many small parts all working together seamlessly in an effort to transfer auditory information to the brain. Inside of our ears, we have the ear canal, which is a passage from the outer parts of the visible ear to the inner ear, where the eardrum is located. The canal is actually a resonant passage, tuned to a specific frequency range that all humans are especially sensitive to. Humans have evolved to hear mid-range frequencies better than others, as this frequency range is where we can distinguish speech and hear babies cry. As an audio engineer, this means that while getting sounds and recording music, we have to keep in mind the fact that we are more sensitive to some sounds than others, and in turn much less sensitive to some frequencies that are super important in music. It's important to recognize that your ears naturally have a bias that won't always be represented by visual information, like the graph on an EQ or the meter on a compressor. Luckily, a duo of scientists pioneered into the field of human hearing long ago, conducting an experiment to test the frequency response of the human ear.

These two scientists, Harvey Fletcher and Wilden Munson, created what were then called the “Fletcher-Munson Equal Loudness Curves.” Nowadays, their experiments have been further refined and simply named the “Equal Loudness Curves.” Essentially, what they did was to take a sample group of people, play a tone for them, proceed to play a different tone at the same voltage value and ask if it's louder or softer to them. Then they would adjust the volume until the subject thought the second tone was as equally loud as the first tone, and take note on how much more or less volume was necessary. Upon recording all of this data, and comparing it with all the other subjects data, they came up with these charts.

The curves shown in blue are the original Fletcher-Munson curves, while the ones shown in red are the curves as we know them today, a more accurate depiction of the ear’s frequency response. This graph shows that our ears have a different frequency response depending on relative loudness. For all of these curves, it shows that our ears are extremely sensitive to frequencies between 2 kHz and 6 kHz, and that we lack sensitivity in frequencies below 200 Hz and above 8 kHz. The differences between these curves show that our sensitivity for the 2-6 kHz range remains at low volumes, while our sensitivity to the extreme frequencies gets worse. With this in mind, it is important for audio engineers to realize that our ears have a more even (flat) response at louder volumes. Therefore, it is more ideal to make decisions about sound at those higher volumes. For example:

If an engineer were to mix a song at low volumes (60 dB or less), and balance all of the sounds at that volume, then when the mix is turned up to normal listening level (about 85 dB), the low frequencies and extreme high frequencies will be too loud, and most likely overwhelm the rest of the mix. On the other hand, if a song is mixed at normal listening level and then turned down to lower volumes, the low end and extreme high end would disappear. The difference in this case is that all of the important sonic information will still be able to be heard in balance, the 2-6kHz range.

So now that we know that, why mix at 85 dB? Why not 100 dB? Why not 120 dB? At levels above 85 dB, prolonged exposure can lead to hearing loss, and the louder you listen, the quicker you can cause permanent damage! This next illustration shows many common sounds and how they relate to dB and the dynamic range of human hearing. Notice how everything above 85 dB is orange and leading to the color red. At 90 dB, it takes about 2 continuous hours of exposure for permanent hearing loss, while at 120 dB, it may take only 8 seconds. Keep in mind, decibels are measured in a logarithmic scale, so there is a drastic difference between 85 dB and 120 dB. This is why we don’t mix or listen above 85 dB. More on this later.

Audio In Use

All of these prior terms are very important to know and understand, but thinking about them while actually recording can actually hinder your performance as an engineer. This brings up an interesting point: a good engineer should know all of the fundamentals of audio so well, that all of this knowledge should become second nature by way of deep understanding. This is easier said than done. The only way that this can be achieved is by constant review of the material and application, but don’t worry. This is why you're here: to learn all of this material, and to make mistakes while learning so that you don’t have to make them later when your reputation is on the line.