DTMF


Project status: working encoder, but unfinished
Source code: dtmf.c

DTMF is an acronym for Dual Tone, Multiple Frequency, which is the technical name for the telephone signaling scheme trademarked as "Touchtone" by AT&T. The telephone keypad (0-9, *, #, A-D) is arranged in the familiar pattern of rows and columns. There are four rows, and four columns; the fourth column contains the keys A, B, C, and D, which are not usually present on home telephones. (These "extra" keys are used in the military phone system, on some private branch exchange (PBX) systems, and of course you can use them for whatever you please.) To each row, and to each column, a unique frequency is assigned. When a key is pressed, the tones for the row and for the column are simultaneously generated.

1209 Hz 1336 Hz 1477 Hz 1633 Hz
697 Hz 1 2 3 A
770 Hz 4 5 6 B
852 Hz 7 8 9 C
941 Hz * 0 # D

Furthermore, DIALTONE, RING, and BUSY signals are also each a composition of two distinct frequencies:

350 Hz440 Hz480 Hz620 Hz
DIALONE
RING
BUSY

For example, a dialtone is a 350 Hz tone combined with a 440 Hz tone. The "2" button is a 697 Hz tone combined with a 1336 Hz tone. Note that pressing multiple buttons simultaneously may result in an ambiguous result. If 1 and 5 are pressed simultaneously, the frequencies 1209 Hz, 1336 Hz, 697 Hz, and 770 Hz are all present. This same situation is produced when 2 and 4 are depressed simultaneously.

Encoding dtmf

Encoding DTMF is easy. You just generate the appropriate frequencies.

To do this in Linux, we make use of the audio device /dev/audio. Data sent (written) to this device is interpreted as sample data and is rendered to audio. Pertinent parameters (sample rate, bits per sample, etc) are set using the ioctl ("io control") system call. Exactly how this is done is described in the Programmer's Guide to OSS. OSS is the Open Sound System for Unix. The process is extremely straightforward:

#include <linux/soundcard.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>

int fd, foo;                           /* file descriptor   */
fd = open("/dev/audio",O_WRONLY,0);    /* open audio device */ 

foo = AFMT_U8;                         /* unsigned, 8-bit samples */
ioctl(fd,SNDCTL_DSP_SETFMT, &foo);     /* set the sample format */ 

foo = 0;                               /* false */ 
ioctl(fd,SNDCTL_DSP_STEREO, &foo);     /* mono mode. (not stereo) /* 

foo =  8000;                           /* 8000 samples per second */ 
ioctl(fd,SNDCTL_DSP_SPEED, &foo);      /* set the sample rate */ 

Of course, we really have to wrap all the system calls (eg, open, ioctl) in if statements to check them for success. Each system call ends up looking something like this:

if ((fd = open("/dev/audio",O_WRONLY,0)) == -1 ) {
  fprintf(stderr,"Couldn't open audio device: %s\n",strerror(errno));
  exit(1); 
 }

After this initial set-up has been accomplished, we can simply write audio samples to the device. We've set up the port so that each sample is eight bits (unsigned) and represents 1/8000 of a second. If we want to generate a dialtone (350 Hz | 440 Hz), we can do this:

#include <math.h>

double time = 0;
static double pi = 3.14159;
double freq[]  = { 350, 440 }; 
double samprate = 8000.0;      /* 8000 samples per second */

while (something) {
 unsigned char sample;
 sample = (unsigned char)(128.0 + 64.0*cos(2*pi*freq[0]*t) 
                                + 64.0*cos(2*pi*freq[1]*t));
 write(fd, &sample, sizeof(sample));
 t += 1.0/samprate; /* increment time */
}

You might be wondering where the 128 and 64 come from in the sample = expression. Each audio sample, when in the AFMT_U8 (unsigned eight-bit) mode, is a number between 0 and 255 (inclusive). The extreme values (0 and 255) represent extreme opposite displacements of the speaker (audio waveform), and the center value (128) represents no displacement (neutral position). The cosine function, however, varies between -1 and 1 (inclusive). Multiplying by 64 lets the audio use up half of the dynamic range of the output device -- ie, it will vary between -64 and 64 out of a possible range of -127 to 128. The second 64*cosine term uses up the rest of the dynamic range. Thus, this snippet of code will have maximum volume. If you want to decrease the volume, scale down the coefficient to the cosine functions. Why the + 128 ? Well, because we're using an unsigned sample format, we want the value to be positive -- we want it to vary from 0 to 255, not from -127 to 128. Adding 128 accomplishes this.

Why the factor of 2*Pi inside the cosine argument? Frequency is given in Hertz == Cycles/Second. We multiple this by Time, and we get (Cycles/Second)*(Seconds) == Cycles. The argument of cosine must be in radians, however, so we multiple by (Radians/Cycle). Thus we get: (Cycles/Second)*Seconds*(Radians/Cycle)==Radians. There are 2*Pi radians in one cycle.

that's pretty much all there is to it.

Decoding dtmf