ViDe // www.ViDe.net
Videoconferencing Cookbook
Version 4.1
Video Development Initiative      
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Previous Next Print Contents Glossary Feedback Search

Basic Requirements for Successful Videoconferencing


ViDe Videoconferencing Cookbook

Basic Components

As discussed in our introduction What is Videoconferencing?, any videoconferencing terminal must have a few basic components to "get the job done": a camera (to capture local video), a video display (to display remote video), a microphone (to capture local audio), and speakers (to play remote audio.) In addition to these more obvious components, a videoconferencing terminal also includes a codec ("COmpressor/DECompressor"), a user interface, a computer system to run on, and a network connection. Each of these components plays a key role in determining the quality, reliability, and user-friendliness of the videoconferencing experience as well as any given videoconferencing terminal's suitability to particular purposes. A basic understanding of each of these component's roles will help you map videoconferencing technology capabilities to your specific application needs.

The Main Camera

By nature of the general definition of videoconferencing, at least one video source is typically present at each endpoint. The most common video source is a single main camera that captures live movement occurring at one end so that it may be sent to the other end in near real-time. ("Near real-time" is an important concept in the success of a videoconference and is covered more in the following sections on the codec and in our later full section on the network connection.)

The most important component of the camera is the image sensor which captures snapshots of the view in regular intervals (25-30 times per second.) This sensor is characterized by the number of pixels or dots that it can distinguish in an image, called the resolution. Typical resolutions are 640x480 pixels for webcams, 720x480 for NTSC cameras and 720x576 for PAL cameras. (The height, or vertical resolution, is listed first; the width, or horizontal resolution, is listed second.) The technology that the sensor is based on is also important. Inexpensive videoconference cameras, usually termed "webcams" come with a CMOS sensor, which gives adequate image quality, but cannot render color and brightness signals very well. This causes colors to appear dull or slightly distorted, and it also makes contrast adjustments difficult. CMOS cameras are also affected a lot by the quality of lighting in the room. Better cameras, at a multiple of the price, incorporate CCD sensors, similar to those used in professional cameras, and can therefore capture a much better image, offering a lot of possibilities for adjustments, to adapt to any room.

When selecting a camera for videoconferencing, it is important to understand that the quality of your camera heavily determines how your video will appear to the receiving end. It is often our first reaction to attribute video quality to the receiving system (i.e.," Why does their picture looks so bad when we spent $XXXX on this system?!") yet, if you cannot see the other site clearly, their camera is quite often the culprit. It follows that, when evaluating camera quality, you need to be sure you are shown how your image will appear to others. In addition to image quality, cameras vary in terms of other features that will affect both their usefulness and their cost. Among these are: the ability to pan, tilt, and zoom (often abbreviated as a PTZ camera), wide angle versus narrow angle lens, manual focus versus auto-focus, manual iris versus auto-iris, auto-tracking, remote control, and/or RS-232 control. Naturally, as features are added, cost goes up. Considering the impact of the main camera on the success of a videoconference, it is extremely important to imagine ahead of time how the camera will actually be used (i.e., room setup, number of participants, user temperament, etc.) and then ensure that the selected camera can support those uses.

The Video Display

In addition to capturing local video, a videoconferencing solution must include the ability to display the remote video that is being received. This incoming video is displayed on a monitor, most often a computer monitor, which influences how clearly the remote site can be seen and also how many people at the receiving site can easily see it. "Typical" display monitor quality considerations such as screen size and resolution affect the size and clarity of the incoming video window and also the integration of the incoming video window with the application interface that surrounds it. The quality of the image within the video window itself is, however, more directly related to the performance and capabilities of the codec and to the quality and bandwidth of the network connection. In the case of a desktop videoconferencing terminal, most offer a scaleable video window that shares space on a PC desktop with other program/application windows. In such cases, the conference aspects most heavily influenced by the capabilities of the computer monitor are the appearance of the video window itself (not what is inside it) and the ability to manipulate that window within the larger display. In some cases, an entire display monitor can be dedicated to displaying incoming video (a "full screen" conference) while a second monitor is added for call control and data sharing. A note: Video resolutions supported by the popular videoconferencing standards H.323 are CIF (352 X 288 pixels) and QCIF (176 by 144 pixels.) Since these resolutions are fixed, increasing the network bandwidth of a call beyond a certain point will not show an appreciable difference in video quality within any given video frame. However, additional bandwidth enables higher frame rates (i.e., the sending of additional video frames per second), which can have dramatic improvements on the smoothness and video quality of motion.

One thing to bear in mind regarding video display is that the resolutions mentioned above are rather small compared to a typical 1024x768 resolution of a computer screen. If you enlarge the video window on a PC, not every pixel displayed will be "real". For example, if you double the width and height of the window to 704x576 pixels, only a quarter of the pixels will be the actual image information:

352x288

704x576

The white pixels in the second image will be automatically filled in by the system to look similar to their neighbors, but they will not be exactly what the original picture was and video quality will be degraded. This is the same effect that you can see when you enlarge the window of a media player (Windows, Quicktime, Real) playing a video clip. Therefore, enlarging a picture should only be done when more people are sitting around a monitor and need to see the image.

On the other hand, in room videoconferencing, display devices are normally larger. A TV monitor can be used in most cases. A rule of thumb to select a size for the monitor is that the viewers should be at a distance between 2 to 6 times the diagonal size of the screen. As an example, if the participants in the room are sitting between 4 and 12 feet from the monitor, then a 24" TV would be sufficient. In bigger rooms, where an LCD or DLP projector might be installed, diagonal size and viewing distance should still adhere to this ratio.

Audio Components

Within a videoconference audio is as important, and often considered more important, than video. If we lose video or experience poor video quality in a conference but audio remains intact, we can still accomplish many of our communication objectives. The conference would simply become a teleconference rather than a videoconference. In contrast, poor or disrupted audio quality effectively shuts down a videoconference, often sending participants scrambling to find a "native audio" telephone to complete the meeting. In light of this, the devices that capture local audio (microphones) and those that reproduce remote audio (speakers) are critical conference components. Coupled with this are characteristics associated with comprehensible full duplex (simultaneous two-way) transmission of audio, such as echo cancellation, noise suppression, and audio mixing. These features are influenced by a combination of the microphones, speakers, and codecs. Similar to the camera discussion, it would be impossible to cover all features of audio performance here. However, one key to ensuring audio that supports conference requirement and expectations is to examine the location, quantity, and quality of your microphones and speakers. Again, as features are added, cost goes up, though the cost differences may not be as pronounced as they are in camera selection. Since hearing is often the best test, you may want to speak and listen before you buy!

One of the catches of adjusting audio quality is that you can never know 100% what quality of sound you are transmitting. This is most usefully determined by a colleague who is engaged in a videoconference with you and tells you how you sound. If they hear reverbs or echoes, this indicates a problem at your end and you should make the necessary adjustments in your audio system to correct it, then check with your colleague to see whether that worked. There are also usually "loopback" test commands in videoconference systems that allow you to hear what signal you are sending, but they are not so reliable. On the other hand if you are hearing echo effects when you are speaking with another site, you can do them a favor by letting them know that they need to adjust their settings. They won't know unless you tell them!

For situations other than personal videoconferencing, a small or larger audio mixer is a very useful tool. It enables you to adjust gains and volumes more accurately than the embedded adjustments of most videoconference systems. It also enables you to separate and mix signals easier. A feature that is particularly useful in mixers is the ability to produce different mixes of the same signals. As an example, you will want to send the audio signal from the remote site to your amplifier or speakers, but you will also want to cut it off from any signal that is fed back to them.

Regarding the microphones, there is a very wide range that can be used for videoconferencing, from a headset mic to an array of room microphones. The inexpensive $2-3 tabletop computer microphones found in many stores often produce unacceptable audio or are not full duplex. On an opposite note, the frequencies used in videoconferencing do not exceed 7KHz, which is more than enough for voice signals, so expensive professional microphones with extensive frequency response won't make any difference in the audio quality. For personal videoconferencing, a headset is often the preferred choice because it can isolate the incoming from the outgoing signal, and therefore eliminate any echo or reverb effect. Some people prefer a speakerphone, however, and these are available as well. For room videoconferencing, a high quality omni-directional microphone is often used, or several smaller directional microphones are placed throughout the room. In all situations, it is important to avoid placing a microphone in the active range of a speaker. This can cause an echo effect which is very distracting and difficult to counterbalance.

Regarding speakers, it is usually easier to select these than it is to select microphones. As above, for personal videoconferencing a headset or a set of plain computer speakers is often enough. For a bigger room and more people, the speakers of a TV monitor can be used. For even larger rooms a separate sound system might be required comprising an amplifier, an optional equalizer and speakers.

The Codec

The codec has been mentioned above as affecting both the video and audio within a videoconference. Indeed, the codec actually forms the heart of any videoconferencing terminal. The word "codec" is a shortened version of "COmpressor/DE-Compressor" and is specifically applied to the wide variety of algorithms used for actually compressing or decompressing audio and/or video information. This compression has historically been necessary to make the audio/video data "small enough" to be practical for sending over expensive network connections. In this sense, there are many audio and video "codecs" (particular compression/decompression methodologies) that are supported as part of most videoconferencing technologies and standards. For the purposes of this section, we are considering a broader meaning for codec: the codec as the portion of the videoconferencing terminal that is responsible for whatever compression/decompression of the audio/video signals is taking place. The processes of compression and decompression are also referred to as encoding and decoding respectively.

This latter and broader definition allows for the codec to be either a software or hardware component, and confers great responsibility upon the codec for the success of the videoconference. The amount of data required to "describe" audio and video in a digital format is very large by today's data networking standards. Without some form of codec, the transmission of a videoconference requires extremely high amounts of network bandwidth. It is the codec that takes the sights and sounds captured by the local camera and microphone, and then compresses that information such that it may be transmitted across a network fast enough to enable near real-time communication. When the compressed information is received at the remote site, the codec within the remote site's videoconferencing terminal decompresses it and enables "play back" through the speakers and display. Though we think of the conference as a real-time conversation, the real-time feeling is a function of how fast each of the codecs are compressing/decompressing the data, and how fast and reliably the compressed data is traveling back and forth across the network. The video compression is much more demanding than the audio compression, and this is what sets the limits on codec capabilities. In light of this, some factors to consider when evaluating codecs are:

  • Is the codec a software or hardware component?
    Hardware codecs are generally faster in completing their compression/decompression task, making near real-time communication more likely. Hardware codecs also often carry their own processing power "on-board" such that they do not rely on the resources of the underlying system. For instance, in the case of a desktop system, using a hardware codec may mean that you don't need a "souped-up" PC, or that you will be able to run other applications on your PC while simultaneously participating in a videoconference. On the other hand, software codecs are generally less expensive and easier to install (no special hardware required), but they tend to produce lower quality ("casual") conferencing with very low frame rates. In desktop videoconferencing systems, the codec sometimes resides on an interface board or, more typically, in a software application. In group conferencing systems, the codec is most likely an interface board (you buy the PC) or part of a turn-key system that is possibly proprietary but most likely PC-based. With computer processor power increasing over time, it has become possible in the last two years to use a software based codec on a computer with more than 2GHz CPU speed, and acquire a fairly good quality of video transmission.
  • What actual audio and video codecs (compression/decompression methodologies) does the more broadly defined "codec" support?
    In order for a successful videoconference to take place, endpoints must be able to negotiate a common methodology for both audio and video exchange. Any given video terminal/codec (using the broader definition) may support a number of audio/video codecs (the narrower definition.) For a device to be considered "standards-compliant" (such as with H.323), a subset of audio/video support that enables basic communication with other devices of the same standard must be supported. A video terminal/codec may also support proprietary audio or video codecs of the system developer's own design. When two of these video terminals are in the same videoconference, they may have access to improved functionality, quality, or reliability between them because they can each understand and use the proprietary features. When selecting a videoconferencing terminal, you should be aware of its range of support for various types of audio/video compression. You then need to consider whether or not this range covers the range you are most likely to encounter in your videoconferences. More information about H.323 codecs specifically can be found in the H.323 Specification listed in the Appendices.

Recently, a new generation of hardware codecs has been introduced into the market, promising much better video transmission quality at limited network bandwidth. H.264 is a very demanding protocol in term of processing power, and therefore requires specialized hardware and cannot be implemented yet as a software component. H.264 capability is being included with increasing frequency in current and emerging video conferencing solutions.

The User Interface

All systems that are meant for use have a user interface. The friendliness of the user interface largely determines whether the system is embraced by end users, or left to be grudgingly approached on an "only-if-I-have-to" basis. The implications and importance of the user interface may easily be overlooked or taken for granted if the main functionality of the system is complex or interesting to the point of distraction. That may be the case with videoconferencing. Often we consider and compare videoconferencing terminals based solely on video and audio quality -- what it looks and feels like when we are actually in a conference -- and we don't necessarily stop to consider other features of the system. These other features may determine how we get into and out of conferences, what we can do in conjunction with a videoconference, and even what we know about how the call is going or what we have documented about the call once it's over. A sampling of specific features and considerations are listed below, some of which have already been touched upon and others that are addressed in greater detail in sections that follow:

  • How the video terminal application "works and plays" with others.
    Is the system easy to install, de-install, etc. How much system capacity does the videoconferencing application use? Can other applications run comfortably and reliably when the videoconferencing application is running and in use? Is a wide range of system performance acceptable, or are system requirements stringent? Has the videoconferencing application been tested for interoperability with other terminals of the same standard?
  • The means of initiating and accepting communication.
    Is there any easy way to access a Phonebook or directory of some type for keeping track of frequently accessed communications (i.e. people or places) in a user-friendly way? Is there an automatic log available for call history and/or error tracking? Can the data rate (bandwidth) be selected for particular communications in a way that is easily understood?
  • Application sharing and data collaboration.
    Are these features fully integrated into the videoconferencing application or are they provided using a "helper" application (e.g., NetMeeting) or perhaps not available at all?
  • Interaction with audio/videoconferencing devices.
    Can a wide variety of audio and video devices be used with the terminal application or are only certain devices supported? Are inputs and outputs other than cameras and monitors supported (e.g., VCR in or out?) To what degree can audio/video features (e.g., volume, echo, color, brightness) be controlled from within the application? Is there support for the use of alternate or enhanced devices (e.g., Far End Camera Control, dual monitors, telephone handsets for privacy?)
  • Support for the particular standards.
    How compliant is the video terminal with the current H.323 standard? With SIP? With the equipment of those with whom you know will want to communcate? How prepared is the terminal/developer/vendor to support future standards and directions? Does the video terminal make any concessions now to cover functionality gaps in current standards? (e.g. user authentication, secure gatekeeper or call server registration?)

Though this checklist only provides a partial glimpse into the very volatile area of videoconferencing terminal development, it should prove useful as a starting point for the very important task of evaluating the user interface.

The Supporting System and the Network Connection
Though the supporting system and the network connection are not technically part of the basic components of a videoconferencing terminal, they have a definite effect on the terminal's perceived performance. To understand more about the influence of each of these, please see the sections Network Matters and Selecting and Tuning Your PC

Videoconferencing system categories — putting everything together

Looking at current videoconferencing systems, and how the vendors implement all the above, can be useful to divide systems into 4 broad categories:

  1. Software only codecs
    These are the simplest systems. They frequently work with any webcam and a headset, and are very fast to set up. They provide adequate quality and ease of use for many applications of personal videconferencing. Examples: Netmeeting, VCON vPoint.
  2. Desktop/Laptop USB or PCI hardware codecs
    These are the next step up in quality and price. They include special hardware to assist the encoding of the audio and video signals, but the decoding is still done by the computer. They might offer more options for connecting external audio and video, but they almost always come with their own camera (webcam quality) and some form of a headset or handset. Examples: Polycom Viavideo, VCON ViGo
  3. Set-top devices
    As the name implies, these are meant to be installed on top of a TV set. They are very easy to install, operate, maintain and support. These usually offer a lot of options for connecting other audio and video devices for input or output, and are often the preferred choice for setting up a videoconference room due to their mix of portability, quality, and price. Example: Polycom Viewstation, Tandberg 880
  4. PC-based integrated codecs
    These are a combination of the previous two. They are usually an industrial grade computer with specialized hardware cards which offer a lot of connectivity options like the set-top category, but still can run PC applications like the desktop category. They are very useful when group videoconferencing is combined with collaboration applications. Example: Polycom iPower

Here is a chart showing how each of the basic components is implemented commonly in these categories, and how they compare in various features.

Feature

Software
codecs

Desktop
systems

Set-top
devices

PC-based
integrated

Camera

Webcam

Webcam

PTZ CCD camera

PTZ CCD camera

Display

Computer
screen

Computer
screen

TV, projector or
flat display

TV, projector or
flat display

Microphone

Headset

Headset/
Handset

Table-top
microphone

Table-top
microphone

Speakers

Headset

Headset/
mini-speakers

TV speakers
or external

External speakers

Encoding/
Decoding

Software/
Software

Hardware/
Software

Hardware/
Hardware

Hardware/
Hardware

User interface

Computer
application

Computer
application

On screen
menu

On screen menu
plus computer
application

Control device

Mouse/
keyboard

Mouse/
keyboard

Remote control

Remote control
plus wireless
mouse/keyboard

Portability

High

High for USB
Low for PCI

Medium

Medium

Audio/video
input/output
connectivity

Low

Low

Medium/High

Medium/High

Audio/video
quality

Decent

Fairly Good

Very good

Very good

Ease of use

Medium

Medium

High

Medium

Collaboration
capabilities

High

High

Medium

High

Reliability/
ease of maintenance

Medium

Medium

High

High

Price

Very low

Low

Medium

High

Examples

The VCON Vigo and Polycom ViaVideo are examples of hardware solutions for desktop PCs. They connect to the PC via USB and have their own onboard video codecs that help reduce CPU usage on the PC. Such systems are often deployed as a desktop solution.

The Polycom ViewStation and VCON Falcon IP are examples of purely hardware- based set-top solutions. The only additional hardware required to use one of these systems is a television. Completely self contained, they are excellent choices for conference rooms and classrooms.

 
Previous Next Print Contents Glossary Feedback Search

© 2004-6, Video Development Initiative.
Updated March, 2005.