Basic Requirements for Successful Videoconferencing
ViDe Videoconferencing Cookbook
Basic Components
As discussed in our introduction
What is Videoconferencing?,
any videoconferencing terminal must have a few basic components to "get
the job done": a camera (to capture local video), a video display (to
display remote video), a microphone (to capture local audio), and speakers
(to play remote audio.) In addition to these more obvious components, a videoconferencing
terminal also includes a codec ("COmpressor/DECompressor"), a user
interface, a computer system to run on, and a network connection. Each of
these components plays a key role in determining the quality, reliability,
and user-friendliness of the videoconferencing experience as well as any
given videoconferencing terminal's suitability to particular purposes. A
basic understanding of each of these component's roles will help you map
videoconferencing technology capabilities to your specific application needs.
The Main Camera
By nature of the general definition of videoconferencing, at least one video
source is typically present at each endpoint. The most common video source
is a single main camera that captures live movement occurring at one end
so that it may be sent to the other end in near real-time. ("Near real-time" is
an important concept in the success of a videoconference and is covered more
in the following sections on the codec and in our later full section on the network
connection.)
The most important component of the camera is the image sensor which captures
snapshots of the view in regular intervals (25-30 times per second.) This
sensor is characterized by the number of pixels or dots that it can distinguish
in an image, called the resolution. Typical resolutions
are 640x480 pixels for webcams, 720x480 for NTSC cameras and 720x576 for
PAL cameras. (The height, or vertical resolution, is listed first; the
width, or horizontal resolution, is listed second.) The technology that
the sensor is based on is also important. Inexpensive
videoconference cameras, usually termed "webcams" come with a CMOS sensor,
which gives adequate image quality, but cannot render color and brightness
signals very well. This causes colors to appear dull or slightly distorted,
and it also makes contrast adjustments difficult. CMOS cameras are also affected
a lot by the quality of lighting in the room. Better cameras, at a multiple
of the price, incorporate CCD sensors, similar to those used in professional
cameras, and can therefore capture a much better image, offering a lot of
possibilities for adjustments, to adapt to any room.
When selecting a camera for videoconferencing, it is important to understand
that the quality of your camera heavily determines how your video will appear
to the receiving end. It is often our first reaction to attribute video quality
to the receiving system (i.e.," Why does their picture looks so bad
when we spent $XXXX on this system?!") yet, if you cannot see the other
site clearly, their camera is quite often the culprit. It follows that,
when evaluating camera quality, you need to be sure you are shown how your
image will appear to others. In addition to image quality, cameras vary in
terms of other features that will affect both their usefulness and their
cost. Among these are: the ability to pan, tilt, and zoom (often abbreviated
as a PTZ camera), wide angle versus narrow angle lens, manual focus versus
auto-focus, manual iris versus auto-iris, auto-tracking, remote control,
and/or RS-232 control. Naturally, as features are added, cost goes up. Considering
the impact of the main camera on the success of a videoconference, it is
extremely important to imagine ahead of time how the camera will actually
be used (i.e., room setup, number of participants, user temperament, etc.)
and then ensure that the selected camera can support those uses.
The Video Display
In addition to capturing local video, a videoconferencing solution must
include the ability to display the remote video that is being received. This
incoming video is displayed on a monitor, most often a computer monitor,
which influences how clearly the remote site can be seen and also how many
people at the receiving site can easily see it. "Typical" display
monitor quality considerations such as screen size and resolution affect
the size and clarity of the incoming video window and also the integration
of the incoming video window with the application interface that surrounds
it. The quality of the image within the video window itself is, however,
more directly related to the performance and capabilities of the codec and
to the quality and bandwidth of the network connection. In the case of a
desktop videoconferencing terminal, most offer a scaleable video window that
shares space on a PC desktop with other program/application windows. In such
cases, the conference aspects most heavily influenced by the capabilities
of the computer monitor are the appearance of the video window itself (not
what is inside it) and the ability to manipulate that window within the larger
display. In some cases, an entire display monitor can be dedicated to displaying
incoming video (a "full screen" conference) while a second monitor
is added for call control and data sharing. A note: Video resolutions
supported by the popular videoconferencing standards H.323 are
CIF (352 X 288 pixels) and QCIF (176 by 144 pixels.)
Since these resolutions are fixed, increasing the network bandwidth of a
call beyond a certain point will not show an appreciable difference in video
quality within any given video frame. However, additional bandwidth enables
higher frame rates (i.e., the sending of additional video frames per second),
which can have dramatic improvements on the smoothness and video quality
of motion.
One thing to bear in mind regarding video display is that the resolutions
mentioned above are rather small compared to a typical 1024x768 resolution
of a computer screen. If you enlarge the video
window on a PC, not every pixel displayed will be "real". For example, if
you double the width and height of the window to 704x576 pixels, only a quarter
of the pixels will be the actual image information:
|
352x288
|
704x576
|
|

|

|
The white pixels in the second image will be automatically filled in by
the system to look similar to their neighbors, but they will not be exactly
what the original picture was and video quality will be degraded.
This is the same effect that you can see when you enlarge the window of a
media player (Windows, Quicktime, Real) playing a video clip. Therefore,
enlarging a picture should only be done when more people are sitting around
a monitor and need to see the image.
On the other hand, in room videoconferencing, display devices are normally
larger. A TV monitor can be used in most cases. A rule of thumb to select
a size for the monitor is that the viewers should be at a distance between
2 to 6 times the diagonal size of the screen. As an example, if the participants
in the room are sitting between 4 and 12 feet from the monitor, then a 24"
TV would be sufficient. In bigger rooms, where an LCD or DLP projector might
be installed, diagonal size and viewing distance should still adhere to this
ratio.
Audio Components
Within a videoconference audio is as important, and often considered more
important, than video. If we lose video or experience poor video quality
in a conference but audio remains intact, we can still accomplish many of
our communication objectives. The conference would simply become a teleconference
rather than a videoconference. In contrast, poor or disrupted audio quality
effectively shuts down a videoconference, often sending participants scrambling
to find a "native audio" telephone to complete the meeting. In
light of this, the devices that capture local audio (microphones) and those
that reproduce remote audio (speakers) are critical conference components.
Coupled with this are characteristics associated with comprehensible full
duplex (simultaneous two-way) transmission of audio, such as echo cancellation,
noise suppression, and audio mixing. These features are influenced by a combination
of the microphones, speakers, and codecs. Similar to the camera discussion,
it would be impossible to cover all features of audio performance here. However,
one key to ensuring audio that supports conference requirement and expectations
is to examine the location, quantity, and quality of your microphones and
speakers. Again, as features are added, cost goes up, though the cost differences
may not be as pronounced as they are in camera selection. Since hearing is
often the best test, you may want to speak and listen before you buy!
One of the catches of adjusting audio quality is that you can never know
100% what quality of sound you are transmitting. This is most usefully
determined by a colleague who is engaged in a videoconference with you and
tells you how you sound. If they hear reverbs or echoes, this indicates a
problem at your end and you
should make the necessary adjustments in your audio system to correct it,
then check with your colleague to see whether that worked. There are also
usually "loopback" test commands in videoconference systems that allow you to hear what signal
you are sending, but they are not so reliable. On the other hand if you are hearing
echo effects when you are speaking with another site, you can do them a favor
by letting them know that they need to adjust their settings. They won't
know unless you tell them!
For situations other than personal videoconferencing, a small or larger audio
mixer is a very useful tool. It enables you to adjust gains and volumes
more accurately than the embedded adjustments of most videoconference systems.
It also enables you to separate and mix signals easier. A feature that is
particularly useful in mixers is the ability to produce different mixes of
the same signals. As an example, you will want to send the audio signal from
the remote site to your amplifier or speakers, but you will also want to
cut it off from any signal that is fed back to them.
Regarding the microphones, there is a very wide range that can be used for
videoconferencing, from a headset mic to an array of room microphones. The
inexpensive $2-3 tabletop computer microphones found in many stores
often produce unacceptable audio or are not full duplex. On an opposite
note, the frequencies used in videoconferencing do not exceed 7KHz, which
is more than enough for voice signals, so expensive professional microphones with extensive
frequency response won't make any difference in the audio quality. For personal
videoconferencing, a headset is often the preferred choice because it can isolate the
incoming from the outgoing signal, and therefore eliminate any echo or reverb
effect. Some people prefer a speakerphone, however, and these are available
as well. For room videoconferencing, a high quality omni-directional microphone
is often used, or several smaller directional microphones are placed throughout
the room. In all situations, it is important to avoid placing a microphone
in the active range of a speaker. This can cause an echo effect which
is very distracting and difficult to counterbalance.
Regarding speakers, it is usually easier to select these than it is to select
microphones. As above, for personal videoconferencing a headset or a set
of plain computer speakers is often enough. For a bigger room and more people,
the speakers of a TV monitor can be used. For even larger rooms a separate
sound system might be required comprising an amplifier, an optional equalizer
and speakers.
The Codec
The codec has been mentioned above as affecting both the video and audio
within a videoconference. Indeed, the codec actually forms the heart of any
videoconferencing terminal.
The word "codec" is a shortened version of "COmpressor/DE-Compressor" and
is specifically applied to the wide variety of algorithms used for actually
compressing or decompressing audio and/or video information. This compression
has historically been necessary to make the audio/video data "small
enough" to be practical for sending over expensive network connections.
In this sense, there are many audio and video "codecs" (particular
compression/decompression methodologies) that are supported as part of
most videoconferencing technologies and standards.
For the purposes of this section, we are considering a broader meaning for codec:
the codec as the portion of the videoconferencing terminal that is responsible
for whatever compression/decompression of the audio/video signals is taking
place. The processes of compression and decompression are also referred to
as encoding and decoding respectively.
This latter and broader definition allows for the codec to be either a software
or hardware component, and confers great responsibility upon the codec for
the success of the videoconference. The amount of data required to "describe" audio
and video in a digital format is very large by today's data networking standards.
Without some form of codec, the transmission of a videoconference requires
extremely high amounts of network bandwidth. It is the codec that takes the
sights and sounds captured by the local camera and microphone, and then compresses
that information such that it may be transmitted across a network fast enough
to enable near real-time communication. When the compressed information is
received at the remote site, the codec within the remote site's videoconferencing
terminal decompresses it and enables "play back" through the speakers
and display. Though we think of the conference as a real-time conversation,
the real-time feeling is a function of how fast each of the codecs are compressing/decompressing
the data, and how fast and reliably the compressed data is traveling back
and forth across the network. The video compression is much more demanding
than the audio compression, and this is what sets the limits on codec capabilities.
In light of this, some factors to consider when evaluating codecs are:
- Is the codec a software or hardware component?
Hardware codecs are generally faster in completing their compression/decompression
task, making near real-time communication more likely. Hardware codecs
also often carry their own processing power "on-board" such
that they do not rely on the resources of the underlying system. For
instance, in the case of a desktop system, using a hardware codec may
mean that you don't need a "souped-up" PC, or that you will
be able to run other applications on your PC while simultaneously participating
in a videoconference. On the other hand, software codecs are generally
less expensive and easier to install (no special hardware required),
but they tend to produce lower quality ("casual") conferencing
with very low frame rates. In desktop videoconferencing systems,
the codec sometimes resides on an interface board or, more typically, in a software
application. In group conferencing systems, the codec is most
likely an interface board (you buy the PC) or part of a turn-key
system that is possibly proprietary but most likely PC-based. With
computer processor power increasing over time, it has become possible
in the last two years to use a software based codec on a computer with
more than 2GHz CPU speed, and acquire a fairly good quality of video
transmission.
- What actual audio and video codecs (compression/decompression methodologies)
does the more broadly defined "codec" support?
In order for a successful videoconference to take place, endpoints must
be able to negotiate a common methodology for both audio and video exchange.
Any given video terminal/codec (using the broader definition) may support
a number of audio/video codecs (the narrower definition.) For a device to be
considered "standards-compliant" (such as with H.323), a subset of audio/video
support that enables basic communication with other devices of the same standard
must be supported.
A video terminal/codec may also support proprietary audio
or video codecs of the system developer's own design. When two of these
video terminals are in the same videoconference, they may have access to
improved functionality, quality, or reliability between them because they
can each understand and use the proprietary features. When selecting a
videoconferencing terminal, you should be aware of its range of support
for various types of audio/video compression. You then need to consider
whether or not this range covers the range you are most likely to encounter
in your videoconferences. More information about H.323 codecs specifically can be found in
the H.323 Specification listed in the
Appendices.
Recently, a new generation of hardware codecs has been introduced into the
market, promising much better video transmission quality at limited
network bandwidth.
H.264 is a very demanding protocol in term of processing power, and therefore
requires specialized hardware and cannot be implemented yet as
a software component. H.264 capability is being included with increasing frequency
in current and emerging video conferencing solutions.
The User Interface
All systems that are meant for use have a user interface. The friendliness
of the user interface largely determines whether the system is embraced by
end users, or left to be grudgingly approached on an "only-if-I-have-to" basis.
The implications and importance of the user interface may easily be overlooked
or taken for granted if the main functionality of the system is complex or
interesting to the point of distraction. That may be the case with videoconferencing.
Often we consider and compare videoconferencing terminals based solely on
video and audio quality -- what it looks and feels like when we are actually
in a conference -- and we don't necessarily stop to consider other features
of the system. These other features may determine how we get into and out
of conferences, what we can do in conjunction with a videoconference, and
even what we know about how the call is going or what we have documented
about the call once it's over. A sampling of specific features and considerations
are listed below, some of which have already been touched upon and others
that are addressed in greater detail in sections that follow:
- How the video terminal application "works and plays" with
others.
Is the system easy to install, de-install, etc. How much system capacity
does the videoconferencing application use? Can other applications run comfortably
and reliably when the videoconferencing application is running and in use?
Is a wide range of system performance acceptable, or are system requirements
stringent? Has the videoconferencing application been tested for interoperability
with other terminals of the same standard?
- The means of initiating and accepting communication.
Is there any easy way to access a Phonebook or directory of some type
for keeping track of frequently accessed communications (i.e. people or places)
in a user-friendly way? Is there an automatic log available
for call history and/or error tracking? Can the data rate (bandwidth)
be selected for particular communications in a way that is easily understood?
- Application sharing and data collaboration.
Are these features fully integrated into the videoconferencing application
or are they provided using a "helper" application (e.g., NetMeeting)
or perhaps not available at all?
- Interaction with audio/videoconferencing devices.
Can a wide variety of audio and video devices be used with the terminal application
or are only certain devices supported? Are inputs and outputs other than
cameras and monitors supported (e.g., VCR in or out?) To what degree can
audio/video features (e.g., volume, echo, color, brightness) be controlled
from within the application? Is there support for the use of alternate or
enhanced devices (e.g., Far End Camera Control, dual monitors, telephone
handsets for privacy?)
- Support for the particular standards.
How compliant is the video terminal with the current H.323 standard? With SIP?
With the equipment of those with whom you know will want to communcate? How
prepared is the terminal/developer/vendor to support future standards
and directions? Does the video terminal make any concessions now to cover
functionality gaps in current standards? (e.g. user authentication,
secure gatekeeper or call server registration?)
Though this checklist only provides a partial glimpse into the very volatile
area of videoconferencing terminal development, it should prove useful
as a starting point for the very important task of evaluating the user interface.
The Supporting System and the Network Connection
Though the supporting system and the network connection are not technically
part of the basic components of a videoconferencing terminal, they have
a definite effect on the terminal's perceived performance. To understand
more about the influence of each of these, please see the sections
Network Matters
and
Selecting and Tuning Your PC
Videoconferencing system categories — putting everything together
Looking at current videoconferencing systems, and how the vendors implement
all the above, can be useful to divide systems into 4 broad categories:
- Software only codecs
These are the simplest systems. They frequently work with any webcam and a headset,
and are very fast to set up. They provide adequate quality and ease
of use for many applications of personal videconferencing. Examples: Netmeeting,
VCON vPoint.
- Desktop/Laptop USB or PCI hardware codecs
These are the next step up in quality and price. They include special hardware
to assist the encoding of the audio and video signals, but
the decoding is still done by the computer. They might offer more options
for connecting external audio and video, but they almost always come with
their own camera (webcam quality) and some form of a headset or handset.
Examples: Polycom Viavideo, VCON ViGo
- Set-top devices
As the name implies, these are meant
to be installed on top of a TV set. They are very easy to install,
operate, maintain and support. These usually offer a lot of options for
connecting other audio and video devices for input or output, and are often the
preferred choice for setting up a videoconference room due to their mix of portability,
quality, and price. Example: Polycom
Viewstation, Tandberg 880
- PC-based integrated codecs
These are a combination of the previous
two. They are usually an industrial grade computer with specialized hardware
cards which offer a lot of connectivity options like the set-top category,
but still can run PC applications like the desktop category. They are very
useful when group videoconferencing is combined with collaboration applications.
Example: Polycom iPower
Here is a chart showing how each of the basic components is implemented commonly
in these categories, and how they compare in various features.
|
Feature
|
Software
codecs
|
Desktop
systems
|
Set-top
devices
|
PC-based
integrated
|
|
Camera
|
Webcam
|
Webcam
|
PTZ CCD camera
|
PTZ CCD camera
|
|
Display
|
Computer
screen
|
Computer
screen
|
TV, projector or
flat display
|
TV, projector or
flat display
|
|
Microphone
|
Headset
|
Headset/
Handset
|
Table-top
microphone
|
Table-top
microphone
|
|
Speakers
|
Headset
|
Headset/
mini-speakers
|
TV speakers
or external
|
External speakers
|
|
Encoding/
Decoding
|
Software/
Software
|
Hardware/
Software
|
Hardware/
Hardware
|
Hardware/
Hardware
|
|
User interface
|
Computer
application
|
Computer
application
|
On screen
menu
|
On screen menu
plus computer
application
|
|
Control device
|
Mouse/
keyboard
|
Mouse/
keyboard
|
Remote control
|
Remote control
plus wireless
mouse/keyboard
|
|
Portability
|
High
|
High for USB
Low for PCI
|
Medium
|
Medium
|
|
Audio/video
input/output
connectivity
|
Low
|
Low
|
Medium/High
|
Medium/High
|
|
Audio/video
quality
|
Decent
|
Fairly Good
|
Very good
|
Very good
|
|
Ease of use
|
Medium
|
Medium
|
High
|
Medium
|
|
Collaboration
capabilities
|
High
|
High
|
Medium
|
High
|
|
Reliability/
ease of maintenance
|
Medium
|
Medium
|
High
|
High
|
|
Price
|
Very low
|
Low
|
Medium
|
High
|
Examples
|

|
The VCON Vigo and Polycom ViaVideo are examples of hardware solutions
for desktop PCs. They connect to the PC via USB and have their own
onboard video codecs that help reduce CPU usage on the PC. Such systems
are often deployed as a desktop solution.
|

|
|

|
The Polycom ViewStation and VCON Falcon IP are examples of purely
hardware- based set-top solutions. The only additional hardware required
to use one of these systems is a television. Completely self contained,
they are excellent choices for conference rooms and classrooms.
|

|
|