Network Matters
ViDe Videoconferencing Cookbook
Network Fundamentals for the Videoconference
Videoconferencing was originally deployed over networks that could provide
some guarantees about the level of service that would be delivered to the application.
The ISDN and/or dedicated T1 circuits of the H.320 standards-based world provided
predictable delays over dedicated paths. This allowed videoconferencing vendors
to create products to work within these parameters. However, dedicated circuits
are also expensive circuits.
IP standards-based videoconferencing was engineered for videoconferencing
that takes place on a data network without any quality-of-service standard,
such as the Internet. Such networks are not intended for delivery
of sensitive near real-time applications. The data network is used for multiple
purposes: e-mail, web browsing, and other activities are inter-mixed with
IP
videoconferencing.
The audio/video information within a videoconference is segmented into chunks
by the application, encoded and compressed, put into a series of data packets
and sent over the network to the remote end at basically constant intervals.
The data packets arrive at their destination at varying times,
if at all, and often out of order. To keep the "real time" impression of
an interactive videoconference, the packets must arrive, on time and in time
to be re-ordered for delivery through the videoconferencing terminal.
There are five fundamental network problems for videoconferencing over networks
such as the Internet. They are bandwidth, packet loss, latency, jitter and
policies.
Bandwidth is the fundamental requirement that there be enough space
in a network path for all of your packets to get through unimpeded. For a
rough
idea of scale, a typical ISDN videoconference uses around 128-384Kbps (kilobits
per second). IP-based H.323 video systems can use the same bandwidth, although
in general they tend
to go higher since the network is cheaper, so bandwidth of around 384-768Kbps
is very common. The bandwidth required for a given videoconferencing
speed is higher on IP networks than on ISDN networks. That is because of the
packetizing overhead of IP. The overhead is about 20%, so for example a
384Kbps IP conference actually uses about 450Kbps of bandwidth.
Higher-quality H.323 and SIP videoconferences can go to 1.5-3.0Mbps,
and if
you want to go to broadcast quality with alternate codecs including MPEG
1/2/4 and MJPEG, the sky is the limit — 6 - 20Mbps
for NTSC/PAL transmission, 20 - 50Mbps for prerecorded HDTV and higher for ‘live’ content.
This bandwidth should be symmetric — meaning each end should be able
to both send and receive the same amount of data for the call. This connection
speed, 384Kbps plus overhead for example, is the maximum bandwidth the call will
use in either
direction.
The actual speed varies depending on how active or static the video image is.
When the call is initiated, there will be a spike in bandwidth use as the entire
video frame is sent;
then usage will fall off as only image updates are sent. A graph showing bandwidth
usage during the initialization of a 384Kbps call is below. This only shows
the outbound traffic, with basically still video after initiation, thus the
significant drop in bandwidth use after the call was started.

Different clients are sensitive to discrepancies in bandwidth symmetry in
different ways depending on if the bandwidth restriction is incoming or outgoing.
In most cases, only video frame rate is affected, though some clients may drop
the video or even the call alltogether.
If
you are in a multi-point videoconference then you need to keep in
mind that the
MCU/bridge is seeing all of the streams at the
same time, even if it is not forwarding them on. So if you have an
8-site videoconference
running at 384Kbps, every site sends and receives up to 384Kbps to
the MCU, and the MCU receives and forwards 8*384Kbps = 3Mbps roughly.
Packet loss is when packets fail to arrive correctly. This can
be due to insufficient bandwidth along the path (when congestion occurs,
routers will drop packets), or perhaps errors in transmission. Errors
occur most commonly on wireless links such as microwave, satellite or
local wireless Ethernet. They can however also occur on copper and even
fiber links. Packet loss results in effects such as "tiling" within the
video window, missing pieces or blank areas within the video window,
and/or
disruptions in audio.
Latency is the time delay between an event occurring and the remote
end seeing it. Latency is introduced both by the encoding/decoding process,
and hence depends on the equipment used, and also by the time it takes packets
to traverse the network. There is little you can usually do to change the network
latency, on any large scale, beyond getting directly involved with a carrier
or a research network. Optimally, a connection from the United States to
Europe on a fiber optic network will have roughly 90ms (milliseconds) of latency.
That same connection going through a satellite can be 200ms.
Excessive latency increases the chances of people "talking over one
another"
because they don't realize that the person at the other end has started
speaking too. This is less significant in calls with less than 50ms
of network latency. It can become very troublesome in calls with more
than 150ms. Another problem is that the latency for the audio and video
may be different, and hence
lip movements
don’t
appear synchronized with the audio. This is a function of both the terminal
and the network,
and can vary dramatically — some products try to compensate for
it. You should experiment to see if it is an issue for your applications.
Jitter is the random variation in latency due to things like competing
processes running on the terminal (for example on your desktop PC), other
traffic
temporarily blocking the path through routers along the way, or even the
network path changing during a videoconference. This random variation is
one of several things that cause
packets to arrive out of order from their transmitted order. Jitter results
in uneven and unpredictable quality within a videoconference and the
endstation client will
try to compensate for it by buffering the traffic up to some finite time,
before playing it out to you. This increases the latency even further.
Policies are introduced by things like firewalls and network
address translation (NAT) devices that are generally used to try to hide
or protect network elements from the wider Internet. H.323 uses dynamically
allocated ports and is thus not very firewall-friendly.
|