An Annotated Bibliography

CS 598, Internet Telephony


Beyond VoIP Protocols by Olivier Hersent, Jean-Pierre Petit, and David Gurle, Wiley, 2005.

This book covers what needs to be done to advance beyond a basic, end-to-end Internet telephone system, particularly with respect to protocols.

Digital Telephony, third edition by John Bellamy, Wiley, 2000.

By the early 90s, about the same time the Internet was taking off as a public utility, the American public telephone system completed its change-over from analog to digital within the circle of central offices. This book describes what the change-over from analog to digital entailed.

Digital Telephony Over Cable by D. R. Evans, Addison Wesley, 2001.

Covers PacketCable, the Cable Consortium’s set of standards specifying a two-way digital communications system for cable TV systems.

Internet Telephony edited by Lee McKnight, William Lehr and David Clark, MIT Press, 2001.

A somewhat risky book that tries to think its way into the future of Internet-based telephony and communication systems more generally. “What goes around comes around” is probably the most useful thing to be thinking while reading this book.

Signaling and Switching for Packet Telephony by Matthew Stafford, Artech House, 2004.

What can be done once the bearer and control planes are separated into independent devices.

Voice over IP by Uyless Black, Prentice Hall, 2002.

A good introductory book, reasonably complete and occasionally deep. It will get you oriented in the VoIP landscape and sets you up to explore further.

Voice over IP Fundamentals by Jonathan Davidson and James Peters, Cisco Press, 2000.

A book published, you will have noticed, by Cisco designed to make technical managers comfortable and adept at constantly shoveling out more budget for Cisco boxes one bigger than the ones they’ve already got.

VoIP Hacks by Ted Wallingford, O’Reilly, 2006.

A hodgpodge of tips & tools for Internet telephony.


Some of these papers are freely available, some require registration, which you get automatically if you access the link from within the domain. If you're not within the domain and can't get there, you have to be a member of the ACM or IEEE (depending on the paper) digital library.

An Architecture for Residental Internet Telephony Service by Christian Huitema, Jane Cameron, Petros Mouchtaris and Darek Smyk in IEEE Internet Computing, May-June 1999 (v. 3, n. 3).

An internet-telephony archiecture should be able to handle millions of end-points, integrate seamlessly with the public telenephone network (PTN) including SS7 support, and be as reliable as the PTN. Given the dissimilarites between the Internet and the PTN, the architecture should be gateway-based, including a residential gateway, a trunking gateway, user agents, and the usual media gateways.

An Architecture for Secure VoIP and Collaboration Applications by Dimitris Zisiadis, Spyros Kopsidas and Leandoros Tassiulas in the Third International Workshop on Security, Privacy and Trust in Pervasive and Ubiquitous Computing, 19 July 2007.

VoIP and collaboration Internet applications usually require registration in a central user database and use either two bridged client-server connections between the end users and the server or they allow direct client connections. Biometric-based procedures followed by the VoIPSec (voice interactive personalized security) protocol can provide end-to-end security for such applications. This approach doesn’t need a trusted third-party authentication authority.

Anti-Vamming Trust Enforcement in Peer-to-Peer VoIP Networks by Nilanjan Banerjee, Samir Saklikar and Subir Saha in Proceedings of the 2006 International Conference on Wireless Communications and Mobile Computing.

I send you a letter and seal it with a wax imprint. You trust the letter came from me because the name and wax imprint match. Let my name be a bit string n and the wax imprint be another bit string w with the property that prefix(h(w), t) = prefix(n, t). prefix(b, n) is the first (leftmost) n bits from the bit string b, h() is a secure hash function, and t is a non-negative integer. Because h() is impossible to invert, finding a wax imprint for which t is large is expensive; wax imprints with large t values are more trustworthy (in some sense) than wax imprints with small t values. Using a public key from a public-key cryptosystem as my name provides authentication by encoding the wax imprint with my private key.

Building Trustworthy Systems: Lessons from the PTN and Internet by Fred Schneider, Steven Bellovin and Alan Inouye in IEEE Internet Computing, November-December 1999 (v. 3, n. 6).

The Internet and the public telephone network (PTN) have different ways of being attacked; skills learned on one network don’t transfer to the other. However, their increasing integration makes each an ingress for attacks on the other. The PTN’s eroding monopoly status and the Internet’s increasing commercialization gives rise to a cloud of diverse, minimally-cooperative agents whose actions make matters worse. What can go wrong is well known; what is to be done isn’t clear.

Critical VPN Security Analysis and New Approach for Securing VoIP Communications over VPN Networks by Wafaa Diab, Samir Tohme and Carole Bassil in Proceedings of the 3rd ACM Workshop on Wireless Multimedia Networking and Performance Modeling.

Many VoIP security attacks can be frustrated using encryption. VPN is a standard mechansim for encrypting on the Internet, but is oriented toward non-real-time data streams. VPN encryption for VoIP should support real-time traffic using IP Security mechanisms and guarantee the performance and quality of services without reducing the effective bandwidth.

Decentralizing SIP by David Bryan and Bruce Lowekamp in ACM Queue, March 2007 (v. 5, n. 2).

A peer-two-peer (p2p) overlay network responds naturally to network connectivity and membership changes at the cost of introducing uncertainty about network state. Hybrid p2p networks impose some structure - using, for example, a distributed hash table - to reduce the uncertainty at a cost of increasing the effort required to maintain the network. Session Initiation Protocol (SIP) overlay networks are mostly distributed except for a few centeralized services such as registration. Moving a SIP network to a p2p network would make formally centeralized services unacceptably expensive, but a hybrid p2p network may provide an appropriate trade-off between the ability to react naturally to network-configuration changes and the cost of providing formally centralized services.

The Delay-Friendliness of TCP by Eli Brosh, Salman Abdul Baset, Dan Rubenstein and Henning Schulzrinne in Proceedings of the 2008 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems.

Despite admonishions not to, many real-time Internet applications use TCP for data transport. How does that work out for them? A Markov-chain model validated by simulatons on networks shows that low packet-loss rates produce small (< 1 sec.) TCP delays, as the loss rate increases the RTT should decrease to compensate, and that large streams (500 Kb/s video) are more effected than small streams (64 kb/s audio). Also, apart from the usual parameter games (big window size, no Nagel, no byte counting, use SACK and so on), splitting large packets into small ones may help the stream but may hurt the network and using parallel streams helps muchly.

The Economics of the Internet: Utility, Utilization, Pricing and Quality of Service by Andrew Odlyzko, AT&T Research, 7 July 1998.

Can throwing bandwidth at the Internet solve congestion problems? Can it solve congestion problems as efficiently and effectively as other approaches, such as various quality of service (QoS) regimes? Many people say no, but it’s not clear why that’s the correct answer.

The Effect of Packet Dispersion on Voice Applications in IP networks by Hanoch Levy and Haim Zlatorkrilov in IEEE/ACM Transactions on Networking, April 2006 (v. 14, n. 2).

Defines the noticeable packet loss (NPL) metric which weights packet loss occurring close together over dispersed packet loss (that is, bursty over Bernoulli loss) and then models how packet loss under dispersed packet routing effects NPL. Packets are distributed among routes randomly, cyclically, or round-robin. Route diversity does improve NPL, but the assumptions used (particularly for independence and receive-side packet handling) to carry the analysis gives one pause.

Enabling SIP-Based Sessions in Ad Hoc Networks by Nilanjan Banerjee, Arup Acharya and Sajal Das in Wireless Networks, August 2007 (v. 13, n. 4).

Session Initiation Protocol (SIP) servers running in the Internet have a relatively stable infrastructure on which to build an overlay network for endpoint discovery and session establishment. Ad hoc networks do not provide a stable infrastructure and require extra techniques to support SIP-based overlay networks. One technique, the loosely coupled approach, relies on the underlying ad-hoc routing and provides endpoint discovery. Another technique, the tightly coupled approach, includes session establishment by defining a virtual topology among clusters of end-points. Simulations show that tight coupling is better (has lower latency) in stable networks while loose coupling is better in dynamic networks. In all cases the extra structure provided by tight coupling has is less control overhead than does loose coupling.

End-To-End Arguments in System Design by Jerome Saltzer, David Reed and David Clark in ACM Transactions on Computing Systems, November 1984 (v. 2, n. 4).

What services should a network provide? The end-to-end argument answers this question by assuming each service added to the network is enormously expensive and requires showing that the enormous expense will be amortized over all network users. If that totalizing amortization can’t be carried out, the feature doesn’t belong in the network.

From POTS to PANS: A Commentary on the Evolution to Internet Telephony by Christos Polyzois, Hal Purdy, Ping-Fai Yang, David Shrader, Henry Sinnreich, François Ménard and Henning Schulzrinne in IEEE Internet Computing, May-June 1999 (v. 3, n. 3).

The Internet has a structure significantly different from that of the public telephone network (PTN), both in the network and at the end-points. At least initially, the Internet phone services will echo those of the PTN, raising the question what should be brought over from the PTN and what should be reconsidered anew. The PTN’s Intelligent Network infrastructure is the most likely contact point for IP networks, both as a way to use existing PTN services and functions and as a way to hook in new Internet-based services.

Guaranteeing Multiple QoSs in Differentiated Services Internet by Hoon Lee and Hyejin Kwon and Yoshiaki Nemoto in Proceedings of the Seventh International Conference on Parallel and Distributed Systems.

An architecture to guarantee multiple Quality of Services (QoSs), including the IETF’s Differentiated Service (DiffServ) architecture and the user application’s requirements. A prioritized packet service scheme using weighted round-robin in the core router supports weighted priority services for the three IETF service classes: EF (Expedited forwarding), AF (Assured forwarding) and DF (Default forwarding).

Holistic VoIP Intrusion Detection and Prevention System by Mohamed Nassar, Saverio Niccolini, Radu State and Thilo Ewald in Proceedings of the First International Conference on Principles, Systems and Applications of IP Telecommunications, 2007.

Bruce Schneier often points out that several flexible, lightweight security layers often combine to provide better overall security than does a single, heavily armored bastion. Holistic VoIP security illustrates Schneier’s point by using two layers to provide VoIP security. The first layer is a VoIP honeypot to collect and analyze data on attacks. The second layer is an event correlater that observes a working VoIP system and flags operation sequences that seem suspicious.

Integrating Internet Telephony Services by Wenyu Jiang, Jonathan Lennox, Sankaran Narayanan, Henning Schulzrinne, Kundan Singh and Xiaotao Wu in IEEE Internet Computing, May-June 2002 (v. 6, n. 3).

Cinema (Columbia Internet extensible multimedia arechitecture) is a SIP-based subsystem that hosts various multimedia facilities such as conferencing (bridging), streaming media, unified voice messaging, and address resolution. Cinema integrates with existing voice networks and end-points via SIP proxies and gateways.

Integration of Call Signaling and Resource Management for IP Telephony by Pawan Goyal, Albert Greenberg, Charles Kalmanek, William Marshall, Partho Mishra, Doug Mortz and K. Ramkrishnan in IEEE Internet Computing, May-June 1999 (v. 3, n. 3).

An IP network usually has computing devices of varying power serving as end-points and network nodes. A signaling architecture for such a network should be distributed so work can be performed at the most appropriate location and open so new services and old services re-implementations can be easily added. Distribution requires scheduling to determine which locations are appropriate and to dispatch work to those locations; QoS issues — such as packet loss, delay, and jitter — can be a first-cut driver for making scheduling decisions.

A Modular Architecture for Providing Carrier-Grade SIP Telephony Services by Hechmi Khlifi and Jean-Charles Grégorie in the Third IEEE International Converence on Wireless and Mobile Cmmputing.

A modular, flexible and scalable architecture to provide mass-market telephony services services in SIP environments. The architecture uses Parlay, a standard, object-oriented and signaling protocol-neutral API, and SIP to separate application logic and network function and, at the network level, signaling and media processing.

Peer-to-Peer Internet Telephony Using SIP by Kundan Singh and Henning Schulzrinne in Proceedings of the International Workshop on Network and Operating Systems Support for Digital Audio and Video, 13–14 June 2005, pages 63–68.

Internet telephony (IT) networks embedded in the Internet have the usual tree hierarchy structure. An alternative structure flattens IT subtrees (domains) into a peer sets with no hierarchy. A flat domain should improve reliability and change accommodation while making it harder to find resources. Session Initiation Protocol servers in a flat hierarchy can run a peer-to-peer (P2P) network protocol, such as Chord or Content-Addressable Network, to organize themselves. However, typical P2P services are latency tolerant and exploit resource replication while IT services are latency intolerant and can’t easily replicate many resources (end users and databases, for example). P2P security and economics models also match poorly with the equivalent IT models.

Programming Internet Telephony Services by Jonathan Rosenberg, Jonathan Lennox and Henning Schulzrinne in IEEE Internet Computing, May-June 1999 (v. 3, n. 3).

A control plane full of SIP servers can be induced to provide new services using a CGI-like mechanism. New services are implemented as programs independent of SIP servers and then invoked as independent processes by SIP servers when the service is needed. A call-processing language, circumscribed in its abilities to limit dangerous operations and to make it statically checkable, makes it possible for end-users to implement custom services.

Providing Emergency Services in Internet Telephony by Henning Shulzrinne and Knarig Arabshian in IEEE Internet Computing, May-June 2002 (v. 6, n. 3).

Emergency communications systems impose new requirements, such as universal numbering, call routing, and caller number and location identification, as well as the usual performance and reliability requirements on IP-based voice-service networks. Replicating the emergency PSTN architecture is (relatively) straightforward, but an IP network’s modular, service-based structure allows for new architectures with better flexibility and scalability.

Real-Time Voice Communication over the Internet Using Packet Path Diversity by Yi Liang, Eckehard Steinbach and Bernd Girod in Proceedings of the Ninth ACM International Conference on Multimedia, pages 431–440.

The quality of real-time voice communication over best-effort networks is mainly determined by the delay and loss characteristics observed along the network path. Excessive playout buffering at the receiver is prohibitive and significantly delayed packets have to be discarded and considered as late loss. We propose to improve the tradeoff among delay, late loss rate, and speech quality using multi-stream transmission of real-time voice over the Internet, where multiple redundant descriptions of the voice stream are sent over independent network paths. Scheduling the playout of the received voice packets is based on a novel multi-stream adaptive playout scheduling technique that uses a Lagrangian cost function to trade delay versus loss. Experiments over the Internet suggest largely uncorrelated packet erasure and delay jitter characteristics for different network paths which leads to a noticeable path diversity gain. We observe significant reductions in mean end-to-end latency and loss rates as well as improved speech quality when compared to FEC protected single-path transmission at the same data rate. In addition to our Internet measurements, we analyze the performance of the proposed multi-path voice communication scheme using the ns network simulator for different network topologies, including shared network links.

SCTP: A Proposed Standard for Robust Internet Data Transport by Armando Caro, Jr., Janardhan Iyengar, Paul Amer, Sourabh Ladha, Gerard Heinz, II and Keyur Shah in IEEE Computer, November 2003 (v. 36, n. 11).

The Stream Control Transmission Protocol (SCTP) provides associations between processes on hosts; each association contains one or more unidirectional streams. SCTP provides flow- and congestion-controlled reliable packet transport; each packet is mixture of control and data blocks. SCTP end-points can straddle several ports on each host; set-up uses a four-way handshake to avoid syn attacks and a three-way tear-down for speed (and eliminating TCP’s half-close semantics).

Security Issues with the IP Multimedia Subsystem (IMS) by Michael Hunter, Russ Clark and Frank Park in Workshop on Middleware for Next-generation Converged Networks and Applications, Newport Beach, California, 26–30 November 2007.

The Internet Multimedia Subsystem (IMS) is designed to support convergent services comprising voice and data. IMS security and related covers all the usual suspects (QoS, billing, services, regulation, security) from the providers’ and users’ perspectives. Apart from a new, more complex architecture, IMS-relevant consideration of these areas will be familiar to those with experience in other areas of Internet-based subsystem design.

Security Patterns for Voice over IP Networks by Eduardo Fernandez and Juan Pelaez and Maria Larrondo-Petrie in Proceedings of the International Multi-Conference on Computing in the Global Information Technology, 4–9 March, 2007.

The grand convergence of voice, video and data on VoIP networks is a source of great hope, but also a source of security concerns do to the lack of isolation between the bit streams. Various system structures, described as software patterns can re-establish isolation to improve security. The patterns involve encryption, network segmentation, tunneling, and authentication.

The Session Initiation Protocol: Internet-Centric Signaling by Hennig Schulzrinne and Jonathan Rosenberg in IEEE Communications, October 2000 (v. 38, n. 10).

The Session Initiation Protocol (SIP) provides signaling and control for multimedia services. SIP locates resources based on a location-independent name and negotiates session characteristics. It can be used for Internet telephony and conferencing, instant messaging, event notification, and the control of networked devices. SIP is a typical IETF protocol: text-based, line-oriented, request-response. Designed to be extensible, SIP has been extended in several ways to define new services (instance messaging, for example) and features (authentication, for example).

A SIP-Based Conference Control Framework by Petri Koskelainen, Henning Schulzrinne and Xiaotao Wu in Proceedings of the 12th International Workshop on Network and Operating Systems Support for Digital Audio and Video.

Conference services in Internet-telephony (IT) systems should be implemented in a way consistent with IT to reap the benefits of such systems. SIP-based coordination using SOAP provide the mechanisms for conference and floor control. Central SIP servers and unicast should be good enough for small conferences, but larger conferences probably require distributed servers or multicast or both.

SOVoIP: Middleware for Universal VoIP Connectivity by M. J. Arif and S. Karunasekera and S. Kulkarni in 8th ACM/IFIP/USENIX International Conference on Middleware.

VoIP has a number of protocols that don’t interoperate, but instead are coordinated by protocols such as SIP or H323. For some reason, SIP or H.323 don’t look enough like middleware, so maybe they can be replaced (or suplimented, it isn’t clear) by CORBA or web services. Naturally CORBA is right out, because of its firewall difficulty and performamce, leaving web services in the form of Service Oriented VoIP (SOVoIP). Just to make sure, SOVoIP performs better than CORBA, but no comparisons are made with SIP or H.232.

Terminating Telephony Services on the Internet by Vijay Gurbani and Xian-He Sun in IEEE/ACM Transactions on Networking, August 2004 (v. 12, n. 4).

How to originate a service in the telephone network and terminate it in an Internet-based network using standard protocols (SIP, HTTP, XML) and a publish-subscribe architecture. The desire to avoid middleware is admirable, but requiring direct access to signaling is troubling. It’s also unclear whether the same architecture can apply in the Internet-to-telephone direction.

Time Synchronization for VoIP Quality of Service by Hugh Melvin and Liam Murphy in IEEE Internet Computing, May-June 2002 (v. 6, n. 3).

Effectively handling time-sensitive voice playout over the Internet requires good and stable information about end-to-end delays. Relatively simple estimation at the receiver’s end works well as long as the estimates don’t drift too rapidly. Time synchronization via GPS provides a uniform, stable time signal end-points can use to produce accurate, stable delay measurements.

Towards a new Security Architecture for Telephony by Carole Bassil, Ahmed Serhrouchni and Nicolas Rouhana in Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL ’06).

The telephone and VoIP networks place different emphasis in their security policies and use different machanisms to acheive their policies. This difference is yet another gap that has to be bridged in the networks’ grand convergence. However, rather than using gateways to translate between the security mechanisms, a shim layer in each network protocol stack would allow each security mechanism to be translated to a common mechanism providing a secure end-to-end voice communication.

Tussle in Cyberspace: Defining Tomorrow’s Internet by David Clark, John Wroclawski, Karen Sollins and Robert Braden in IEEE/ACM Transactions on Networking, June 2005 (v. 13, n. 3).

A tussle is a clash of interests among competing parties in a system. The Internet was designed and implemented in a relatively tussle-free environment; however, the Internet’s current popularity and importance has increased the number and diversity of competing parties and greatly increased the number of tussles, making the original design principles less useful then they once were. New design principles should recognize and identify places where tussles may occur and support late binding to allow a range of possible resolutions.

Ubiquitous Computing using SIP by Stefan Berger, Henning Schulzrinne, Stylianos Sidiroglou and Xiaotao Wu in Proceedings of the 13th International Workshop on Network and Operating Systems Support for Digital Audio and Video.

The Session Initiation Protocol (SIP) is an open, extensible, distributed, request-response infrastructure. Extending a SIP-based communication system with user-location information allows for services that follow you around and customize themselves to your location. Such an extension requires a subsystem for discovering user location, a subsystem for managing location information, and a subsystem for reacting to location state.

Unified Communications with SIP by Martin Steinmann in ACM Queue, March 2007 (v. 5, n. 2).

Proprietary PBXs are disappearing because standard and open-source Internet-telephony software, such as SIP, can provide similar services more flexibility and less cost, and are easy to extend to provide new services.

A Voice Over IP Service Architecture for Integrated Communications by Daniele Rizzetto and Claudio Catania in IEEE Internet Computing, May-June 1999 (v.3, n. 3).

The unification of voice and data traffic in the Internet overshadows an increasing separation between the control and data parts of the network. Emphasizing the control-data separation can make it simpler to efficiently implment advanced services, as well as well as isolate each part from technological change in the other part. A service architecture providing an abstract API to control network preserves the advantages of separated control and data.

VoIP Security and Privacy Threat Taxonomy, VOIPSA, 24 October 2005.

All (most? some? a few?) of the goblins that could get you if you don’t watch out.

VoIP Security: Not an Afterthought by Douglas Sicker and Tom Lookabaugh in ACM Queue, September 2004 (v. 2, n. 6).

The things that make VoIP interesting and important — distributed operation, flexibility, openness — also makes it hard to secure. One advantage is an Internet base, which come with existing relevant security work and research.

VoIP: What is it Good For? by Sudhir Ahuja and Robert Ensor in ACM Queue, September 2004 (v. 2, n. 6).

A brief, high-level comparison between service implementation in the PSTN and over VoIP networks, mostly to the favor of VoIP networks. Recognizes the importance of service development the the future of VoIP networks, but then presents lame examples (click-to-dial web page links, persistent chat).

You Don’t Know Jack About VoIP by Phil Sherburne and Cary Fitzgerald in ACM Queue, September 2004 (v. 2, n. 6).

Voice over Internet shows great promise due to network flexibility and openess, but also presents great challenge given the service requirements for good quality voice traffic, as well as management and security requirements.

This page last modified on 14 August 2008.