The Behavior Gap: Three Persistent Problems for Internet Technologies

By Andy Oram
July 16, 2008 | Comments: 1

Behind the competing technologies for Internet application development--which impinge directly on the plans of Internet providers and dot-com businesses--lie some basic problems with Internet standards and protocols. I often sense that the communities that develop Internet applications, as well as those who try to build businesses on them, are stuck. These problems go deep. Knowing the reasons for them, we might find ways to do better within their limitations. And we could at least avoid a lot of false paths and recriminations over failures.

Each technical problem is also a metaphor for difficulties in the way people interact, both online and off-line:

Too many to many

ISPs, customers, and major Internet sites are screaming at each other about large downloads. As many people have pointed out, the Internet's packet-switching design and best-effort delivery policy makes it unsuitable for real-time streaming media. I covered the problems in two older articles, A Nice Way to Get Network Quality of Service? and Network Neutrality and an Internet with Vision.

But there's a more fundamental problem.

The broadcast model dominated the first century of electronic media because the model is so simple. The air waves create a wide, permanent, fixed channel between a central broadcaster and his grateful customers. The circuit-switched telephone was also simple, and immediately successful. We understand one-to-many and one-to-one communication models.

But the YouTube phenomenon and Web 2.0 assume a many-to-many model. We just don't have efficient techniques for to handling that model, particularly for streaming media. It requires ad-hoc channels that can be erected quickly between people who don't have a pre-existing relationship. Packet switching has taken us amazingly far toward solutions, but current user activity is showing up its limitations.

It's worth noting that many-to-many models are hard for other computer technologies to handle, too. Relational databases offer one-to-many and one-to-one relationships, but have to cobble them together to fake a many-to-many relationship.

The many-to-many model doesn't scale in social terms either. It can be applied metaphorically to real life, where we're used to one-to-many relationships (with centralized government and business institutions) as well as one-to-one relationships. We build up many-to-many relationships in our schools, churches, and neighborhoods, but we don't really treat them as such because we rarely try to manage all the complex interrelationships in these institutions.

As many anthropologists have pointed out, groups of humans lose a sense of connection if they grow too large (although the studies disagree on an optimal size). I suspect that, if online groups behave differently from geographically close groups, their maximum effective size will prove to be even lower, unless the environment is radically enhanced with advanced visualization and virtualization.

We've made progress in tackling the many-to-many problem on the Internet through an investment in faster connections and a sophisticated use of caches and information sharing. But these are imperfect solutions to a problem we're reluctant to call by name.

No time like the present

The congestion and connection-handling features of the TCP protocol have been tweaked a lot over the decades and do a pretty good job. But they still have to react crudely to a very limited set of information.

The basic approach TCP takes to congestion is to slow down when the host notices time-outs indicating dropped packets. More recent versions of TCP can also play around with the timing of acknowledgments over a series of packets. This is still a very localized form of information. TCP can't learn as much as routing protocols can learn (which is still not very much) about the state of the network beyond the hosts immediately connected to the local one.

Similarly, wireless mesh networks use packet loss to determine routing decisions. Mesh networks resemble LANs in some ways and WANs in others. As I reported last November, algorithms that try to take the larger network structure into account have been tried and found lacking; they're less reliable at finding the best route than algorithms that just rely on packet loss.

In short, decisions on bandwidth use and routing affect large parts of the network but are based only on a sliver of locally available information. This is like driving a hundred kilometers an hour in fog.

It's not surprising that similar problems turn up on the application layer. Theo Schlossnagle recently sounded a bit resigned about the difficulty of predicting for traffic spikes on web sites.

There's a trivial observation behind all this: we don't know what will happen next. The could easily be a ragged discontinuity between the current state of the network (including, of course, our role in it) and the state of the network one minute from now.

The challenges of trying to design technical protocols around what will happen in the future can also be a metaphor for what life is like for all of us in the Internet age. The business consultants all say, "Look toward the next big trend." But this is chasing a mirage. We can't know the next big trend in an environment where new tools get posted to the Internet and new hardware gadgets get imported from other continents faster than we can read press releases about them.

The best individuals can do, it seems, is make local decisions, like the TCP protocol. They find the tools that meet their needs and make alliances with others who show similar interests.

The technologies' ignorance of the context they're operating in could also be a metaphor for the lack of context that hampers all our interactions on the Internet. A conference, a church function, a work environment, and even a summer resort all provide structures that allow us to make some assumptions about the people we meet there, as well as providing rules of engagement and etiquette. As is well-known, such structures and rules are harder to establish online, and a community that succeeds in establishing them finds it much harder to convey them to newcomers. They're left trying to make the right moves based on witnessing a few exchanges.

It's ironic that we have to base decisions on local information in a medium that has extended our social settings further than anything in history--and that punishes any missteps by preserving them forever and making them accessible to everyone.

Splitting the atomic operation

The hot environment for application development these days is web services, and they're a pain in the ass. Even with APIs provided by scripting languages, the ramp-up curve is daunting. Logins, session and error handling, and version changes are more complex because you're working with a remote system.

Historically, it didn't take long after the introduction of networks for programmers to start designing systems for remote procedure calls. The transition from local to remote system calls shouldn't have been hard. Unlike physical atoms, computer programs had natural fault lines. Modularity was supposed to make all programs easy to break into unrelated pieces, and moving some pieces to other hosts was expected to be relatively transparent to both the programmer and the end user.

But modularization is rarely clean. Many applications still depend on shared context, which has to be embodied in the form of sessions when the applications are split onto different hosts. Dropped connections are time-outs can't be hidden from the programmer, and the process of discovering hosts and entry points leads to the networking equivalent of what Windows programmers used to call DLL hell. The complexity of the solutions to discovery caused CORBA to fail in the marketplace, and has probably held back SOAP as well.

The Web was not designed for web services. In fact, despite Berners-Lee's talk of the read/write Web and later the Semantic Web, it was really designed for the exchange of static files. The basic HTTP commands are PUT and GET, the same as FTP.

Although HTTP runs over TCP, application and behavior are more like UDP. Send a request, wait a bit, and then either retry it or give up and do something else. (I'm indebted to my friend and author Karl Fogel for this observation)

Around 2000 or 2001, new peer-to-peer protocols figured out how to get around firewalls by using HTTP's port 80. System administrators called this work-around "port 80 pollution"; now it's called web services.

Over time, the Web got spruced up for interprogram communications with CGI, the SOAP and REST technologies that enabled web services, and the quite powerful HttpRequest command that underlies Ajax. Frameworks hide the distinctions between browsers and reduce the number of languages programmers need to learn in order to write web applications.

But with interfaces in every programming language for web services, these applications will probably end up moving off the Web. The same APIs could be implemented over some other protocol.

The problems of discovery, session handling, and security will remain. They might be lightened by the development cleaner programming implementations--but then they'll get harder again as we pile on identification systems, reputation, and ways of associating classes of data with individuals (either openly or covertly, as companies try to find out more about whom they're dealing with).

The difficulties of distributed processing and distributed data will probably interact to make our tasks geometrically more difficult. So it doesn't matter so much if we fix the problems of packet-switching, congestion handling, or Web sessions. We'll have bigger headaches at higher layers to deal with anyway. Maybe it's not such a bad industry to be in--we know there will always be plenty of work to do.


You might also be interested in:


1 Comments

Andy -
There are some interesting thoughts here, but you are really talking about three separate and only slightly related topics. Multi-cast network traffic has been problematic for quite a while. Try figuring out how to set up a video conference between multiple sites and you quickly run into major difficulties.

The discussion of port 80 pollution -- which I heartily agree is a big problem -- and the evolution of something that was originally designed essentially as a rudimentary document management system into all things to all people is interesting, but I think merits more discussion.

Popular Topics

Archives

Or, visit our complete archives.

Recommended for You

Got a Question?