Doug Engelbart's Colloquium at Stanford | Session 1: Peter Neumann

Doug Engelbart's
Colloquium at Stanford

An In-Depth Look at "The Unfinished Revolution"

Session 1
State of software technology: Security and reliability
Peter Neumann
video clip. 1.*

I've been involved for most of my professional life over the past 46 years in systems that are very reliable, very secure, in some sense highly survivable in the face of arbitrary adversities, whether it's security attacks, hacker attacks, or hardware malfunctions, or software failures, or cosmic radiation, or any arbitrary type of difficulty, including squirrels eating through cables and animals behaving in strange ways. And the challenge here has always been to design things that are better than what you get out of commercial products. Even in my doctoral work I was worried about communications systems that were able to withstand arbitrary adversities. So, for the last many years, beginning in '65 when I was working on Multix, and throughout the seventies, eighties and nineties at SRI, I've been concerned with very reliable, very secure, complex systems. We built, probably the world's first, fly-by-wire computer -- a seven-processor, self-diagnosing, self-reconfiguring system, with voting on critical tasks* -- that ran for years and years in the NASA laboratories on the ground. And we designed some very secure systems. 2

Even back in '65, we recognized the relevance of the Y2K problem, thirty-five years out, and developed a system that did not have the Y2K problem. This is sort of symptomatic of the difficulties in dealing with complexity: the incredible lack of foresight. The Y2K problem is one symptomatic example of that. A lot of the modern operating systems are other examples of it, particularly the PC software where you get enormous bloatware -- millions and millions of lines of code, most of which is inherently untrustworthy. So the challenge is: how do we build systems that, in some sense, are very reliable, very secure, very robust, highly dependable, interoperable with other systems, capable of robust networking, despite the fact that over and over again we see all these hidden flaws, hidden fault modes, hidden security problems? 3

The Y2K thing is one example where for years and years we've recognized the problem, and yet, nobody's done anything about it. The security of operating systems is another example. The typical vendor software is so riddled with security holes that one wonders why anyone would ever want to buy it. And, the answer is that most people don't seem to care. Now, as we get into the world of internet commerce, for example, it does matter all of a sudden. We've had the eBay outages; three different outages which caused them a great deal of business loss and loss of credibility. We've seen the AT&T collapse of 1990 where you basically couldn't make a long-distance call for half a day. Ten years before that was the ARPANet collapse of 1980 where, essentially, a fault mode, an isolated fault in one node contaminated every node in the entire network. The same thing happened ten years later in the case of the AT&T systems. So here we have the problem of building very robust, secure, reliable network systems despite the inherent complexity of what we're doing. 4

One of the normal challenges is: keep it simple. This doesn't work when you're talking about missile defense systems and enormous financial systems and things of that nature. So we have to learn how to deal with complexity. And, what is required is decent requirements in the first place, which we normally don't have in any system development. A good system design, a good system architecture, it is inherently capable of robust behavior. Decent software engineering practice, which we very seldom find. The buffer overflow problem, for example, has been around for decades and continues to haunt us. . Every year there are another group of major security vulnerabilities that result from buffer overflows, bad programming practice, bad operations, bad maintenance. Systems administrators are under tremendous pressure to keep systems up-to-date with all of the thousands of patches that are being released constantly. 5

This is not the way to run a business. You'd like operating systems that are intrinsically more secure. You'd like application software that is inherently more robust. Now, we tend to always blame it on the operator, or the user, or the systems administrator, but if you look at some of the cases of pilots who get blamed for aviation problems where the interface to the computer was a disaster -- blame it on the operator, blame it on the pilot. Don't blame it on the computer system is the immediate knee-jerk reaction. And this is a joke. If it weren't so sad and so serious, it would, in fact, be a joke. But it's not a joke; it's very serious. So, we're living in a world where we'd like to be able to deal with complexity. 6

Fred Brooks* suggests that you build one to throw it away. This never happened. What happens is you build a prototype, which has no security, no reliability, no availability, and no fault tolerance in it, and then you start growing it. And, what you wind up with is a monolithic bloatware system with millions and millions of lines of code that has no security, no reliability, no availability, and any of these other things that we need. The bottom line here is that we need new paradigms for system requirements, new paradigms for system development, new paradigms for programming that don't inherently result in buffer overflows and all of the other reliability problems and security flaws that we see over and over again. And one of my favorite architectures is in one in which we have very thin client-user systems -- basically no operating system that is overwritable or changeable, and very little that can be affected in the client system, with a lot of trustworthy servers around, and trustworthy distribution paths, and perhaps even a lot of open-source code, rather than the proprietary bloatware that we continually run into. 7

A course that I've been teaching in Maryland for the past year has explored all of these issues in some great length. All of the notes are online* and I think you might want to take a look at some of that. 8

[ principal lecture]

Top

Footnotes. 9

Re voting: two-out-of-three voting on critical tasks. 9A

Re Fred Brooks. Author of "The Mythical Man Month: Essays on Software Engineering." Addison-Wesley, 1995. 9B

Re course notes. Dr. Neumann taught a course on survivable systems and networks at the University of Maryland Department of Electrical Engineering in the Fall of 1999. The lecture notes are available as copyleft documents. A final set of lecture notes is available in PostScript form. 9C

---

Above space serves to put hyperlinked targets at the top of the window