The Benefits of Modular Programming

By Tim Boudreau, Jaroslav (Yarda) Tulach and Geertjan Wielenga

2.1 Distributed Development

Nobody writes software entirely in-house anymore. Outside the world of embedded systems, almost everyone relies upon libraries and frameworks written by someone else. By using them, it is possible to concentrate on the actual logic of the application while reusing the infrastructure, frameworks, and libraries written and provided by others. Doing so shortens the time needed to develop software.

The rise of open source software over the past decade makes library reuse doubly compelling. For many kinds of programs there are existing solutions for various problems, and those solutions are available at zero monetary cost. The set of open source offerings starts with UNIX kernels, base C libraries, command-line utilities, and continues over Web servers and Web browsers to Java utilities such as Ant, Tomcat, JUnit, Javacc—ad infinitum. Writing modern software is as much a process of assembly as it is creation. Picking available pieces and composing them together is a large part of modern application development. Instead of writing everything from scratch, people who need an HTTP server for their application select Apache or Tomcat. Those who need a database could choose MySQL or PostgreSQL. The application glues these pieces together and adds its own logic. The result is a fully functional, performant application developed in remarkably little time.

Consider how Linux distributions work. RedHat's Fedora, Mandriva, SUSE, and Debian all contain largely the same applications, written by the same people. The distributor simply packages them and provides the "glue" to install them together. Distribution vendors often write only central management and installation software and provide some quality assurance to make sure all the selected components work well together. This process works well enough that Linux has grown considerably in popularity. As evidence of the meaningfulness of such a model, consider that Mac OS X is in fact a FreeBSD UNIX with a bunch of add-ons from Apple. The key thing to note is that the software in question is created through a distributed development model. The developers and distributors of the software may not even know or communicate with each other, and are usually not even in the same place geographically.

Such distributed development has specific characteristics. The first thing to notice is that the source code for the application (or operating system) is no longer under a developer's complete control. It is spread all over the world. Building such software is unquestionably different from building an application whose source code is entirely in your in-house repository.

The other thing to realize is that no one fully controls the schedule of the whole product. Not only the source code, but also the developers are spread all over the world and are working on their own schedules. Such a situation is not actually as unusual or dangerous as it sounds. Anyone who has tried to schedule a project with a team of more than fifty people knows that the idea of ever having "full control" over the process is at best a comforting illusion. You always have to be prepared to drop a feature or release an older version of one or another component. The same model works with distributed development.

The basic right everyone has is the freedom to use a newer or older version of a library.

The ability to use external libraries and compose applications out of them results in an ability to create more complex software with less time and work. The trade-off is the need to manage those libraries and ensure their compatibility. That is not a simple task. But there is no other practical, cost-efficient way to assemble systems of today's complexity.

2.2 Modular Applications

The technological solution to the challenges of distributed development is modularization. A modular application, in contrast to one monolithic chunk of tightly coupled code in which every unit may interface directly with any other, is composed of smaller, separated chunks of code that are well isolated. Those chunks can then be developed by separate teams with their own life cycles and their own schedules. The results can then be assembled together by a separate entity—the distributor.

It has long been possible to put a bunch of libraries on the Java classpath and run an application. The NetBeans Platform takes the management of libraries further—by actively taking part in the loading of libraries and enforcing that the minimum version of a library that another library uses is adequate. Such libraries are what we call modules. The NetBeans Module System is a runtime container that ensures the integrity of the system at runtime.

2.2.1 Versioning

Breaking an application into distinct libraries creates a new challenge—one needs to ensure that those independent parts really work together. There are many possible ways to do so. The most popular is versioning. Each piece of a modular application has a version number—usually a set of numbers in Dewey decimal format, such as 1.34.8. When a new version is released, it has an increased version number, for example 1.34.10, 1.35.1, or 2.0. If you think about it, the idea that an incremented version number can encode the difference between two versions of a complex piece of software is patently absurd. But it is simple to explain, and it works well enough that the practice is popular.

The other parts of a modular system can then declare their external dependencies. Most components will have some external requirements. For example, a component in a modular system might rely on an XML parser being present, or on some database driver being installed, or on a text editor or Web browser being present. For each of these, another module can request a specific minimum version of their interfaces. Even if the dependencies on external libraries are minimized, every program in Java depends on a version of Java itself. A true modular system should make it possible to specify the desired minimum JDK version. A module could require JDK >= 1.5, xmlparser >= 3.0, and webbrowser >= 1.5. At runtime, the code responsible for starting the application must ensure that the requested dependencies are satisfied—that the XML parser is available in a version 3.0 or newer, the Web browser is in version 1.5 or higher, and so forth. The NetBeans Module System does that. Using such dependency schemas to maintain dependencies between components in a modular system can work only if certain rules are obeyed. The first rule is backward compatibility—that if a new version is released, all contracts that worked in the previous version will work with the new one as well. This is easier to say than to achieve. Rule number two is that components of the system need to accurately say what they need. When a module's set of dependencies changes, it needs to say so, so that the system can accurately determine if they are satisfied. So if a piece of a modular system starts to rely on new functionality, such as an HTML editor, it needs to add a new dependency (e.g., htmleditor >= 1.0). And if you start to use a new interface to the HTML editor component—one which was only added in version 1.7 of the component—the dependency needs to be updated to require htmleditor >= 1.7. The NetBeans Module System makes this second part relatively simple in practice, since a module's compile-time classpath will only include modules it declares a dependency on. So unless the module's list of dependencies is updated, it will not compile.

2.2.2 Secondary Versioning Information

The versioning scheme just discussed refers to the specification version of a library. It describes a specific snapshot of the public APIs in that library.

It is a fact of life that some versions of libraries can contain bugs which must be worked around. For this reason, a secondary version identifier—an implementation version—should be associated with a component. In contrast to the specification version, this is usually a string like "Build20050611" which can only be tested for equality. This provides a secondary identifier that can be used to determine if a specific piece of code to work around a given bug is needed. The fact that a bug is present in (specification) version 3.1 does not mean it will also be in version 3.2 or even in a different build of 3.1. So, for reasons of bugfixing or special treatment of certain versions, associating an implementation version with a library can be useful.

2.2.3 Dependency Management

The system of versions and dependencies needs a manager that makes sure all requirements of every piece in the system are satisfied. Such a manager can check at each piece's install time that everything in the system remains consistent—this is how RPMs or Debian packages work in Linux distributions. Metadata about such dependencies is also useful at runtime. Such metadata makes it possible for an application to dynamically update its libraries without shutting down. It can also determine if the dependencies of a module it is asked to dynamically load can be satisfied—and if not, it can describe the problem to the user.

NetBeans IDE is a modular application. Its modules—its constituent libraries—are discovered and loaded at runtime. They can install various bits of functionality, such as components, menu items, or services; or they can run code during startup to initialize programmatically; or they can take advantage of declarative registration mechanisms that various parts of the platform and IDE offer to register services and initialize them on demand. The NetBeans Module System uses the declared dependencies of the installed components to set up the parent classloaders for each module's own classloader, determining what JARs will be searched when a module tries to load a class. This ensures that any one module's classpath excludes any module JARs which are not above it in its dependency tree and enforces the declared dependencies of each component—a module cannot call code in a foreign module unless it declares a dependency on that foreign module, so it will not be loaded at all if some of its dependencies cannot be satisfied.

[next]

URL:

The Benefits of Modular Programming