A cluster is a group of computers that have been interconnected to share processing. Clustering technology, which coordinates the computers as they work in parallel on computing tasks, comes in a variety of flavors. Clusters can be built from custom hardware and software that implement a fast interconnect and some kind of shared memory or cache. Clustering technology can also be completely implemented in software, making use of the power and low prices of today’s PCs, inexpensive high-speed networking like Fast Ethernet, and a robust open operating system like Linux. An example of this kind of cluster is the Linux Beowulf system, which consists of a server node connected to multiple client nodes via some type of private high-speed network.
In this article, we describe the Java Application Server Pseudo-cluster (JASPer), a simple cluster built with commodity hardware and free software from the Apache Foundation. The master server runs the Apache HTTP daemon and the cluster nodes are running the Apache JServ Java application server. Using this combination of open software and inexpensive equipment, my team has been able to reap the benefits of clustering while avoiding some of the disadvantages.
Advantages and Disadvantages of Clusters
There are three primary benefits of clustering, depending on the computing tasks to be shared by a cluster.
1)The ability to scale capacity.
2) Additional computers can be easily added to a properly constructed cluster to gain additional capacity for processing.
3)This works only until any shared resources in the cluster (the master node and cluster interconnect) begin to become saturated. Second is fault tolerance.
Assuming that the cluster is smart enough to stop trying to use any nodes that are no longer available, it will continue to run as long as the master node and at least one client node can communicate. Finally, higher performance is often achieved with a cluster by distributing work across multiple servers, although the shared resources in a cluster will again place an upper limit on how far this benefit can extend.
There are, however, some problems with clusters, which have limited their use.
1) The difficulty of setting up and administering a cluster has prevented some from taking advantage of them.
2) IT shops without the appropriate expertise find the administration of a cluster very complex and, depending on the type of clustering technology used, support services can be expensive or unavailable proper knoladge must required.
Another limitation is the kind of applications that can benefit from a clustered platform. To take advantage of the performance of a cluster, computing tasks must be partitioned in some way to distribute parts of the tasks across multiple processors. In parallel computing, this partitioning and distribution is usually performed in the application program, inserted by intelligent optimizing compilers or based on directives included by the programmer. For example, on a Beowulf system, the programmer decides which parts of the program can be run concurrently on different client nodes and uses a messaging software API or remote shell commands to run those procedures on client nodes.
Load Balancing Clusters:
However, certain computing tasks are inherently concurrent and can be partitioned on a cluster without special programming. A high-volume transaction processing system, in which multiple transaction requests are submitted concurrently, can be naturally partitioned on the transactions. A common example of such a system is a Web server,
Load balancing software can be used in a similar way to create effective clusters for Web applications. , which provides load-balancing across a set of Java servlet engines. Methods similar to those described here can be used to cluster any Web application server that supports load balancing. This technique can minimize the primary disadvantages seen with other kinds of clusters. By using your existing application server software, administration of the cluster is only incrementally more complex than administering a single Web server. And, the transaction-oriented nature of Web applications means that no special programming is required to partition the computing tasks.
On the other hand, a significant limitation of this kind of cluster is that it only runs applications written for the application server. It is not a general purpose cluster and that is one reason we named our implementation a “pseudo-cluster” (although the main reason was that it sounded better than “Cluster” in the acronym JASPer).
In “Building a Web-Based Java Application server with Apache JServ” . As described in article, load balancing is performed by the Apache HTTP server module mod_jserv. When an initial request is received by mod_jserv on the master node, it randomly selects a JServ cluster node from the defined set of servers. The request is passed to that cluster node using thep Apache JServ Protocol (AJP), a network protocol for communicating between the master and cluster nodes. mod_jserv adds a server id to the JServ session id so that subsequent requests for that session are recognized and routed to the same cluster node.
In practice, we have found the JServ load balancing mechanism to be quite effective. The load has been very closely balanced as measured by the number of sessions on each server. During moderately busy times, each JServ node in our cluster handles between 65 and 75 sessions. Actual load may not always correspond to the number of sessions being handled, but in our portal application, it is close because there is not a wide discrepancy in the processing required to handle different kinds of requests.
The three parts of a load balancing cluster comprise the master node, which performs the load balancing; the cluster nodes, which process the transactions; and the cluster interconnect, which allows the master and cluster nodes to communicate. The components used for JASPer are illustrated in Figure 1.
The master node can be the weak link in a cluster since all transactions must go through it. A master node failure will bring down the whole cluster, so the platform must be very robust. To enhance availability and performance, a Web redirecting switch could be used to route HTTP requests to redundant master nodes, which are configured to use the same set of cluster nodes in a many-to-many cluster configuration. However, there is still a potential single point of failure. It has just been moved from the master node to the (presumably more robust) Web redirecting switch.
For cluster, generally used a Sun Enterprise 4000 with six 250-Mhz processors and 3.5-GB RAM as the master node. This is more processing power than needed to receive HTTP requests and route them to the cluster nodes, but this server hosts several Web sites and an online library catalog application besides being the JASPer master node. This platform was selected for its reliability more than its performance.
Cluster nodes can be any combination of servers that run Apache JServ, including Linux, Solaris, and Windows NT platforms. If a combination of different hardware or software platforms is used, you should make some guesses about their relative performance so you can weight them in the load balancing set defined for mod_jserv. Benchmarking of the individual nodes, as described below, can be used to make better estimates of their relative capacity. For example, if you have one new high-speed node that can handle three times as many requests per second as an old slower node, you might ask mod_jserv to route three quarters of the requests to the faster node with these directives:
ApJServBalance setname FASTNODE 3
ApJServBalance setname SLOWNODE 1
Since JASPer is supporting primary enterprise Web portal, some are able to get funding to buy four new Pentium III 600 Mhz Linux servers to serve as the JServ cluster nodes. Memory requirements for the cluster nodes depend on the Java applications that run on them. In our case, we use lightweight sessions, saving the minimum information about each session that we need, but load a lot of configuration information in each servlet at startup to avoid having to read the configuration database and files for each request. 256 MB RAM appears to be sufficient to avoid swapping. Memory requirements of the Java Virtual Machine can be determined at runtime by a Java monitoring agent class that is presented below.
some persons also purchased a dedicated 100-Mbs network switch with a Gigabit Ethernet upload module to connect to the master node. We initially purchased more bandwidth than necessary to make sure we could scale the cluster as usage increased. Like the master node, the cluster interconnect is a shared resource and potential single point of failure. For performance and fault tolerance, redundant interconnect networks could be implemented. But in our experience, network switches are more reliable than servers.
So this redundancy is usually not necessary. Although virtually all the interconnect traffic is between the master node and a cluster node, this switch is also connected to external networks via a router to support occasional connections required for services such as DNS lookup and remote administration with telnet. Alternatively, they could have had there master node act as a router for the cluster’s external network traffic.
The total cost for the four cluster nodes, network switch, and Gigabit Ethernet adapters for JASPer was about $10,000. We can scale up to 10 cluster nodes for just the cost of additional nodes before reaching the capacity of our cluster interconnect. The cost of the cluster could have been reduced significantly if we had used our existing LAN for the interconnect and recycled old PCs as the cluster nodes. However, we think the cost is justified when you factor in the increased reliability of the new equipment along with the performance.