Abstract
We argue that objects that interact in a distributed system need to be dealt with in ways that are intrinsically different from objects that interact in a single address space. These differences are required because distributed systems require that the programmer be aware of latency, have a different model of memory access, and take into account issues of concurrency and partial failure. Distributed computing became a field of intense study as the result of computer hardware miniaturization and advances in networking technologies. Distributed computing aims to unify multiple networked machines to let them share information or other resources, and encompasses multimedia systems, client-server systems, parallel computing, Web programming, mobile agents, and so on.
We look at a number of distributed systems that have attempted to paper over the distinction between local and remote objects, and show that such systems fail to support basic requirements of robustness and reliability. These failures have been masked in the past by the small size of the distributed systems that have been built. In the enterprise-wide distributed systems foreseen in the near future, however, such a masking will be impossible.
We conclude by discussing what is required of both systems-level and application-level programmers and the following topics
What are distributed computer systems?
Distributed Computer Systems – 1
Distributed Computer Systems – 2
Distributed Computer Systems – 3
How It Works
Distributed Computing Management Server
Motivation for Distributed Computer Systems
More complex distributed computing examples – 1
More complex distributed computing examples – 2
Distributed Computer System Metrics
Distributed Computer System Architectures
conclusion
Introduction to Distributed Systems
What are distributed computer systems
Architectures
Transparency and design issues
Distributed computing paradigms
Distributed operating systems
Parallel and concurrent programming concepts
What are distributed computer systems?
Compare: Centralised Systems
One system with non-autonomous parts
System shared by users all the time
All resources accessible
Software runs in a single process
(Often) single physical location
Single point of control (manager)
Single point of failure
Distributed Computer Systems - 1
Multiple autonomous components
Components shared by users
Resources may not be accessible
Software can run in concurrent processes on different processors
(Often) multiple physical locations
Multiple points of control
Multiple points of failure
No global time
No shared memory
Distributed Computer Systems - 2
Networked computers (close or loosely coupled) that provide a degree of operation transparency
Distributed computer system =independent processor + net working infrastructure
Communication between processes (on the same or different computer) using message passing technologies is the basis of distributed computing
Virtual computing
Two issues
How do computers communicate
How do processes on different computers interact
Distributed Computer Systems - 3
A distributed system is:
A collection of independent computers that appears to its users as a single coherent system.
(Idea of a virtual computer)
. . Autonomous computers
. . Connected by a network
. . Specifically designed to provide an integrated computing environment
How It Works
In most cases today, a distributed computing architecture consists of very lightweight software agents installed on a number of client systems, and one or more dedicated distributed computing management servers. There may also be requesting clients with software that allows them to submit jobs along with lists of their required resources.
An agent running on a processing client detects when the system is idle, notifies the management server that the system is available for processing, and usually requests an application package. The client then receives an application package from the server and runs the software when it has spare CPU cycles, and sends the results back to the server. The application may run as a screen saver, or simply in the background, without impacting normal use of the computer. If the user of the client system needs to run his own applications at any time, control is immediately returned, and processing of the distributed application package ends. This must be essentially instantaneous, as any delay in returning control will probably be unacceptable to the user.
Distributed Computing Management Server
The servers have several roles. They take distributed computing requests and divide their large processing tasks into smaller tasks that can run on individual desktop systems (though sometimes this is done by a requesting system). They send application packages and some client management software to the idle client machines that request them. They monitor the status of the jobs being run by the clients. After the client machines run those packages, they assemble the results sent back by the client and structure them for presentation, usually with the help of a database.
If the server doesn't hear from a processing client for a certain period of time, possibly because the user has disconnected his system and gone on a business trip, or simply because he's using his system heavily for long periods, it may send the same application package to another idle system. Alternatively, it may have already sent out the package to several systems at once, assuming that one or more sets of results will be returned quickly.
Motivation for Distributed Computer Systems
High cost of powerful single processor – its cheaper (£/MIP) to buy many small machines and network them than buy a single large machine. Since 1980 computer performance has increased at 1.5 per year
Share resources
Distributed applications and mobility of users
Efficient low cost networks
Availability and Reliability – if one component fails the system will continue
Scalability – easier to upgrade system by adding more machines, than replace the only system
Computational speedup
Service Provision -Need for resource and data sharing and remote services
Need for communication
More complex distributed computing examples – 1
Computing dominated problems
(Distributed processing)
Computational Fluid Dynamics (CFD) and Structural Dynamics (using Finite Element Method)
Environmental and Biological Modeling – human genome project, pollution and disease control, traffic simulation, weather and climate modeling
Economic and Financial modeling
Graphics rendering for visualization
Network Simulation – telecommunications, power grid
More complex distributed computing examples – 2
Storage dominated problems
(Distributed data)
Data Mining
Image Processing
Seismic data analysis
Insurance Analysis
Distributed Computer System Metrics
Latency – network delay before any data is sent
Bandwidth – maximum channel capacity (analogue communication Hz, digital communication bps)
Granularity – relative size of units of processing required. Distributed systems operate best with coarse grain granularity because of the slow communication compared to processing speed in general
Processor speed – MIPS, FLOPS
Reliability – ability to continue operating correctly for a given time
Fault tolerance – resilience to partial system failure
Security – policy to deal with threats to the communication or processing of data in a system
Administrative/management domains – issues concerning the ownership and access to distributed systems components
Distributed Computer System Architectures
Flynn, 1966+1972 classification of computer systems in terms of instruction and data stream organizations
Based on Von-Neumann model (separate processor and memory units
4 machine organizations
SISD - Single Instruction, Single Data
SIMD - Single Instruction, Multiple Data
MISD - Multiple Instruction, Single Data
MIMD - Multiple Instruction, Multiple Data
Distributed Computers are essentially all MIMD machines
SM – shared memory, multiprocessor, e.g. SUN Sparc
DM – distributed memory, multicomputer, e.g. LAN Cluster
Distributed Computing Application Characteristics
Obviously not all applications are suitable for distributed computing. The closer an application gets to running in real time, the less appropriate it is. Even processing tasks that normally take an hour are two may not derive much benefit if the communications among distributed systems and the constantly changing availability of processing clients becomes a bottleneck. Instead you should think in terms of tasks that take hours, days, weeks, and months. Generally the most appropriate applications, according to Entropies, consist of "loosely coupled, non-sequential tasks in batch processes with a high compute-to-data ratio." The high compute to data ratio goes hand-in-hand with a high compute-to-communications ratio, as you don't want to bog down the network by sending large amounts of data to each client, though in some cases you can do so during off hours. Programs with large databases that can be easily parsed for distribution are very appropriate.
Clearly, any application with individual tasks that need access to huge data sets will be more appropriate for larger systems than individual PCs. If terabytes of data are involved, a supercomputer makes sense as communications can take place across the system's very high speed back plane without bogging down the network. Server and other dedicated system clusters will be more appropriate for other slightly less data intensive applications. For a distributed application using numerous PCs, the required data should fit very comfortably in the PC's memory, with lots of room to spare.
About Peer-to-Peer Features?
Though distributed computing has recently been subsumed by the peer-to-peer craze, the structure described above is not really one of peer-to-peer communication, as the clients don't necessarily talk to each other. Current vendors of distributed computing solutions include Entropia, Data Synapse, Sun, Parabon, Avaki, and United Devices. Sun's open source grid engine platform is more geared to larger systems, while the others are focusing on PCs, with Data Synapse somewhere in the middle, however, client PCs can work in parallel with other client PCs and share results with each other in 20ms long bursts. The advantage of Live Cluster’s architecture is that applications can be divided into tasks that have mutual dependencies and require interprocess communications, while those running on Entropia cannot. But while Entropia and other platforms can work very well across an Internet of modem connected PCs, Data Synapse’s Live Cluster makes more sense on a corporate network or among broadband users across the Net.
Conclusion
Distributed computing proves a very attractive cost effective method of computing. The current concepts of clustering must be expanded to include mainframes; MPP and other special propose hardware to provide best fit scheduling between jobs and hardware. Higher speed networking capabilities, global file systems, and improved security mechanisms will move distributed computing beyond the confines of local area networks into a single transparent global network.
A great challenge in this global network scheme is the effective management of resources. Without effective resource management, the cost effectiveness dwindles as the size of the distributed pool grows.
DQS attempts to provide a single coherent allocation and management tool for this environment, incorporating not only dedicated compute-servers, but also idled interactive workstations when possible. By providing support for single/multiple node interactive and batch jobs, using the best-fit concept, DQS increases the efficiency of the matching of available resources to user needs.
Saturday, December 6, 2008
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment