Saturday, December 6, 2008

paper-2

Abstract
We argue that objects that interact in a distributed system need to be dealt with in ways that are intrinsically different from objects that interact in a single address space. These differences are required because distributed systems require that the programmer be aware of latency, have a different model of memory access, and take into account issues of concurrency and partial failure. Distributed computing became a field of intense study as the result of computer hardware miniaturization and advances in networking technologies. Distributed computing aims to unify multiple networked machines to let them share information or other resources, and encompasses multimedia systems, client-server systems, parallel computing, Web programming, mobile agents, and so on.
We look at a number of distributed systems that have attempted to paper over the distinction between local and remote objects, and show that such systems fail to support basic requirements of robustness and reliability. These failures have been masked in the past by the small size of the distributed systems that have been built. In the enterprise-wide distributed systems foreseen in the near future, however, such a masking will be impossible.
We conclude by discussing what is required of both systems-level and application-level programmers and the following topics
 What are distributed computer systems?
 Distributed Computer Systems – 1
 Distributed Computer Systems – 2
 Distributed Computer Systems – 3
 How It Works
 Distributed Computing Management Server
 Motivation for Distributed Computer Systems
 More complex distributed computing examples – 1
 More complex distributed computing examples – 2
 Distributed Computer System Metrics
 Distributed Computer System Architectures
 conclusion



Introduction to Distributed Systems
 What are distributed computer systems
 Architectures
 Transparency and design issues
 Distributed computing paradigms
 Distributed operating systems
 Parallel and concurrent programming concepts

What are distributed computer systems?
 Compare: Centralised Systems
 One system with non-autonomous parts
 System shared by users all the time
 All resources accessible
 Software runs in a single process
 (Often) single physical location
 Single point of control (manager)
 Single point of failure

Distributed Computer Systems - 1
 Multiple autonomous components
 Components shared by users
 Resources may not be accessible
 Software can run in concurrent processes on different processors
 (Often) multiple physical locations
 Multiple points of control
 Multiple points of failure
 No global time
 No shared memory





Distributed Computer Systems - 2

 Networked computers (close or loosely coupled) that provide a degree of operation transparency
 Distributed computer system =independent processor + net working infrastructure
 Communication between processes (on the same or different computer) using message passing technologies is the basis of distributed computing
 Virtual computing
Two issues
 How do computers communicate
 How do processes on different computers interact




Distributed Computer Systems - 3

 A distributed system is:
 A collection of independent computers that appears to its users as a single coherent system.

(Idea of a virtual computer)
 . . Autonomous computers
 . . Connected by a network
 . . Specifically designed to provide an integrated computing environment
How It Works
In most cases today, a distributed computing architecture consists of very lightweight software agents installed on a number of client systems, and one or more dedicated distributed computing management servers. There may also be requesting clients with software that allows them to submit jobs along with lists of their required resources.
An agent running on a processing client detects when the system is idle, notifies the management server that the system is available for processing, and usually requests an application package. The client then receives an application package from the server and runs the software when it has spare CPU cycles, and sends the results back to the server. The application may run as a screen saver, or simply in the background, without impacting normal use of the computer. If the user of the client system needs to run his own applications at any time, control is immediately returned, and processing of the distributed application package ends. This must be essentially instantaneous, as any delay in returning control will probably be unacceptable to the user.
Distributed Computing Management Server
The servers have several roles. They take distributed computing requests and divide their large processing tasks into smaller tasks that can run on individual desktop systems (though sometimes this is done by a requesting system). They send application packages and some client management software to the idle client machines that request them. They monitor the status of the jobs being run by the clients. After the client machines run those packages, they assemble the results sent back by the client and structure them for presentation, usually with the help of a database.
If the server doesn't hear from a processing client for a certain period of time, possibly because the user has disconnected his system and gone on a business trip, or simply because he's using his system heavily for long periods, it may send the same application package to another idle system. Alternatively, it may have already sent out the package to several systems at once, assuming that one or more sets of results will be returned quickly.
Motivation for Distributed Computer Systems
 High cost of powerful single processor – its cheaper (£/MIP) to buy many small machines and network them than buy a single large machine. Since 1980 computer performance has increased at 1.5 per year
 Share resources
 Distributed applications and mobility of users
 Efficient low cost networks
 Availability and Reliability – if one component fails the system will continue
 Scalability – easier to upgrade system by adding more machines, than replace the only system
 Computational speedup
 Service Provision -Need for resource and data sharing and remote services
 Need for communication



More complex distributed computing examples – 1
Computing dominated problems
(Distributed processing)
 Computational Fluid Dynamics (CFD) and Structural Dynamics (using Finite Element Method)
 Environmental and Biological Modeling – human genome project, pollution and disease control, traffic simulation, weather and climate modeling
 Economic and Financial modeling
 Graphics rendering for visualization
 Network Simulation – telecommunications, power grid



More complex distributed computing examples – 2

Storage dominated problems
(Distributed data)
 Data Mining
 Image Processing
 Seismic data analysis
 Insurance Analysis


Distributed Computer System Metrics
 Latency – network delay before any data is sent
 Bandwidth – maximum channel capacity (analogue communication Hz, digital communication bps)
 Granularity – relative size of units of processing required. Distributed systems operate best with coarse grain granularity because of the slow communication compared to processing speed in general
 Processor speed – MIPS, FLOPS
 Reliability – ability to continue operating correctly for a given time
 Fault tolerance – resilience to partial system failure
 Security – policy to deal with threats to the communication or processing of data in a system
 Administrative/management domains – issues concerning the ownership and access to distributed systems components







Distributed Computer System Architectures

 Flynn, 1966+1972 classification of computer systems in terms of instruction and data stream organizations
 Based on Von-Neumann model (separate processor and memory units
 4 machine organizations
 SISD - Single Instruction, Single Data
 SIMD - Single Instruction, Multiple Data
 MISD - Multiple Instruction, Single Data
 MIMD - Multiple Instruction, Multiple Data











Distributed Computers are essentially all MIMD machines
SM – shared memory, multiprocessor, e.g. SUN Sparc
DM – distributed memory, multicomputer, e.g. LAN Cluster







Distributed Computing Application Characteristics
Obviously not all applications are suitable for distributed computing. The closer an application gets to running in real time, the less appropriate it is. Even processing tasks that normally take an hour are two may not derive much benefit if the communications among distributed systems and the constantly changing availability of processing clients becomes a bottleneck. Instead you should think in terms of tasks that take hours, days, weeks, and months. Generally the most appropriate applications, according to Entropies, consist of "loosely coupled, non-sequential tasks in batch processes with a high compute-to-data ratio." The high compute to data ratio goes hand-in-hand with a high compute-to-communications ratio, as you don't want to bog down the network by sending large amounts of data to each client, though in some cases you can do so during off hours. Programs with large databases that can be easily parsed for distribution are very appropriate.
Clearly, any application with individual tasks that need access to huge data sets will be more appropriate for larger systems than individual PCs. If terabytes of data are involved, a supercomputer makes sense as communications can take place across the system's very high speed back plane without bogging down the network. Server and other dedicated system clusters will be more appropriate for other slightly less data intensive applications. For a distributed application using numerous PCs, the required data should fit very comfortably in the PC's memory, with lots of room to spare.
About Peer-to-Peer Features?
Though distributed computing has recently been subsumed by the peer-to-peer craze, the structure described above is not really one of peer-to-peer communication, as the clients don't necessarily talk to each other. Current vendors of distributed computing solutions include Entropia, Data Synapse, Sun, Parabon, Avaki, and United Devices. Sun's open source grid engine platform is more geared to larger systems, while the others are focusing on PCs, with Data Synapse somewhere in the middle, however, client PCs can work in parallel with other client PCs and share results with each other in 20ms long bursts. The advantage of Live Cluster’s architecture is that applications can be divided into tasks that have mutual dependencies and require interprocess communications, while those running on Entropia cannot. But while Entropia and other platforms can work very well across an Internet of modem connected PCs, Data Synapse’s Live Cluster makes more sense on a corporate network or among broadband users across the Net.





Conclusion
Distributed computing proves a very attractive cost effective method of computing. The current concepts of clustering must be expanded to include mainframes; MPP and other special propose hardware to provide best fit scheduling between jobs and hardware. Higher speed networking capabilities, global file systems, and improved security mechanisms will move distributed computing beyond the confines of local area networks into a single transparent global network.
A great challenge in this global network scheme is the effective management of resources. Without effective resource management, the cost effectiveness dwindles as the size of the distributed pool grows.
DQS attempts to provide a single coherent allocation and management tool for this environment, incorporating not only dedicated compute-servers, but also idled interactive workstations when possible. By providing support for single/multiple node interactive and batch jobs, using the best-fit concept, DQS increases the efficiency of the matching of available resources to user needs.

No comments: