Friday, December 5, 2008

paper-1

INTRODUCTION:
For the past decade, "distributed computing" has been one of the biggest buzz phrases in the computer industry. At this point in the information age, we know how to build networks; we use thousands of engineering workstations and personal computers to do our work, instead of huge behemoths in glass-walled rooms. Surely we ought to be able to use our networks of smaller computers to work together on larger tasks. And we do--an act as simple as reading a web page requires the cooperation of two computers (a client and a server) plus other computers that make sure the data gets from one location to the other. However, simple browsing (i.e., a largely one-way data exchange) isn't what we usually mean when we talk about distributed computing. We usually mean something where there's more interaction between the systems involved.
You can think about distributed computing in terms of breaking down an application into individual computing agents that can be distributed on a network of computers, yet still work together to do cooperative tasks. The motivations for distributing an application this way are many. Here are a few of the more common ones:
• Computing things in parallel by breaking a problem into smaller pieces enables you to solve larger problems without resorting to larger computers. Instead, you can use smaller, cheaper, easier-to-find computers.
• Large data sets are typically difficult to relocate, or easier to control and administer located where they are, so users have to rely on remote data servers to provide needed information.
• Redundant processing agents on multiple networked computers can be used by systems that need fault tolerance. If a machine or agent process goes down, the job can still carry on.
There are many other motivations, and plenty of subtle variations on the ones listed here.
Assorted tools and standards for assembling distributed computing applications have been developed over the years. These started as low-level data transmission APIs and protocols, such as RPC and DCE, and have recently begun to evolve into object-based distribution schemes, such as CORBA, RMI, and OpenDoc. These programming tools essentially provide a protocol for transmitting structured data (and, in some cases, actual run able code) over a network connection. Java offers a language and an environment that encompass various levels of distributed computing development, from low-level network communication to distributed objects and agents, while also having built-in support for secure applications, multiple threads of control, and integration with other Internet-based protocols and services.
Would be better or worse if you chose a different set of tools in building something similar.


1.1. Anatomy of a Distributed Application
A distributed application is built upon several layers. At the lowest level, a network connects a group of host computers together so that they can talk to each other. Network protocols like TCP/IP let the computers send data to each other over the network by providing the ability to package and address data for delivery to another machine. Higher-level services can be defined on top of the network protocol, such as directory services and security protocols. Finally, the distributed application itself runs on top of these layers, using the mid-level services and network protocols as well as the computer operating systems to perform coordinated tasks across the network.
At the application level, a distributed application can be broken down into the following parts:
Processes
A typical computer operating system on a computer host can run several processes at once. A process is created by describing a sequence of steps in a programming language, compiling the program into an executable form, and running the executable in the operating system. While it's running, a process has access to the resources of the computer (such as CPU time and I/O devices) through the operating system. A process can be completely devoted to a particular application, or several applications can use a single process to perform tasks.
Threads
Every process has at least one thread of control. Some operating systems support the creation of multiple threads of control within a single process. Each thread in a process can run independently from the other threads, although there is usually some synchronization between them. One thread might monitor input from a socket connection, for example, while another might listen for user events (keystrokes, mouse movements, etc.) and provide feedback to the user through output devices (monitor, speakers, etc.). At some point, input from the input stream may require feedback from the user. At this point, the two threads will need to coordinate the transfer of input data to the user's attention.
Objects
Programs written in object-oriented languages are made up of cooperating
objects. One simple definition of an object is a group of related data, with
methods available for querying or altering the data (get Name(), set-Name()), or
for taking some action based on the data (send Name(Out-upstream()). A
process can be made up of one or more objects, and these objects can be accessed
by one or more threads within the process. And with the introduction of distributed
object technology like RMI and CORBA, an object can also be logically spread across multiple processes, on multiple computers.

Agents
For the sake of this book, we will use the term "agent" as a general way to refer to significant functional elements of a distributed application.While a process, a thread, and an object are pretty well-defined entities, an agent is a higher-level system component, defined around a particular function, or utility, or role in the overall system. A remote banking application, for example, might be broken down into a customer agent, a transaction agent and an information brokerage agent. Agents can be distributed across multiple processes, and can be made up of multiple objects and threads in these processes. Our customer agent might be made up of an object in a process running on a client desktop that's listening for data and updating the local display, along with an object in a process running on the bank server, issuing queries and sending the data back to the client. There are two objects running in distinct processes on separate machines, but together we can consider them to make up one customer agent, with client-side elements and server-side elements.
The term "agent" is overused in the technology community. In the more formal sense of the word, an agent is a computing entity that is a bit more intelligent and autonomous than an object. An agent is supposed to be capable of having goals that it needs to accomplish, such as retrieving information of a certain type from a large database or remote data sources. Some agents can monitor their progress towards achieving their goals at a higher level than just successful execution of methods, like an object. o a distributed application can be thought of as a coordinated group of agents working to accomplish some goal. Each of these agents can be distributed across multiple processes on remote hosts, and can consist of multiple objects or threads of control. Agents can also belong to more than one application at once. You may be developing an automated teller machine application, for example, which consists of an account database server, with customer request agents distributed across the network submitting requests.
1.1.1 Distributed Computing Applied to Computation
Distributed computing offers researchers the potential of solving complex problems using many dispersed machines. The result is faster computation at potentially lower costs when compared to the use of dedicated resources. The term Distributed Computation has been used to describe the use of distributed computing for the sake of raw computation rather than say, remote file sharing, storage or information retrieval.
This paper will tend to focus mainly on distributed computation, however, many of the concepts and tools apply to other distributed computing projects.
Distributed computation: a Community Approach to Problem Solving
Distributed Computation offers researchers an opportunity to distribute the task of solving complex problems onto hundreds and in many cases thousands of Internet connected machines. Although, the network is itself distributed, the research and end user participants form a loosely bound partnership. The resulting partnership is not unlike a team or community. People band together to create and connect the resources required for the achievement of a common goal. A fascinating aspect of this continues to be humanity’s willingness to transcend cultural barriers.
Distributed-computing to open source tools
Distributed Computing projects are successfully utilizing the idle processing power of millions of computers. Open Source tools can be used to harness the vast computing potential of Internet connected machines.

Cluster Computing
As machines became more powerful, researchers began exploring ways to connect collections (clusters) of smaller machines to build less expensive substitutes for the larger, more costly systems.
In 1994, NASA researchers, Thomas Sterling and Don Becker connected 16 computers to create a single cluster. They named their new cluster, Beowulf, and it quickly became a success. Sterling and Becker demonstrated using commodity-off-the-shelf (COTS) computers as a means of aggregating computing resources and effectively solving problems, which typically required larger and more dedicated systems.
DISTRIBUTED COMPUTING APPLIED TO DATA BASE.
Distributed data bases bring the advantages of distributed computing to the data base management domain. We can define a distributed database (DDB) multiple logically
Interrelated databases distributed over a computer network and a distributed data base management system (DDBMS) as software that manages a distributed data base while making the distribution transport to the use. A collection of files stored at different nodes
Of a network and the maintaining of interrelationship among them via hyperlinks has become a common organization on the INTERNET with files of web pages.
COMPARISON BETWEEN PARALLEL & DISTRIBUTED TECHNOLOGY

a>SHARED MEMORY ARCHITECTURE: Multiprocessor share secondary storage and also share primary memory.
b>SHARED DISK ARCHITECTURE :( loosely coupled)
Multiple processors share secondary storage but each has their own primary memory.
The above architecture enables processors to communicate without the overhead of exchanging messages over a network. The above point deals with parallel system architecture. Another approach is distributed data base approach. In this case each and every system do have their own memory and they communicate over a communication network.

ADVANTAGES OF DISTRIBUTED COMPUTING
Distributed computing has been proposed for various reasons ranging from organizational decentralization to economical processing to greater economy.
a>MANAGE OF DISTRIBUTED DATA WITH DIFFERENT LEVELS OF
TRANSPARENCY
By means of hiding the details of the locations where each file is physically stored within the system.
















Site-1
A n/w architecture with centralized data accessing















b>DISTRIBUTION OR NETWORK TRANSPARENCY
This refers to the freedom for the user from the operational details of the network. It
may be location transparency and naming transparency Location transparency refers to the fact that the command used to perform a task is independent of the location of the data and the location of the system where the command was used .Naming transparency
Implies that once a name is specified the named objects can be accessed unambiguously
Without any additional specification.
c>REPLICATION TRANSPARENCY
Copies of data can be stored at multiple sites for better availability, performance and reliability. Replication transparency makes the user unaware of existence of copies.



Name
Project
Age

Name
Project
Age

Name
Project
Age Name
Project
Age


d>INCREASED RELIABILITY AND AVAILABILIY
These are two of the most common potential advantages cited for distributed computing. Reliability is broadly defined as the probability that the system is running at a
Certain time point .When data is distributed over sites, and when one system fails other
can perform that job .This increases the reliability.
e>INCREASED PERFORMANCE
Since data is closer to every sites, data localization occurs, which reduces the contention of cpu and i/o services and simultaneously reduces access delays involved
In wide area networks. Local queries can be developed at each & every site so as to extract the required data which may improve the performance of the system.

No comments: