XML-RPC Agents for Distributed Scientific Computing

Robert van Engelen
Kyle Gallivan
Gunjan Gupta
Department of Computer Science
and School of Computational Science and Information Technology
Florida State University

George Cybenko
Thayer School of Engineering
Dartmouth College
 


Overview

  • Goal: distributed computing using XML-RPC and mobile agents in a component-based PSE

  •  
  • Background and related work

  •  
  • SubProject 1: Implementation of a SOAP (XML-RPC) stub compiler for C

  •  
  • SubProject 2: Application server registry and indexing

  •  
  • SubProject 3: Mobile agents for connecting applications

  •  
  • Conclusions

Goals

  • Integration of scientific applications in a component-based PSE
    • Component systems offer framework for rapid construction, experimentation, and development of distributed applications

    • Component project [Gannon], Globus [Foster], the Grid [Foster], Nasa IPG, WebFlow [Fox]

  • Automatic wrapper compilation for the integration of "legacy applications" written in Fortran/C
    • Less time spend on developing new wrappers
    • Less time spend on modifying existing wrappers
    • Handle large volumes of data more efficiently
    • Component system should operate across departmental boundaries (firewalls)
    • Eg: Meteorology - Physics - Computer Science

Goals (cont'd)

  • Searching and indexing of scientific applications on the Web using semantic descriptions of services
    • No centralized component system manager and registry

    •  
  • Problem solving and modeling using mobile agents
    • Client-server model is not suitable for many distributed scientific applications
    • Mobile agents are responsable for setting up connections between applications

    •  
  • Keep it simple using existing lightweight technology

Component-Based Systems

Well known desktop component systems are CORBA, DCOM, ActiveX, Java Studio, Java Voyager
  • Rely on object-oriented design principles for rapid application development
    • Classes and behaviors define component objects and an infrastructure that allows components to be composed

    •  
  • Often utilize remote procedure calling (RPC) / remote method invocation (RMI) communication style

  •  
  • Application interfaces are defined with an interface definition language (IDL) which defines types, messages, and exceptions

  •  
  • Distribute workload

  •  
  • Access to specialized servers (eg. databases)

  •  
  • Distributed component system enables real-time data extraction, eg. from a networked instrument

Object-Registry Service

Component systems offer simple object-registry services
  • A server creates an instance of an object and passes a reference to the registry which is maintained by the object request broker (ORB)

  •  
  • A client can request a reference to the object by sending a request to the local ORB

  •  
  • With the reference to the object the client can make remote calls to the object's functions


Problems with Component Systems in Scientific Computing

  • Completely Java-based component systems are not suitable for scientific applications
    • Numerical software in FORTRAN/C and a scripting language is useful for setting up experiments
    • Mixed language approach requires wrappers to encapsulate application in eg. Java

    •  
  • Client-server model of computation not suitable: collaborating applications are peers
    • Components are required to communicate in groups, not through single control point to maximize concurrency and to minimize bandwidth requirements

    •  
  • Communication protocols should adapt to multiple, dynamic transport layers

Encapsulating Applications using Wrappers

  • Unless everything is build in Java, building a PSE with "legacy" applications using component technology requires application wrappers to be developed as application programming interfaces (APIs)

  •  
  • Wrapping a legacy applications with APIs written in a different language (eg. Java)
    • requires significant programming efforts 
    • incurs run-time overhead 
    • prone to errors related to data type conversions

    •  
  • Use of an IDL compiler helps to create class definitions in C++ or Java for data types, but still requires data translation if application representation is different from interface representation

  •  
  • The data duplication involved using a wrapper in Java puts a burden on memory resources or even makes it impossible to exchange large volumes of numerical data

SOAP: Simple Object Access Protocol

SOAP is a remote procedure calling protocol for the Internet.

SOAP is an XML/HTTP-based protocol for accessing services, objects and servers in a platform-independent manner

SOAP v1.1 recently submitted for standardization by W3C

SOAP is available as a package for Java, Perl, and C(++)/COM

SOAP main transport is HTTP but is adaptable:

  • Operates through firewalls (selectively)
  • Transport issues left to HTTP (eg. security, encryption, and compression)
SOAP data marshaling is XML:
  • XML is platform independent data representation
  • XML parsers/generators widely available
  • Viewing of data by Web browsers is possible
  • XML schemas are used to define data structure (like data type) and serves as an IDL

Problem: SOAP for C/C++ Consist of I/O Libraries only

C/C++ library routines for I/O of SOAP client and server applications

Routines to translate SOAP payload to/from internal XML tree representation (DOM)

Translation between DOM and internal data structures necessary for many applications
Eg. applications dealing with graphs, grids, and sparse matrices

Thus, the development of a SOAP enabled application costs significant programming efforts as the XML serialization must be implemented by hand
 


Project 1: Automatic Wrapper Compilation

For legacy codes written in Fortran and C, our solution is to automatically generate wrapper routines for direct SOAP data exchange and RPC without resorting to Java

The compiler takes native data type definitions as source and produces routines to serialize and deserialize the actual data in C

Serialization effectively traverses the data structures thereby producing a flattened representation by avoiding traversing nodes in a data structure graph multiple times

FORTRAN version of the wrapper compiler will rely on compiling C interfaces that are linked with FORTRAN program
 


SOAP v1 Stub Compiler for C

  • Takes C data type declarations and C function prototypes as input

  •  
  • Produces SOAP XML data structure serialization/deserialization routines for message passing

  •  
  • Produces SOAP RPC stubs for client and server programs

  •  
  • SOAP v1 compliant, so routines can interact with existing SOAP servers

Supported Data Types

  • All base types (int, float, etc)
  • Strings
  • Pointers
  • Structs (records)
  • Fixed size arrays
  • Work on dynamic arrays in progress
  • Work on exploiting SOAP sparse matrix representation in progress
  • No unions (variant records), but can be mimicked using structs

Example Compiler Input

struct matrix { ... };
struct vector { ... };
int solve(struct matrix A, struct vector b, struct vector *x);

Example Compiler Output

int soap_call_solve(char *URL, struct matrix A, struct vector b, struct vector *x)
     { ... }
int soap_serve() { ... }
User calls soap_call_serve(...) in client application
To set up the service, user writes main program and the solver:
main() { soap_serve(); }
int solve(struct matrix A, struct vector b, struct vector *x) { ... }
The user then compiles the program and installs executable as a CGI application

The generated programs allow client-server model (shown), message-passing model, or Gannon's component system model with containers


Example SOAP Request and Response

Client request:
POST /~engelen/solve.cgi HTTP/1.1
Host: www.cs.fsu.edu
Content-Type: text/plain
Content-Length: 192
SOAPMethodName: solve


<SOAP:Envelope xmlns:SOAP="urn:schemas-xmlsoap-org:soap.v1">
<SOAP:Body>
<solve>
<matrix>...</matrix>
<vector>...</vector>
</solve>
</SOAP:Body>
</SOAP:Envelope>
Server response:
<SOAP:Envelope xmlns:SOAP="urn:schemas-xmlsoap-org:soap.v1">
<SOAP:Body>
<vector>...</vector>
</SOAP:Body>
</SOAP:Envelope>

Some Implementation Issues

Minimal space and time overhead for serializing and deserializing the C data structures
  • Serialize: two-pass in-situ data structure traversal (analyze - output)
  • Deserialize: one-pass with backpatching of forward references (pointers)
  • Deserialization by allocating on heap or storing in existing data structure (if possible)
  • Only storage for pointers is duplicated (in internal hash tables) for alias analysis
  • Can handle any run-time pointer structure
    • <s><p href="#7"/>
         <a><int>5</int>...<int id="7">3</int>...
         </a>
      </s>

Resolving Name Conflicts

XML namespace mechanism allows identification of different data structures sharing same name but are conceptually different

Electronic address versus physical address:

struct e__address { char *email; ... };
struct p__address { char *street; ... };
Define namespaces (run-time adaptable):
struct Namespace namespaces[] =
{ "e", "urn:my-electronic-address",
  "p", "urn:my-physical-address"
};
SOAP payload:
<e:address xmlns:e="urn:my-electronic-address">
  <email>engelen@cs.fsu.edu</email>
  ...


Project 2: Registry and Indexing of Scientific Services

Registry and indexing methods for non-scientific applications are well developed
Eg. CORBA, Jini location service

However, characterization of scientific applications is very often functional behavior (eg. solve linear system)

We need a generic semantic description of behavior eg. using lambda calculus notation with OpenMath

Example: solve = lambda (A,b).(A^-1*b)

OpenMath technology includes "content dictionaries" for sparse matrix forms, definitions of common mathematical operations, and a type system

OpenMath design is "compositional": function composition is central for component composition

XML namespace mechanism can be used to distinguish specific services by name, eg. a sattelite image provider
 


Project 3: Connecting Applications using Mobile Agents

In general, mobile agents can move to large data sources to perform a local computation on the data and to bring back the data products
  • Avoid centralized point of control by using mobile agents to establish SOAP RPC connections between applications

  •  
  • Mobile agents can enhance scalability of the component system by moving to servers that have lower process loads

  •  
  • Enhance flexibility in dynamic environment (eg. Web) by letting agents perform search and lookup of services on remote machines (avoids continuously downloading registry updates on local machines)

  •  
  • Local proxy servers can speedup the lookup process by locally caching indexing information

Example Scenario

Predicting the impact of the pollution of a river on the environment

Components:

  • PSE user interface to control the session
  • Geographical information system
  • Simulator (transport model solver)
  • Visualization package
A mobile agent is send from the PSE to look for the services and to establish the connections between the component servers


Conclusions

  • SOAP protocol allows mix-language component system
    • Operates through firewalls
    • Platform independent
    • Independent of transport layer

    •  
  • Wrapper compiler eases encapsulation problem of legacy software

  •  
  • Service indexing and lookup using
    • OpenMath (functional behavior)
    • XML namespaces and XML schemas (data types)

    •  
  • Mobile agents for service lookup and to connect applications