The Florida State University
College of Arts and Sciences
A Simple Object Access Protocol (SOAP) stub compiler
for C
By
Gunjan Gupta
Dec 8th, 2000
A project submitted to the
Department of Computer Science
For the degree of Master of
Science
The member of committee approve the Master Project of Gunjan Gupta
defended on Dec 8th, 2000
________________________
Dr. Robert van Engelen
Supervising Professor
_________________________
Dr. Kyle Gallivan
Committee Member
_________________________
Dr. Gregory A. Riccardi
Committee Member
Id like to thank Dr.
Robert van Engelen for the original idea and the design of the project. And also
for his constant guidance and support in the implementation.
I thank Dr. Gallivan
for his valuable insight on various issues regarding the project. And Id also
like to thank Dr. Greg Riccardi for agreeing to be on the committee.
Table of Contents
Abstract
2
1. Introduction
..3
1.1
Background
Information
3
1.2
SOAP
9
1.3
Data Representation in
XML
..13
1.4
Stubs
20
1.5
Serialization
21
2. User Guide
.23
2.1 Input and Output
Files
.23
2.2 Mapping of C structures
to XML data format
.26
3. Examples
30
3.1 Query
Example
31
3.2 System Example
.33
4. Implementation
..35
4.1. Functions Generated for
each type
.35
4.2. How the functions are
generated
36
4.3 Calls to the functions in the soapClient.c and
soapServer.c
37
5. Conclusion
.41
6. References
.42
Abstract
The project presents the use of SOAP to achieve data-interoperability
between applications in distributed environment. Remote procedure calling with
SOAP is programming language independent and operates across different
platforms. We have designed and implemented a tool for the automatic generation
of SOAP stub routines and serialization converters to support application data
interoperability in distributed computing environments. SOAP defines a simple
mechanism for expressing application semantics by providing a modular packaging
model and encoding mechanisms for encoding data within modules. Thus it
provides a simple and lightweight mechanism for exchanging structured and
typed information between peers in a
decentralized, distributed environment using XML. Combining HTTP and XML into a single solution
gives us a whole new level of interoperability. Since SOAP relies on HTTP as
the transport mechanism, and most firewalls allow HTTP to pass through, we do
not have no problem invoking SOAP endpoints from either side of a firewall.
Which has been one of the problems in the distributed computing solutions.
1.1 Background Information
XML (extensible Markup Language) is a markup language for structured
documents (a document that contains both content and some indication of what
role that content plays). A markup language is a mechanism to identify
structures in a document. XML specifications define a standard way of adding
markup to structured documents.
XML is the subset Standard Generalized Markup Language, SGML defined in
ISO standard 8879:1986 that is designed to make it easy to interchange
structured documents over the internet[1]
Document Type Definition is a standardization of the XML object
representation. Using DTD applications are able to exchange in XML without having
the risk of failing to recognize objects.
In HTML both the tag semantics and tag set are fixed. XML specifics
neither semantics neither semantics nor a tag set. XML provides a facility to
define tags and the structural relationships between them. It really is a meta
language for describing markup language. Semantics are defined by the
applications that process them or by stylesheets.
XML is a much restricted form of SGML. XML does not require DTD. User
can assign a default definition for undeclared components of markup.
Document Object Model provides an API for
HTML and XML documents. DOMs are hierarchical representations of XML. An
instance of XML is purely textual and lends itself to be portable between
applications while the corresponding DOM of an XML instance is a tree data
structure that is used internally by an application with the DOM. A programmer
can build documents, navigate their structure and add, modify or delete
elements and content.
DOM interfaces are an abstraction in that they are a means of
specifying a way to access and manipulate an application's internal
representation of a document. DOMs are platform and language neutral interface
than tallow programs and scripts to dynamically access and update the content,
structure and style of documents. A "binding" of the DOM to a
particular programming language provides a concrete API. Currently several
libraries that handle XML DOMs are being developed for a number of programming
languages. This would enable applications to manipulate XML as data structures.
However in designing an API for XML and HTML documents using the DOM,
programmers still need to implement the concrete methods to access and
manipulate document content [2]
XML is defined by four specifications.
XML: Extensible Markup Language
Defines the syntax of XML
Defines a standard way to represent links between multiple resources
and links between read only resources.
Will define a standard style sheet language for XML
XUA Extensible User Agent
Will help standardize user Agents.
The process to translate arbitrary data structure to XML can be loosely
referred to as XML serialization. XML serialization converts graph like data
structures into XML by recursively traversing the graphs converting nodes to
XML and marking the nodes that are converted.
HTTP
(HyperText Transport Protocol) The communications protocol used to connect to servers on the World Wide Web. Its primary function is to establish a connection with a Web server and transmit HTML pages to the client browser. An HTTP exchange takes place over a TCP/IP socket. The client opens a socket, connects to the HTTP server via the port the server is listening to, and issues a command. The command is routed to the server via the internet. The server receives the command and does something, typically involving a file lookup.
Types of requests
1. Simple Request
GET http://websrv.cs.fsu.edu/<CR><LF>
where <CR> is a carriage return character; <LF> is a line feed character. Here the object is simply sent back to the user.
2. Full Request
Method URI Protocol Version <CR><LF> [*<HTRQ Header>] [<CR><LF> <data>] example:
METHOD GET http://websrv.cs.fsu.edu/ HTTP/1.0<CR><LF>
where <CR> is a carriage return character; <LF> is a line feed character.
In the event of a full request, the object is encapsulated using the MIME protocol, and a descriptive header precedes it on its way to the client. Some examples of the METHODS are GET, PUT, POST, DELETE etc.
CGI
(Common Gateway Interface script) A small program written in a language such as Perl, Tcl, C or C++ that functions as the glue between HTML pages and other programs on the Web server. For example, a CGI script would allow search data entered on a Web page to be sent to the DBMS (database management system) for lookup. It would also format the results of that search as an HTML page and send it back to the user. The CGI script resides in the server and obtains the data from the user via environment variables that the Web server makes available to it.
RPC
(Remote Procedure Call) A programming interface that allows one program to use the services of another program in a remote machine. The calling programming sends a message and data to the remote program, which is executed, and results are passed back to the calling program
RPC: Remote Procedure Call
Remote Procedure Call is a simple extension
to the procedure idea. Its basically creating connection between procedures
that are on different machines. Remote procedures are marshaled into a format
that can be understood on the other side of the connection. There are almost
infinite number of formats possible one possible format is XML. XML-RPC uses
XML as the marshalling format.
Client-Server
An architecture in which the
client (personal computer or workstation) is the requesting machine and the
server is the supplying machine, both of which are connected via a local area
network (LAN) or wide area network (WAN). The client contains the user
interface and may perform some or all of the application processing.
Servers can be high-speed microcomputers, minicomputers or even mainframes
SOAP is a lightweight protocol for
exchange of information in a decentralized, distributed environment. It is an
XML based protocol that consists of three parts: an envelope that defines a
framework for describing what is in a message and how to process it, a set of
encoding rules for expressing instances of application-defined datatypes, and a
convention for representing remote procedure calls and responses. SOAP can
potentially be used in combination with a variety of other protocols; however,
the only bindings defined in this document describe how to use SOAP in
combination with HTTP and HTTP Extension Framework.[W3c SOAP 1.1
specification]
Description and use
The use of SOAP can be best described if you consider a situation(which we encounter often these days) of building an internet application for example simple database query where the client puts a simple request the server looks up the database and returns with the rows that match the request.
This application would have a simple solution in a client/server(definition included in the introduction) custom application which would work perfectly if all the clients used the same platforms. But that is no the case with the different platforms being used(Windows, unix, linux etc) this would not work there are two possible solutions(excluding SOAP) to solve this problem
1. Have a different client application written for each platform. First of all this is a lot of work secondly its not very scalable.
2. Use the web. But still youre tied to browser implementations, and you still have to build an infrastructure to send and receive input and output and to format and package that data for transmission. For a complicated application, you may opt for Java or ActiveX code, but then you start losing users to bandwidth and security issues.
SOAP provides a solution to this. It is a simple packaging protocol that packages the data that client sent and packages the result from the server on the back trip. And in a format that is commonly understood(XML). So all the arguments from the client required to complete the remote procedure would be converted to XML and the result the server will also be converted to XML format. It acts as a glue between client and server. The protocol used to transfer the data is HTTP (definition included in the introduction).
SOAP structure
SOAP is a protocol specification for invoking methods on servers, services, components and objects. SOAP codifies the existing practice of using XML and HTTP as a method invocation mechanism. The SOAP requires a small number of HTTP headers that facilitate firewall/proxy filtering. The SOAP specification also mandates an XML vocabulary that is used for representing method parameters, return values, and exceptions." [DevelopMentor]
Following is the structure of the SOAP protocol according to the W3C SOAP1.1 Specification.
SOAP
consists of three parts:
A SOAP message is an XML document that consists of a mandatory SOAP envelope, an optional SOAP header, and a mandatory SOAP body.
A SOAP message contains the following:
SOAP provides a flexible mechanism for extending a message in a decentralized and modular way without prior knowledge between the communicating parties. Typical examples of extensions that can be implemented as header entries are authentication, transaction management, payment etc.
The Envelope is the top element of the XML document representing the message. The element MAY contain namespace declarations as well as additional attributes. If present, such additional attributes MUST be namespace-qualified. Similarly, the element MAY contain additional sub elements. If present these elements MUST be namespace-qualified and MUST follow the SOAP Body element.
The SOAP Body element provides a simple mechanism for exchanging mandatory information intended for the ultimate recipient of the message. Typical uses of the Body element include marshalling RPC calls and error reporting.
1. Interoperability
Combining HTTP and XML into a single solution gives us a whole new level of interoperability. For example, lathered with SOAP, clients written in Microsoft Visual Basic can easily invoke CORBA services running on UNIX boxes, JavaScript clients can easily invoke code running on the mainframe, and Macintosh clients can start invoking Perl objects running on Linux. The list goes on. While some interoperability is achieved today through cross-platform bridges for specific technologies, once SOAP becomes standard, bridges will no longer be necessary.
2.F ireWall
Problems
Currently, developers struggle to make their
distributed applications work across
the Internet when firewalls get in the way. Since most firewalls block
all but a few ports, such as the standard HTTP port 80, all of today's
distributed object protocols like DCOM suffer because they rely on dynamically
assigned ports for remote method invocations. If you can persuade your system
administrator to open a range of ports through the firewall, you may be able to
get around this problem as long as the ports used by the distributed object
protocol are included.
To make
matters worse, clients of your distributed application that lie behind another
corporate firewall suffer the same problems. If they don't configure their
firewall to open the same port, they won't be able to use your application.
Making clients reconfigure their firewalls to accommodate your application is
just not practical
Since
SOAP relies on HTTP as the transport mechanism, and most firewalls allow HTTP
to pass through, we do not have a problem invoking SOAP endpoints from either
side of a firewall. Don't forget that SOAP makes it possible for system
administrators to configure firewalls to selectively block out SOAP requests
using SOAP-specific HTTP headers.
Example of SOAP Request and Response
POST /StockQuote HTTP/1.1
Host :
Content-Type: test/xml;
Charset = utf-8
Content-Length: nnnn
SOAPAction:
some-URI
<SOAP-ENV:Envelope
xmlns: SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/
SOAP-ENV:encodingSTYLE= http://schemas.xmlsoap.org/soap/encoding/>
<SOAP-ENV:Body>
<m:GetLastTradePrice xmlns:m = Some-URI>
<Symbol>
DIS </symbol>
</m:GetLastTradePrice>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
HTTP/1.1 200 OK
Content-Type: text/html;
Charset=utf-8
Content-Length: nnnn
<SOAP-ENV:Envelope
xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/
SOAP-ENV:encodingStyple=http://schemas.xmlsoap.org/soap/encoding//>
<SOAP-ENV:Body>
<m:GetLastTradePriceResponse
xmlns:m = Some-URI>
<Price>34.5</Price>
</m:GetlastTraderPriceResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
1.3. Data
representation in XML
XML parsers and generators are available for most programming
languages. An application that adopts XML parsers and generators builds an XML
document object Model (DOM), which is a tree-like data structure that closely
resembles the structure of a hierarchical XML object. Internal data structures
of the application have to be translated into the DOM. For hierarchical data
structures, such as lists and trees, this translation is almost one to one.
Many Web applications providing services can be characterized by operating on
such hierarchical data structures. Consider for example Web Pages and database
records. The translation of these types of objects is simple. However,
applications are not necessarily constrained to hierarchical data structures.
Care has to be taken for representing graph like data structures in
XML. More specifically, pointers can lead to co-referenced objects, and a
co-referenced part of the data structure must be translated and presented in
XML only once. A producer of XML is responsible for generating a data structure
in XML that can be translated by a consumer into a true copy of the original
data structure. A naοve use of XML and the DOM could lead to trees with
replicated data, because of the hierarchical layout in the DOM. Pointers pose
another problem when pointers are allowed to refer to elements within arrays
and records in C and C++.
The example illustrates the tedious task of implementing data
conversions between an applications internal data structures and XML in the
presence of indirection (pointers). The problem is exacerbated by the
well-known problem that the inspection of data structure declarations alone
cannot reveal the exact usage of the data structures in question. Consider for
example the C data structure shown in Fig. 1(a). Although the declaration
suggests that it is a tree(because of the use of left and right field names),
there is no limitation for using this data structure as a graph in which a node
is referred to by more than one left or right pointer from another node. Every
pointer must be treated as potentially co-referencing.
The process to translate arbitrary data structures to XML can be
loosely referred to as XML serialization. XML serialization converts graph-like
data structures into XML by recursively traversing the graphs. If there are
co-referenced pointers then the pointers that is referred to assigned an id and
the node that refers that pointer uses href with id in the XML. So effectively
the pointers are visible in the xml produced only when they are
co-referenced.
However the data part of the protocol is restrictive, because it does
not support graphs. The XML-RPC protocol defines records and arrays and some
primitive data types. A severe restriction of the protocol is the lack of
constructs for indirection with pointer types and constructs for passing
methods (procedures) as parameters.
The reason
why pointer analysis is so important for RPC , even
though the
data structures may not share common nodes (like in a DAG). Is the following
When you call
a remote procedure with e.g. array arguments that are aliases, the SOAP
send/receive routines will not duplicate that data. For example, when we call
matrix multiply (matmul) to multiply an array A by itself, array A is send only
once. Here, array A is a pointer to some matrix.
matmul
(A,A,A2)
This is an
additional good motivation for the pointer analysis.
Struct Node {
Int val;
Struct Node *left;
Struct Node *right;
}
5
![]()
![]()
![]()
![]()
![]()
![]()
(b)
1. a
<node1>
<val>5</val>
<left>
<val>3</val>
<left
xsd:null="1" xsd:type="u:PointerToStruct_Node"/>
<right
xsd:null="1" xsd:type="u:PointerToStruct_Node"/>
</left>
<right>
<val>8</val>
<left>
<val>7</val>
<left xsd:null="1"
xsd:type="u:PointerToStruct_Node"/>
<right
xsd:null="1" xsd:type="u:PointerToStruct_Node"/>
</left>
<right>
<val>9</val>
<left
xsd:null="1" xsd:type="u:PointerToStruct_Node"/>
<right
xsd:null="1" xsd:type="u:PointerToStruct_Node"/>
</right>
</right>
</node1>
1
(b) : soap generated for figure (1a)
(1)
void soap_serialize_Struct_Node(struct Node *p)
{
soap_reference(p,SOAP_Struct_Node);
soap_mark_Struct_Node(p);
}
void soap_mark_Struct_Node(struct Node *p)
{
if(p != 0){
soap_embedded(&p->val,SOAP_int);
soap_mark_int(&p->val);
soap_embedded(&p->left,SOAP_PointerToStruct_Node);
soap_mark_PointerToStruct_Node(&p->left);
soap_embedded(&p->right,SOAP_PointerToStruct_Node);
soap_mark_PointerToStruct_Node(&p->right);
}
}
void soap_default_Struct_Node(struct Node *p)
{
soap_default_int(&p->val);
soap_default_PointerToStruct_Node(&p->left);
soap_default_PointerToStruct_Node(&p->right);
}
void soap_put_Struct_Node(struct Node *p)
{
int i;
struct nlist
*np;
if(i=soap_pointer_lookup(p,SOAP_Struct_Node,&np))
if
(soap_is_embedded(p,i))
soap_element_ref("Node", i);
else if (soap_is_single(p,i))
soap_out_Struct_Node("Node", 0,
p);
else
{
soap_out_Struct_Node("Node",i,p);
soap_set_embedded(p,i);
}
else soap_out_Struct_Node("Node", 0,
p);
}
void soap_out_Struct_Node(char *tag,int id,struct Node *p)
{
soap_element_begin_out(tag,
soap_embedded_id(id, p, SOAP_Struct_Node),NULL);
soap_out_int("val",-1,&p->val);
soap_out_PointerToStruct_Node("left",-1,&p->left);
soap_out_PointerToStruct_Node("right",-1,&p->right);
soap_element_end_out(tag);
}
struct Node * soap_get_Struct_Node(struct Node *a)
{
struct Node
*q;
soap_independent("Node",
NULL);
if(!soap_error && (q =
soap_in_Struct_Node("Node",a)))
soap_independent(NULL,NULL);
return q;
}
struct Node * soap_in_Struct_Node(char * tag,struct Node *p)
{
if(soap_element_begin_in(tag))
return NULL;
if(soap_null
|| *soap_type != '\0' && strcasecmp(soap_type,"Node")!=0)
{ soap_error = SOAP_TYPE_MISMATCH;
return NULL;
}
if(soap_body)
{ p=soap_id_enter(soap_id, p, sizeof(struct
Node));
if(soap_alloced)
soap_default_Struct_Node(p);
for(;;)
{
if(!soap_in_int("val",&p->val))
if(soap_error == SOAP_TAG_MISMATCH
&& !soap_in_PointerToStruct_Node("left",&p->left))
if(soap_error == SOAP_TAG_MISMATCH
&& !soap_in_PointerToStruct_Node("right",&p->right))
if(soap_error == SOAP_TAG_MISMATCH)
soap_error =
soap_ignore_element();
if(soap_error == SOAP_NO_TAG)
break;
if(soap_error)
return NULL;
}
if(soap_element_end_in(tag)) return NULL;
}
else
{ p=soap_id_forward(soap_id, p,
sizeof(struct Node));
if (soap_alloced)
soap_default_Struct_Node(p);
}
return p;
}
1.4.RPC Stubs
Server Stub Unmarshalling Marshalling Client Stub Marshalling Unmarshalling
![]()
![]()
![]()
The stub performs basic support functions for remote procedure calls.
For instance, stubs prepare input and output arguments for transmission between
systems with different forms of data representation. The stubs use the RPC
runtime to send and receive remote procedure calls. The client stub can also
use the runtime to find servers for the client.
When a client application calls a remote procedure, the client stub first
prepares the input arguments for transmission. The process for preparing
arguments for transmission is known as marshalling . Marshalling converts
call arguments, a stub unmarshals them. Unmarshalling is the process by which a
stub disassembles incoming network data and converts it into application data
using a format that the local system understands. Marshalling and Unmarshalling
both occur twice for each remote procedure call; that is, the client stub
marshals input arguments and unmarshals output arguments Marshalling and
unmarshalling permit client and server systems to use different data
representations for equivalent data. The stub compiler we use generated stubs
by compiling an RPC interface definition written by application developers. The
compiler generates marshalling and unmarshalling routines for the C data types.
To build the client for an RPC application, a developer links client
application code to the client stubs of all the RPC interfaces the application
uses. To build the server, the developer links the server application code to
the corresponding server stubs.
1.5. Serialization
OBJECT SERIALIZATION
Object Serialization is the ability of an object to write its complete state and the complete state of any objects that it references to an output stream; and then, at some later time, to recreate itself by reading its serialized state from an input stream. The stream may be a file, a byte array or a stream associated with a TCP/IP socket.
.
![]()
![]()
![]()
![]()
![]()
![]()


2a 2b
Lets say we need to serialize 2a and de-serialize it back to 2b. We need to make sure that 2b is a true copy (replica) of 2a. a may have different memory address in 2b than in 2a but the structure needs to be preserved that is d should not be reproduced twice. For this to happen we need to make sure that every object has one and only one entry in the hash table.
a





![]()
Fig .3a Fig.3b
Two Phase Algorithm
Now lets consider the same case as 2a with the variation that c points back to a (3a) if we analyze the pointers and output them simultaneously we will not know that c has a pointer to a till we reach c and we would have already output a by then. Only co-referenced objects can have ID in soap so we first need to analyze the pointers to find out which objects are co-referenced and then when we output we need to output the ids for only the co-referenced objects (thats why two phases are required)
Embedded objects
In fig 3b c is pointing to x which is embedded in a and is the first element in the struct. So the both x and a have the same starting memory location, but c points to x and not a. So the distinction is that x and a should be recognized as two distinct objects in the hash table for this reason we also need to include the type information in the hash table and two different types have different entries in the hash table.
![]()
![]()
Px
b

![]()
![]()
d

Fig 4
Forward
Pointers
In figure 4 Px has a pointer to an embedded object x which should be output when the object d is output but d output only after c is output. In this case we use forward pointers when an unresolved pointer (a href ) is obtained then a hash table entry with the id and a linked list is maintained with the linked list containing all the unresolved references to that id. Later when that object actually arrives on the stream all the references in the linked list are replaced with the pointer that arrived.
An example of forward pointers is
<x href #123 />
<x id = 123> ..</x>
To generate the SOAP request for the c function a programmer would have to construct the request which would have the SOAP HEADER(with the file length and other fields)
SOAP ENVELOPE and the SOAP BODY which we discussed earlier
This step requires a number of routines for pointer table maintenance and also lookup and basically maintains the state of each field in the struct as having been referred once or multiple times or being embedded.
The above steps include a lot of involved programming. So every programmer who wants to use SOAP to create and deserialize a soap request would have to write all these routines for their functions. Writing these wrappers/stub routines is a tedious task.
All the information required to generate
these routines is already stored in the type tables and symbol tables of a C
compiler. Our C compiler tool accesses the type information and generates the
routines requires at compile time. This basically automates the whole process
of generating the stub routines. And saves the developer a lot of work.
Tool support for XML
API Synthesis
We
have developed a compiler as part of a problem-solving environment for XML API
synthesis. The compiler translates source C data structure declarations into
serializing XML input and output routines for instances of the data structures.
The output routines serialize internal data structures into XML and the input
routines for the data structure in. The compiler supports all data types except
unions (see note at the end) Pointers are constrained to point to single
objects (except for character pointers which are considered strings).
We are currently extending the compiler to generate SOAP-RPC routines
for C functions. These routines implement RPC for arbitrary C programs by
mapping XML-RPC routines for C functions. These routines implement RPC for
arbitrary C programs by mapping XML-RPC to calls to C procedures. Fig
. Depicts
an example C function for drawing our tool accesses
A tree data structure with the type struct Node defined in 1.a Fig 1.b
shows the generated stub routine that implements XML-RPC.
A client uses the stub for a remote call to the routine of the server.
After the RPC completes on the server side, the internal copy of the data
structures are removed and an XML reply integer is send back to the call.
2.
USER
GUIDE
2.1. Input and output files
Stub.h
![]()
COMPILER
![]()
![]()
![]()
![]()
![]()
![]()
![]()
SoapC.c runserver.c runclient.c soapH.h
BUILD
Client.c
![]()
Stdsoap.h
![]()
![]()
Server.c
Client Server
Fig
3
Input file
A c file containing the prototype of
the function and the strucutre of the return parameters is given . For
e.g., the prototype maybe
int lookup(double SSN ,struct result *r) ;
is given as an input to the c compiler.
Notice that the struct result is always the last parameter passed it is
ignored when the xml output for the function is being created.
Again this file should also declare struct result which maybe as simple
as
struct result { char * name; char * SSN; };
the struct result is what will be returned from the server.
Files generated
Following files will be generated, a brief description is included it
will be explained in detail later.
1. soapC.c
file containing the serialization routine and the routines for
generating xml from the given strucutre and viceversa.
2. soapClient.c
file that creates the xml output with the header and the xml
representation of the structure with the parameters of the function.(note result
struct is not sent)
sends the xml output
waits on the result and populates the result struct with the elements
of the received XML.
3. soapServer.c
looks in the list of functions for the function that was called.
and calls the appropriate function. Gets the parameter for the function
from the xml and makes a call to the function with it and a reference to a
struct result. so now the struct result
is populated with the required parameters. This struct is first serialized and
then a xml file is created with the xml representation of this struct and send
back to the client.
4. soapH.h
contains the declaration/prototype of
all the functions created in the soapC.c file
Also declares the user defined types and assigns a number to them.
REQUISITES FOR USING THE SYSTEM
As we saw above that the user doesnt need to have any knowledge of
SOAP or XML in order to be able to use the seriailizer. The translation of C
data structer to SOAP/XML is transparent to the user. But if the user wants to
use it with some other SOAP enabled system he needs to know the XML data
format(as defined by XML Schemas). So for the interoperability purposes he
needs to know how the C data structures are mapped to XML.
2.2
MAPPING OF C DATA STRUCTURES to XML DATA FORMAT
The c data
structures are mapped in the following manner
1. Structures
Struct Record
{
Type1 field1;
Type2 field2;
Type3 field3;
TypeN fieldN;
}
will be mapped to
<Record>
<field1>
. </field1>
<field2>
..</field2>
<field3>
..</field3>
..
<fieldN>
.</fieldN>
</Record>
ARRAY
Type
array[size];
Will be mapped to
<array
xsd:type="Type[size]">
<type>array[0]</type>
<type>array[1]</type>
<type>array[2]</type>
<type>array[size-1]</type>
</array>
Basic
types
All the basic
types int, short, long, char etc. are implemented by mapping to XML with types
defined by xsi schema types (this is described in the SOAP document)
SOAP
Request and Response
Co-referenced
Objects
Struct X
{
char *name1;
char *name2;
} x;
x.name1 =
Bob;
x.name2 = x.name1;
xml output
will be
<X>
<name1 id = 45>Bob</name1>
<name2 href = #45/>
</X>
Namespaces
XML namespace
is defined as follows by the Namespaces in XML
[Definition:] An XML namespace
is a collection of names, identified by a URI reference, which are used in XML
documents as element types
and attribute names. XML
namespaces differ from the "namespaces" conventionally used in
computing disciplines in that the XML version has internal structure and is
not, mathematically speaking, a set.
The way we
are using namespaces is the user has to define a namespace table in the main
function which points to the definition the user may used and a soap namespaces
stack is maintained on which all the namespaces provided by the user are
stacked so whenever that word is used in the XML the URN is referred to. An ID
and a URN is associated with each namespace which are stored in the table all
the namespaces are included in the soap envelope and the ID is used at the time
of outputting the soap begin tag. At the time of deserialization first the
server populates the table with all the entries in the Soap envelope and then
whenever there is a reference to a particular namespace then the struct defined
by the URI specified in the table is used.
Note About Unions
We have found
earlier that unions can be replaced by structs. However,
this is at a
cost of memory. It is especially costly when the number
of
alternatives is large and each alternative can be a large data structure
(e.g. array).
Instead, one
can give a struct with pointer fields instead. The SOAP XML representation will
be the same, because the pointers are invisible!
For unions in
a C code, a user can translate
union U {
Tx x;
Ty y;
Tz z;
}
Into
struct U {
Tx *x;
Ty *y;
Tz *z;
}
(assuming
types Tx, Ty, and Tz are not already pointers.)
However, he
has to modify existing code that uses union U.
Reading a
SOAP form for this struct will work fine, because the fields are
initially set
to NULL. So the only field that was defined in the SOAP incoming
stream is not
NULL.
Currently in
the compiler, a struct of the form above will be output with null
pointers to
all the fields, except to the field that contains a pointer
to the actual
field of the union. This is not very elegant, but I don't know how
to do this
differently right now. I mean: the null pointers should not be
output ONLY
when dealing with fields in a struct, not in general.
3. EXAMPLES
3.1 Query
Example
This example uses a simple txt file, which stores a list of SSNs and
corresponding names.
This files resides on the server and the client queries the file using
SOAP.
So the server has a function lookup_SSN which looks up through the file
for the required SSN and returns the name corresponding to the SSN.
The stub.h file looks like this
struct result
{
char * name;
char * SSN;
};
int
lookup_SSN(double SSN, struct result *r);
so basically it has the declarations of the result function and the
prototype of the function. Notice that the last parameter in the function
prototype is struct result.
The result comes back with the SSN, name pair.
Test case : WE query for the SSN 59222222
The SOAP that is sent to the server
POST
/~gupta/server.cgi HTTP/1.1
Host:
www.cs.fsu.edu
Content-Type:
text/plain Content-Length: 212
SOAPMethodName:
lookup_SSN
<SOAP:Envelope
xmlns:SOAP="urn:schemas-xmlsoap-org:soap.v1"
xmlns:m="urn:mynamespace" xmlns:n="urn:hisnamespace">
<SOAP:Body>
<lookup_SSN>
<SSN>59222222</SSN>
</lookup_SSN>
</SOAP:Body>
</SOAP:Envelope>
The
SOAP received from the server is
HTTP/1.1
200
OK
Date:
Sat, 18 Nov 2000 21:07:43 GMT
Server:
Apache/1.3.4 (Unix)
Transfer-Encoding:
chunked
Content-Type:
application/x-httpd-cgi
104
<SOAP:Envelope
xmlns:SOAP="urn:schemas-xmlsoap-org:soap.v1"
xmlns:m="urn:mynamespace" xmlns:n="urn:hisnamespace">
<SOAP:Body>
<result><name
xsd:type="u:string">Koustubh </name>
<SSN
xsd:type="u:string">592222222</SSN>
</result>
</SOAP:Body>
</SOAP:Envelope>
In this example we query whos logged on the system
the server is running on.
The client queries there is just an integer passed
which really isnt used. The server returns the array of String (for space
limitation in this example the server is returning only the first 10 users
logged on)
The stub file is like this
struct
result
{ char * a[100];
int ain;
};
int
who_here(int i, struct result *r);
The soap request that is generated looks as follows
POST
/~gupta/server.cgi HTTP/1.1
Host:
www.cs.fsu.edu
Content-Type:
text/plain
Content-Length:
189
SOAPMethodName:
who_here
<SOAP:Envelope
xmlns:SOAP="urn:schemas-xmlsoap-org:soap.v1"
xmlns:m="urn:mynamespace" xmlns:n="urn:hisnamespace">
<SOAP:Body>
<who_here><i>3</i>
</who_here>
</SOAP:Body>
</SOAP:Envelope>
The server runs the
who system command and returns an array of strings(only the first 10 are being
returned)
HTTP/1.1 200 OK
Date: Sat, 18 Nov
2000 21:03:16 GMT
Server:
Apache/1.3.4 (Unix)
Transfer-Encoding:
chunked
Content-Type:
application/x-httpd-cgi
4ac
<SOAP:Envelope
xmlns:SOAP="urn:schemas-xmlsoap-org:soap.v1"
xmlns:m="urn:mynamespace" xmlns:n="urn:hisnamespace">
<SOAP:Body>
<result><a
xsd:type="u:string[100]"><string
xsd:type="u:string">curci
pts/48 Nov 3 21:45&#x;(dial972.acns.fsu.edu)&#x;</string>
<string
xsd:type="u:string">whalley
pts/0 Oct 12
06:43&#x;(protoss)&#x;</string>
<string
xsd:type="u:string">whalley
pts/1 Oct 12
06:43&#x;(protoss)&#x;</string>
<string
xsd:type="u:string">pant
pts/2 Nov 16
16:43&#x;(128.186.111.178)&#x;</string>
<string
xsd:type="u:string">whalley
pts/3 Oct 12
06:44&#x;(protoss)&#x;</string>
<string
xsd:type="u:string">engelen
pts/4 Nov 7 09:25&#x;(taz)&#x;</string>
<string
xsd:type="u:string">whalley
pts/5 Oct 12
06:50&#x;(protoss)&#x;</string>
<string
xsd:type="u:string">whalley
pts/6 Oct 12
06:50&#x;(protoss)&#x;</string>
<string
xsd:type="u:string">whalley
pts/7 Oct 12
06:50&#x;(protoss)&#x;</string>
<string
xsd:type="u:string">schwartz
pts/9 Oct 14
08:42&#x;(du1)&#x;</string>
</a>
<ain>0</ain>
</result>
</SOAP:Body>
</SOAP:Envelope>
4. Implementation
4.1 Functions generated for each type
void
soap_serialize_Struct_lookup_SSN (struct lookup_SSN *p)
void
soap_mark_Struct_lookup_SSN(struct lookup_SSN *p)
void soap_default_Struct_lookup_SSN(struct lookup_SSN
*p)
void
soap_put_Struct_lookup_SSN(struct lookup_SSN *p)
void
soap_out_Struct_lookup_SSN(char *tag,int id,struct lookup_SSN *p)
struct lookup_SSN *
soap_get_Struct_lookup_SSN(struct lookup_SSN *a)
struct lookup_SSN *
soap_in_Struct_lookup_SSN(char * tag,struct lookup_SSN *p)
soap_serialize serialzes the type and resolves the pointer references,
both a forward reference/ bac reference. It does it using the soap_mark routine
which marks the element as being referred or refering
soap_default?
soap_put : decides how to output the element in xml , if it is
referred, or if it has a back reference and
calls the appropriate output method accordingly soap_element_ref or
soap_out with a id or soap_ref without a id.
soap_out: constructs the actual xml ouput with the tag name, id and the value. Note that this involves calling xml_out functions for the primitive type. Whether the xml will be actually ouput or used just for counting is determined by a flag COUNT if it is 0 then the counter is incremented with each character that would be printed if not then the characters are actually written to the stream. We need this in order to determine the actual length of the xml file that will be generated because it is required to generate the header at the time of creating a request
The compiler loops through the type table and when it comes across a
type that is not a primitive type, or for which the functions haven't been
already generated then it generates the function for it
It does so recursively i.e., if the type further uses a user defined
type for e.g, pointer to a type T where T is again a type which is not
primitive and for which the functions haven't been generated then it'll first
generate the functions for T and then the function for pointer to T. So first
functions for all composite types are generated.
For each function that is written to soapC.c a prototype is written to
soapH.h
and with each type a new int is assigned to the type and added to the
soapH.h
4.3 Calls to the functions in the soapClient.c and soapServer.c
soapClient.c has soap_call_lookup function that takes the parameter
that the function takes and also the URL of the server.
int soap_call_lookup_SSN(char * URL, double SSN, struct result * _R)
creates a struct (s) to
send the parameters to the server populates the fields with the fields passed
to the function.(s.SSN = SSN)
serializes the struct
soap_begin();
soap_serialize_Struct_lookup_SSN(&s);
it then creates the xml in two steps first creating the header, for
which it needs the length of the file this is taken care of by using a two
phase algorithm in which whether the output is for counting all actually
sending it to stream is decided by a flag it is set by using functions
begin_print(0 and begin_count(). For creating the header first begin_count is
called and then soap_envelope is created
Add stuff about soap_envelope
begin_count();
soap_envelope_begin_out();
soap_put_Struct_lookup_SSN(&s);
soap_envelope_end_out();
this calculates the length.
then header is generated
begin_print();
generate_header(URL,"lookup_SSN");
followed by sending the actual xml body for the struct
soap_envelope_begin_out();
soap_put_Struct_lookup_SSN(&s);
soap_envelope_end_out();
It then waits for the reply from the server.
Once it gets the reply it absorbs the reply and populates the struct result it
took as the parameter.
soap_element_begin_in("SOAP:Envelope");
soap_element_begin_in("SOAP:Body");
soap_get_Struct_result(_R);
soap_element_end_in("SOAP:Envelope");
soap_element_end_in("SOAP:Body");
soapServer.c
soapServer.c has a int soap_serve_F() for each function F and then it has a function
soap_serve()
soap_serve()
It picks the right function. it does that by
basically making using the int returned by the soap_serve_F function. It uses
nested if statements. If the returned val is 0 for any of the functions that
means that function succeeded and then the function will end else if it drops
down to the last statement that means that the function you're looking for
wasn't found.
soap_serve_F()
this function does the same two steps as the
soap_client_F but in the opposite order in that it first waits for the request.
Absorbs the request and populates a local struct with the parameters from xml.
soap_element_begin_in("SOAP:Envelope");
soap_element_begin_in("SOAP:Body");
soap_get_Struct_lookup_SSN(&s);
soap_element_end_in("SOAP:Envelope");
soap_element_end_in("SOAP:Body");
then it makes a call to the function using
the parameters from the request and the refernce to a struct result created
locally.
lookup_SSN(s.SSN,&r);
after the call the struct result holds the
result that need to be send over to the client.
first step is two serialize and then to
output the struct result
soap_begin();
soap_serialize_Struct_result(&r);
This again involves sending the length of the
file and the envelope and the body. This is done by two step method of first
counting the length by calling begin count
begin_count();
soap_envelope_begin_out();
soap_put_Struct_result(&r);
soap_envelope_end_out();
send the output
begin_print();
soap_envelope_begin_out();
soap_put_Struct_result(&r);
soap_envelope_end_out();
5. Concluding Remarks
Our XML serializartion and XML-RPC stub generation works well to achieve
effective data interoperability between disparate applications. Semantic data
interoperablility, however, may require the modification of data to meet
specific constraints of an application that handles the data. The group will
further investigate methods to automatically (re) map XML into modified forms
given a specification of constraints.
6. References
W3C Note 08 May 2000 by
Don Box, David Ehnebuske, Gopal
Kakivaya, Andrew Layman,
Noah Mendelsohn, Henrik
Frystyk Nielsen, Satish Thatte,
Dave Winer
5. R. van Engelen, K. Gallivan, G. Gupta, and G. Cybenko, XML-RPC Agents for Distributed Scientific Computing in IMACS'2000 Conference, Lausanne, Switzerland, August 2000.