One of my projects in the past few weeks has been to put together a SOAP server for a client. So suddenly I've had to learn a lot of the nitty gritty details about what works and what doesn't...
While they're fresh, let me jot them down here. WARNING: Extremely technical content ahead.
First of all, SOAP is supposed to stand for "Simple Object Access Protocol." It's anything but simple. There is a lot of SOAP software out there, but subtle implementation gotchas that can be quite difficult to figure out.
We chose the native PHP SoapServer in PHP 5.2 to implement the project, mainly because we're a PHP shop, and a little smoke testing revealed it was quite quick to get set up and going. It turns out that it's quite hard to debug. For its good points, it can read in a WSDL and automatically map methods to methods on a class, and it converts arrays, simple objects, or complex objects to a valid response object, and request objects into simple or complex objects on the incoming side.
Problems with PHP's SOAP Server:
- No validation of incoming or outgoing documents.
- No warnings, exceptions, or errors if it can't convert a document to fit the schema--it just dies.
- No debugging information about what it's doing.
- No ability to manage namespaces, especially if they need to be copied from the SOAP envelope into the payload.
- Difficult to test.
- No access to the raw XML of either the request or the response.
Using PHP's SoapServer is quite simple, except when things aren't perfect...
Here's what the code looks like for the simple case:
$xml = $GLOBALS['HTTP_RAW_POST_DATA']; // or file_get_contents('php://input');
// make sure you have something to process, throw an error if $xml is empty
$soap = new SoapServer('http://path/to/your.wsdl');
That's basically it. You declare methods on 'mySoapClass' that correspond to the SOAP methods. These handler methods receive a simple object as a parameter, and you can do whatever you need to do with that data. Then it needs to return some data structure that can be serialized to the expected type defined in the WSDL. The return data structure can be an array, a simple object, or an object of a class you define that can serialize appropriately.
Great. With this much, it took me about a day to have a working web service with 8 methods and a bunch of complex data objects. The problems started when people connected with different SOAP software.
The web service I was implementing defines a specific SOAP Fault document, so if I did run across a problem, I could simply throw an exception of that type. My wrapper object kindly passed the custom fields defined in the WSDL.
Problem #1: Validation
As I said before, there is none. If the SoapServer gets anything it doesn't like, it doesn't send any response at all. And since all you get inside your method handlers is an already-converted object, you don't have any way to validate the response without using a global variable or a call to a singleton.
In our case, the project specified that we strip out the payload from the SOAP header and store the payload XML on the disk as a document for several methods, and do processing on other methods. Processing a SOAP request was not a problem. Storing a valid XML document was. Several methods just stored the XML on the disk, with another method retrieving it and returning it to the caller. The problem we had was that the Soap Response was more picky than the Soap Request--so documents that we loaded from the disk and returned as the response would fail with no explanation.
Our solution was to load the raw XML into a DOM Document, and validate it against the schema. This mostly worked, until we had to deal with a request generated from Jitterbit. More on that later.
The question is, what to validate? We weren't supposed to store the entire SOAP envelope--just the payload. So how to extract it? The simple way was to grab the first child of the Body element, append it to the DOMDocument itself, remove the original root, and call the normalize() method. This did generate a warning, but not a fatal error, and did the right thing. Furthermore, we could also access the raw libxml validation specifics, by calling libxml_use_internal_errors(true), and then when a document fails to validate, using libxml_get_errors() and libxml_get_error() to grab the details.
Problem #2: Returning valid XML responses
The root of all of our problems in this project has to do with where namespaces are defined. One limitation of libxml appears to be that you can only point it to one schema for validation at a time. We can validate against the SOAP Schema, or our custom schema, but not both at the same time, unless one includes the other. So our validation options consist of:
- Include the SOAP Schema in the custom schema, and validate the entire SOAP body, or
- Extract the payload from the SOAP body, and validate only that against our schema.
#2 is clearly the correct way--we really don't care about the SOAP envelope once we have the message. But the problem is, many SOAP clients put the namespace declarations on the SOAP Envelope, and not the payload root element. In fact, the XML generated by the PHP SoapServer class does this itself.
So our first task was to generate the proper XML Namespace declarations on our generated payloads.
To do this, we could no longer rely on the PHP SoapServer's automatic conversion of simple objects or arrays to XML--we had to generate our own XML, and tell the SOAP server to use that instead. This turned out to be difficult to track down, so here's the answer:
/* Serialize XML as desired here.
Omit the XML declaration, start with the root element
You can also simply build the XML as a string from object properties
$xml = $this->myDom->saveXML();
preg_match('%(.*)$%s',$xml,$match); //strip XML declaration
// snip to end of actual SOAP handler method:
$out = new SOAPVar($data->toXML(),XSD_ANYXML);
The SOAPVar object allows you to control the output of the SoapServer a bit better, with support for namespaces, raw XML, or casting objects in a certain way.
Problem #3: Jitterbit
Validating a clean payload without namespace prefixes seems to work fine. Our technique of moving the nodes around with the DOM seemed to keep most of the namespaces intact, so even if there was no xmlns declaration on the payload root, we could still validate just the payload effectively, and serialize/de-serialize without issue.
Except for Jitterbit.
Now, I loaded up Jitterbit yesterday, and it didn't do this--this might be a problem with an earlier version. The problem is, Jitterbit is extremely verbose, specifying not just a namespace prefix on every element, but also an xsi:type. And even that's not enough to break it--except that the value for its xsi:type also contained a namespace declaration. And if this namespace declaration was not the root namespace, suddenly our validation broke.
It broke for us on types declared as ns:token, ns:string, ns:integer -- the simple types specified by XSD itself, which Jitterbit put into a namespace prefixed with ns: and declared on the SOAP Envelope.
For example, here's the start of a problem document:
The first validation error was on the VendorID, with xsi:type="ns:token". If I copied xmlns:ns="http://www.w3.org/2001/XMLSchema" into the tns1:submitPO element, it validated fine. The PHP DOMDocument seems to be able to keep track of namespaces on elements and attribute names even after the envelope is gone. But not attribute values.
After hours of banging on this, we came up with 3 workarounds for this:
- Completely regenerate the XML, after processing. To do this, we would need to create a custom data class for each incoming object, provide a classmap to the SoapServer, and then generate brand new XML out of the data object. This is perhaps the best approach, but I didn't think of it until the project was over--I was thinking about writing out the data to the database and then loading our custom objects and serializing them as we do for our responses. The biggest drawback here is that we need to model the entire complexity of the request, as allowed in the schema. And this was a really complex object... lots of work to implement, when we're only going to store this XML for passing to other systems.
- Hack the XML to get the offending namespace into the stored document. This turned out to be easy to program, but uses lots of CPU resources--DOMDocuments are expensive to use. It's also the most brittle approach, only catching this single case--if the namespace prefix changes, or a different required namespace is necessary, it'll break. To do this, we created a new DOM Document, imported the root node of the payload, appended it to the document, and used setAttribute to set an "xmlns:ns" attribute on the root. This did not actually get the namespace recognized for validation, and normalizeDocument did not fix it--but creating a third DOMDocument, and doing $doc->loadXML($doc2->saveXML) did make the namespace recognized by the object so we could successfully validate.
- Hack the XSD to validate the entire SOAP request. By including the SOAP schema in our custom schema (using xs:import), we could validate the raw SOAP request, then extract the payload and save it. The saved XML does not validate on its own, but we know it validates. So we can remove our validation check on the outgoing document, and as long as the other system does not explicitly validate the standalone XML, we're okay.
Whew. Hope this helps somebody...