Recovery of REST interfaces is an interesting research topic which came into conversation with one of our industrial partners and which I would like to explore. This is also something mentioned by Maleshkova et al in their work on “Investigating Web APIs on the World Wide Web” (pre-print, IEEE citable). The authors claim that “two thirds of the APIs do not state the data-type of the input and 40% of the APIs do not state the HTTP method. If a standard interface description language, such as WSDL, [was] used […] this would be unthinkable”.

This is very true. Read on to know more.

While SOAP services are always backed up by their WSDL files which describe the interfaces, including all the methods and data types, the much more popular REST services have no such feature. Instead, developers are left with manually created web pages which describe in natural language how to invoke the web API and what the endpoints are; case in point: Facebook, Twitter and Netflix. But what about when web API developers forget to change the documentation, or even worse, make a mistake when specifying the interface? This is when having automatic recovery of the REST interface would prove useful.

The possibilities are endless. This recovery can be done on either client or server side (or both) and each of the sides can add a different view and understanding of the communication. Where with a SOAP-generated WSDL file we know that whatever is in the WSDL file is what’s available in the system, no more, no less, by doing recovery of a REST interface we can learn many things:

  1. Client-side recovered interface: whatever is in the recovered interface may or may not be all that’s available on the server-side. At any rate, we know that whatever is in the recovered interface represents whatever features we are using of that particular web API (because if we recovered it from runtime data, then our client has at some point used these features).
  2. Server-side recovered interface: once again, because we’re recovering the interface from usage data (and even though on the server-side you could probably do a full recovery based on static analysis), whatever we see in the recovered interface is in fact whatever is being used by ALL our clients. This means that if you use long periods of runtime data to recover the interface, you can detect potential dead code. This is very much the same principle behind code coverage, except we’re testing interface coverage by client code.

This is, of course, in a very premature stage and in my mind only but it is a thought I wanted to put out there and maybe stir the discussion around this topic a little. If you have any thoughts on this feel free to drop me a comment!