Blogxter: A Cloud Tools Manifesto
A Cloud Tools Manifesto
Oct 2, 2009 6:43:16 PM, steve

Dedicated to the belief that tooling that works reliably can only be achieved by good designs and adequate testing

There's been lots of discussion about what people who deploy their applicationss in "the cloud" want -it gets fairly political, as the Open Cloud Manifesto shows. That manifesto contains requirements about portability that are anathema to people hosting applications directly in their infrastructure -Google, Microsoft, and I worry about it too. I worry about the HTML source of the manifesto -have you seen it? Scary. Someone has gone to a lot of effort to make a web page look like a document.

Having read the manifesto, I cannot help but smirk when I read the a bit about re-using existing standards and judicious creation of new ones. This manifesto came from the same company that gave the world WS-ResourceFramework? Either they have learned the error of their ways, or WS-RF is one of the existing standards they have in mind.

Anyway, I'm not going to comment on it in detail, except to say I'm working at a different level, namely the problem of getting client code to work with different clouds with different machine management APIs. Listing what you want to do with the machines, that's something to worry about in a different place -my concern is what be can be done to work with the infrastructures, what do tool authors need from the service providers.

Here, then, is my Cloud Tools Manifesto. I have no signatories for it yet, having recently written it. I did post a draft to the Typica list, where there's been recent talk of a mock EC2 stack. One of the responses was an invitation to get involved in the Open Cloud Computing Interface (OCCI) group, which is part of the Open Grid Forum. I was most amused, for reasons some people (Savas) should recognise. Time spent with Savas and Jim were the best parts of the GGF process.

Cloud Tools Manifesto

API requirements

  1. Provide enough information about library/protocol calls that anyone can implement tools to work with the infrastructure -any vendor specific tooling is the start, not the finish
  2. Licensing of the header files, any other parts of the specification, should not prevent open source -any license- or closed source implementations.
  3. Do not require that the tooling is implemented in a specific language, using a specific SOAP library, on a specific OS. There may be restrictions on what the infrastructure can run, but that does not need to affect the tools used to get that code working.
  4. Where possible, use underlying protocols/specifications that are, by virtue of their stability in the field or rigorous specification and test suite, highly interoperable.
  5. When XML is used, generate well-formed XML 1.0 in responses and error messages.
  6. Parse XML formatted responses in a proper XML parser; not be brittle to different XML encodings.
  7. If you add a new authentication method to HTTP, provide the relevant patches and tests for popular libraries such as Apache HttpClient.
  8. Have a structured form for error messages. SoapFaults, ugly as they are, are something you can chuck back on HTTP responses.
  9. Provide stable constants for some failure modes (no auth, no credit, not enough machines), document them online, ideally with a URL rule that lets us take an error name "e_no_auth" and map to some documentation such as "" . Better yet, make the URL the fault constant, as it includes explicit namespace information.
  10. List the constants, in machine parseable XML as well as HTML, so that XSL transforms can generate language-specific error constants
  11. If you adopt someone else's cloud machine management API, retain their error response structure and the error codes. You may need to add new faults, in which case they should be your own URLs. If you do not provide faults that parse the same way as the original API, you have not implemented the API.
  12. Include API/build version data in the error. This is very useful for fielding bugreps against rapidly evolving implementations.
  13. Don't assume the caller's clock is accurate, it really makes testing under VMWare tricky
  14. If the service is somehow sensitive to clock times, provide a documented means for a caller to easily determine your endpoint's view of system time; this can be used to calculate the offset the client needs to apply to its own clock.
  15. Provide some means of contact with people who can help debug interoperability problems. Good: Forums, email, issue trackers. Not acceptable: requiring the developers to travel round the world for meetings and conferences.
  16. Where possible, engage us in discussions about API futures. NDAs and conflicts of interest complicate this, but it would still be useful.
  17. Listen to our feedback. If something is hard for us to test, it probably doesn't work right in our code.


Mock Endpoint

Provide a mock endpoint that:

  • Has the same API and error responses as the production endpoint
  • Simulates the allocation/release of VMs and other assets, validates all requests
  • Can be set up by a caller to fail for the next request from a specific account, with a specific failure.
  • Is free to use to everyone with an account.
  • Can be used by test accounts whose authentication details aren't required to be kept a secret. This would let us embed the tests in open source releases, run on hudson, etc.
  • If the mock endpoint can be redistributed as a program , a library or a VM Image, provide a means of downloading or hosting it for independent testing.

Note that while we create our own mock endpoints -and often do, those mock endpoints will contain our assumptions about the API, our beliefs on what the failure messages will be. A mock endpoint provided by the production team would fail in the ways the production team expect things to go wrong, and be more rigorous.

Production Endpoint

On the production endpoint

  • Provide discounted/free machines to the test tool teams. These can be massively underpowered VMs, as we are normally simulating complex systems, not doing real work. That we can pay for. It's the unit tests that run up our bills; creating and destroying machines all the time.
  • Offer access to forthcoming features/API versions, NDAs permitting

Nightly builds

If the infrastructure team has an automated build process with a staging cluster, consider:

  • Offering the tooling developers access to this endpoint, so that they can report problems sooner rather than later.
  • Running a local copy of the tooling against the development branch's endpoint, as part of the CI process.
  • Adding open source tools build and test runs to the CI server's build and test process. This helps find interop problems with the trunk versions of everyone's code.

Our Obligations

In exchange we agree to:

  1. Read your documentation and look at your examples before getting into trouble.
  2. Write code that usually appears to work.
  3. When XML is needed, generate well-formed XML.
  4. Parse XML formatted responses in a proper XML parser.
  5. Document our client for others to use.
  6. Provide our client identification/version info in an HTTP header.
  7. Write functional tests.
  8. Test our code against your endpoints.
  9. Test our code against your endpoints with a proxy in the way.
  10. Write code that fails in some vaguely useful way when things go wrong.
  11. Write code that provides diagnostics information when things go wrong, so as to help in blame assignment when something does not work. For example, list the endpoint, proxy settings, client code, dump the error response.
  12. Have an option to log interactions with the server in more detail.
  13. Write client applications that can be switched to different endpoints, such as the test clusters, or third-party implementations
  14. Where the far end requires the caller's clock to be roughly similar, get the system time from the far end and use the calculated offset to drive the timestamps.
  15. Not to cache DNS values indefinitely; to assume that hostnames move around.

There, that isn't asking for too much, is it?

(C) 2003-2006 1060 Research Limited
1060 Registered Trademark and NetKernel Trademark of 1060 Research Limited
Powered by Netkernel