All my resources are also DB requests and performance suffers

Poster Content
nk4um Moderator
Posts: 901
February 8, 2012 09:28

Hi Gary,

Posted by gary.sole (View)
I suppose my question is - what is the appropriate ROC type approach. Based performance issues, we tend to start leaning back towards RPC type approaches that say - for a given resource request this is your result optimized for the request. So in essence. Don't compose resource that are expensive to fetch. Instead build resources according to what your client needs and satisfy the performance needs.

I'm afraid there's no magic bullet to this scenario. Just good old fashioned engineering compromise. In an ideal world going out to the system-of-record would be instantaneous - in which case you would have no penalty in a completely atomic fine grained design - every resource could be an atom of state and then build these up to a composite.

But the real world isn't instantaneous. Therefore you have to look at your data as a set of state. What you really try to aim for with an outbound request, is to source a set of state that is not too big and not too small.

That is, it may not be the normalized state for one resource channel - it may be a superset that will satisfy several channels independently.

The ultimate logical resource channels can be implemented to use tree processing operations on the results to slice and dice the set up to the provide the required state. The RDBMS tools bring back HDS trees, which can be efficiently chopped up into pieces using a combination of XPaths.

As Jeff says - you can generally find a certain amount of common state that will cache nicely. In your case you say you have rapidly changing data - in which case maybe you can introduce transient cacheability (with time based expiry) to provide a semi-stable balance.

One further point - you seem to have a system where parallelising the requests to the DB layer will help a lot. FYI Tony is, as we speak, working on an update to XRL to add the same async requests as we added to TRL. So you'll be able to still use XRL for composition but do it with an async fanout pattern. Of course this means you will definitely need to have a lot of spare threads in your kernel pool.

P.

Like · Post Reply
nk4um User
Posts: 112
February 7, 2012 23:31All your resources are belong to cache

Hey Gary,

We have a similar problem with our protege exporter - there is one resource to get a list of all instances, and another resource that gets the data for a particular instance. To get a whole class you just loop over the list and make a subrequest for each item. It's a nice clean approach, and it works pretty well for small classes. But some of our classes have a lot of instances (10,000 in one case) and even fairly quick requests add up when you have that many. And the exports need to be kept up to date. So what we did is a lot of caching. We not only cache the final export, but also the list of instances and the individual instances. Then when we get an update, we cut the appropriate golden threads so that most of the cached data is still valid, and only a handful of subrequests need to go beyond retrieving from cache. So an initial export might take 30s, but subsequent requests for the same data without any updates get a complete version out of cache, and if there was an update then the big result is expired but most of the subrequests are still valid so it takes maybe a second to rebuild it. This (mostly) isn't for a webpage, so we can get away with these long times generally.

However, even this can get to be a bit much (generating a big xml tree takes lots of memory), so we've been discussing if we could export smaller items - the instances - separately, and rely on the consumer to aggregate them as needed.

So in that vein, one possible approach is to expose the subrequests as part of the api also, and rather than have the application request '/customer/12345' and get back all possible information about the customer, have it return the list of orders, when can then be requested separately, as needed (i.e., if the webpage is making ajax calls to retrieve the details for a particular order). You could have your '/customer' accessor fork off subrequests for all the orders and lineitems to pre-warm the cache, but that doesn't work as well as it could (Hey Tony, where's my Super-ultra-mega-cache overlay?) Or otherwise only call the sub-requests lazily.

Like · Post Reply
nk4um User
Posts: 131
February 7, 2012 21:06it depends

Greetings Gary,

Of course, the answer in the subject doesn't really help (although it is the truth). From your explanation I deduce you're working with a relational database ? Then I'd add database paging at every level. Check the indexes while you are at it.

Also, while small is beautiful, it won't do to have to do a couple of thousand requests (I'm probably exaggerating for your case, but I've seen people code that way, blindly using a services that they were told to use) for a single "screen" with orderlines. You do need the single line "services", but not for your overviews (only for inserts/updates/deletes). Rule of thumb, 10 - 30 subsecond requests for one screen is plenty.

It is a bit like walking a tightrope. Check closely where caching is possible/acceptable and use it.

Those are all "general" tips, forgive me if I'm stating the obvious. I would need more detail to help you better.

Regards, Tom

Like · Post Reply
nk4um User
Posts: 92
February 7, 2012 17:02All my resources are also DB requests and performance suffers

So we have gone down the route of making lots of small resources. Each of the resources has a database query and is styled into the logical result we wish to present. Each of these resources performs sub-second. Then we start composition. The final logical result is a composition of a bunch of smaller items. Because we are dealing with sets of data we are making many small requests to the point where performance starts to add up and we get issues. Take a simple example res:/state/MN - returns a representation of Minnesota very quickly. (100ms) But then I want a list of all states with res:/states - I query my list of all state abbreviations and then do a XRL call into each res:/state/<xsl:valueof ... etc. This now goes 50*100ms and I end up with 5 seconds. This is a very simple example and I fixed it by having my res:/states not call into res:/state, but rather just use the styling from res:/state to style its data. As we start to move into more complex structures of both data and structure things get more complicated. I can have a res:/topLevelResource that has a set of res:/secondLevel resources which might in turn have a set of res:/thirdLevelResources. E.g Customer->Orders->LineItems. With state, we could say that we can suffer the first 5 seconds the first time into cache and then it will perform well after that, but with customer the data changes frequently and there are a lot of different customers meaning that cacheing is not really an option.

I suppose my question is - what is the appropriate ROC type approach. Based performance issues, we tend to start leaning back towards RPC type approaches that say - for a given resource request this is your result optimized for the request. So in essence. Don't compose resource that are expensive to fetch. Instead build resources according to what your client needs and satisfy the performance needs.

Any other alternatives?

Like · Post Reply