|
nk4um Moderator
Posts: 770
|
2012-02-08T09:28:31.500ZFebruary 8, 2012 09:28
Hi Gary,
Posted by gary.sole ( View)
I suppose my question is - what is the appropriate ROC type approach. Based performance issues, we tend to start leaning
back towards RPC type approaches that say - for a given resource request this is your result optimized for the request. So
in essence. Don't compose resource that are expensive to fetch. Instead build resources according to what your client needs
and satisfy the performance needs.
I'm afraid there's no magic bullet to this scenario. Just good old fashioned engineering compromise. In an ideal world going out to the system-of-record would be instantaneous - in which case you would have no penalty in
a completely atomic fine grained design - every resource could be an atom of state and then build these up to a composite.
But the real world isn't instantaneous. Therefore you have to look at your data as a set of state. What you really try
to aim for with an outbound request, is to source a set of state that is not too big and not too small.
That is, it may not be the normalized state for one resource channel - it may be a superset that will satisfy several channels
independently.
The ultimate logical resource channels can be implemented to use tree processing operations on the results to slice and dice
the set up to the provide the required state. The RDBMS tools bring back HDS trees, which can be efficiently chopped up
into pieces using a combination of XPaths.
As Jeff says - you can generally find a certain amount of common state that will cache nicely. In your case you say you
have rapidly changing data - in which case maybe you can introduce transient cacheability (with time based expiry) to provide
a semi-stable balance.
One further point - you seem to have a system where parallelising the requests to the DB layer will help a lot. FYI Tony
is, as we speak, working on an update to XRL to add the same async requests as we added to TRL. So you'll be able to still
use XRL for composition but do it with an async fanout pattern. Of course this means you will definitely need to have a lot
of spare threads in your kernel pool.
P.
|
|
nk4um User
Posts: 111
|
Hey Gary,
We have a similar problem with our protege exporter - there is one resource to get a list of all instances, and another resource
that gets the data for a particular instance. To get a whole class you just loop over the list and make a subrequest for
each item. It's a nice clean approach, and it works pretty well for small classes. But some of our classes have a lot of
instances (10,000 in one case) and even fairly quick requests add up when you have that many. And the exports need to be
kept up to date. So what we did is a lot of caching. We not only cache the final export, but also the list of instances
and the individual instances. Then when we get an update, we cut the appropriate golden threads so that most of the cached
data is still valid, and only a handful of subrequests need to go beyond retrieving from cache. So an initial export might
take 30s, but subsequent requests for the same data without any updates get a complete version out of cache, and if there
was an update then the big result is expired but most of the subrequests are still valid so it takes maybe a second to rebuild
it. This (mostly) isn't for a webpage, so we can get away with these long times generally.
However, even this can get to be a bit much (generating a big xml tree takes lots of memory), so we've been discussing if
we could export smaller items - the instances - separately, and rely on the consumer to aggregate them as needed.
So in that vein, one possible approach is to expose the subrequests as part of the api also, and rather than have the application
request '/customer/12345' and get back all possible information about the customer, have it return the list of orders, when
can then be requested separately, as needed (i.e., if the webpage is making ajax calls to retrieve the details for a particular
order). You could have your '/customer' accessor fork off subrequests for all the orders and lineitems to pre-warm the cache,
but that doesn't work as well as it could (Hey Tony, where's my Super-ultra-mega-cache overlay?) Or otherwise only call the sub-requests lazily.
|
|
nk4um User
Posts: 131
|
2012-02-07T21:06:14.470ZFebruary 7, 2012 21:06it depends
Greetings Gary,
Of course, the answer in the subject doesn't really help (although it is the truth). From your explanation I deduce you're
working with a relational database ? Then I'd add database paging at every level. Check the indexes while you are at it.
Also, while small is beautiful, it won't do to have to do a couple of thousand requests (I'm probably exaggerating for your
case, but I've seen people code that way, blindly using a services that they were told to use) for a single "screen" with
orderlines. You do need the single line "services", but not for your overviews (only for inserts/updates/deletes). Rule of
thumb, 10 - 30 subsecond requests for one screen is plenty.
It is a bit like walking a tightrope. Check closely where caching is possible/acceptable and use it.
Those are all "general" tips, forgive me if I'm stating the obvious. I would need more detail to help you better.
Regards,
Tom
|
|
nk4um User
Posts: 76
|
So we have gone down the route of making lots of small resources. Each of the resources has a database query and is styled
into the logical result we wish to present. Each of these resources performs sub-second.
Then we start composition. The final logical result is a composition of a bunch of smaller items. Because we are dealing
with sets of data we are making many small requests to the point where performance starts to add up and we get issues.
Take a simple example res:/state/MN - returns a representation of Minnesota very quickly. (100ms) But then I want a list
of all states with res:/states - I query my list of all state abbreviations and then do a XRL call into each res:/state/<xsl:valueof
... etc. This now goes 50*100ms and I end up with 5 seconds. This is a very simple example and I fixed it by having my res:/states
not call into res:/state, but rather just use the styling from res:/state to style its data.
As we start to move into more complex structures of both data and structure things get more complicated. I can have a res:/topLevelResource
that has a set of res:/secondLevel resources which might in turn have a set of res:/thirdLevelResources. E.g Customer->Orders->LineItems.
With state, we could say that we can suffer the first 5 seconds the first time into cache and then it will perform well after
that, but with customer the data changes frequently and there are a lot of different customers meaning that cacheing is not
really an option.
I suppose my question is - what is the appropriate ROC type approach. Based performance issues, we tend to start leaning
back towards RPC type approaches that say - for a given resource request this is your result optimized for the request. So
in essence. Don't compose resource that are expensive to fetch. Instead build resources according to what your client needs
and satisfy the performance needs.
Any other alternatives?
|