TDIing out loud, ok SDIing as well

Ramblings on the paradigm-shift that is TDI.

Thursday, February 5, 2015

Easily implementing a monitoring API to your TDI solution

All you have to do is make AssemblyLines with the names of the operations you want to expose.

For example, I have a solution that catches incoming events over TCP  and dispatches the payload data to one or more 'event triggered' AssemblyLines. Like all server mode based ALs, this one uses multiple threads to deal with client traffic (the AL Pool), and shared log output can get a little convoluted. This makes monitoring and troubleshooting tougher.

So I leveraged a web server functionality first found in SDI 7.2.0.2 and TDI 7.1.1.4 that lets me run an AL named 'status' in a Config named 'MyRestAPI" by dialing up this URL in a browser.


The MyRestAPI.xml file needs to be in the <tdi soldir>/configs folder.

My 'status' AL has a single script that grabs some objects shared between the AL threads.

      // Get the shared objects - both are Javascript objects
      metricsObj = java.lang.System.getProperties().get("metricsObj");
      errorsObj = java.lang.System.getProperties().get("errorsObj");

      // Make the return payload
      returnPayload = {
            status: (metrics == null) ? "Not running" : "Ok",
            metrics: metricsObj,
            errors: errorsObj
      }

      // Set up HTTP attributes for the reply to the client
      work["http.body"] = toJson(returnPayload);
      work["http.content-type"] = "application/json";
      work["http.responseCode"] = "200";

Whenever an event is serviced by the TCP Listener AL, the metricsObj object is updated by calling metrics.gather(endmsecs - startmsecs), with the msec variables having been set before and after dispatching an event to its handler AL. A scripted logmsg() function keeps track of "ERROR", "FATAL" and "WARN" level log messages in the shared 'errorsObj' object.

The result message looks like this:

{
"status" : "Ok",
"metrics": [
"eventName": "Auth Voilation",
"duration": {
"max": 1,
"min": 1, 
"avg": 1, 
"total": 1
},
"responses": 1 
},
{
"eventName": "Health Metric",
"duration": { 
"max": 16,
"min": 0,
"avg": 0.3785425101214575, 
"total": 187
},
"responses": 494 
},
"errors": [], 
}

I have also added an 'admin' AL that looks for the query string parameter 'pause', and if the value is 'true' it sets a flag in metrics that causes the capture() function hang until its value is 'false' again. Your imagination is the limit here. That of course and time.

Note that the port used by SDI web services, as well as other comms settings, are found in the solution.properties file.

## Web container
web.server.port=1098
web.server.ssl.on=true
web.server.ssl.client.auth.on=false

Below these are the properties for the SDI dashboard (https://localhost:1098/dashboard).

No comments: