cjmx: A command-line version of JConsole

JConsole is a nice tool when it comes to monitoring a running Java application. But when it is not possible to connect to a JVM with JConsole directly (due to network restrictions for example) and SSH tunneling is not possible, then it would be great to have a command line version of JConsole.

jcmx is such a command line version of JConsole. After having downloaded the single jar file cjmx_2.10-2.1.0-app.jar you can start it by including the tools.jar into the classpath:

java -cp $JAVA_HOME/lib/tools.jar:cjmx_2.10-2.1.0-app.jar cjmx.Main

This will open a “JMX shell” with the following basic commands:

  • help: This shows a basic help screen that explains the available commands.
  • jps/list: Like the jps tool from the JDK this command prints out all java processes with their process id.
  • connect: You can use this command to connect to a running JVM process.
  • format: Let’s you specify whether you want your output in a simple text format or as a JSON string.
  • exit: Quits the application.

To learn more about cjmx let us start a session and connect to the JVM that is running cjmx itself:

> jps
13198 cjmx.Main
> connect 13198
Connected to local virtual machine 13198
Connection id: rmi://0:0:0:0:0:0:0:1  2
Default domain: DefaultDomain
5 domains registered consisting of 19 total MBeans
>
describe     disconnect   exit         format       help         invoke       mbeans       names        names        sample       select       status

After the last appearance of > you see a great feature of cjmx: auto-completion. Every time you do not know which commands are available, you can just type [TAB] and cjmx will list them. This even works for MBean names as we will see.

Now that we are connected to our JVM we can let cjmx describe an available MBean. With auto-completion we can just start typing describe '[TAB] to retrieve a list of all available packages:

> describe '
:                     JMImplementation:     com.sun.management:   java.lang:            java.nio:             java.util.logging:

This way we can dig through the MBean names until we have found what we are looking for. In this example we are interested in the MBean ‘java.lang:type=OperatingSystem':

> describe 'java.lang:type=OperatingSystem'
Object name: java.lang:type=OperatingSystem
-------------------------------------------
Description: Information on the management interface of the MBean

Attributes:
  MaxFileDescriptorCount: long
  OpenFileDescriptorCount: long
  FreePhysicalMemorySize: long
  CommittedVirtualMemorySize: long
  FreeSwapSpaceSize: long
  ProcessCpuLoad: double
  ProcessCpuTime: long
  SystemCpuLoad: double
  TotalPhysicalMemorySize: long
  TotalSwapSpaceSize: long
  AvailableProcessors: int
  Arch: String
  SystemLoadAverage: double
  Name: String
  Version: String
  ObjectName: ObjectName

As we can see, the MBean ‘java.lang:type=OperatingSystem’ provides information about the number of open files and the current CPU load, etc.. So let’s query the number of open files by invoking the command mbeans with the name of the MBean as well as the sub-command select and the MBean’s attribute:

> mbeans 'java.lang:type=OperatingSystem' select OpenFileDescriptorCount
java.lang:type=OperatingSystem
------------------------------
  OpenFileDescriptorCount: 35

We can even query all available attributes by using the star instead of the concrete name of an attribute. Please note that using the cursor up key recalls the last issued command, hence we do not have to type it again. Instead we just replace the attribute’s name with the star:

> mbeans 'java.lang:type=OperatingSystem' select *
java.lang:type=OperatingSystem
------------------------------
  MaxFileDescriptorCount: 10240
  OpenFileDescriptorCount: 36
...

By using the sub-command invoke we can even invoke MBean methods like in the following example:

> mbeans 'java.lang:type=Memory' invoke gc()
java.lang:type=Memory: null

Now that we know how to query attributes and invoke methods, we can start to script this functionality in order to monitor the application. To support this kind of scripting, cjmx provides the feature that one can pass all “commands” also as an argument to the application itself, hence you can invoke cjmx in the following way (where <PID> has to be replaced by a concrete process id of a running JVM):

java -cp $JAVA_HOME/lib/tools.jar:cjmx_2.10-2.1.0-app.jar cjmx.Main &amp;amp;lt;PID&amp;amp;gt; &amp;amp;quot;mbeans 'java.lang:type=OperatingSystem' select OpenFileDescriptorCount&amp;amp;quot;
java.lang:type=OperatingSystem
------------------------------
  OpenFileDescriptorCount: 630

With this knowledge we can write a simple bash script that queries the JVM each second for the number of open files:

#!/bin/bash
while [ true ] ; do
        echo `date` | tr -d '\n'
        java -cp /usr/java/default/lib/tools.jar:cjmx_2.10-2.1.0-app.jar cjmx.Main $1 &amp;amp;quot;mbeans 'java.lang:type=OperatingSystem' select OpenFileDescriptorCount&amp;amp;quot;|grep OpenFileDescriptorCount|cut -f 2 -d :
		sleep 1
done

This produces each second a new line with a timestamp and the the current number of open files. When redirected into a file, we have a simple log file and can evaluate it later on.

Conclusion: cjmx is a great alternative to JConsole when the latter cannot be used due to network restrictions on a server machine. The ability to even issue commands by passing them on the command line makes it suitable for small monitoring scripts.

Writing an interpreter in Ceylon

Ceylon is a statically typed language for the Java Virtual Machine (JVM) that comes with a powerful set of features that the Java language does not have or is missing (depending on your point of view). However, about three years after the first release and about one year after the version 1.0.0 release it is time to take a closer look.

A good way to get in touch with a new language is to start with a simple project, just for fun. One thing that came into my mind was to implement an interpreter for the programming language Brainfuck. The language consists of only eight simple commands and one instruction pointer. More details about Brainfuck can be found here.

This first surprising point about Ceylon is that we do not have the typically “main” method we know from other C-like languages. We can just implement a “global” method (here we call it “run”):

"Run the module `brainfuck`."
shared void run() {
	String? programFile = process.namedArgumentValue("program");
	if (exists programFile) {
		value bytes = readFile(programFile);
		BrainFuckInterpreter interpreter = BrainFuckInterpreter(bytes);
		interpreter.run();
	} else {
		print("Please provide the argument 'program'.");
	}
}

The string before the method definition is just the short form for the doc(“…”) annotation, the counterpart of the javadoc feature. To format the comments we can use the means of Markdown formatting.

Another surprising fact is the access modifier “shared” that we see in this example. Instead of using public in order to make the method accessible by other code, we use the keyword shared. In Ceylon the visibility hierarchy starts with modules that consist of one or more packages. A package consists of one or more source files, which again can share their elements. Sharing an element from one level of the hierarchy means to make it visible to the next higher level. This way even packages can be invisible to other modules, as we see in the following snippet from the package.celyon file, which is placed within each package and which declares in our case the package brainfuck as shared:

shared package brainfuck;

By the way: The package declarations that all Java classes start with are not necessary in Ceylon. Instead the package name is derived from the names of the folders the source file resides in, hence the redundant information can be left out. But let’s go back to the run() method above. As we do not have a special method which gets the command line arguments from the runtime environment passed in, we ask the global process instance for the named argument “program”. The returned String is an “optional” String, i.e. the value may be or may not be set. In Java that would be modelled by a String reference that may be null. But in order to circumvent possible NullPointerExceptions, the compiler ensures that you can only access the value of an “optional” references when you have checked them using the exists keyword.

Now that we have retrieved the path to our brainfuck “code”, we can read the bytes from the file using the following method:

ArrayList<Byte> readFile(String programFile) {
	value bytes = ArrayList<Byte>();
	print("Reading file '" + programFile + "'.");
	Path path = parsePath(programFile);
	OpenFile openfile = newOpenFile(path.resource);
	ByteBuffer buffer = newByteBuffer(1024);
	variable Integer bytesRead = openfile.read(buffer);
	print("bytesRead=" + bytesRead.string);
	while (bytesRead != -1) {
		buffer.flip();
		for (Byte byte in buffer) {
			bytes.add(byte);
		}
		buffer.clear();
		bytesRead = openfile.read(buffer);
	}
	openfile.close();
	return bytes;
}

As a storage for the bytes read from the program file, we utilize an ArrayList from the Java SDK. In order to do that, we have to import them in our module definition. This one resides in our module.celyon file:

module brainfuck "1.0.0" {
	shared import ceylon.io "1.1.0";
	shared import java.base "8";
}

Next to the name of the module and its version we can import other modules. Line 3 actually imports the java “base” module, i.e. the Java packages java.lang, java.util, java.io, java.net, java.text, NIO and security. Now that we have imported the java.base module, we can import the ArrayList at the beginning of our source file:

import java.util {
	ArrayList
}

Interestingly Ceylon emphasizes immutability, hence you can assign a value to every reference per default only once. If you want to change its value, you have to mark the reference as mutable using the keyword variable. You can see this in the code of the readFile() method as the reference to path, openfile and buffer are not marked as variable. In these cases the value is only assigned once. In contrast to that the number of bytes read from the file is of course variable, hence we mark it so.

After having read the code of the brainfuck program into memory, we can start its execution. This is done by creating an instance of the Brainfuck interpreter and calling its run() method:

BrainFuckInterpreter interpreter = BrainFuckInterpreter(bytes);
interpreter.run();

The beginning of this class is shown in the following:

shared class BrainFuckInterpreter(ArrayList<Byte> buffer) {
	ArrayList<Integer> field = ArrayList<Integer>();
	variable Integer ptr = 0;
	
	shared void run() {
		variable Integer bufferPos = 0;
		while (bufferPos >= 0 && bufferPos < buffer.size() - 1) {
			value oldBufferPos = bufferPos;
			value byte = buffer.get(bufferPos);
			switch (byte.unsigned)
			case (43) {
				ensureFieldCapacity();
				value oldValue = field.get(ptr);
				field.set(ptr, oldValue + 1);
			}
			...
			else {
				print("Ignoring char " + byte.string);
			}
			bufferPos++;
		}
		...
	}
	...
}

If you are used to Java, you might be missing the constructor. Where do we assign the passed in argument buffer to the corresponding member variable? The Ceylon feature here is that we can define some of our member variables within the braces after the class name. In our case we define a member with the name buffer of type ArrayList. The constructor we would have to write in Java that assigns the given value to the member variable is obsolete in Ceylon. The compiler creates some kind of “default” constructor that does this job for us. Additional members of the class, that are not passed to the constructor, can be defined as we would do that in Java (see field and ptr in our example). And as long as we do not declare them to be shared, they are only visible to the class members. Defining the “signature” of the constructor already within the class name also means that we can have only one constructor per class. In case we also want to execute additional code during the “construction”, we can put that code into the class body.

Our run() method now iterates of the buffer with the instructions that we have read from the file. Depending on which value the current byte has, we execute the appropriate operation. This is implemented by a switch statement. In contrast to Java we can omit the curly braces around all cases, but we cannot omit the “default” case, which is modelled in Ceylon as “else” clause of the switch statement. In this “else” case we just print the input byte that we are ignoring. Interesting about this point for Java developers is the fact that we cannot just “add” the byte to the previous String, as we would do in Java. Instead we have to access the String representation of the byte by writing byte.string. An implicit invocation of a method like toString() in Java does not exist in Ceylon.

Conclusion: The interpreter does not utilize all language features, but at least it covers a few interesting points. As someone who has worked with Java for years, the Ceylon features seem to tackle many well known issues of the Java language (versioned modules, package visibility, immutability, optional pattern, etc.). Although features like the absence of multiple constructors is a thing you have to get used to, I must admit that I liked to work with Ceylon.

PS: The sources can be found here.

Analysing the performance degradation/improvements of a Java EE application with interceptors

When you are developing a Java EE application with certain performance requirements, you have to verify that these requirements are fulfilled before each release. An Hudson job that nightly executes a bunch of test measurements on some specific hardware platform is what you may think about.

You can check the achieved timings and compare them with the given requirements. If the measured values deviate from the requirements too much, you can either break the build or at least send an email to the team.

But how do you measure the execution time of your code? The very first thought might be to add thousands of insertions of time measuring code into your code base. But this is not only a lot of work, but also has an impact on the performance of your code, as now the time measurements are also executed in production. To get rid of the many insertions you might want to leverage an aspect oriented framework (AOP) that introduces the code for time measurements at compile time. Using this way you have at least two versions of your application: the one with and the one without the additional overhead. To measure performance at some production site still requires a redeployment of your code. And you have to decide which methods you want to observe already at compile time.

Java EE comes therefore with an easy to use alternative: Interceptors. This is were the inversion of control pattern plays out its advantages. As the Application Server invokes your bean methods/web service calls, it is easy for it to intercept these invocations and provide you a way of adding code before and after each invocation.

Using interceptors is then fairly easy. You can either add an annotation to your target method or class that references your interceptor implementation or you can add the interceptor using the deployment descriptor:

@Interceptors(PerformanceInterceptor.class)
public class CustomerService {
...
}

The same information supplied in the deployment descriptor looks like this:

<interceptor-binding>
    <target-name>myapp.CustomerService</target-name>
    <interceptor-class>myapp.PerformanceInterceptor.class</interceptor-class>
</interceptor-binding>

The interceptor itself can be a simple POJO class with a method that is annotated with @AroundInvoke and one argument:

@AroundInvoke
public Object measureExecutionTime(InvocationContext ctx) throws Exception {
	long start = System.currentTimeMillis();
	try {
		return ctx.proceed();
	} finally {
		long time = System.currentTimeMillis() - start;
		Method method = ctx.getMethod();
		RingStorage ringStorage = RingStorageFactory.getRingStorage(method);
		ringStorage.addMeasurement(time);
	}
}

Before the try block and in the finally block we add our code for the time measurement. As can be seen from the code above, we also need some in-memory location where we can store the last measurement values in order to compute for example a mean value and the deviation from the mean value. In this example we have a simple ring storage implementation that overrides old values after some time.

But how to expose these values to the world outside? As many other values of the Application Server are exposed over the JMX interface, we can implement a simple MXBean interface like shown in the following code snippet:

public interface PerformanceResourceMXBean {
    long getMeanValue();
}

public class RingStorage implements PerformanceResourceMXBean {
	private String id;

	public RingStorage(String id) {
		this.id = id;
		registerMBean();
		...
	}
	
	private void registerMBean() {
        try {
            ObjectName objectName = new ObjectName("performance" + id + ":type=" + this.getClass().getName());
            MBeanServer platformMBeanServer = ManagementFactory.getPlatformMBeanServer();
            try {
                platformMBeanServer.unregisterMBean(objectName);
            } catch (Exception e) {
            }
            platformMBeanServer.registerMBean(this, objectName);
        } catch (Exception e) {
            throw new IllegalStateException("Problem during registration:" + e);
        }
    }
	
	@Override
    public long getMeanValue() {
        ...
    }
	...
}

Now we can start the jconsole and query the exposed MXBean for the mean value:

jconsole

Writing a small JMX client application that writes the sampled values for example into a CSV file, enables you to later on process these values and compare them with later measurements. This gives you an overview of the evolution of your application’s performance.

Conclusion: Adding dynamically through the deployment descriptor performance measurement capabilities to an existing Java EE application is with the use of interceptors easy. If you expose the measured values over JMX, you can apply further processing of the values afterwards.

Using @NamedEntityGraph to load JPA entities more selectively in N+1 scenarios

The N+1 problem is a common issue when working with ORM solutions. It happens when you set the fetchType for some @OneToMany relation to lazy, in order to load the child entities only when the Set/List is accessed. Let’s assume we have a Customer entity with two relations: a set of orders and a set of addresses for each customer.

@OneToMany(mappedBy = "customer", cascade = CascadeType.ALL, fetch = FetchType.LAZY)
private Set<OrderEntity> orders;

@OneToMany(mappedBy = "customer", cascade = CascadeType.ALL, fetch = FetchType.LAZY)
private Set<AddressEntity> addresses;

To load all customers, we can issue the following JPQL statement and afterwards load all orders for each customer:

List<CustomerEntity> resultList = entityManager.createQuery("SELECT c FROM CustomerEntity AS c", CustomerEntity.class).getResultList();
for(CustomerEntity customerEntity : resultList) {
    Set<OrderEntity> orders = customerEntity.getOrders();
    for(OrderEntity orderEntity : orders) {
	...
    }
}

Hibernate 4.3.5 (as shipped with JBoss AS Wildfly 8.1.0CR2) will generate the following series of SQL statements out of it for only two(!) customers in the database:

Hibernate: 
     select
         customeren0_.id as id1_1_,
         customeren0_.name as name2_1_,
         customeren0_.numberOfPurchases as numberOf3_1_ 
     from
         CustomerEntity customeren0_
Hibernate: 
     select
         orders0_.CUSTOMER_ID as CUSTOMER4_1_0_,
         orders0_.id as id1_2_0_,
         orders0_.id as id1_2_1_,
         orders0_.campaignId as campaign2_2_1_,
         orders0_.CUSTOMER_ID as CUSTOMER4_2_1_,
         orders0_.timestamp as timestam3_2_1_ 
     from
         OrderEntity orders0_ 
     where
         orders0_.CUSTOMER_ID=?
Hibernate: 
     select
         orders0_.CUSTOMER_ID as CUSTOMER4_1_0_,
         orders0_.id as id1_2_0_,
         orders0_.id as id1_2_1_,
         orders0_.campaignId as campaign2_2_1_,
         orders0_.CUSTOMER_ID as CUSTOMER4_2_1_,
         orders0_.timestamp as timestam3_2_1_ 
     from
         OrderEntity orders0_ 
     where
         orders0_.CUSTOMER_ID=?

As we can see, the first query selects all customers from the table CustomerEntity. The following two selects fetch then the orders for each customer we have loaded in the first query. When we have 100 customers instead of two, we will get 101 queries. One initial query to load all customers and then for each of the 100 customers an additional query for the orders. That is the reason why this problem is called N+1.

A common idiom to solve this problem is to force the ORM to generate an inner join query. In JPQL this can be done by using the JOIN FETCH clause like demonstrated in the following code snippet:

entityManager.createQuery("SELECT c FROM CustomerEntity AS c JOIN FETCH c.orders AS o", CustomerEntity.class).getResultList();

As expected the ORM now generates an inner join with the OrderEntity table and therewith only needs one SQL statement to load all data:

select
    customeren0_.id as id1_0_0_,
    orders1_.id as id1_1_1_,
    customeren0_.name as name2_0_0_,
    orders1_.campaignId as campaign2_1_1_,
    orders1_.CUSTOMER_ID as CUSTOMER4_1_1_,
    orders1_.timestamp as timestam3_1_1_,
    orders1_.CUSTOMER_ID as CUSTOMER4_0_0__,
    orders1_.id as id1_1_0__
from
    CustomerEntity customeren0_
inner join
    OrderEntity orders1_
        on customeren0_.id=orders1_.CUSTOMER_ID

In situations where you know that you will have to load all orders for each customer, the JOIN FETCH clause minimizes the number of SQL statements from N+1 to 1. This comes of course with the drawback that you now transfer for all orders of one customer the customer data again and again (due to the additional customer columns in the query).

The JPA specification introduces with version 2.1 so called NamedEntityGraphs. This annotation lets you describe the graph a JPQL query should load in more detail than a JOIN FETCH clause can do and therewith is another solution to the N+1 problem. The following example demonstrates a NamedEntityGraph for our customer entity that is supposed to load only the name of the customer and its orders. The orders are described in the subgraph ordersGraph in more detail. Here we see that we only want to load the fields id and campaignId of the order.

@NamedEntityGraph(
        name = "CustomersWithOrderId",
        attributeNodes = {
                @NamedAttributeNode(value = "name"),
                @NamedAttributeNode(value = "orders", subgraph = "ordersGraph")
        },
        subgraphs = {
                @NamedSubgraph(
                        name = "ordersGraph",
                        attributeNodes = {
                                @NamedAttributeNode(value = "id"),
                                @NamedAttributeNode(value = "campaignId")
                        }
                )
        }
)

The NamedEntityGraph is given as a hint to the JPQL query, after it has been loaded via EntityManager using its name:

EntityGraph entityGraph = entityManager.getEntityGraph("CustomersWithOrderId");
entityManager.createQuery("SELECT c FROM CustomerEntity AS c", CustomerEntity.class).setHint("javax.persistence.fetchgraph", entityGraph).getResultList();

Hibernate supports the @NamedEntityGraph annotation since version 4.3.0.CR1 and creates the following SQL statement for the JPQL query shown above:

Hibernate: 
    select
        customeren0_.id as id1_1_0_,
        orders1_.id as id1_2_1_,
        customeren0_.name as name2_1_0_,
        customeren0_.numberOfPurchases as numberOf3_1_0_,
        orders1_.campaignId as campaign2_2_1_,
        orders1_.CUSTOMER_ID as CUSTOMER4_2_1_,
        orders1_.timestamp as timestam3_2_1_,
        orders1_.CUSTOMER_ID as CUSTOMER4_1_0__,
        orders1_.id as id1_2_0__ 
    from
        CustomerEntity customeren0_ 
    left outer join
        OrderEntity orders1_ 
            on customeren0_.id=orders1_.CUSTOMER_ID

We see that Hibernate does not issue N+1 queries but that instead the @NamedEntityGraph annotation has forced Hibernate to load the orders per left outer join. This is of course a subtle difference to the FETCH JOIN clause, where Hibernate created an inner join. The left outer join would also load customers for which no order exists in contrast to the FETCH JOIN clause, where we would only load customers that have at least one order.

Interestingly is also that Hibernate loads more than the specified attributes for the tables CustomerEntity and OrderEntity. As this conflicts with the specification of @NamedEntityGraph (section 3.7.4) I have created an JIRA issue for that.

Conclusion: We have seen that with JPA 2.1 we have two solutions for the N+1 problem: We can either use the FETCH JOIN clause to eagerly fetch a @OneToMany relation, which results in an inner join, or we can use @NamedEntityGraph feature that lets us specify which @OneToMany relation to load via left outer join.

Tracing SQL statements in JBoss AS 7 using a custom logging handler

Using an ORM to abstract from your specific database and to let it create and issue all the SQL statements you would have to write by hand yourself seems handy. This is what made ORM solutions popular.

But it also comes with a downside: As the ORM does a lot of work for you, you lose to some degree control over the generated SQL and you have to rely on the ORM to create a high-performance statement for you. But it can happen that the SQL generated by the ORM is not what you might have written by hand and expected the ORM to do for you. In this case you have to get back control over the SQL and put your hands on the code again.

In huge applications this task is not as trivial, as there might be hundreds of statements issued to the database that stem from hundreds of lines of Java code that makes heavy usage of JPA features. Tracing the SQL statement that your database profiling tool has identified as problematic down to the actual code line becomes tedious.

We know that we can enable SQL statement logging for Hibernate with the following two lines in our persistence.xml:

<property name="hibernate.show_sql" value="true"/>
<property name="hibernate.format_sql" value="true"/>

But this will only output the already generated SQL; the actual Java code line is still not visible. For smaller applications it might be feasible to attach a debugger to the application server and debug through the code until you have found the line that logs the problematic SQL statement, but for bigger applications this is time consuming.

As Hibernate itself does not provide any means of intercepting the logging and enhance it with more information, we will have to do this on our own. The JBoss documentation indicates that it is possible to write your own custom logging handler. As this logging handler receives all the logging messages and therewith also the messages produces by Hibernate with enabled SQL logging, we can try to find the line we are looking for and then output a stack trace to our own log file.

Writing a custom logging handler turns out to be very simple. All you have to do is setup a small project with a class that extends the class Handler from the JDK package java.util.logging:

package mypackage;

import java.util.logging.Handler;
import java.util.logging.LogRecord;

public class MyJBossLogger extends Handler {

	@Override
	public void publish(LogRecord record) {
	
	}
	
	@Override
	public void flush() {
	}
	
	@Override
	public void close() throws SecurityException {

	}
}

The publish() method receives all logging output in form of an instance of LogRecord. Its method getMessage() lets us access the output directly. Hence we can match this message against some keywords we have loaded from some configuration file:

@Override
public void publish(LogRecord record) {
	String message = record.getMessage();
	buffer.add(message + "\n");
	if (keywords == null) {
		keywords = loadKeywords();
	}
	if (matches(message, keywords)) {
		String stacktrace = "\nStacktrace:\n";
		StackTraceElement[] stackTrace = Thread.currentThread().getStackTrace();
		for (StackTraceElement element : stackTrace) {
			stacktrace += element.toString() + "\n";
		}
		buffer.add(stacktrace);
		flush();
	}
}

Buffer is here some simple data structure (e.g. guava’s EvictingQueue) that buffers the last lines, as the method publish() is called for each line(!) of output. As a complete SQL statement spans more than one line, we have to remember a couple of them. Next to the buffered lines and the current line we also output a String representation of the current stack trace. This tells us later in the log file from where we are called and therewith which line of Java code in our project causes the current statement.

Once we have compiled the project we can copy the resulting jar file to the newly created folder structure under: $JBOSS_HOME/modules/system/layers/base/com/mydomain/mymodule/main (for JBoss AS 7.2). In order to tell JBoss AS about our new module, we have to create a XML file called module.xml with the following content:

<?xml version="1.0" encoding="UTF-8"?>
<module xmlns="urn:jboss:module:1.1" name="com.mydomain.mymodule">
	<resources>
		<resource-root path="MyJBossLogger-0.0.1-SNAPSHOT.jar"/>
	</resources>
</module>

The name of the module corresponds to the path within the JBoss modules folder. It will also be used in the configuration file to configure our custom logging handler:

...
<subsystem xmlns="urn:jboss:domain:logging:1.2">
	<custom-handler name="CUSTOM" module="com.mydomain.mymodule" class="com.mydomain.mymodule.MyJBossLogger">
		<level name="DEBUG"/>
	</custom-handler>
	...

When we implement the flush() method of our logging handler to write the output to some log file, we will see something like the following (of course in condensed form):

Hibernate:     select ... from customer ...
Stacktrace:
java.lang.Thread.getStackTrace(Thread.java:1568)
com.mydomain.mymodule.MyJBossLogger.publish(MyJBossLogger.java:20)
org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:292)
org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:300)
org.jboss.logmanager.Logger.logRaw(Logger.java:721)
org.jboss.logmanager.Logger.log(Logger.java:506)
...
com.mydomain.myapp.ArticleEntity.getCustomers(ArticleRepository.java:234)
...

Here we can see clearly which OneToMany relation causes the problematic select statement we were looking for.

Conclusion: Using a custom logging handler to inject the current stack trace into the logging of the SQL statements may help you when you want to find the
exact location in the source code where a concrete query is issued. It turned also out that writing your own custom logging handler for JBoss AS is also a straight forward task.

API design and performance

When you design a new API you have to take a lot of decisions. These decisions are based on a number of design principles. Joshua Bloch has summarized some of them in his presentation “How to Design a Good API and Why it Matters”. The main principles he mentions are:

  • Easy to learn
  • Easy to use
  • Hard to misuse
  • Easy to read and maintain code that uses it
  • Sufficiently powerful to satisfy requirements
  • Easy to extend
  • Appropriate to audience

As we see from the list above, Joshua Bloch puts his emphasis on readability and usage. A point that is completely missing from this listing is performance. But can performance impact your design decisions at all?

To answer this question let’s try to design a simple use case in form of an API and measure its performance. Then we can have a look at the results and decide whether performance considerations have an impact on the API or not. As an example we take the classic use case of loading a list of customers from some service/storage. What we also want to consider is the fact that not all users are allowed to perform this operation. Hence we will have to implement some kind of permission check. To implement this check and to return this information back to the caller, we have multiple ways to do so. The first try would look like this one:

	List<Customer> loadCustomersWithException() throws PermissionDeniedException

Here we model an explicit exception for the case the caller has not the right to retrieve the list of customers. The method returns a list of Customer objects while we assume that the user can be retrieved from some container or ThreadLocal implementation and has not to be passed to each method.
The method signature above is easy to use and hard to misuse. Code that uses this method is also easy to read:

	try {
		List<Customer> customerList = api.loadCustomersWithException();
		doSomething(customerList);
	} catch (PermissionDeniedException e) {
		handleException();
	}

The reader immediately sees that a list of Customers is loaded and that we perform some follow-up action only in case we don’t get a PermissionDeniedException. But in terms of performance exceptions do cost some CPU time as the JVM has to stop the normal code execution and walk up the stack to find the position where the execution has to be continued. This is also extremely hard if we consider the architecture of modern processors with their eager execution of code sequences in pipelines. So would it be better in terms of performance to introduce another way of informing the caller about the missing permission?

The first idea would be to create another method in order to check the permission before calling the method that eventually throws an exception. The caller code would then look like this:

	if(api.hasPermissionToLoadCustomers()) {
		List<Customer> customerList = api.loadCustomers();
		doSomething(customerList);
	}

The code is still readable, but we have introduced another method call that also costs runtime. But now we are sure that the exception won’t be thrown; hence we can omit the try/catch block. This code now violates the principle “Easy to use”, as we now have to invoke two methods for one use case instead of one. You have to pay attention not to forget the additional call for each retrieval operation. With regard to the whole project, your code will be cluttered with hundreds of permission checks.

Another idea to overcome the exception is to provide an empty list to the API call and let the implementation fill it. The return value can then be a boolean value indicating if the user has had the permission to execute the operation or if the list is empty because no customers have been found. As this sounds like C or C++ programming where the caller manages the memory of the structures that the callees uses, this approach costs the construction of an empty list even if you don’t have a permission to retrieve the list at all:

	List<Customer> customerList = new ArrayList<Customer>();
	boolean hasPermission = api.loadCustomersWithListAsParameter(customerList);
	if(hasPermission) {
		doSomething(customerList);
	}

One last approach to solve the problem to return two pieces of information to the caller would be the introduction of a new class that holds next to the returned list of Customers also a boolean flag indicating if the user has had the permission to perform this operation:

	CustomerList customerList = api.loadCustomersWithReturnClass();
	if(customerList.isUserHadPermission()) {
		doSomething(customerList.getCustomerList());
	}

Again we have to create additional objects that cost memory and performance, and we also have to deal with an additional class that has nothing more to do than to serve as a simple data holder to provide the two pieces of information. Although this approach is again easy to use and creates readable code, it creates additional overhead in order to maintain the separate class and has some kind of awkward means to indicate that an empty list is empty because of the missing permission.

After having introduced these different approaches it is now time to measure their performance, one time for the case the caller has the permission and one time for the case the caller does not have the necessary permission. The results in the following table are shown for the first case with 1.000.000 repetitions:

Measurement Time[ms]
testLoadCustomersWithExceptionWithPermission 33
testLoadCustomersWithExceptionAndCheckWithPermission 34
testLoadCustomersWithReturnClassWithPermission 41
testLoadCustomersWithListAsParameterWithPermission 66

As we have expected before, the two approaches that introduce an additional class respectively pass an empty list cost more performance than the approaches that use an exception. Even the approach that uses a dedicated method call to check for the permission is not much slower than the one without it.
The following table now shows the results for the case where the caller does not have the permission to retrieve the list:

Measurement Time[ms]
testLoadCustomersWithExceptionNoPermission 1187
testLoadCustomersWithExceptionAndCheckNoPermission 5
testLoadCustomersWithReturnClassNoPermission 4
testLoadCustomersWithListAsParameterNoPermission 5

Not surprisingly the approach where a dedicated exception is thrown is much slower than the other approaches. The magnitude of this impact is much higher than one would expect before. But from the table above we already know the solution for this case: Just introduce another method that can be used to check for the permission ahead, in case you expect a lot of permission denied use cases. The huge difference in runtime between the with and without permission use cases can be explained by the fact that I have returned an ArrayList with one Customer object in case the caller was in possession of the permission; hence the loadCustomer() calls where a bit more expensive than in case the user did not possess this permission.

Conclusion: When performance is a critical factor, you also have to consider it when designing a new API. As we have seen from the measurements above, this may lead to solutions that violate common principles of API design like “easy to use” and “hard to misuse”.

JSON Processing (JSON-P)

JSON (JavaScript Object Notation) is a compact text file format that can be used to store and transfer data. It has become very popular over the years as it is very simple to read and parse. Beyond that each JSON construct should be valid JavaScript and it should be possible to evaluate it using JavaScript’s eval() function. The latter has made it popular in the web developer community as JSON data received from the server can be easily evaluated on the client side.
The popularity of JSON has created the demand to use it also on the server-side. Over the time different Java libraries have been developed that support reading and writing JSON data (e.g. google-gson, flexjson or Jackson). Hence it was only a question of time until JSON processing becomes a part of the Java EE specification.
In the last article I have thrown light on the new concurrency features of Java EE 7. In this article we want to explore the new feature JSON Processing.
After these introductory words it is time to put our hands on some code. The following code shows how to write a very simple JSON file:

JsonObject model = Json.createObjectBuilder()
		.add("firstName", "Martin")
		.add("phoneNumbers", Json.createArrayBuilder()
				.add(Json.createObjectBuilder()
						.add("mobile", "1234 56789"))
				.add(Json.createObjectBuilder()
						.add("home", "2345 67890")))
		.build();
try (JsonWriter jsonWriter = Json.createWriter(new FileWriter(Paths.get(System.getProperty("user.dir"), "target/myData.json").toString()))) {
	jsonWriter.write(model);
} catch (IOException e) {
	LOGGER.severe("Failed to create file: " + e.getMessage());
}

The builder API makes it easy to append the next call to the previous one. I still need to get used to the fact that you have to call createArrayBuilder() or createObjectBuilder() everytime you want to create an array or an object.
When reading a JSON file you have to determine the type you are currently reading by calling getValueType():

try (JsonReader jsonReader = Json.createReader(new FileReader(Paths.get(System.getProperty("user.dir"), "target/myData.json").toString()))) {
	JsonStructure jsonStructure = jsonReader.read();
	JsonValue.ValueType valueType = jsonStructure.getValueType();
	if (valueType == JsonValue.ValueType.OBJECT) {
		JsonObject jsonObject = (JsonObject) jsonStructure;
		JsonValue firstName = jsonObject.get("firstName");
		...

Functionality to map a complete JSON file to an existing structure of Java classes (someting like JAX-B for XML files) is not part of the standard. But instead of that JSON-P offers a complete streaming API for reading huge amount of JSON data:

try (FileWriter writer = new FileWriter(Paths.get(System.getProperty("user.dir"), "target/myStream.json").toString())) {
	JsonGenerator gen = Json.createGenerator(writer);
	gen
	.writeStartObject()
		.write("firstName", "Martin")
		.writeStartArray("phoneNumbers")
			.writeStartObject()
				.write("mobile", 123456789)
				.write("home", "2345 67890")
			.writeEnd()
		.writeEnd()
	.writeEnd();
	gen.close();
} catch (IOException e) {
	LOGGER.severe("Failed to write to file: " + e.getMessage());
}

The code for the streaming API reads very similar to the normal API. The main difference is that you have to mark the end of an array or object with writeEnd().
Reading a JSON file looks very similar to the [Streaming API for XML (StAX)](http://stax.codehaus.org/Home):

try (FileReader fileReader = new FileReader(Paths.get(System.getProperty("user.dir"), "target/myStream.json").toString())) {
	JsonParser parser = Json.createParser(fileReader);
	while (parser.hasNext()) {
		JsonParser.Event event = parser.next();
		switch (event) {
			case START_OBJECT:
				LOGGER.info("{");
				break;
			case END_OBJECT:
				LOGGER.info("}");
				break;
...

Having just said that the above code looks very similar to StAX code, let’s try to compare both implementations by the means of file size and runtime. The following diagram shows the average execution times for writing and reading a JSON/XML file with 100.000 entries.
comparison_diagram
One entry consists of a simple object with a name and an array of two telephone numbers:

{"firstName":"Martin","phoneNumbers":[{"mobile":123456789,"home":"2345 67890"}]}

This is how it looks in XML:

<firstName>Martin</firstName><phoneNumbers><mobile>123456789</mobile><home>234567890</home></phoneNumbers>

It is surprising that the StAX implementation is this much slower when writing big files. When we take a closer look at the file size, we see that the XML file is about 1.31 times bigger than the JSON file, although it contains the same data.

Conclusion: JSON-P provides a standardized API for JSON processing together with a StAX like streaming API. Although it still lacks a binding to a Java class structure, it performs even better than the StAX implementation by a more compact file size.

Source code can be found on my github repository.

PS: The implementations used in the comparison are listed below:

	<dependency>
		<groupId>org.glassfish</groupId>
		<artifactId>javax.json</artifactId>
		<version>1.0</version>
	</dependency>
	<dependency>
		<groupId>stax</groupId>
		<artifactId>stax</artifactId>
		<version>1.2.0</version>
	</dependency>
Follow

Get every new post delivered to your Inbox.