Verify HTML documents in junit tests with jsoup

Assume that you are developing an application that creates some kind of fancy HTML report for its users. When it comes down to writing your unit tests, you have two choices:

  • You test the generated report against a complete report prepared beforehand.
  • You parse the HTML document and test parts of it separately.

The first choice seems to be simple at first glance, because you have manually validated that the prepared report is correct. Writing such kind of tests is also easy as it boils down to the following pattern:

String preparedReport = loadReportFromSomeWhere();
assertThat(generatedReport, is(preparedReport));

But what happens when you change a small part of the report generating code? You will have to change probably some or even all of the prepared reports. Hence the second choice is in these cases the better one, as you only have to adjust the test cases that are affected (and that you would have to change anyhow).

Here is the part where jsoup comes in handy. jsoup is a Java library developed for parsing HTML documents, but in contrast to other options for parsing XML like structures it supports CSS selectors like those used in JavaScript libraries like jquery. This way you don’t have to write tons of code in order to verify exactly the part of the report that your current unit test is concerned with.

To demonstrate how jsoup can be used, we assume that our application has a simple HtmlReport class that can be used to create a valid HTML document using the builder pattern (https://en.wikipedia.org/wiki/Builder_pattern):

String html = HtmlReport.create()
	.addHeader1("title", "Testing HTML documents with jsoup")
	.addSection("intro", "This section explains what the text is all about.")
	.addHeader2("jsoup", "jsoup in a nutshell")
	.addSection("pjsopu", "This section explains jsoup in detail.")
	.addList("jsoup_adv", Arrays.asList("find data using CSS selectors", "manipulate HTML elements"))
	.build();

To keep it simple, the report just consists of a header element (h1) followed by a section (p) and a paragraph with a header h2 that contains an HTML list (ul). The first argument to each method is the id of the HTML element. This way we can use it later on to address exactly the element we want and beyond that support the formatting of all elements (the CSS designer will love us).

The first thing we want to test is that the document contains an h2 element with id “title”:

<h1 id="title">Testing HTML documents with jsoup</h1>

Using jsoup this verification becomes a two liner:

Document doc = Jsoup.parse(html);
assertThat(doc.select("h1#title").text(), is("Testing HTML documents with jsoup"));

While we let jsoup parse the document in the first line, we can use the provided method select() to query for the element using the selector h1#title, i.e. we are asking for an h1 element with id title. The same way we can assure that we have a paragraph with the correct content:

assertThat(doc.select("p#intro").text(), is("This section explains what the text is all about."));

A little bit more tricky is to verify that the list with id jsoup_adv is written in the correct order. For that we have to use the pseudo selector :eq(n) that allows use to query for a specific index position of a sibling:

assertThat(doc.select("ul#jsoup_adv > li:eq(0)").text(), is("find data using CSS selectors"));
assertThat(doc.select("ul#jsoup_adv > li:eq(1)").text(), is("manipulate HTML elements"));

The selector ul#jsoup_adv > li:eq(0) asks for the first (:eq(0)) li elements that is a direct child of an ul element with id jsoup_adv.

Beyond that one can even use regular expression to find for example all h2 elements whose text ends with the string “nutshell”:

assertThat(doc.select("h2:matches(.*nutshell$)").size(), is(1));

Conclusion: Using jsoup for parsing HTML documents in junit tests makes the verification of HTML documents much easier and robust. If one is used to and likes CSS selectors like they are used by jquery, then jsoup is worth a look.

Advertisements

Tags: , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: