Verify HTML documents in junit tests with jsoup
Assume that you are developing an application that creates some kind of fancy HTML report for its users. When it comes down to writing your unit tests, you have two choices:
- You test the generated report against a complete report prepared beforehand.
- You parse the HTML document and test parts of it separately.
The first choice seems to be simple at first glance, because you have manually validated that the prepared report is correct. Writing such kind of tests is also easy as it boils down to the following pattern:
String preparedReport = loadReportFromSomeWhere(); assertThat(generatedReport, is(preparedReport));
But what happens when you change a small part of the report generating code? You will have to change probably some or even all of the prepared reports. Hence the second choice is in these cases the better one, as you only have to adjust the test cases that are affected (and that you would have to change anyhow).
To demonstrate how jsoup can be used, we assume that our application has a simple
HtmlReport class that can be used to create a valid HTML document using the builder pattern (https://en.wikipedia.org/wiki/Builder_pattern):
String html = HtmlReport.create() .addHeader1("title", "Testing HTML documents with jsoup") .addSection("intro", "This section explains what the text is all about.") .addHeader2("jsoup", "jsoup in a nutshell") .addSection("pjsopu", "This section explains jsoup in detail.") .addList("jsoup_adv", Arrays.asList("find data using CSS selectors", "manipulate HTML elements")) .build();
To keep it simple, the report just consists of a header element (
h1) followed by a section (
p) and a paragraph with a header
h2 that contains an HTML list (
ul). The first argument to each method is the id of the HTML element. This way we can use it later on to address exactly the element we want and beyond that support the formatting of all elements (the CSS designer will love us).
The first thing we want to test is that the document contains an
h2 element with id “title”:
<h1 id="title">Testing HTML documents with jsoup</h1>
Using jsoup this verification becomes a two liner:
Document doc = Jsoup.parse(html); assertThat(doc.select("h1#title").text(), is("Testing HTML documents with jsoup"));
While we let jsoup parse the document in the first line, we can use the provided method
select() to query for the element using the selector
h1#title, i.e. we are asking for an
h1 element with id
title. The same way we can assure that we have a paragraph with the correct content:
assertThat(doc.select("p#intro").text(), is("This section explains what the text is all about."));
A little bit more tricky is to verify that the list with id
jsoup_adv is written in the correct order. For that we have to use the pseudo selector
:eq(n) that allows use to query for a specific index position of a sibling:
assertThat(doc.select("ul#jsoup_adv > li:eq(0)").text(), is("find data using CSS selectors")); assertThat(doc.select("ul#jsoup_adv > li:eq(1)").text(), is("manipulate HTML elements"));
ul#jsoup_adv > li:eq(0) asks for the first (
li elements that is a direct child of an
ul element with id
Beyond that one can even use regular expression to find for example all
h2 elements whose text ends with the string “nutshell”:
Conclusion: Using jsoup for parsing HTML documents in junit tests makes the verification of HTML documents much easier and robust. If one is used to and likes CSS selectors like they are used by jquery, then jsoup is worth a look.