Home > Java, OOAD, software engineering, Technologies > Unit testing the data access layer

Unit testing the data access layer

September 30th, 2007

Unit testing can be difficult sometimes. In some cases, you need to refactor your code-under-test to make it more testable; in other cases, you need to apply some special techniques into your test code so that it can do what you want; and in the worst cases, you can’t think of any such techniques nor are able to refactor the production code and thus, don’t write any test at all. It’s in my experience working with many teams applying unit testing that they often have troubles writing tests for the data access layer (DAL), UI layer, service interface layer, and multi-threading code, mostly because they are not aware of the techniques for doing that. That is not surprisingly difficult to understand though, business logic is much more intuitive and straight-forward to be tested and every single article or book out there will have examples writing test for the business logic firstly.

In this blog entry, I will explore some techniques of testing the data access layer. While it won’t be very comprehensive (I think an entire book can be written just to explore in details all facets of database unit testing), I hope there is enough of topics covered for you to explore more.



But first, what is DAL?
Everyone has his or her own idea of what DAL is, but for this entry, I will stick with the following definition: DAL is a software module which encapsulates data access logic and publishes interfaces for other software module to interact with the underlying data source(s) at a high-level of abstraction (programming language objects and attributes) instead of data source’s concepts like row, columns, tables etc.

There are many ways of implementing DAL, including Active Record pattern, Data Mapper (or DAO) pattern etc. (see more in [1]), and while the techniques described in this entry will apply to all DAL implementations, the examples assume an implementation of the Data Mapper pattern because this is arguably the most frequently used data access pattern in the Java and .NET spaces.



What to test
Before writing any tests for the DAL, we should consider the following things:

  • Do we want to hit the database at all or should we mock out the database communication API?
  • If we have to hit the database, do we want to hit the real production version of the database or do we want to hit an in-memory lightweight database?

Some people advocate against writing unit tests which hit the database [2] because:

  • A test is not really unit test if it depends on the data logic of the application, the behaviors of the database system and the underlying communication API (JDBC or ADO.NET). (That’s too much for a unit test.)
  • Hitting the database slows down the test running life cycle and thus is not recommended.

According to this camp, in order to test the DAL, we should mock out the interaction with the real database system and only aim to test how the data access class responds to the caller and interacts with the database communication API like ADO.NET or JDBC regardless of how the database system really behaves. Before discussing on whether this is a good idea or not, let’s look at the some code. (Note: To run this code, you need to have the EasyMock library [3].) We basically have a DAO class, JdbcPersonDaoImpl, publishing an interface to persist objects of type Person and using JDBC to communicate with the database system. Our test class, JdbcPersonDaoImplTest, will verify whether the DAO class interact properly with JDBC.


public class JdbcPersonDaoImpl implements PersonDao {

	private DataSource dataSource;

	public void setDataSource(DataSource dataSource) {
		this.dataSource = dataSource;
	}

	public int insert(Person person) {
		try {
			Connection conn = dataSource.getConnection();
			String sql = "INSERT INTO people VALUES (?, ?)";

			// Creates a statement and asks it to return generated Ids
			PreparedStatement stmt = conn.prepareStatement(sql,
				Statement.RETURN_GENERATED_KEYS);
			stmt.setString(1, person.getName());
			stmt.setInt(2, person.getAge());

			// Executes the statement and retrieve the generated Id
			stmt.executeUpdate();
			ResultSet rs = stmt.getGeneratedKeys();
			int personId = rs.getBigDecimal(1).intValue();

			rs.close();
			stmt.close();
			conn.close();
			return personId;
		} catch (SQLException e) {
			throw new RuntimeException(e);
		}
	}
}


public class JdbcPersonDaoImplTest {

	@Test public void testInsertOnePerson() throws Exception {
		final int expectedId = 100;
		final Person person = new Person("John Doe", 30);
		final String sql = "INSERT INTO people (name, age) VALUES (?, ?)";

		// Creates a bunch of mock objects
		DataSource dataSource = createMock(DataSource.class);
		Connection connection = createMock(Connection.class);
		PreparedStatement stmt = createMock(PreparedStatement.class);
		ResultSet rs = createMock(ResultSet.class);

		// Sets expectations
		expect(dataSource.getConnection()).andReturn(connection);
		expect(connection.prepareStatement(sql,
			Statement.RETURN_GENERATED_KEYS)).andReturn(stmt);
		stmt.setString(1, person.getName());
		stmt.setInt(2, person.getAge());

		// Sets expectations
		stmt.executeUpdate());
		expect(stmt.getGeneratedKeys()).andReturn(rs);
		expect(rs.getBigDecimal(1)).andReturn(expectedId);
		rs.close();
		stmt.close();
		connection.close();

		// Turns to "action" mode
		replay(dataSource, connection, stmt, rs);

		// Creates the test fixture and feeds it the mock data source
		PersonDao dao = new JdbcPersonDaoImpl();
		dao.setDataSource(dataSource);

		// Executes and verifies results
		int personId = dao.insert(person);
		assertEquals(expectedId, personId);
		verify(dataSource, connection, stmt, rs);
	}
}

How’s that? The test code is way too complex compared with the production code and yet, the only thing it does is assuring that the production code interacts with the JDBC API properly. (And if you try to write test for a search operation which returns large result set, it is much worse.) I don’t think it is fun nor worth the effort. The complexity of the data layer has little to do with the interaction with the database communication API, but rather with the database schema and the SQL statements or stored procedures executed as part of an interaction (and if you use an ORM framework like Hibernate, the complexity of the mapping rules adds up to that matter).

In fact, I’d rather let my tests hit the real database for the SQL or stored procedures to be executed, or I would not write any test at all. (Of course, you can combine both testing the interactions and testing the real database modification, but as I said, the interaction part is too simple to be wrong while costing so much code.) Those in the real-unit-test-does-not-touch-database camp may refer to this as ‘integration’ test, not unit test, but as long as my test cases are reliable, independent, repeatable, automated, and fast (enough), I don’t care much about the terminology.



The in-memory database
To have your tests run fast, you might not want to test against your production database systems (like Oracle or MS SQL Server). Instead, you may consider an in-memory lightweight database like HSQLDB [4] or Derby [5] (see [6] for a list of in-memory databases). The nice thing about these two databases is that they have both JDBC and Hibernate support, thus it may not take a lot of effort to run your production code against these databases. (That’s not 100% certain though and it depends on what your production database is. You’d better check out the documentation of these databases.) There are a few other things you should consider when choosing this approach:

  • You must make sure your schema is compatible between the test and production database. You will not want to maintain two separate sets of SQL script files. (DRY, everyone.)
  • You should have your tests populate the test database’ schema on the fly to assure the automation and repeatability characteristics of good tests.

The good news is that a tool like Hibernate can populate the database schema automatically and on the fly based on the mapping files. The bad news is that sometimes you have to modify the default Hibernate configuration settings to make sure that the generated schema is compatible with your production database’ schema, otherwise, you may run into problems upon deployment.

If you use Hibernate, then testing against the in-memory database can be as simple as creating a Hibernate configuration file and a simple DAO test class which basically invokes the DAO’s interface and verifies the results (much less simpler compared to our mock-based version) like below:

hibernate.dialect=org.hibernate.dialect.HSQLDialect
hibernate.connection.driver_class=org.hsqldb.jdbcDriver
hibernate.connection.url jdbc:hsqldb:testdb/hsqldb
hibernate.connection.username=sa
hibernate.connection.password=forgetit
hibernate.hbm2ddl.auto=create-drop


@Test public void testFindAllPeople()
		throws Exception {
	// Prepares the test data
	List<Person> peopleList = new ArrayList<Person>();
	peopleList.add(new Person("John Doe", 30));
	peopleList.add(new Person("Jane Doe", 28));

	// Populates the database and flushes
	Session session = HibernateUtil.getSessionFactory().getCurrentSession();
	for (Person p : peopleList) {
		session.save(p);
	}
	session.flush();

	// Verifies the result
	PersonDao dao = new JdbcPersonDaoImpl();
	dao.setDatasource(dataSource);
	assertEquals(peopleList, dao.findAll());
}



The real database
If speed is not a big deal to you and you do not want to mess with another database and get into the compatibility headache, you can always go directly to your production database system. In that case, keep in mind a few things:

  • Each developer should have her own database server on her development machine to avoid messing up with other developers’ tests.
  • If software license is a problem, create a separate database instance for each developer.
  • Separate the DAL test into its own suite so that you can test it as you like and don’t touch it if you are not changing any data-related stuff (DAL code, SQL, stored procedures, database schema etc.) You don’t want the possibly time-consuming unit testing process of the DAL to occur while the only thing you change is a piece of application logic.

Another thing you should consider whenever your tests hit the database, be it in-memory or production database, is that your tests must start the running with a known database state. In other words, either clean up everything before or after a test run and make sure the same data is populated in advance if necessary. And for data population purpose, consider a tool like DbUnit [7].



So, you ask “which techniques should I use?”
I think that it is not worth testing the interaction with the database communication API using mock objects if the interaction is simple (most of the time, it is simple). On the other hand, I would rather pick the test for the real production database system rather than an in-memory database to maximize the reliability of the tests while staying away from the compatibility issues. Besides, there are a few practices, as described above, to alleviate the performance and repeatability problems of hitting the real database. But of course, that’s my choice given the kinds of applications I’m doing. You need to pick the techniques that work best in your environment. And I hope I have given enough introduction for you to explore more and make you call.



References
[1]: Patterns of Enterprise Application Architecture, Martin Fowler, Addison-Wesley, 2002
[2]: http://www.artima.com/weblogs/viewpost.jsp?thread=126923
[3]: http://www.easymock.org
[4]: http://hsqldb.org
[5]: http://db.apache.org/derby
[6]: http://en.wikipedia.org/wiki/In-memory_database
[7]: http://dbunit.sourceforge.net

  1. Robert Murray
    September 30th, 2007 at 22:41 | #1

    You mention DbUnit for data population purposes, but DbUnit also quite capable for results testing also. Example code below (from the docs) – very easy to code and understand, though setup does require some xml tinkering beforehand.

    // Fetch database data after executing your code
    IDataSet databaseDataSet = getConnection().createDataSet();
    ITable actualTable = databaseDataSet.getTable("TABLE_NAME");
    
    // Load expected data from an XML dataset
    IDataSet expectedDataSet = new FlatXmlDataSet(
                           new File("expectedDataSet.xml"));
    ITable expectedTable = expectedDataSet.getTable("TABLE_NAME");
    
    // Assert actual database table match expected table
    Assertion.assertEquals(expectedTable, actualTable);
    
  2. October 1st, 2007 at 11:06 | #2

    Nicely written. I agree with it all 100%. I think the hardest part (with hibernate and hsqldb) is getting data in to the db to start with. I’ve looked at DbUnit a few times but have never ventured into setting it up. It looks like more work than its worth. Although, I must say, the comment above does look simple. Robert, I’m going to give it another whirl. Thanks all.

  3. October 1st, 2007 at 19:13 | #3

    Good article. My thoughts:

    – I greatly prefer using an in-memory database to the mock database. Using the mock database can be a real pain to debug and maintain. With the in-memory database option, it is much easier to replicate both the DDL and the data used by your application. With the mock data set, you sometimes “think” you have it set up the same as your database, but subtle bugs may make that not so.

    – If you are using the Java Persistence API / EJB 3, there are some really cool things that you can do using embedded containers to help simulate the whole transaction. It makes for very easy setup, configuration and execution. Having used these techniques, I could never imagine developing a J2EE app without an embedded container unit test. The presentations are available at http://www.innovationontherun.com/presentations/ and I would be happy to go into more detail if anyone is interested.

  4. James
    October 2nd, 2007 at 01:49 | #4

    I largely agree. I mainly work in an oracle shop. When I test DAO classes, I use a local database. Each dev uses Oracle XE (which doesn’t require any more than 10 minutes to install), and every project maintains an ant sql task to build/drop the database schemas. Script and schema is managed by a DM tool like ER studio. I would rather use the REAL database vendor since any non trivial application will find itself using proprietary sql at some point. Hence moving from in memory to another would be too much trouble, since oracle sql can be non standard.

    For test setup, on a jpa/hibernate project, I find using hibernate itself as a row populator is a nice way to make sure mappings work, besides its incredible payback for taking time to map properly.

    I also use the spring mock test cases, since they make it easy to start a transaction, and rollback at end of test, in addition to caching the jpa configuration, which saves a lot of time.

  5. October 2nd, 2007 at 11:25 | #5

    @Robert: thanks for noticing, it’s indeed another good use of DbUnit.

    @Rob: I agree, hitting the database, either in-memory or production is much better than the mock technique. Also, I have accessed to the links but seem not to understand the approach very well. It would be great if you can provide some more information (or links). Thanks!

    @James: good approach on the Oracle XE. If Hibernate is used for the DAL, then leveraging it as a data populator is an good and efficient approach. Otherwise, DbUnit is good to use.

  6. October 24th, 2007 at 18:57 | #6

    just a link to a book (maybe the one your were thinking of in the post incipit..) :

    Test Driven: TDD and Acceptance TDD for Java Developers, by Lasse Koskela
    http://www.amazon.com/Test-Driven-Acceptance-Java-Developers/dp/1932394850

    it`s chapter on testing a DAL follow the same ideas you pointed out. it`s for Java, but still useful for any other platform (I`m a .NET guy)

    bye
    -papo-

  7. Jaime
    March 13th, 2010 at 06:08 | #7

    mock dataSource, hard stuff!!

Comments are closed.