Archive

Posts Tagged ‘design patterns’

Unit testing the data access layer

September 30th, 2007 7 comments

Unit testing can be difficult sometimes. In some cases, you need to refactor your code-under-test to make it more testable; in other cases, you need to apply some special techniques into your test code so that it can do what you want; and in the worst cases, you can’t think of any such techniques nor are able to refactor the production code and thus, don’t write any test at all. It’s in my experience working with many teams applying unit testing that they often have troubles writing tests for the data access layer (DAL), UI layer, service interface layer, and multi-threading code, mostly because they are not aware of the techniques for doing that. That is not surprisingly difficult to understand though, business logic is much more intuitive and straight-forward to be tested and every single article or book out there will have examples writing test for the business logic firstly.

In this blog entry, I will explore some techniques of testing the data access layer. While it won’t be very comprehensive (I think an entire book can be written just to explore in details all facets of database unit testing), I hope there is enough of topics covered for you to explore more. Read more…

Some comments about Jeff Atwood's thoughts on design patterns

May 9th, 2007 5 comments

I’ve just come across Jeff Atwood’s post about design patterns. A little bit dated, and I hope he has changed his mind since then, because what he said in the post is ridiculous.
Read more…

Categories: OOAD, Read List Tags:

The Legend of Data Persistence – Part 1

February 11th, 2007 5 comments

1. Abstract

Have you ever felt frustrated for having to develop applications whose back-end making use of a Relational Database Management System (RDBMS), such as MS SQL Server, or Oracle?  Do you think it is a pain to write SQL (or stored procedures) to query some data and then manually map the result set to your object model and back?  Great, you have Hibernate, EJB, iBATIS, and Active Record, but do they really really make the work of object-relational mapping (O-R/M) simple enough and completely transparent while imposing no compromises to the richness and expressiveness of the object model?  If O-R/M is such a big problem, why do we not use an Object Database Management System (ODBMS) instead?  And if ODBMS is possible for certain applications that we are developing, which ODBMS implementation can we use at a start?

In this three-part article, I will attempt to provide the answers to all the above questions.  Please note that most of the concepts and tools described in this article will certainly take more than just one or two pages to be fully presented (in fact, 500-page+ books have been written for several of them), thus I will not discuss in dept about any particular concept or tool – instead the aim is to provide a high-level overview of the key points and interested readers are recommended to learn about the specifics via their own research (the References section can serve as a start)

Okay, with that in mind, the contents of the article are organized as follows:

  • In Part 1, I will discuss about the object-relational (O-R) impedance mismatch, its consequences, and ORM tools as a rescue
  • In Part 2, I will introduce the readers to ODBMS, its benefits, and the reasons why it still cannot replace RDBMS
  • In Part 3, I will introduce the readers to DB4O, one of today’s most popular ODBMS implementations

2. The O-R Impedance Mismatch

As OO languages such as Java and C# have become the mainstream programming languages, the O-R impedance mismatch has become among the biggest problems that application developers are facing today.  Of all the troubles caused by the O-R mismatch, the followings are most notorious

  • Representation: in the OO world, classes are, inherently, represented in a nested hierarchical structure (i.e. a Customer object consists of many Order objects which in turn consist of many OrderLine objects and so on), in the RDBMS world, things can only be represented flatly in tables (relations) which consists of multiple rows (records, or tuples) and columns (attributes).  In other words, while classes can be represented in any level of granularity, relational schema is limited to only four primitives: the table, the record, the column, and the cell (intersection between a row and a column).  As a result, the richness of the object model is often compromised (inheritance trees are flattened out, associations are simplified or even removed) for the sake of having it easily mapped to the relational model.  The representational difference between the object and relational worlds is the core of all problems
  • Object Identity: two objects, despite having the exact same attributes (and even referencing to same nested objects), can be separate entities in the OO world because objects are identified based on their location in memory.  On the other hand, there is no way for the RDBMS to distinguish between the two records with the exact same data.  Imagine two exact same records in the DB are loaded intro a result set and mapped to two distinct objects, when these objects are updated and persisted back to the database, the database cannot distinguish which record the updates should go to.  To resolve this, the concept of primary-key, while not necessary in the object world, is introduced in the relational world to help distinguish records within a table
  • Association: while associations can easily be traversed in the OO world using the built-in object referencing mechanism of the host programming language, they are not very straightforward in the flat RDBMS world in which tables can only be linked together using the concept of foreign-key.  To retrieve an associated record in one table for another record in another table, one must use different SQL “join” statements, instead of “object.attribute”.  (To retrieve representation of deeply nested objects, multiple levels of joins are required.)  Finally, while many-to-many relationship (e.g. Singer and Song) can easily be represented in the OO world, you need to have a link table to represent this relationship in the RDBMS world
  • Inheritance: although inheritance can easily be modeled in the OO world (e.g. using the extends keyword in Java), it is much harder to be represented in the RDBMS world, which does not have the concept of “table inheritance”.  Thus, several work-arounds are required to represent inheritance, ranging from complete normalization (aka table-per-concrete-class, which has separate independent tables for all sub-classes of an inheritance hierarchy), or complete denormalization (aka table-per-class-family, which has one big table to contain all the possible attributes of all types in an inheritance tree as well as a “discriminator” column to distinguish among the types), to hybrid solutions (such as table-per-class, which represents each class in an inheritance tree by a table with the children tables linking to the parent tables via the foreign-key mechanism)

3. The O-R Impedance Mismatch’s Consequences and ORM Tools

The most obvious consequence of the O-R impedance mismatch is that developers tempt to create simplistic object model so that the mapping between relational data set into objects (and vice versa) can be done in a straight-forward and less error-prone manner.  In fact, it is not hard to see projects in which domain classes and their attributes are simply one-to-one mappings of the database tables and columns respectively.  And while that does help the data mapping task less painful, it means a huge sacrifice to the richness and expressiveness of the domain model and this in turn affects the maintainability and extensibility of the system.  (The discussion about as to why a simplistic object model negatively affects the ability to be evolved of a system [esp. complex system] will be one of the main topic of my future post[s] about Domain-Driven Design.)

As object-oriented developers are crying for the need of rich domain model, numerous ORM tools are bornt to address it.  Ideally, an ORM tool is expected to1:

  1. Make the mapping between the relational database and the object model as simple and transparent as possible
  2. Minimize the constraints imposed on the object model and the relational database schema and allow them to evolve as independent as possible

Unfortunately, these two goals, in many cases, contradict with each other: the simpler and more transparent the mapping is, the more constraints required for the object model and the schema and vice versa.  For example, Hibernate takes the Data Mapper approach [Fowler, 2002], and bases on the mapping rules defined by developers to dynamically generate SQL statements required for the mapping.  While this means a simple usage and an almost transparent mapping, it does impose many constraints onto the object model (e.g. requires certain collection interfaces to be used for object associations so that dynamic proxies can be injected at runtime) and the database schema (e.g. to represent inheritance).  Like Hibernate, a particular implementation of the JDO specification2 for RDBMS would impose similar constraints onto the object model and relational schema.  On the other hand, iBATIS3 takes a hybrid SQL-map approach which offers a configurable layer of indirection (expressed in SQL and XML) to “map the parameters and results (i.e., the inputs and outputs) of a SQL statement to a class” [Begin et al, 2007].  While iBATIS is very flexible in term of constraints placed onto the model and schema (because developers still take ownership of writing SQL), it requires more work from the application developers than O-R/M solutions like Hibernate.  Next, Active Record, based on the power of the Ruby programming language and implementing the Active Record pattern [Fowler, 2002], while requires the application developers to write the least amount of data persistence code (in comparison with other full-scaled O-R/M solutions such as Hibernate), it does impose a lot of constraints onto the domain model and database schema, especially by many conventions serving as implicit contract between application developers and the framework (so that no XML configuration file or annotation is necessary).  And finally, it’s worth mentioning about the once-considered a silver bullet EJB 2.x, which is not only hard to use but also significantly pollutes the domain model with all kinds of interfaces and conventions.  As a result, until today there is still no O-R/M tool which can completely resolve those two contradictory goals and really make the developers’ lives as easy as they should be…

That goes back to the question that if O-R/M is such a big problem, why do we not use an ODBMS instead?  That will be the topic of the second part of this article, in which I will introduce the readers to the concept of ODBMS, its benefits, as well as the reasons why RDBMS, despite all of the problems it causes to the object world, will still be there to live.

Design patterns: signs of languages' weaknesses?

January 31st, 2007 8 comments

A nice post by Mark Dominus about design patterns. I include the post here in case the link is modified. You should also read the response by Ralph Johnson and Mark’s follow-up. Read more…