The Legend of Data Persistence - Part 1

2. The O-R Impedance Mismatch

As OO languages such as Java and C# have become the mainstream programming languages, the O-R impedance mismatch has become among the biggest problems that application developers are facing today.  Of all the troubles caused by the O-R mismatch, the followings are most notorious

  • Representation: in the OO world, classes are, inherently, represented in a nested hierarchical structure (i.e. a Customer object consists of many Order objects which in turn consist of many OrderLine objects and so on), in the RDBMS world, things can only be represented flatly in tables (relations) which consists of multiple rows (records, or tuples) and columns (attributes).  In other words, while classes can be represented in any level of granularity, relational schema is limited to only four primitives: the table, the record, the column, and the cell (intersection between a row and a column).  As a result, the richness of the object model is often compromised (inheritance trees are flattened out, associations are simplified or even removed) for the sake of having it easily mapped to the relational model.  The representational difference between the object and relational worlds is the core of all problems
  • Object Identity: two objects, despite having the exact same attributes (and even referencing to same nested objects), can be separate entities in the OO world because objects are identified based on their location in memory.  On the other hand, there is no way for the RDBMS to distinguish between the two records with the exact same data.  Imagine two exact same records in the DB are loaded intro a result set and mapped to two distinct objects, when these objects are updated and persisted back to the database, the database cannot distinguish which record the updates should go to.  To resolve this, the concept of primary-key, while not necessary in the object world, is introduced in the relational world to help distinguish records within a table
  • Association: while associations can easily be traversed in the OO world using the built-in object referencing mechanism of the host programming language, they are not very straightforward in the flat RDBMS world in which tables can only be linked together using the concept of foreign-key.  To retrieve an associated record in one table for another record in another table, one must use different SQL “join” statements, instead of “object.attribute”.  (To retrieve representation of deeply nested objects, multiple levels of joins are required.)  Finally, while many-to-many relationship (e.g. Singer and Song) can easily be represented in the OO world, you need to have a link table to represent this relationship in the RDBMS world
  • Inheritance: although inheritance can easily be modeled in the OO world (e.g. using the extends keyword in Java), it is much harder to be represented in the RDBMS world, which does not have the concept of “table inheritance”.  Thus, several work-arounds are required to represent inheritance, ranging from complete normalization (aka table-per-concrete-class, which has separate independent tables for all sub-classes of an inheritance hierarchy), or complete denormalization (aka table-per-class-family, which has one big table to contain all the possible attributes of all types in an inheritance tree as well as a “discriminator” column to distinguish among the types), to hybrid solutions (such as table-per-class, which represents each class in an inheritance tree by a table with the children tables linking to the parent tables via the foreign-key mechanism)

 

Pages: 1 2 3 4

7 Comments

German ViscusoFebruary 11th, 2007 at 8:07 pm

Hi!

Very good article! In your intro you say:
“In Part 2, I will introduce the readers to ODBMS, its benefits, and the reasons why it still cannot replace RDBMS”
I think that’s not really accurate since usually an ODBMSs such as db4o is used where an RDBMS doesn’t make sense (it’s a matter of context).
So I think that perhaps it would be better to use something like this:
“In Part 2, I will introduce the readers to ODBMS, its benefits, and the context in which it still cannot replace RDBMS”
What do you think?

Thanks a lot!

German

Buu NguyenFebruary 12th, 2007 at 6:34 pm

Thanks for you comments, German. You’re right that the decision as to whether to use ODBMS or not is context-sensitive. However, currently I am thinking I would discuss this in a general sense (i.e. despites its benefits, why ODBMS cannot replace RDBMS [and that would include the disadvantages of ODBMS]) and let the readers, based on this information, decide whether ODBMS is the right choice for their specific context or not. But I’ll may adjust the statement you mentioned above when Part 2 actually comes out :-).

Nam LeFebruary 14th, 2007 at 12:08 am

Nice post. You have done a very good job on summarizing what a developer/architecture should be awared of when choosing an ORM approach.

However, data persistence has a much broader scope. It always surprises me that people think of RDBMS as the only storage mechanism for data persistence. Sometimes, flat files are far better off in terms of performance, development speed, and flexibility.

Actually, I was so glad that I had made some brilliant decisions on using simple object serialization/deserializing (either binary files or XML files) as a persistence layer instead of RDBMS. Many years in this industry has NOT taught me how to solve the ORM problems effectively and completely, but I do learn something useful: sometimes we’d far better off getting out of the problem than struggling with it.

Surely RDBMS is not always the right answer; however, I am looking foward to your later posts about ODBMS.

Buu NguyenFebruary 14th, 2007 at 2:27 pm

Thanks, Nam. I agree with you that some people tent to think that DB is the only way to handle persistence in the applications and throw out all kinds of excuses (e.g. not scalable, not secured, lack of concurrency control etc.) for not using other mechanisms such as XML/binary serialization or even flat file (even for very small projects). The point is to use the tool that is most appropriate to your problems (but this is often easier said than done :-))

sometimes we’re far better off getting out of the problem than struggling with it.

Nice saying, I like it.

[...] post by Buu Nguyen’s Blog and software by Elliott Back Share and Enjoy:These icons link to social bookmarking sites where [...]

[...] and the SQL statements or stored procedures executed as part of an interaction (and if you use an ORM framework like Hibernate, the complexity of the mapping rules adds up to that [...]

WahooOctober 7th, 2007 at 12:24 am

Thank you for sharing!

Leave a comment

Your comment