I have worked with so many architects in my career, including those who have the “Architect” word in their business card and those who play architect role in their projects. And while I had good fortune to meet very talented people, I am frequently disappointed by poor architects who put their ego, arrogance, fanaticism (and sometimes, ignorance) before anything else. Recalling the memories I have about the poor architects, I come up with the following grouping. Read more…
Unit testing can be difficult sometimes. In some cases, you need to refactor your code-under-test to make it more testable; in other cases, you need to apply some special techniques into your test code so that it can do what you want; and in the worst cases, you can’t think of any such techniques nor are able to refactor the production code and thus, don’t write any test at all. It’s in my experience working with many teams applying unit testing that they often have troubles writing tests for the data access layer (DAL), UI layer, service interface layer, and multi-threading code, mostly because they are not aware of the techniques for doing that. That is not surprisingly difficult to understand though, business logic is much more intuitive and straight-forward to be tested and every single article or book out there will have examples writing test for the business logic firstly.
In this blog entry, I will explore some techniques of testing the data access layer. While it won’t be very comprehensive (I think an entire book can be written just to explore in details all facets of database unit testing), I hope there is enough of topics covered for you to explore more. Read more…
I can’t tell you enough about how much surprised I was when some developers applying for senior .NET development position, when being interviewed by me, could not answer very fundamental questions about a specific technology or programming language as well as were not aware of any trends in the field. What I found out was that usually this had something to do with their attitude towards “learning on the job”. Read more…
In Part 1 and Part 2 of the article entitled The Legend of Data Persistence, I have introduced you to the O-R impedance mismatch, O-R/M tools, ODBMS and its advantages and disadvantages in comparison with RDBMS. Now, once you have decided that you would use an ODBMS for your project, which ODBMS should you use? This article, as a follow-up of The Legend of Data Persistence, aims to introduce you to one of today’s most popular ODBMS implementations, db4objects (db4o).
In Part 1 of this article, I have discussed about the Object-Relational (O-R) impedance mismatch, the problem it causes as well as the pitfalls of some O-R/M tools. In this part, I will examine Object Database Management System (ODBMS) and compare it with Relational Database Management System (RDBMS).
1. What is ODBMS
Basically, an ODBMS is a DBMS which stores objects as opposed to rows or tuples in respectively a SQL or a RDBMS [Wikipedia, Object Database] and bornt out of the need for transparent and non-intrusive persistence of complex object model, tasks which could not easily be addressed by RDBMS because of the O-R impedance mismatch. As a DBMS, besides being a data respository for storing object graphs (together with their identities, attributes, associations, and inheritance information), an ODBMS would, at the very least, include a query engine, a concurrency management system, and a data recovery mechanism. (The very first effort to define the features of ODBMS was the ODBMS Manifesto first published in 1989 by Malcolm Atkinson et al.)
Before examing the types of ODBMS, it’s worth to learn about the ODMG (Object Data Management Group), which is a standardization committee established in 1991 with the goal of promoting the adoption of ODBMS via the creation of standardized ODBMS specifications. In 1999, the latest version of the ODMG specification (3.0) was released with the four major components:
- Object Model: defines the common data model (which is a common denominator for OO database systems and programming languages) to be supported by all ODMG-compliant ODBMSs. With this common data model, object definitions within object databases can be portable among different applications, programming languages, and platforms
- Object Specification Languages: include Object Defition Language (ODL) and Object Interchange Format (OIF). ODL is used to define the database’ object schema and is equivalent to the Data Definition Language (DDL) in the relational world. On the other hand, OIF is a means to dump and load the object databases’ state to and from files (e.g. XML files) (e.g. to support the exchanging of objects between different object databases)
- Object Query Language: is a query language based on SQL 92 and is equivalent to SQL in the relational world. OQL supports the querying of complex objects, polymorphism and late-binding calls, and is interoperable with specific language bindings
- Language Bindings: written for C++, Smalltalk, and Java and expose a persistence API so that these languages can interact with ODMG-compliant object databases
Despites this standardization effort, as of 2001, there was no ODBMS fully compliant to all ODMG standards [Barry, 2001]. In this same year, the ODMG disbanded as the member companies decided to concentrate their effort on the Java Data Objects specification, which was resulted from the ODMG Java Language Binding submitted to the Java Community Process. In 2006, the Object Management Group (OMG) announced that they would develop a new specifications on the ODMG 3.0 specification and has yet to release any specification since then. As a result, while many standards (including SQL and the mathematics-based relational model) have been consistently adopted by virtually all RDBMS vendors, widely adopted ODBMS standards simply do not exist yet.
2. Types of ODBMS
Depending on how an ODBMS implementation chooses to persist objects, there are two types of ODBMS: non-native and native.
a. Non-native ODBMS
In a non-native object database, there are two separate object models, one of the application and the other of the database itself. ODMG-compliant ODBMSs are examples of non-native ODBMSs since they require a separate data schema to be defined, regardless of the existance of the application object model. In order to query or persist objects from and to non-native object databases, the mapping between these two distinct models must be performed. For ODMG-compliant databases, the schema is defined by the ODL and the application object model can either be generated from that schema or manually written by developers and then modified by a source-code or bytecode/CIL enhancers (as part of the persistent API, such as JDO, for that particular database implementation) to add persistent behaviors (e.g. to make the class an Active Record) and information (e.g. mapping information).
While the separation between the application object model from the database object model gives non-native ODBMS the advantage of having its databases portable across applications, programming languages, and platforms, it is also the source of problems because application developers have to maintain both of these models as the application evolves.
b. Native ODBMS
In native object databases, objects are stored exactly as they are, without the need to map them into a different object model supported by the databases and vice versa. In other words, in the world of native object databases, there is just one single object model: the application object model and thus, unlike non-native ODBMS, no ODL and common object model are necessary. (Note that while no new object model is required, it does not mean that a native ODBMS cannot have its proprietary data format to represent the application object models in the data store.)
The interesting thing is that one can easily implement a simple non-native ODBMS in Java, Ruby or a .NET language using the built-in serialization mechanism which can serialize objects into byte-stream, which can then be stored into a file or sent over the network, and deserialize objects from the same byte-stream. With the serialization infrastructure, no extra work is necessary for storing objects’ attributes, associations, and inheritance information, and thus one will only need to add an object identification mechanism (e.g. assign an OID field to each object, either hand-coding or, more sophisticated, using bytecode/CIL enhancement [no need in Ruby thanks to its "open-class" feature]), a query API (e.g. query-by-example) and a simple concurrency system (assume the database is shared by just one application at a time, the built-in thread locking mechanism is sufficient) in order to have an ODBMS implementation 2.
In contrast with its non-native counterpart, native ODMBS while simplifies the querying and persistence of application object model to the minimum, its databases are not easily portable across applications, programming languages and platforms. In fact, for two or more applications to make use of the same database, they must have the exact same persistence classes bundled with them (same name, same package/namespace, and attributes with their types). It is even harder for applications written in different languages to share the same database file because of the differences in naming conventions and base types (framework classes) 3.
3. ODBMS Versus RDBMS
Having looked at the basic features and types of ODBMS, let’s examine its advantages and disadvantages in comparison to RDBMS
- Rich domain model: since ODBMS can store objects at any level of granularity and has built-in suport for identity, association, and inheritance, OO developers can model their domain classes as richly and expressively as they want without being constraint as with the relational world
- Maintainability: since the application object model and database object model are closely related to each other (in non-native object databases) or even are the exact same model (in native object databa
ses), it takes less effort to maintain these models as the application evolves
- Development effort: the ability of developers to implement rich domain domain with the least maintainance effort would result in a significant reduction in development time and cost.
- Performance: ODBMS is supposed to perform much better than its RDBMS counterpart, regardless of whether O-R/M tools are used or not, in systems with highly complex object model, since no complex queries (e.g. joins) and mapping are required
- Portability: data in RDBMS can be shared by applications written in any paradigm and platform while ODBMS is tied to the OO world. The situation is even worse for native object databases since the data cannot be used by multiple applications with different domain classes, even if they share the same data. As a matter of fact, the clear separation between the relational model and object model further assists portability of RDBMS since these two can evolve independent of each other
- Legacy applications: there are so many applications built with a RDBMS back-end that it is impractical to migrate all these data into ODBMS. In addition, not only the data has to be migrated, the applications which consume the data will also need to be modified to make use of a OO data access mechanism
- Maturity: the ODBMS industry are new (emerged since the 90′s) and thus is far from close to RDBMS world (emerged since the 70′s) in terms of available system vendors (including the compatibility among database systems from different vendors) and tool supports (such as reporting, OLAP, data transformation, and clustering services etc.)
With the above analysis, ODBMS is not so much of a silver-bullet that some people hope for and thus the decision whether to use ODBMS in a project or not must be considered very carefully. However, once we have done the homework and decided that object databases can be used for our applications, then we can sit down and enjoy the huge productivity gain which cannot be achieved if we stick with relational databases. In the final part of this article, we will look at the DB4O object database system. Stay tuned!
1 The very first effort to specify the features of ODBMS was the work of Malcolm Atkinson and others in 1989 with the ODBMS Manifesto
2 This simplistic implementation, besides the lack of functionalities, has several major drawbacks. First, searching of serialized objects requires all objects to be deserialized into memory first before an in-memory search can occur while deserialization is a extremely costly operation. Next, databases created by this implementation are not portable across platforms (e.g. Java to .NET) since the proprietary serialization mechanism of the host programming language is used. Finally, the serialization infrastructure will break down as soon as the object model evolves with new attributes, associations, and data types. An alternative could serialize objects into a custom XML format so that searching can be performed quickly using XPath, schema evolution and portability can be handled via the XML binding layer; but then, this is not simple anymore and it is usually better to make use an existing ODBMS system, like DB4O, instead
3 While these are hard, they are not impossible in native ODBMS such as DB4O which allows developers to change the default mapping rules in code (although awkward and should be avoided as much as possible), which we will see in Part 3 of the article
- Database Systems, Paul Beynon-Davies, Palgrave Macmillan, 2004
- Java Data Object, David Jordan and Craig Russell, O’Reilly, 2003
- [Barry, 2001], ODMG Compliance, Barry et al, Barry & Associates, Inc., 2001
- ODMG 2.0: A Standard for Object Storage, Doug Barry, Component Strategies, 1998
- ODBMS Manifesto, Malcolm Atkinson et al, University of Glasgow, 1995
- ODMG’s website: http://www.odmg.org/
- DB4O’s website: http://www.db4o.com/
- JDO’s website: http://java.sun.com/products/jdo/
Have you ever felt frustrated for having to develop applications whose back-end making use of a Relational Database Management System (RDBMS), such as MS SQL Server, or Oracle? Do you think it is a pain to write SQL (or stored procedures) to query some data and then manually map the result set to your object model and back? Great, you have Hibernate, EJB, iBATIS, and Active Record, but do they really really make the work of object-relational mapping (O-R/M) simple enough and completely transparent while imposing no compromises to the richness and expressiveness of the object model? If O-R/M is such a big problem, why do we not use an Object Database Management System (ODBMS) instead? And if ODBMS is possible for certain applications that we are developing, which ODBMS implementation can we use at a start?
In this three-part article, I will attempt to provide the answers to all the above questions. Please note that most of the concepts and tools described in this article will certainly take more than just one or two pages to be fully presented (in fact, 500-page+ books have been written for several of them), thus I will not discuss in dept about any particular concept or tool – instead the aim is to provide a high-level overview of the key points and interested readers are recommended to learn about the specifics via their own research (the References section can serve as a start)
Okay, with that in mind, the contents of the article are organized as follows:
- In Part 1, I will discuss about the object-relational (O-R) impedance mismatch, its consequences, and ORM tools as a rescue
- In Part 2, I will introduce the readers to ODBMS, its benefits, and the reasons why it still cannot replace RDBMS
- In Part 3, I will introduce the readers to DB4O, one of today’s most popular ODBMS implementations
2. The O-R Impedance Mismatch
As OO languages such as Java and C# have become the mainstream programming languages, the O-R impedance mismatch has become among the biggest problems that application developers are facing today. Of all the troubles caused by the O-R mismatch, the followings are most notorious
- Representation: in the OO world, classes are, inherently, represented in a nested hierarchical structure (i.e. a Customer object consists of many Order objects which in turn consist of many OrderLine objects and so on), in the RDBMS world, things can only be represented flatly in tables (relations) which consists of multiple rows (records, or tuples) and columns (attributes). In other words, while classes can be represented in any level of granularity, relational schema is limited to only four primitives: the table, the record, the column, and the cell (intersection between a row and a column). As a result, the richness of the object model is often compromised (inheritance trees are flattened out, associations are simplified or even removed) for the sake of having it easily mapped to the relational model. The representational difference between the object and relational worlds is the core of all problems
- Object Identity: two objects, despite having the exact same attributes (and even referencing to same nested objects), can be separate entities in the OO world because objects are identified based on their location in memory. On the other hand, there is no way for the RDBMS to distinguish between the two records with the exact same data. Imagine two exact same records in the DB are loaded intro a result set and mapped to two distinct objects, when these objects are updated and persisted back to the database, the database cannot distinguish which record the updates should go to. To resolve this, the concept of primary-key, while not necessary in the object world, is introduced in the relational world to help distinguish records within a table
- Association: while associations can easily be traversed in the OO world using the built-in object referencing mechanism of the host programming language, they are not very straightforward in the flat RDBMS world in which tables can only be linked together using the concept of foreign-key. To retrieve an associated record in one table for another record in another table, one must use different SQL “join” statements, instead of “object.attribute”. (To retrieve representation of deeply nested objects, multiple levels of joins are required.) Finally, while many-to-many relationship (e.g. Singer and Song) can easily be represented in the OO world, you need to have a link table to represent this relationship in the RDBMS world
- Inheritance: although inheritance can easily be modeled in the OO world (e.g. using the extends keyword in Java), it is much harder to be represented in the RDBMS world, which does not have the concept of “table inheritance”. Thus, several work-arounds are required to represent inheritance, ranging from complete normalization (aka table-per-concrete-class, which has separate independent tables for all sub-classes of an inheritance hierarchy), or complete denormalization (aka table-per-class-family, which has one big table to contain all the possible attributes of all types in an inheritance tree as well as a “discriminator” column to distinguish among the types), to hybrid solutions (such as table-per-class, which represents each class in an inheritance tree by a table with the children tables linking to the parent tables via the foreign-key mechanism)
3. The O-R Impedance Mismatch’s Consequences and ORM Tools
The most obvious consequence of the O-R impedance mismatch is that developers tempt to create simplistic object model so that the mapping between relational data set into objects (and vice versa) can be done in a straight-forward and less error-prone manner. In fact, it is not hard to see projects in which domain classes and their attributes are simply one-to-one mappings of the database tables and columns respectively. And while that does help the data mapping task less painful, it means a huge sacrifice to the richness and expressiveness of the domain model and this in turn affects the maintainability and extensibility of the system. (The discussion about as to why a simplistic object model negatively affects the ability to be evolved of a system [esp. complex system] will be one of the main topic of my future post[s] about Domain-Driven Design.)
As object-oriented developers are crying for the need of rich domain model, numerous ORM tools are bornt to address it. Ideally, an ORM tool is expected to1:
- Make the mapping between the relational database and the object model as simple and transparent as possible
- Minimize the constraints imposed on the object model and the relational database schema and allow them to evolve as independent as possible
Unfortunately, these two goals, in many cases, contradict with each other: the simpler and more transparent the mapping is, the more constraints required for the object model and the schema and vice versa. For example, Hibernate takes the Data Mapper approach [Fowler, 2002], and bases on the mapping rules defined by developers to dynamically generate SQL statements required for the mapping. While this means a simple usage and an almost transparent mapping, it does impose many constraints onto the object model (e.g. requires certain collection interfaces to be used for object associations so that dynamic proxies can be injected at runtime) and the database schema (e.g. to represent inheritance). Like Hibernate, a particular implementation of the JDO specification2 for RDBMS would impose similar constraints onto the object model and relational schema. On the other hand, iBATIS3 takes a hybrid SQL-map approach which offers a configurable layer of indirection (expressed in SQL and XML) to “map the parameters and results (i.e., the inputs and outputs) of a SQL statement to a class” [Begin et al, 2007]. While iBATIS is very flexible in term of constraints placed onto the model and schema (because developers still take ownership of writing SQL), it requires more work from the application developers than O-R/M solutions like Hibernate. Next, Active Record, based on the power of the Ruby programming language and implementing the Active Record pattern [Fowler, 2002], while requires the application developers to write the least amount of data persistence code (in comparison with other full-scaled O-R/M solutions such as Hibernate), it does impose a lot of constraints onto the domain model and database schema, especially by many conventions serving as implicit contract between application developers and the framework (so that no XML configuration file or annotation is necessary). And finally, it’s worth mentioning about the once-considered a silver bullet EJB 2.x, which is not only hard to use but also significantly pollutes the domain model with all kinds of interfaces and conventions. As a result, until today there is still no O-R/M tool which can completely resolve those two contradictory goals and really make the developers’ lives as easy as they should be…
That goes back to the question that if O-R/M is such a big problem, why do we not use an ODBMS instead? That will be the topic of the second part of this article, in which I will introduce the readers to the concept of ODBMS, its benefits, as well as the reasons why RDBMS, despite all of the problems it causes to the object world, will still be there to live.
1. The Code is the Design
At university, most of us are taught that the development of a software should go through the following phases: requirement specification, design, construction (or coding), and testing. By gathering system requirements (e.g. from the clients, market researches etc.), analysts would come up with a bunch of functional and supplementary requirement documents, use case model and specifications during the requirement specification phase. Read more…