Tuesday, September 18, 2012

Using RavenDB (Part 01 of x)


I've decided I want to learn about RavenDB, which is a NoSql (document) database product built on .NET for .NET. I've been noodling this over in my head for some time now, and have finally decided I'm going to do it. I'll be posting about my progress, and what I've learned as I go along, and will ultimately make the source code/solution available for free. Since I haven't planned out exactly how many posts there will be, I'm using "n of x" in my titles.
 
Having worked with SQL databases for the last 18 years or so (long before MS SQL Server was a gleam in Uncle Bill's eyes), I'm pretty well ensconced in relational design, normalization, and referential integrity. The real sticking point for me is that you essentially throw all that out when dealing with a document database store. I know that you can implement your own integrity constraints in your application, but that's exactly the sort of crappy, bug-producing code you avoid using a modern SQL product. I wasn't willing to give up all that free stuff. 
 
What eventually pushed me over the starting point was my response to a Linked In post where I provided a good example of why a product like Raven would actually provide a lot of benefits (i.e. potential cost reductions) by using a document store rather than a relational store. So the next question is, what to use as a model application? For a while, I thought about getting a copy of Actya, which is a CMS product built using Raven. However, looking at the code, it's more complex than I really want for example purposes, and it seems to have too little activity for me to believe it is still being supported.
 
So I punted... I'm going to do a Blog application. Sigh ... I know. It's not exactly original. It is, however, simple enough to serve as an example. And pondering over what application would be appropriate was actually keeping me from getting started. I just didn't want to spend any more time trying to come up with a better example that was small enough.
 
Raven has pretty useful documentation at their site, so for once, I can recommend that you actually start there for learning about it. They actually have some good examples, as well.
 
For my purposes, since I'll be doing/testing this in a shared hosting environment, I'll be using the embedded server version.
 
One mistake I DON"T want to make is to ignore architecture issues in my "simple" example. The biggest issue I have with a lot of examples (especially MS examples), is the architecture of their samples, doesn't match the architecture they later promote as "best practices" architecture. What happens, is people use the examples as starting points, so they start out with a bad architecture which prevents them from scaling up their example into a larger system.
 
That said, one of my first questions to myself was, Do I need a repository? After some "soul" searching, I decided the answer is no. When you're working with a relational database, the entities, i.e. objects, are typically comprised of many parts taken from multiple tables. If for some reason the storage model changes, then it has direct effect on the entities and their behaviors - and the concomitant requirement to test everything again. So it makes sense from a practical sense to have a "helper" function that is responsible for persisting the entities - it's a useful level of abstraction and separation of concerns. But with a document data store, those concerns don't seem to apply. The object isn't broken up and stored in different parts of the model. It's persisted and restored as a unit. To me, it makes more sense to make the entity itself responsible for persisting and restoring itself. So that's what I'll try.
 
Of course, there's a proviso here. Not all objects should be persisted - primarily because they represent parts (properties) of a document (object). Take a Word document as an example. The document is comprised of many different properties: paragraphs, clauses, characters, etc. Each of these properties can certainly be objects in their own right. However, it's not very practical to store each of them separately. In a relational storage model there would be a HUGE explosion of tables and rows to store/restore a word document in that model. So it makes sense to make a distinction between "persistable" objects, and "non-persistable" or "associative" objects. So that will be reflected in my demo application.
 
Incidentally, after making this decision, I found this article: http://novuscraft.com/blog/ravendb-and-the-repository-pattern
The author came to a similar decision, albeit for slightly different reasons. I did enjoy reading some other articles of his, so you might want to peruse his site.

So, onto developing ...