Views: 6926
Number of votes: 3
Average rating:

Dynamic Data Store LINQ implementation – from the inside

Here I go again, blogging about the dynamic data store. This time I will try to describe how LINQ can be implemented, and how we have implemented it. I think this will make you write better queries against the dynamic data store in the feature.

To start with some basics, dynamic data store has support for making LINQ queries against it, or some anyway. It’s important to know that we do not support all query methods in the .Net framework, but we support the once we think you will get the most use of. Which once we support can you find out here.

LINQ execution methods

When you are creating a LINQ query against a provider, you will work with query data (most often the IQuery interface) and nothing will we executed against the LINQ provider before you call one of the execution methods. This is very good thing if you think about it, because you can play around with your query without going against the LINQ provider (which in our case would include creating a SQL query, execution it against the database, and creating objects from the result).

At the moment, we support two types of executers, Count() and ToList() implementations. This is probably the executors you would like to use in 90% of the cases, but if you only want to receive one object from the store, the executor “FirstOrDefault()” would have been a better method to call. This method will probably be supported in a near feature, but you can not use it right now (an “NotSupportedException” will be thrown). In the end of the article, I will show you workaround that gives you the same results as you would have received by executors like “FirstOrDefault”.

When an execute method gets called, the query will be executed into expression trees. This makes it possible to receive all necessary data from the query by recursion of the expression trees.

Creating SQL string from Expression Tree

One LINQ query will be executed into one or more expression trees. By reading the expression from the top to the bottom, we can create a SQL string that corresponds to the LINQ query. This sounds pretty easy, but if you look into the expression tree model, you will find out that we have to go through the tree a lot of times to find all the small pieces of the LINQ query.

Pre-process an expression

Before we can go through the expression tree and write an SQL query, we have to go through the tree backwards to find parts of the LINQ query that needs to be evaluated and then compile the parts using the Lambda compiler.

To take an example. If we want to query a “Person” store for people which lives in Stockholm, we can write it like:

   1: var query = from person in store.Items<Person>() where person.Address.City == _person[0].Address.City select person;
   2: List<Person> persons = query.ToList();

If you think about it, _person[0].Address.City isn’t something we can put in a SQL query, so we need to get the value from the variable by some form of reflection. When the query has been executed, this variable is included inside an expression, which makes the Lambda compiler a good choose for getting the value from the variable and creating a constant of it instead.

This might sound like a painless operation, but this is actually the most costly operation in the whole process of creating a SQL query from the LINQ query. This is an important thing to have in mind when creating a query.

Process the expression

After we have pre-processed the expression, we can do the actual process and create a SQL query from it. The Expression Tree model can actual be very clean, even if it sometimes give you headache when it by default not handles states of the data (for example, you do not know what’s on the left side of an equal operation when your looking on the right side).

I will not spend you’re time on writing about all methods in the process, but if you're more interested in what we are doing in the code, start the reflector and dig in to it.

Joining stores

One good thing to know about, is how it’s possible to query on complex object in a class. If you have a class called “Person” that contains a property of type “Address”, that has a property of type string called “City”, then you might want to ask for all people that lives in Stockholm.

If you in the “Address” class have an Id property, the “Address” will be created as an separate store, and we will make an connection between the “Person” store and the “Address” store. Now when you are making a query we will find out that you are using a complex object in your query, and therefore we are looking in the database for a store of that type. When we find the type in the database, we will create a join in the SQL query and the database query will be valid.

Query the database

Another thing that can be good to know is that we use parameters for constants, which will make SQL injection a whole lot harder. Anyway, when we have an SQL query generated, it’s time to send it to the database. This is in most cases the most costly operation of the execution, and if you find a query very slow we recommend you to use the SQL Profiler to look at the generated query. Maybe you find something there that we can optimize in the next version.

Creating objects from the result

In most cases we will only receive the Id values from the database (that’s how the query are designed) and from that Id load the objects. This might sound like an expensive way of doing things, but you should also know that we have an cache layer that makes the load operation extremely fast in many cases.

As I wrote, we only receive the Id value in most cases, that means not always. When you are using the “Count” executor we of course uses the Count(*) instead, but there is one more exception.

When you are creating anonym types, you will receive the properties directly from the database and therefore go around the loading part.

Workaround for not supported executors

Now it’s time to show the workarounds for not supported executors. If you think this is good code, you really have to read the next lines carefully:

   1: var result = (from person in store.Items<Person>() 
   2:             select person).ToList().FirstOrDefault();

This peace of code will receive all persons from the database, read them into your memory, and if you are lucky enough to not getting an “OutOfMemoryException”, “LinqToObject” will get you the first object in the memory.

I have already seen some people doing this against the DDS, and someone else will probably do it soon, but not you. So, how should you do it? By using the method “Take” you can make sure to only receive X objects from the database.

First

   1: var result = (from person in store.Items<Person>() 
   2:             select person).Take(1).ToList()

Last

   1: var result = (from person in store.Items<Person>() 
   2:             select person).Reverse.Take(1).ToList()

Single

   1: var result = (from person in store.Items<Person>() 
   2:             select person).Take(2).ToList()
   3:  
   4: if(result.Count > 1)
   5: {
   6:     throw new InvalidOperationException();
   7: }

Summarize

I hope this blog post will make you write good queries against the DDS in the feature. Earlier I posted an extension class that gives you intellisense for the only supported methods in our LINQ provider, so if you still think this is confusing, download it and use it.

Next time I think I will focus on some “don’ts” in the LINQ provider.

Mar 26, 2010

pb
(By pb, 9/21/2010 12:33:13 PM)

Very well written and interesting articles Jonas, looking forward to the next one in your DDS series.

Magnus Rahl
(By Magnus Rahl, 9/21/2010 12:33:13 PM)

Very nice and informative post! I suspect too many developers are unaware of the inner workings of LINQ implementations and how they (if implemented correctly like in DDS) don't necessarily execute step by step "condensating" a collection in each step, but rather build an execution tree which enable them to enumerate over and pick exactly the data requested by the whole query.

Please login to comment.