An Introduction to LINQ to SQL

February 24th, 2009

So we already know that LINQ gives us a way to compose queries against IEnumerable<T> objects. LINQ to SQL gives us a way to describe our database so that we can use LINQ queries against the data and have LINQ to SQL generate the necessary SQL statements to get the results. With LINQ to SQL you can query, insert and delete data all with fully compiled LINQ queries. Although LINQ to SQL will take a back seat to the Entity Framework (I’ll cover that in a future post) according to the ADO.Net team blog’s “Update on LINQ to SQL and LINQ to Entities Roadmap“, LINQ to SQL remains a good simple way to set up a database.

I’ll just be covering the core concepts here. Each of these core concepts will still apply to LINQ to Entities although there are sure to be small differences.

Basic structure

In Visual Studio LINQ to SQL comes with a full designer experience. You’ll find “LINQ to SQL Classes” as an item under “Data” in the Add New Item dialog. Keep in mind that “DataContext” will be added to the end of the name you give. I’ve created many TestDataDataContext classes!

Once this item has been added you will be presented with a designer window. Assuming you are using a SQL Server or SQL Express database you can simply drag tables from the Server Explorer to the designer window (I’ll be discussing the tool sqlmetal.exe in a future post which can be used to get LINQ to SQL working with SQL Server Compact Edition). As you add more tables, the relationships will begin to appear. Dragging all these tables simply generates extra classes under the scenes. There are four important files that are part of the LINQ to SQL classes:

  • TestDataContext.dbml is the core file. It is an XML formatted file that describes the tables and relationships in your Data Context.
  • TestDataContext.dbml.layout is purely for the designer experience so that you can lay out your tables in a way that makes sense to you.
  • TestDataContext.designer.cs is where things get interesting. This is the file that contains all the classes that are generated from your .dbml file. These are partial classes so you can add extra methods and properties in a separate file.
  • TestDataContext.cs is the file where you put your extra methods and properties.

Each of the tables is represented as a class in your object model. The fields of the tables become the properties and relationships are also represented as properties. It is important to remember that the classes represent a single row from the database.

In addition to the classes representing your data model you also get a DataContext class (in this example TestDataContext) which you use to interact with your database. It has a property for each of your tables which are of type System.Data.Linq.Table<T> where T is the name of your class. So if you have a table of Users, you might call the class User (as it represents one user) and your DataContext would have a property Users of type System.Data.Linq.Table<User>.

Initialize your DataContext

All access to your data is done through the DataContext object. Before you can begin you’ll need to create an instance of the class. If you’ve used Visual Studio to connect to SQL Server or SQL Express database you probably have access to the default constructor which will use the connection string from your app.config file. If you’ve used sqlmetal.exe or if you want to manually specify a connection string you can use a simple constructor that just takes the connection string.

In the next examples, we’ll assume we have already initialized the data context using the connection string from app.config like so:

var db = new TestDataContext();

Query the database (SELECT)

Let’s start with a simple query to pull data from the database. Let’s say that we have a really simple blog with the following tables: Entries, Comments and Users. We’ll be using this same basic database through all of the following examples.

Let’s start with the front page. Because this is a simple blog all the posts will stay on the front page until we archive them. So we want to start by finding all the blog posts that haven’t been archived. Let’s start by looking at the raw SQL.

SELECT *
FROM Entries
WHERE e.Archived = 0

That’s a pretty simple query and in LINQ it is also simple:

var frontPage = from e in db.Entries
                where !e.Archived
                select e;

Again, this is just our query definition, this line of code will not execute the query against the database. To actually execute the query we need to do something like this:

foreach (var entry in frontPage)
{
   WriteBlogTitle(entry.Title);
   WriteBlogBody(entry.Body);
   // TODO: Show number of comments
}

Here WriteBlogTitle() and WriteBlogBody() methods are responsible for sending the output to the web browser. Both methods just take a string and are completely unaware of where the data is coming from.

I’ve included a TODO comment suggesting it might be good to show the number of comments for each entry. First I’ll show you how to do this by running an extra query for each entry:

foreach (var entry in frontPage)
{
   WriteBlogTitle(entry.Title);
   WriteBlogBody(entry.Body);
   WriteBlogCommentCount(entry.Comments.Count);
}

Because comments are associated with the entry we can directly access its count. This line will cause another database query so has obvious performance implications (as it needs to hit the database server for every single post). Instead we’ll rewrite the query so we pull everything we need from the database in one query.

var frontPage = from e in db.Entries
                where !e.Archived
                select new {
                              Entry = e,
                              CommentCount = e.Comments.Count
                           };

foreach (var post in frontPage)
{
   WriteBlogTitle(post.Entry.Title);
   WriteBlogBody(post.Entry.Body);
   WriteBlogCommentCount(post.CommentCount)
}

By creating a new anonymous type in the query we send one complex query to the database instead of lots of simple queries.

So what if a user actually wants to view those comments. This time we’ll accept that two database queries is acceptable (it’s a constant). So, let’s see how that might work for a page that displays a specific post.

var post = (from e in db.Entries
            where e.Id == id
            select e).FirstOrDefault();

if (post == null)
{
   // Post could not be found
   throw new ArgumentException();
}

WriteBlogTitle(post.Title);
WriteBlogBody(post.Body);

foreach (var comment in post.Comments)
{
   WriteBlogComment(comment.Comment);
}

It almost seems to easy to believe. Because the relationships are already defined LINQ to SQL is able to generate the second SQL statement to get each of the comments for that post. Most importantly we did all this without any strings!

Inserting data (INSERT)

Right now the blog is pretty boring, it can display posts and comments, but how do the posts and comments get there?  Let’s start with the simple example of creating a post.

var post = new Entry();

post.Title = "My first post";
post.Body = "I love LINQ. It's the best.";

db.Entries.InsertOnSubmit(post);
db.SubmitChanges();

That’s it. The post is now in the database. What about a comment?

var comment = new Comment();

comment.Comment = "I love LINQ more";
comment.Entry = post;

db.Comments.InsertOnSubmit(comment);
db.SubmitChanges();

We’ve been able to link the comment to the post by using the Entry property directly. We haven’t had to worry about how our tables are related in our code.

Deleting data (DELETE)

Did you get a spam comment on your blog? Don’t worry, LINQ to SQL can help you delete it. It follows a similar model to that of inserting data.

db.Comments.DeleteOnSubmit(comment);
db.SubmitChanges();

Updating data (UPDATE)

Maybe you want to start cleaning up the front page because it is starting to get cluttered. We need to update the Archived field on each of the entries we want to archive. To keep things really simple, we’ll archive all the posts that aren’t already archived.

var toArchive = from e in db.Entries
                where !e.Archived
                select e;

foreach (var entry in toArchive)
{
   entry.Archived = true;
}

db.SubmitChanges();

Remember to submit your changes

The key thing to remember when manipulating data with LINQ to SQL is to call SubmitChanges() on your DataContext object. Only then will the SQL statements be generated and executed on the server.

What’s next?

In my next LINQ post I’ll be exploring LINQ to XML.

Visual Studio, .NET and Developer Productivity

February 22nd, 2009

Recently I’ve had to work on porting a .NET app to Ruby. The .NET version consumed WCF web services, exposed WCF web services, parsed XML and relied on multiple threads. The experience reminded me of one thing: the awesomeness of .NET and Visual Studio.

.NET’s documentation is pervasive

.NET seems to have been designed from the start to encourage excellent documentation. With XML documentation all the important parts of a method and a class are able to be encapsulated. Using a tool like GhostDoc (which is free) a skeleton for the documentation can be written for you.

In Visual Studio as you code you are constantly presented with relevant documentation as you code. If you need more help you can press F1 to be taken to the relevant page from the MSDN library.

The .NET Framework is generally well laid out

I’m sure someone can come up with an exception, but generally the .NET framework is well laid out. IO functions can be found in System.IO, Windows form controls sit in System.Windows.Forms. Knowing where to look for things goes a long way to improving the discovery of methods and types leading to a less steep learning curve.

It also serves well for discussions. If you want to working with XML you can be told to check the classes in System.Xml. Want to check out the new LINQ to XML classes, they are sitting in System.Xml.Linq, right under the System.Xml namespace.

The .NET Framework has consistent naming

Classes and method names have consistent case rules that make it easy to work in a case-sensitive environment. Consistent use of prefixes (like ‘I’ for Interface) and suffixes (like ‘Exception’ for exceptions) helps developers identify the purpose of a class without looking deeper. Fortunately Visual Studio makes looking deeper easy. Design guidelines for developing class libraries encourage developers to stick with this consistent approach. Furthermore, tools like StyleCop and FxCop can help make sure your code is consistent.

Even the basic documentation is good

The class documentation on MSDN is generally excellent, but even at its core, just listing the members, constructor overloads and object hierarchy goes a long way to understanding a complex framework. It is well organised (using the namespaces we’ve already discussed) and easy to navigate.

Working with Ruby’s documentation really made this apparent. Whilst the ruby docs can have this information I have seen members not listed in documentation, and the three top panes are almost impossible to navigate (from left to right, a list of source files, a list of all classes and modules, and then a list of all members). These lists get so long that it is difficult to scroll to the right spot and even then it can be hard to pick the item you want.

Visual Studio makes it easy to navigate your code

Enterprise applications are large, so being able to navigate through your own code needs to be as quick and easy as possible. In Visual Studio it is easy to navigate to the declaration of a type or a method. In addition to Visual Studio’s navigation features I also make use of ReSharper (not free, but so worth it) to navigate to a member, a particular file, references, etc. This ease of navigation improves my productivity greatly.

Did I mention how much I love IntelliSense?

Seriously, IntelliSense is the greatest IDE feature ever. It improves the discoverability of classes and methods and reduces errors in code. ReSharper has a handy feature which let’s me take advantage of camel casing too. I can type ‘ArgN’ and still be presented with the option of an ArgumentNullException. This has saved me a lot of time I would normally spend using the dreaded cursor keys to select the type I want.

Simple refactoring made easy

Changing the name of a method or type in Visual Studio is simple. References can be updated to reflect the change. Making a change like this by hand is time consuming and prone to error. Again, tools like ReSharper can take this further.

The joy of compilation

Compilers are awesome. Not only do they now do a lot of type inference magic, but they can help identify many of the common little problems that arise in code such as minor typos, invalid syntax and undeclared variables. The C# compiler generally returns good error messages that help identify problems quickly. Some of the errors I saw while running my Ruby port were reasonable and certainly allowed me to solve the problem, but often I would need to go deep into the code before these problems become evident. I needed to make thorough use of unit tests just to be confident that the code was syntactically valid.

Too much hand holding?

My experience with Ruby has certainly highlighted my reliance on tools like Visual Studio to help me write my code. At the same time I recognised that I was making fewer errors over time writing all my code in a text editor. Perhaps it is beneficial to code in a text editor from time to time, but for anything with more than a few methods or classes you won’t see me giving up Visual Studio (and ReSharper) any time soon.

An Introduction to Lambda Expressions in LINQ

February 21st, 2009

Lambda expressions are a great way to write simple anonymous delegates in a concise way. Of course you aren’t limited to simple functions, you can write a full blown method in lambda syntax.

I’ve already shown some lambda expressions in use when I discussed extensions methods. Here’s the example:

items.Where(item => item.Price < 1).Select(item => item.Name)

There are two lambda expressions in the above example. They are:

  1. item => item.Price < 1
  2. item => item.Name

These are very simple lambda expressions that take one parameter (item) and return a result. The type of the parameter and the the type of the result are inferred by the compiler allowing us to express what clearly without having to decorate it with types. So each of the expressions really means the following:

  1. Take an item and return whether the item’s price is less than one.
  2. Take and item and return the item’s name.

Hopefully you can see the basic pattern here. Take what’s on the left of the lambda operator (=>), use it in the expression on the right and return the result of the expression.

The important thing to remember with lambdas is that they only declare the function. In the example above the lambda expression is executed within the Where and Select methods and is executed once for each item in the enumeration. The Where method uses the result of the lambda expression to determine if the item should be in the resultant enumeration and the Select method returns the result on the lambda expression as the member of the enumeration.

Invoke() made easy

Lambdas aren’t restricted to being used just with LINQ, they can be used anywhere that anonymous delegates can be found. One area I’ve found lambdas increasingly useful is in multi-threaded applications. For example, my Tweet demo uses multiple threads to perform the animation. Consequently I often needed to update the UI from the background thread. Because this isn’t directly allowed I needed to send the code to the UI thread. Before anonymous delegates I would need to create a full blown method to perform a single task. That’s a lot of extra work for something that is unlikely to be re-used elsewhere. With anonymous delegates I can define the method inline, which is great, but still uses a lot of extra decoration. Now with lambdas I can finally get to the work of just having my code. Here’s an example straight from that demo.

Dispatcher.BeginInvoke(() =>
                           {
                              info.Text = title;
                              infoContainer.Visibility =
                                 Visibility.Visible;
                              _mutex.Unlock();
                           });

Perhaps the most interesting part of the code is the use of the title variable within the lambda expression. In this instance, title is a local variable within the method that is calling BeginInvoke(). The anonymous delegate will use this local reference when it is called. You can’t always get away with this.  Fortunately strings are immutable in .NET, so we can be confident that the value will not change. If title was mutable (can be changed) its value could be modified after BeginInvoke() is called, but before it is used in the lambda expression. This may lead to unexpected results.

This problem isn’t just isolated to multi-threaded applications (although multi-threaded applications are inherently more unpredictable). Because LINQ queries are not executed until they are enumerated (LINQ and Deferred Execution) they are susceptible to the same problems, but fortunately in a more consistent way. So remember to always be wary when using a local variable in a LINQ query.

Generic Delegates in .NET 3.5

Version 3.5 of the .NET Framework introduced some new generic delegates designed to cover most cases. In fact, it is unlikely that you will need to define your own delegates unless you need more than four parameters.

The Action delegates

Action delegates refer to a method that does not return a value (a void method).

  • Action is non-generic delegate that takes no parameters and does not return a value.
  • Action<T> was originally introduced in .NET 2.0. This delegate takes one parameter of type T.
  • Action<T1, T2>, Action<T1, T2, T3> and Action<T1, T2, T3, T4> are generic delegates that take two, three and four parameters respectively and do not return a value.

The Func delegates

Func delegates are similar to the Action delegates except that they also return a value. The type of the value is always the last type parameter of the generic delegate.

  • Func<TResult> is a generic delegate that takes no parameters and returns a value of type TResult.
  • Func<T, TResult>, Func<T1, T2, TResult>, Func<T1, T2, T3, TResult> and Func<T1, T2, T3, T4, TResult> are generic delegates that take one, two, three and four parameters respectively and return a value of type TResult.

What’s next?

Next up we’ll be looking at LINQ to SQL and how it can make accessing and using a database a joy.

LINQ and Deferred Execution

February 18th, 2009

One of the stumbling blocks on the road to understanding LINQ is deferred execution. The key to getting past this is being able to identify that a query is a definition of what you want, rather than the results themselves.

Here’s an example of how this works:

var itemsInStock = from item in warehouse.Items
                   where item.Quantity > 0;
                   select item;

// Display how many items are in stock
Console.WriteLine("Items in stock: {0}", itemsInStock.Count());

// Add a new item to the warehouse
warehouse.Items.Add(new Item("A new item", 50);

// Display how many items are in stock
Console.WriteLine("Items in stock: {0}", itemsInStock.Count());

The second time itemsInStock.Count() is called it returns the updated count that includes our new item. Instead of executing the query when it is defined, execution is deferred until a result is needed (such as iterating over the collection with a foreach loop, using ToList() to store the results in a List<T> or one of the many LINQ extension methods that force an actual result (such as Count() in this example). This has the added benefit of allowing a query to be extended like so:

var lowStock = from item in itemsInStock
               where item.Quantity < 5;
               select item;

This query can now be used to return items that are in stock, but have less than 5 available units.

Quite often you’ll want to work with a snapshot of the results from a query. Maybe you are writing a method that returns a particular set of items. In this scenario it may be better to return a list rather than the query itself. By returning a list, the calling code is able to iterate over the result multiple times without the result changing. For example you might implement your method like this:

private IEnumerable<Item> GetItemsInStockQuery()
{
   return from item in warehouse.Items
          where item.Quantity > 0
          select item;
}

public List<Item> GetItemsInStock()
{
   return GetItemsInStockQuery().ToList();
}

Calling code is able to get the information it needs and internally you can directly get access to the query.

Another important thing to remember is that because a query is executed every time you iterate it with a foreach loop you should use ToList() if you are repeatedly calling the query and don’t need the results to be recalculated each time.

More LINQ to come

In my next post I’ll explore lambda expressions.

LINQ and Extension methods

February 16th, 2009

Have you ever wished that a base class had a particular method? What about interfaces? Wouldn’t it be great to define a method on an interface along with its implementation? Any class that then implemented the interface would get this implementation for free.

In the past this was achieved with static utility classes. Unfortunately this leads to cluttering your code with the names of these utility classes and dilute the expressiveness of your code. Let’s say we have a utility class the gets the words and word count from a string. Don’t worry too much about the implementation, just the general structure.

public static class StringUtilities
{
   private static readonly Regex wordsRegex = new Regex(@"\w+");

   public static IEnumerable<string> GetWords(string source)
   {
      return from word in wordsRegex.Matches(source).Cast<Match>()
             select word.Value;
   }

   public static int WordCount(string source)
   {
      return GetWords(source).Count();
   }
}

To use this in our code we would have to do something like this:

var sentence = "The quick brown fox jumps over the lazy dog";

// Display each of the words
foreach (var word in StringUtilities.GetWords(sentence))
{
   Console.WriteLine(word);
}

// Display the word count
Console.Write("Total Words: ")
Console.WriteLine(StringUtilities.WordCount(sentence));

Look at all that clutter. The truth in this context is that we are really performing an action on the sentence. Wouldn’t it be better if we could just call sentence.GetWords() or sentence.WordCount() instead? It would certainly be more readable. Extension methods make this all possible. Here’s our updated StringUtilities class that creates the extension methods:

public static class StringUtilities
{
   private static readonly Regex wordsRegex = new Regex(@"\w+");

   public static IEnumerable<string> GetWords(this string source)
   {
      return from word in wordsRegex.Matches(source).Cast<Match>()
             select word.Value;
   }

   public static int WordCount(this string source)
   {
      return GetWords(source).Count();
   }
}

We’ve added this before the variable type. The rest of the code has been left untouched. So now we can use the extension methods like so:

var sentence = "The quick brown fox jumps over the lazy dog";

// Display each of the words
foreach (var word in sentence.GetWords())
{
   Console.WriteLine(word);
}

// Display the word count
Console.WriteLine("Total Words: {0}", sentence.WordCount());

Doesn’t that read better? We have been able to push the implementation details (the name of the static utility class) out of our code.

How to enable an extension method

In order to use an extension method it must be part of the local namespace or imported with a using statement. Once that’s done you can call extension methods just as you would any normal method.

What does this have to do with LINQ?

LINQ is all about extension methods. When you import the System.Linq namespace it comes with a whole bundle of extension methods. Most of them act on IEnumerable<T> and can be used to write your LINQ queries in method syntax. Let’s look at this query:

from item in items
where item.Price < 1
select item.Name

This query finds the items that are under one dollar and returns their names. We can write this query in method syntax like so:

items.Where(item => item.Price < 1).Select(item => item.Name)

It’s not quite as readable (although that is a matter of opinion), but it gives a good indication of what is going on (and further demonstrates why select is at the end). These methods also take advantage of Lambda expressions (which I’ll discuss in a future post).

There are other useful extension functions that work with queries. Some of the ones you’ll use most often are:

  • ToList() executes the query and returns the results in a list. You will probably use this method a lot. I’ll cover this method an its consequences in more depth in a future post on deferred execution.
  • Count() executes the query and returns the number of results. When used with LINQ to SQL it will execute SQL code to get the database server to return the count.
  • Any() returns true if there are any results in the query. Use this instead of Count() > 0 to abstract out the implementation detail.
  • First() returns the first result from the query. This is particularly useful when you have a query that will only return one result (such as looking up an entry based on its primary key). This method will throw an exception (InvalidOperationException) if the query yields no results.
  • FirstOrDefault() returns the first result from the query, much like First(). If there are no results it will return the default for the type (e.g. 0 for an int, null for reference types).

Fortunately you aren’t limited to using these extension methods on LINQ queries. They are designed to work on any class that implements IEnumerable<T>. This means you can use them directly on a lot of the classes already in the .NET base class library.

What about old non-generic IEnumerable?

There are a lot of classes in the .NET framework that don’t implement IEnumerable<T> but instead implement the non-generic interface IEnumerable. A perfect example is MatchCollection used by Regular expressions. When we enumerate over a MatchCollection we are given the base object which we then need to cast to a Match object. Until we do this cast we can’t access any of the properties of Match. Fortunately there are a couple of LINQ extension methods designed to help out when dealing with IEnumerable.

  • Cast<T>() returns a strongly typed IEnumerable<T> object. Each object is cast to the type T. If an object can’t be cast an exception is thrown (InvalidCastException). In the case of a MatchCollection I am confident that every object is a Match object and an exception won’t be thrown.
  • OfType<T>() also returns a strongly typed IEnumerable<T> object. It goes further than Cast<T>() by only including objects of that type in the enumeration. In other words it filters out any class that isn’t of the desired type (without throwing exceptions). This is the method to use when you are unsure of what the type will be or if you are dealing with an enumeration that contains different typed objects.

If you want to see OfType<T>() in action, copy and paste the following example into LINQPad. (You’ll need to select C# Statement(s) from the language drop down).

var items = new object[]{"a string", 22, Math.PI};

items.OfType<string>().Dump("OfType<string>");
items.OfType<int>().Dump("OfType<int>");
items.OfType<double>().Dump("OfType<double>");

LINQPad has its own extension method Dump() which is used to output results to the LINQPad window. You’ll see that each individual dump returns a strongly typed IEnumerable<T> object. In this example items actually implemented IEnumerable<object>. Fortunately these methods don’t discriminate and happily work their magic on any IEnumerable<T> as well.

Still more to come

There is still plenty of more that I will post about LINQ. In my next post I’ll look at deferred execution, what it means and how you can take advantage of it.