Posts Tagged ‘LINQ’

An Introduction to Lambda Expressions in LINQ

Saturday, February 21st, 2009

Lambda expressions are a great way to write simple anonymous delegates in a concise way. Of course you aren’t limited to simple functions, you can write a full blown method in lambda syntax.

I’ve already shown some lambda expressions in use when I discussed extensions methods. Here’s the example:

items.Where(item => item.Price < 1).Select(item => item.Name)

There are two lambda expressions in the above example. They are:

  1. item => item.Price < 1
  2. item => item.Name

These are very simple lambda expressions that take one parameter (item) and return a result. The type of the parameter and the the type of the result are inferred by the compiler allowing us to express what clearly without having to decorate it with types. So each of the expressions really means the following:

  1. Take an item and return whether the item’s price is less than one.
  2. Take and item and return the item’s name.

Hopefully you can see the basic pattern here. Take what’s on the left of the lambda operator (=>), use it in the expression on the right and return the result of the expression.

The important thing to remember with lambdas is that they only declare the function. In the example above the lambda expression is executed within the Where and Select methods and is executed once for each item in the enumeration. The Where method uses the result of the lambda expression to determine if the item should be in the resultant enumeration and the Select method returns the result on the lambda expression as the member of the enumeration.

Invoke() made easy

Lambdas aren’t restricted to being used just with LINQ, they can be used anywhere that anonymous delegates can be found. One area I’ve found lambdas increasingly useful is in multi-threaded applications. For example, my Tweet demo uses multiple threads to perform the animation. Consequently I often needed to update the UI from the background thread. Because this isn’t directly allowed I needed to send the code to the UI thread. Before anonymous delegates I would need to create a full blown method to perform a single task. That’s a lot of extra work for something that is unlikely to be re-used elsewhere. With anonymous delegates I can define the method inline, which is great, but still uses a lot of extra decoration. Now with lambdas I can finally get to the work of just having my code. Here’s an example straight from that demo.

Dispatcher.BeginInvoke(() =>
                           {
                              info.Text = title;
                              infoContainer.Visibility =
                                 Visibility.Visible;
                              _mutex.Unlock();
                           });

Perhaps the most interesting part of the code is the use of the title variable within the lambda expression. In this instance, title is a local variable within the method that is calling BeginInvoke(). The anonymous delegate will use this local reference when it is called. You can’t always get away with this.  Fortunately strings are immutable in .NET, so we can be confident that the value will not change. If title was mutable (can be changed) its value could be modified after BeginInvoke() is called, but before it is used in the lambda expression. This may lead to unexpected results.

This problem isn’t just isolated to multi-threaded applications (although multi-threaded applications are inherently more unpredictable). Because LINQ queries are not executed until they are enumerated (LINQ and Deferred Execution) they are susceptible to the same problems, but fortunately in a more consistent way. So remember to always be wary when using a local variable in a LINQ query.

Generic Delegates in .NET 3.5

Version 3.5 of the .NET Framework introduced some new generic delegates designed to cover most cases. In fact, it is unlikely that you will need to define your own delegates unless you need more than four parameters.

The Action delegates

Action delegates refer to a method that does not return a value (a void method).

  • Action is non-generic delegate that takes no parameters and does not return a value.
  • Action<T> was originally introduced in .NET 2.0. This delegate takes one parameter of type T.
  • Action<T1, T2>, Action<T1, T2, T3> and Action<T1, T2, T3, T4> are generic delegates that take two, three and four parameters respectively and do not return a value.

The Func delegates

Func delegates are similar to the Action delegates except that they also return a value. The type of the value is always the last type parameter of the generic delegate.

  • Func<TResult> is a generic delegate that takes no parameters and returns a value of type TResult.
  • Func<T, TResult>, Func<T1, T2, TResult>, Func<T1, T2, T3, TResult> and Func<T1, T2, T3, T4, TResult> are generic delegates that take one, two, three and four parameters respectively and return a value of type TResult.

What’s next?

Next up we’ll be looking at LINQ to SQL and how it can make accessing and using a database a joy.

LINQ and Deferred Execution

Wednesday, February 18th, 2009

One of the stumbling blocks on the road to understanding LINQ is deferred execution. The key to getting past this is being able to identify that a query is a definition of what you want, rather than the results themselves.

Here’s an example of how this works:

var itemsInStock = from item in warehouse.Items
                   where item.Quantity > 0;
                   select item;

// Display how many items are in stock
Console.WriteLine("Items in stock: {0}", itemsInStock.Count());

// Add a new item to the warehouse
warehouse.Items.Add(new Item("A new item", 50);

// Display how many items are in stock
Console.WriteLine("Items in stock: {0}", itemsInStock.Count());

The second time itemsInStock.Count() is called it returns the updated count that includes our new item. Instead of executing the query when it is defined, execution is deferred until a result is needed (such as iterating over the collection with a foreach loop, using ToList() to store the results in a List<T> or one of the many LINQ extension methods that force an actual result (such as Count() in this example). This has the added benefit of allowing a query to be extended like so:

var lowStock = from item in itemsInStock
               where item.Quantity < 5;
               select item;

This query can now be used to return items that are in stock, but have less than 5 available units.

Quite often you’ll want to work with a snapshot of the results from a query. Maybe you are writing a method that returns a particular set of items. In this scenario it may be better to return a list rather than the query itself. By returning a list, the calling code is able to iterate over the result multiple times without the result changing. For example you might implement your method like this:

private IEnumerable<Item> GetItemsInStockQuery()
{
   return from item in warehouse.Items
          where item.Quantity > 0
          select item;
}

public List<Item> GetItemsInStock()
{
   return GetItemsInStockQuery().ToList();
}

Calling code is able to get the information it needs and internally you can directly get access to the query.

Another important thing to remember is that because a query is executed every time you iterate it with a foreach loop you should use ToList() if you are repeatedly calling the query and don’t need the results to be recalculated each time.

More LINQ to come

In my next post I’ll explore lambda expressions.

LINQ and Extension methods

Monday, February 16th, 2009

Have you ever wished that a base class had a particular method? What about interfaces? Wouldn’t it be great to define a method on an interface along with its implementation? Any class that then implemented the interface would get this implementation for free.

In the past this was achieved with static utility classes. Unfortunately this leads to cluttering your code with the names of these utility classes and dilute the expressiveness of your code. Let’s say we have a utility class the gets the words and word count from a string. Don’t worry too much about the implementation, just the general structure.

public static class StringUtilities
{
   private static readonly Regex wordsRegex = new Regex(@"\w+");

   public static IEnumerable<string> GetWords(string source)
   {
      return from word in wordsRegex.Matches(source).Cast<Match>()
             select word.Value;
   }

   public static int WordCount(string source)
   {
      return GetWords(source).Count();
   }
}

To use this in our code we would have to do something like this:

var sentence = "The quick brown fox jumps over the lazy dog";

// Display each of the words
foreach (var word in StringUtilities.GetWords(sentence))
{
   Console.WriteLine(word);
}

// Display the word count
Console.Write("Total Words: ")
Console.WriteLine(StringUtilities.WordCount(sentence));

Look at all that clutter. The truth in this context is that we are really performing an action on the sentence. Wouldn’t it be better if we could just call sentence.GetWords() or sentence.WordCount() instead? It would certainly be more readable. Extension methods make this all possible. Here’s our updated StringUtilities class that creates the extension methods:

public static class StringUtilities
{
   private static readonly Regex wordsRegex = new Regex(@"\w+");

   public static IEnumerable<string> GetWords(this string source)
   {
      return from word in wordsRegex.Matches(source).Cast<Match>()
             select word.Value;
   }

   public static int WordCount(this string source)
   {
      return GetWords(source).Count();
   }
}

We’ve added this before the variable type. The rest of the code has been left untouched. So now we can use the extension methods like so:

var sentence = "The quick brown fox jumps over the lazy dog";

// Display each of the words
foreach (var word in sentence.GetWords())
{
   Console.WriteLine(word);
}

// Display the word count
Console.WriteLine("Total Words: {0}", sentence.WordCount());

Doesn’t that read better? We have been able to push the implementation details (the name of the static utility class) out of our code.

How to enable an extension method

In order to use an extension method it must be part of the local namespace or imported with a using statement. Once that’s done you can call extension methods just as you would any normal method.

What does this have to do with LINQ?

LINQ is all about extension methods. When you import the System.Linq namespace it comes with a whole bundle of extension methods. Most of them act on IEnumerable<T> and can be used to write your LINQ queries in method syntax. Let’s look at this query:

from item in items
where item.Price < 1
select item.Name

This query finds the items that are under one dollar and returns their names. We can write this query in method syntax like so:

items.Where(item => item.Price < 1).Select(item => item.Name)

It’s not quite as readable (although that is a matter of opinion), but it gives a good indication of what is going on (and further demonstrates why select is at the end). These methods also take advantage of Lambda expressions (which I’ll discuss in a future post).

There are other useful extension functions that work with queries. Some of the ones you’ll use most often are:

  • ToList() executes the query and returns the results in a list. You will probably use this method a lot. I’ll cover this method an its consequences in more depth in a future post on deferred execution.
  • Count() executes the query and returns the number of results. When used with LINQ to SQL it will execute SQL code to get the database server to return the count.
  • Any() returns true if there are any results in the query. Use this instead of Count() > 0 to abstract out the implementation detail.
  • First() returns the first result from the query. This is particularly useful when you have a query that will only return one result (such as looking up an entry based on its primary key). This method will throw an exception (InvalidOperationException) if the query yields no results.
  • FirstOrDefault() returns the first result from the query, much like First(). If there are no results it will return the default for the type (e.g. 0 for an int, null for reference types).

Fortunately you aren’t limited to using these extension methods on LINQ queries. They are designed to work on any class that implements IEnumerable<T>. This means you can use them directly on a lot of the classes already in the .NET base class library.

What about old non-generic IEnumerable?

There are a lot of classes in the .NET framework that don’t implement IEnumerable<T> but instead implement the non-generic interface IEnumerable. A perfect example is MatchCollection used by Regular expressions. When we enumerate over a MatchCollection we are given the base object which we then need to cast to a Match object. Until we do this cast we can’t access any of the properties of Match. Fortunately there are a couple of LINQ extension methods designed to help out when dealing with IEnumerable.

  • Cast<T>() returns a strongly typed IEnumerable<T> object. Each object is cast to the type T. If an object can’t be cast an exception is thrown (InvalidCastException). In the case of a MatchCollection I am confident that every object is a Match object and an exception won’t be thrown.
  • OfType<T>() also returns a strongly typed IEnumerable<T> object. It goes further than Cast<T>() by only including objects of that type in the enumeration. In other words it filters out any class that isn’t of the desired type (without throwing exceptions). This is the method to use when you are unsure of what the type will be or if you are dealing with an enumeration that contains different typed objects.

If you want to see OfType<T>() in action, copy and paste the following example into LINQPad. (You’ll need to select C# Statement(s) from the language drop down).

var items = new object[]{"a string", 22, Math.PI};

items.OfType<string>().Dump("OfType<string>");
items.OfType<int>().Dump("OfType<int>");
items.OfType<double>().Dump("OfType<double>");

LINQPad has its own extension method Dump() which is used to output results to the LINQPad window. You’ll see that each individual dump returns a strongly typed IEnumerable<T> object. In this example items actually implemented IEnumerable<object>. Fortunately these methods don’t discriminate and happily work their magic on any IEnumerable<T> as well.

Still more to come

There is still plenty of more that I will post about LINQ. In my next post I’ll look at deferred execution, what it means and how you can take advantage of it.

Getting Started with LINQ

Saturday, February 14th, 2009

I really like LINQ. It’s one of my favourite .NET features. When I first heard about it I was doing most of my programming in Visual Basic 6 (or worse, Visual Basic for Applications). Working now with C# and the .NET Framework has blessed me with full O-O, strong types, an excellent base class library, Visual Studio 2008 (and IntelliSense), Generics (I love generics) and LINQ.

So what is LINQ and why is it so important to add to your arsenal of .NET skills?

LINQ is so many things

At its core LINQ is exactly what its acronym suggests: Language Integrated Query. But what does this actually mean? Is it just some marketing hype designed to confuse the masses and look good on your resume. Probably. But the value of expressing a query concisely in the language of your choice becomes more apparent with each LINQ query you write. (Yes, I know LINQ Query would stand for Language Integrated Query query. Just go with it, it reads better.)

Importantly a LINQ query separates defining what you are looking for from how to find it. This means that a LINQ query could potentially be executed across multiple CPU cores and in the case of LINQ to SQL can be turned into an efficient SQL query so the hard work can be done by your database server.

But when are you going to actually use LINQ? Chances are good that you already have some code that could benefit from a bit of LINQ.

Take this example:

var itemsUnderOneDollar = new List<Item>();
foreach (var item in items)
{
   if (item.Price < 1)
   {
      itemsUnderOneDollar.Add(item);
   }
}

In this case we want to find all items that are under one dollar. The same in LINQ would be:

var itemsUnderOneDollar = (from item in items
                           where item.Price < 1
                           select item).ToList();

We can ignore the variable declaration for now (and the call to ToList()) so let’s break it down to just the core LINQ query.

from item in items
where item.Price < 1
select item

The LINQ query describes exactly what you want and nothing more. When we used the foreach construct we were resigned to the fact that we had to look at each and every item. We are also doing all this in a single thread. In fact, we spend more time describing how we want to find the items than saying what it is that we want. By describing what we want using LINQ we don’t bother with the implementation details resulting in cleaner code and improved flexibility for how the query should be implemented. In the case of a database query, the ideal implementation would be to generate a SQL query, execute the SQL query against the database and return the results. LINQ to SQL does just that with essentially the same code (I’ll be discussing LINQ to SQL in depth in another post).

From Where Select vs. Select From Where

If you are already familiar with SQL you may be a little confused by the syntax of a LINQ query. Indeed this is a major stumbling block most people encounter when they start to use LINQ. In SQL we have the ‘select’ statement upfront but in LINQ we save it for the end. Why? The primary reason for this choice was to enable great IntelliSense support in Visual Studio.

I’d like to argue that the syntax in LINQ actually makes more sense. Rather than starting with what we want at the end we start with the subject of our query. The reason this seems so foreign is that we are so used to it because of SQL. When you write code you typically say where you want to look before you say what you want to do when you’ve found it. In natural language it is like saying “From the store find a computer with 2GiB RAM and get me the price of the computer”. In SQL speak that would be “Get the price of the computer in the store where that computer has 2GiB RAM”. You tell me which form you’d be more likely to use.

The magic of type inference

Another convenient way to remember that from comes first is to think of the old foreach implementation. You’ll find that they have a lot in common. The biggest difference is that in the foreach loop we have to explicitly specify a type. In the example above I’ve used var to let the compiler infer the type. In the LINQ query the type is inferred automatically unless you specify it explicitly.

Type inference is used throughout most LINQ usage to simplify code and to improve maintainability. Queries return an object that implements the IEnumerable<T> interface. More advanced queries can return objects that implement a more complex interface (which is also an IEnumerable<T>). By using var to let the compiler infer the type of object returned by the query it saves the programmer from having to explicitly work out what type of object is returned. The full significance of this will become apparent in future posts.

How to get started

The best way to get started working with LINQ is to read up about it on MSDN. Then download the great tool LINQPad. LINQPad has some great sample LINQ queries and lets you play with LINQ outside of Visual Studio. It’s great for writing short snippets of code and is an ideal sandbox to try out bits of code. LINQPad is free, but Auto Completion is a paid feature (but well worth it). It also lets you run LINQ to SQL queries on a SQL database (and now SQL Compact Edition).

Once you have started familiarising yourself with LINQ you should start using it in your projects. There are two key requirements to using LINQ:

  1. Your project must target version 3.5 of the .NET Framework.
  2. You must include using System.Linq; to reference the LINQ namespace in all code files where you want to use LINQ.

If you don’t have Visual Studio 2008 you can download one of the free express editions from http://www.microsoft.com/express/. Once installed you might also want to turn on line numbers in Visual C# Express.

More to come

There’s plenty of stuff to talk about with LINQ. In my next few posts I’ll cover Extensions methods, Lambda expressions, LINQ over objects and much more.