C# LINQ from beginner to expert - Part 2

Quick Recap

Back in Part 1 we started to explore LINQ in C#, you should have learnt the following:

  • LINQ is a framework for manipulating collections of data
  • It has two distinct syntax styles, but they are actually the same. The query syntax is just sugar layered on the lambda syntax.
  • Queries are not executed until you look at the elements in the results.
  • Select() is used to transform data in a collection by applying an operation on each element.
  • Where() is used to filter out elements you are not interested in

For the whole of Part 2 we will focus on one LINQ operation, SelectMany(). This is actually a vital part of LINQ but too many people do not understand its full potential, they just see it as a way to navigate nested data.

SelectMany (Lazy)

At first glance SelectMany() appears to be another mapping operation like Select(), and on the surface level this is 100% true. The key difference being that the mapping operation is a one to many operation instead of one to one operation like Select(). With SelectMany() each element in the source collection is mapped to zero or more values in the results. These results are then concatenated together to flatten them into a single collection.

Common names for this operation are fmap, flatmap and bind.

First a quick example of a common use case, to flatten data in a nested class structure. Similar way to a table JOIN in SQL:

// Here we have a collection of people where each person can have
// more than one phone numbers. We then flatten them into a single
// collection of all the phone numbers

// Lambda syntax
var lambdaPhoneNumbers = people.SelectMany(x => x.PhoneNumbers);

// Query syntax
var queryPhoneNumbers =
    from person in people
    from phoneNumber in person.PhoneNumbers
    select phoneNumber;

As with select you should be able to figure out why this is another lazy operation, it is another mapper and so we only need to map over source elements when required.

If this was the only use case for SelectMany() we could wrap up here and move along to the next operation, not the case. To hint at its power and flexibility I will implementation Select() and Where() using SelectMany(). It should be noted that there are more efficient ways to implement these but this is to get you thinking.

public static IEnumerable<TR> Select<T, TR>(
    this IEnumerable<T> source, Func<T, TR> mapper)
    => source.SelectMany(x => new[] {mapper(x)});

public static IEnumerable<T> Where<T>(
    this IEnumerable<T> source, Func<T, bool> predicate)
    => source.SelectMany(x => predicate(x) ? new[] {x} : new T[0]);

So what is going on here?

I will start with select(), all that is done here is to take the result from the mapper and put that in a single element array. Because SelectMany() flattens the results from the mapper into a single array it takes all the one element arrays and combines them to produce the same results as Select().

Where() works on the same principle, if we do not want the element it returns an empty array otherwise it wraps the value in a single element array. When SelectMany() flattens the results.

As you can see, with some lateral thinking, you can use SelectMany() in some interesting ways. It should be noted that the results of the mapper do not have to use the input value to generate the results. I will show this in both syntax, this is a case where the query syntax is better.

var queryIndexPairs =
    from x in Enumerable.Range(0, grid.Width)
    from y in Enumerable.Range(0, grid.Height)
    select Tuple.Create(x, y);

var lambdaIndexPairs = 
    Enumerable.Range(0, grid.Width).SelectMany(
        x => Enumerable.Range(0, grid.Height).Select(
            y => Tuple.Create(x, y));

The query syntax is far cleaner. The lambda syntax has to make use of nested queries and closures to capture all the data it requires. At this point, many functional programmers will just see this as normal, standard syntax for them.

The people that created LINQ realized this and provided a second version of SelectMany to clean this up a bit. It handles nested composition for you. The projection gets called for all the combinations of x and y.

public static IEnumerable<TR> SelectMany<T1, T2, TR>(
    this IEnumerable<T1> source, Func<T1, IEnumerable<T2>> mapper,
    Func<T1, T2, TR> projection);

// So this
var lambdaIndexPairs = 
    Enumerable.Range(0, grid.Width).SelectMany(
        x => Enumerable.Range(0, grid.Height).Select(
            y => Tuple.Create(x, y));

// Becomes this
var lambdaIndexPairs = 
    Enumerable.Range(0, grid.Width).SelectMany(
        _ => Enumerable.Range(0, grid.Height),
        (x, y) => Tuple.Create(x, y));

I used _ for the variable passed into the mapper to show that it will be ignored, the output does not depend on it.

Part 3 is now available.

That is a fair chunk of information to take in for this part. As always if you have any questions feel free to ask :)

Woz

H2
H3
H4
3 columns
2 columns
1 column
4 Comments