Monthly Archives: January 2013

My First GitHub Pull Request with LinqToExcel

I recently found myself in a situation dealing with a web application that  needed the ability to allow an administrator-type user upload an Excel list of records and then process that list.  What I mean by process the list is to add records from the Excel file that weren’t already in the database, update records that were different in the Excel file than what was in the database, and mark for deletion any records that were not in the Excel but were in the database.

Letting a user upload an Excel file that in turn modifies data records has always been a task that I’ve had reservations about.  There are so many variables to account for, not the least of which is hoping the user uploaded the correct Excel file in the correct format.  From a technical standpoint, performing an Excel import and the resulting data augmentation has always been sort of dicey too.  However, in order to help me out I thought to use a library that I had used before in another project named LinqToExcel.

As stated succinctly on the project’s GitHub page:

Linq to Excel is a .Net library that allows you to query Excel spreadsheets using the LINQ syntax.

To get started, I reviewed LinqToExcel’s documentation.  It’s actually quite good; so good that I’m not going to bother rehashing it but will provide some resources:

Clicking on those links it becomes obvious that LinqToExcel is simple to use with hardly any learning curve.  Basically, if you know how to use Linq, you can pick up LinqToExcel in a few minutes.

Now to the meat of this blog post: If LinqToExcel is such a great and easy to use tool to query Excel files, why would I find a need to change it?  LinqToExcel has a query property named StrictMapping.  The original implementation of StrictMapping was a boolean property that enforced the Excel worksheet’s columns to match the properties of the target class and vice versa.  If there was a mismatch, an exception is thrown.  For instance, the following situation would pass a StrictMapping enforcement check:

Excel Column Class Property StrictMapping Result
FirstName FirstName ok
LastName LastName ok
Title Title ok

For my situation, the StrictMapping check proved to be an excellent way to verify the user was uploading an Excel worksheet that contained the correct columns.  It’s one of the ways to validate the consumer’s data in the Excel worksheet.  However, in my scenario there were extra columns in the Excel worksheet that I wanted to ignore because they were not properties of my target class.  Because StrictMapping saw a mismatch, an exception was thrown.  For instance, here is a depiction of an Excel worksheet with the added column HireDate and the target class without that property:

Excel Column Class Property StrictMapping Result
FirstName FirstName ok
LastName LastName ok
Title Title ok
HireDate [none] Exception

By not being able to set StrictMapping in this situation, I was losing out on the enforcement it provided to verify the properties in my target class had Excel worksheet data matching it.  I determined that I needed to modify this property in order to make it work for me and luckily LinqToExcel is open source!

I began by forking the project on GitHub and changing the StrictMapping property from a boolean to a nullable enumeration:

public StrictMappingType? StrictMapping { get; set; }

The enumeration has values for multiple new conditions because I figured if I was going to expand StrictMapping, I may as well cover all possible scenarios:

namespace LinqToExcel.Query
{
    /// <summary>
    /// Class property and worksheet mapping enforcemment type.
    /// </summary>
    public enum StrictMappingType
    {
        /// <summary>
        /// All worksheet columns must map to a class property; all class properties must map to a worksheet columm.
        /// </summary>
        Both,

        /// <summary>
        /// All class properties must map to a worksheet column; other worksheet columns are ignored.
        /// </summary>
        ClassStrict,

        /// <summary>
        /// No checks are made to enforce worksheet column or class property mappings.
        /// </summary>
        None,

        /// <summary>
        /// All worksheet columns must map to a class property; other class properties are ignored.
        /// </summary>
        WorksheetStrict
    }
}

I chose to have the StrictMapping property a nullable enumeration by design.  Being the property was previously a boolean, there was logic in place to default the value to False if the property wasn’t explicitly set.  I modified that logic to similarly set the value to StrictMapping.None by default (see line 7 below):

internal ExcelQueryArgs(ExcelQueryConstructorArgs args)
{
	FileName = args.FileName;
	DatabaseEngine = args.DatabaseEngine;
	ColumnMappings = args.ColumnMappings ?? new Dictionary<string, string>();
	Transformations = args.Transformations ?? new Dictionary<string, Func<string, object>>();
	StrictMapping = args.StrictMapping ?? StrictMappingType.None;
}

Finally, I had to update the logic that determines if the StrictMapping condition was met by using the value set to the StrictMapping enum:

private void ConfirmStrictMapping(IEnumerable<string> columns, PropertyInfo[] properties, StrictMappingType strictMappingType)
{
	var propertyNames = properties.Select(x => x.Name);
	if (strictMappingType == StrictMappingType.ClassStrict || strictMappingType == StrictMappingType.Both)
	{
		foreach (var propertyName in propertyNames)
		{
			if (!columns.Contains(propertyName) && PropertyIsNotMapped(propertyName))
				throw new StrictMappingException("'{0}' property is not mapped to a column", propertyName);
		}
	}

	if (strictMappingType == StrictMappingType.WorksheetStrict || strictMappingType == StrictMappingType.Both)
	{
		foreach (var column in columns)
		{
			if (!propertyNames.Contains(column) && ColumnIsNotMapped(column))
				throw new StrictMappingException("'{0}' column is not mapped to a property", column);
		}
	}
}

private bool PropertyIsNotMapped(string propertyName)
{
	return !_args.ColumnMappings.Keys.Contains(propertyName);
}

private bool ColumnIsNotMapped(string columnName)
{
	return !_args.ColumnMappings.Values.Contains(columnName);
}

This was followed by updating and creating supporting positive and negative unit tests.  Once my changes were complete I was able to commit my fork and issue a pull request to the project owner where it was accepted several days later.

I am very excited to say the least.  The change I made to LinqToExcel is really not significant when taking in to account the library as a whole.  However, this was my first open source software contribution and it feels great to have been able to contribute something useful.  It is a true testament to the community of open source software that I was able to use this very helpful (and free) library and then contribute to it by solving a specific business need that I encountered.

Tags: , Filled Under: Programming Posted on: January 7, 2013