Posts Tagged ‘xml’

Writing XML with XElement

Friday, February 27th, 2009

In my last post we looked at how you can use LINQ to XML and XElement to parse XML. But what if you want to create XML files programmatically? Or modify an existing XML document?

Let’s start by looking at how we might add a new entry to our blog. Here is the XML file again:

<?xml version="1.0" encoding="UTF-8"?>
<Blog>
   <Entries>
      <Entry Archived="false">
         <Title>My First Post</Title>
         <Body>I love LINQ. It's the best</Body>
         <Comments>
            <!-- TODO: Shouldn't comments have authors? -->
            <Comment>I love LINQ more</Comment>
            <Comment>LINQ is the way of the future.</Comment>
         </Comments>
      </Entry>
   </Entries>
</Blog>

So we want to add a new Entry under the Entries element. We’ll also assume that our XML file has been parsed into an XElement variable blog.

We’ll start by creating our entry first:

var entry = new XElement("Entry");
entry.SetAttributeValue("Archived", false);
entry.Add(new XElement("Title", "My Second Post"));
entry.Add(new XElement("Body", "Just a quick post."));
entry.Add(new XElement("Comments"));

We started by creating the element, set the “Archived” attribute, then added the other necessary elements. I’ve still added the Comments element even though it will be empty. Depending on the rules that have been set about how I should layout the XML it might be optional.

To check that my code worked I plugged it into LINQPad and dumped the value of entry like so:

entry.ToString().Dump();

The results showed me the following:

<Entry Archived="false">
   <Title>My Second Post</Title>
   <Body>Just a quick post.</Body>
   <Comments />
</Entry>

Wow, that’s exactly what we want. Even though we used a Boolean value instead of a String for the attribute, XElement was smart enough to display its value as a human readable string. The XML is also nicely formatted and readable. I added the call to ToString() to emphasise that it wasn’t LINQPad that was responsible for the improved formatting.

What we have done here is generate an XML fragment. Sometimes it is easier to think of large XML files as smaller fragments that can be handled independently.

So now all we have to do is find the Entries element and add our entry XElement to it like so.

blog.Element("Entries").Add(entry);

This will leave us with the final XML looking like this:

<Blog>
   <Entries>
     <Entry Archived="false">
       <Title>My First Post</Title>
       <Body>I love LINQ. It's the best</Body>
       <Comments>
         <!-- TODO: Shouldn't comments have authors? -->
         <Comment>I love LINQ more</Comment>
         <Comment>LINQ is the way of the future.</Comment>
       </Comments>
     </Entry>
     <Entry Archived="false">
       <Title>My Second Post</Title>
       <Body>Just a quick post.</Body>
       <Comments />
     </Entry>
   </Entries>
</Blog>

What about our XML declaration?

You might be wondering why the ToString() method of XElement doesn’t include the XML declaration. Because XElement represents a fragment of XML which could appear anywhere in an XML document. If it included the XML declaration it would lose this flexibility. However there is a workaround if you are outputting to a final file.

var blogDump = new StringBuilder();
blog.Save(new StringWriter(blogDump));

The Save() method on XElement automatically adds an appropriate XML declaration, which is probably a good idea as it sorts out the complicated things like the encoding and XML version (which I’ve never seen as anything other than 1.0 to date). The Save() method can take either the name of a file (as a String), an XmlWriter or TextWriter. In the example above I’ve used a StringWriter (which is a subclass of TextWriter) to save XML to a StringBuilder object which I could then use to build a string containing the XML. Save() also takes a second parameter, SaveOptions which allows you to save your XML file without the extra whitespace that I’ve shown above. If you want to save those bytes it might be worth looking at this option.

Where do we go from here?

I haven’t yet decided what my next LINQ post will cover (although LINQ to Entities is high on the agenda), so I won’t promise anything here now. I have much more to say still about LINQ, so feel free to post in the comments suggestions for areas to cover in future posts and the areas you would like to see covered in more detail. So far this has been fairly introductory and we’ll be building towards more advanced topics over the coming weeks.

XML Made Easy with LINQ to XML

Wednesday, February 25th, 2009

XML is a fantastic way to structure information. Here are the two things I like most about XML.

  1. It’s fundamental concepts are simple, making many XML files readable by regular humans.
  2. The formalised structure enables re-use of a more generalised XML parser.

Projects can certainly suffer from too much XML or XML is used when a better option exists. Once your XML files become too difficult to read in a text editor it may be better to look at another option (or better design your XML schema).

A lightning fast introduction to XML

Skip this section if you already know XML, but take time to look at this XML sample as it will be used throughout the article.

<?xml version="1.0" encoding="UTF-8"?>
<Blog>
   <Entries>
      <Entry Archived="false">
         <Title>My First Post</Title>
         <Body>I love LINQ. It's the best</Body>
         <Comments>
            <!-- TODO: Shouldn't comments have authors? -->
            <Comment>I love LINQ more</Comment>
            <Comment>LINQ is the way of the future.</Comment>
         </Comments>
      </Entry>
   </Entries>
</Blog>

Above is an example of a simple XML file. XML files follow a structured pattern called a schema. The schema defines the rules for what is allowed where and generally defines the structure of your file. Fortunately you don’t need to write a formal schema to get started with XML. Instead you can just start laying out your data. That’s where the “X” in XML comes from, because it is eXtensible.

So the sample XML above is being used to store the contents of a simple blog. XML isn’t the best way to do this, but a blog is a simple well understood concept. If you read my article on LINQ to SQL you might notice that this is very similar to the database example I used there.

Every XML document should start with what is known as an XML declaration. It’s in the first line of the XML and defines the version of the XML as well as the encoding of the file. If you are using notepad you can select the encoding when you save the file. The topic of encodings is out of the scope of this article.

The next important element that all XML files need is a root node. In this example our root node is called “Blog” and it holds all of our other elements. There can only be one root node in an XML document so if we wanted another blog we would have to put it in another XML file or redesign our XML to have a new root node (such as BlogCollection).

From there we can see that our XML document is made up of two key parts, elements and attributes. Elements are the things in angle brackets (called tags) and an element continues until it is closed with a matching closing tag. Closing tags are different from regular tags as they have a forward slash (/) before the name of the tag. We will use the term element to describe everything from the opening tag (a regular tag) to the closing tag, and a tag as the bit with the angle brackets.

There is also a special kind of tag called a self-closing tag that is both an opening tag and a closing tag. These tags have a forward slash before the closing angle bracket. For example:

<SelfClosingTag />

The space before the forward slash is optional (and stems back to compatibility with HTML). Personally I like keeping the space there, but your project may have different rules.

The other important concept is attributes. Attributes go inside the tag to provide more information about a tag. Attributes can only be used once per element (but one element can have multiple attributes). In the example above, we have given the entry tag the Archived attribute.

Sometimes it can be difficult to determine whether data should be expressed as an attribute or as a child element (an element inside another element). Typically the rule of thumb is that an attribute should be describing metadata, that is extra information about the element itself and how it might be interpreted. Occasionally this doesn’t clear things up at all. If you are still confused, consider the complexity of the data and whether multiple instances of the data will be required. Complex and repeating data is a sure sign that you want to use an element.

Importantly elements can contain other elements which can in turn contain more elements (and so on). XML follows a very strict hierarchy (which makes it easy to navigate) so an element must be closed inside the element that it was opened in. This means that any element (except the root node of course) has one and only one parent element. If you are modelling structured data it is unlikely you’ll run into troubles.

Finally I’ve also added a comment to remind me to add authors to the comments. We won’t actually be doing this, it was merely there to demonstrate how you can include comments in your XML documents. Comments should be ignored when parsing an XML file as they are unrelated to the data. Comments begin with <!-- and end with -->.

Ok, so by now you should know enough about XML to understand how we can parse this XML file and pull the necessary elements.

Now for the exciting stuff

LINQ to XML is a set of classes designed to work well with LINQ. It provides a very simple API that allows XML to be read and written with ease.

The centre of your LINQ to XML world is XElement. Through XElement we can access all of the important information in the sample above. Let’s start by writing a query that can help us get the Blog entries to display on the front page. We’ll assume I’ve loaded the XML as a string into a variable called blogXml.

var blog = XElement.Parse(blogXml);

var frontPage = from e in blog.Descendants("Entry")
                where e.Attribute("Archived").Value == "false"
                select e;

foreach (var entry in frontPage)
{
   WriteBlogTitle(entry.Element("Title").Value);
   WriteBlogBody(entry.Element("Body").Value);
   WriteBlogCommentCount(entry.Descendants("Comment").Count());
}

This example does absolutely no error checking (something you’ll definitely want to do if you are working with real XML) but demonstrates how simple it is to find particular elements inside XML. Additionally you can use XElement objects to pass XML fragments around your application. We could have made our LINQ query return an anonymous type that pulled out the Title, Body and Comment count for each entry, but instead we just pulled out the XElement itself. From there we were able count the comments inside our loop.

There is nothing preventing you from using these fantastic classes without having to use LINQ queries as well. In fact, most of the XML parsing code I’ve written lately doesn’t use LINQ queries at all to find elements, just the methods of the XElement class. Let’s look at the ones you’ll likely use most. Don’t worry that these parameters take an XName as their parameter, strings are automatically cast to a XName. You’ll need to use XName if you are dealing with namespaces (which I’ll discuss in a future post).

  • Element(XName name) returns the first immediate child element with the given name. If the element does not exist it returns null.
  • Elements() returns an IEnumerable<XElement> of all the immediate child elements. So against Blog the enumeration would yield a single “Entries” XElement. If there are no child elements the enumeration will be empty.
  • Elements(XName name) returns an IEnumerable<XElement> of all the immediate child elements with the given name. If no elements with the name exist it will return an empty enumeration.
  • Attribute(XName name) returns an XAttribute that is the attribute with the specified name. If the attribute does not exist it returns null.

To match the Element() and Elements() methods there are also a set of Descendant() and Descendants() methods. These work in the same way except that they return all elements under the node. We used this method when we were finding the Entry element as we didn’t care about the rest of the document’s hierarchy.

Because these methods return null if the element (or attribute) is not found it is important to check that the value is not null unless you are using a method which returns an IEnumerable<T> object.

Where to from here?

You now know all the important classes needed to parse XML files (perhaps to load up some strongly typed objects). In my next post I’ll be discussing how you can use this same class to build complex XML structures. In the meantime, check out the MSDN documentation for XElement.