Subversion is Xmlish

You know that it is a bad day when you start parsing XML using regular expressions.

When I started working on SvnBridge, I expected to have a lot of issues with TFS. What I didn't expect was to get hit by a Subversion WTF of gigantic magnitude.

Take a look at the following XML:

<?xml version="1.0" encoding="utf-8" ?>
<D:propertyupdate xmlns:D="DAV:" 
    xmlns:V="http://subversion.tigris.org/xmlns/dav/" 
    xmlns:C="http://subversion.tigris.org/xmlns/custom/" 
    xmlns:S="http://subversion.tigris.org/xmlns/svn/">
    <D:set>
        <D:prop>
            <C:bugtraq:label>Work Item:</C:bugtraq:label>
            <C:bugtraq:url>http://www.codeplex.com/SvnBridge/WorkItem/View.aspx?WorkItemId=%BUGID%</C:bugtraq:url>
            <C:bugtraq:message> Work Item: %BUGID%</C:bugtraq:message>
            <C:bugtraq:number>true</C:bugtraq:number>
            <C:bugtraq:warnifnoissue>true</C:bugtraq:warnifnoissue>
        </D:prop>
    </D:set>
</D:propertyupdate> 

Check the properties, and look closely. Despite the so-called xml header, this is not valid XML, yet this is produce (and consumed) by Subversion. This raise some interesting questions about what parser they are using, but that is beside the point.

This is wrong, period.

Print | posted on Tuesday, March 04, 2008 7:03 AM

Feedback


Gravatar

# re: Subversion is Xmlish 3/4/2008 9:21 AM Grimace of Despair

Moreover, since long, it's easy to have the svn command line spit out invalid xml. Failing operations can leave behind open tags. This is especially frustrating within automated builds that consume the svn output to get status information.


Gravatar

# re: Subversion is Xmlish 3/4/2008 11:34 AM Rik Hemsley

Did you find any documentation on the svn wire protocol? I wanted to work with it a while back but couldn't find anything useful, even after talking to several svn developers.


Gravatar

# re: Subversion is Xmlish 3/4/2008 3:18 PM Ayende Rahien

Rik,
What SvnBridge did is reverse engineer the protocol.
You can take a look at TestsProtocol to see how it was done.
Basically, it started from TCP level sniffer and build the tests from there


Gravatar

# re: Subversion is Xmlish 3/4/2008 3:26 PM Rik Hemsley

I was tempted to do this myself, but it worried me that I'd make too many assumptions - or that the protocol would change 'under' me too rapidly.


Gravatar

# re: Subversion is Xmlish 3/4/2008 3:33 PM Ayende Rahien

Rik,
Why are you trying to simulate the protocol?

You don't have to worry about it changing, I would say. Too much relies on it.

You can also take the SvnBridge source code and use that as a base, you would need to supply an implementation of ISourceCodeProvider, but that about it


Gravatar

# re: Subversion is Xmlish 3/4/2008 5:23 PM Rik Hemsley

I was going to write a managed library for talking to svn servers, then use svn as the backend for a website - the idea of revision history being part of the storage was attractive.


Gravatar

# re: Subversion is Xmlish 3/4/2008 5:25 PM Ayende Rahien

Why not use SVN itself for that?


Gravatar

# re: Subversion is Xmlish 3/4/2008 5:28 PM Rik Hemsley

You mean the command line program? Because its output is not good enough to parse.


Gravatar

# re: Subversion is Xmlish 3/4/2008 5:40 PM Ayende Rahien

The output of svn.exe is explicitly designed to be parsed by machines.
But I meant using the SVN server


Gravatar

# re: Subversion is Xmlish 3/4/2008 7:41 PM Rik Hemsley

The svn developers I talked to told me that parsing the output of svn.exe is painful. Perhaps it has similar problems to the one you are seeing.

I'm not sure what you mean by 'using' the SVN server. If I'm not checking in / checking out / diffing etc. using svn.exe or the wire protocol, what else is there?


Gravatar

# re: Subversion is Xmlish 3/4/2008 7:58 PM Ayende Rahien

Storing the information in Subversion itself.


Gravatar

# re: Subversion is Xmlish 3/4/2008 8:42 PM Rik Hemsley

By using svn.exe or talking to the server over the wire, I _am_ storing the information in Subversion itself.


Gravatar

# re: Subversion is Xmlish 3/4/2008 8:59 PM orcmid

If it is the namespace peculiarity that bothers you (DAV: not exactly being a URI), it goes back into WebDAV history, where the not-yet-solid namespace specifications were misunderstood.

I didn't look for other problems, but I suspect they have an origin in a misunderstanding of WebDAV (and any misunderstandings that still lurk in WebDAV).


Gravatar

# re: Subversion is Xmlish 3/4/2008 9:03 PM Ayende Rahien

Dennis,
Take a look at the element name:
<C:bugtraq:label>


Gravatar

# re: Subversion is Xmlish 3/5/2008 2:21 AM Jan Limpens

sql server 2005 has something similar for its theasuarus

<thesaurus xmlns="x-schema:tsSchema.xml">
<diacritics = false/>
</thesaurus>

does not make live any easier


Gravatar

# re: Subversion is Xmlish 3/5/2008 4:31 PM Paul Hatcher

That last one is a perfectly valid uri; there's only one colon and both the left and right are valid names.


Gravatar

# re: Subversion is Xmlish 3/5/2008 8:49 PM Rob Levine

The <C:bugtraq:label> certainly looks nasty, but I do believe it is syntactically allowable as an element name as the XML RFC seems to permit it.
It does note that colons have a reserved meaning in the XML namespaces RFC and so authors shouldn't use them in element names, but it also says that this *must* be handled by parsers.

A very quick look at the relevant parts of the RFC is here:
http://blog.roblevine.co.uk/?p=11

That is not to say I think this sample of XML is actually *nice*...


Gravatar

# re: Subversion is Xmlish 3/5/2008 9:16 PM Ayende Rahien

Rob.
No XML parser that I tried could handle them, and I tried 3 different ones.


Gravatar

# re: Subversion is Xmlish 3/6/2008 7:04 PM Rob Levine

Interestingly enough, it turns out that you can get the .Net XmlTextReader to accept this format (not that I knew this before about an hour ago).
More info here:
http://blog.roblevine.co.uk/?p=12
I don't know if that is any help in your quest to read Subversion's xml, but maybe...

Comments have been closed on this topic.