Ayende @ Rahien

It's a girl

Rhino ETL: Importing Data into MS CRM

Okay, so this is the "coding in anger" part for Rhino ETL. I need to import files into MS CRM entities. The files are standard CSV files, with the usual corruption of values that such files have. The CRM is accessed through the web services, although I am keeping aside the option of direct DB access, if I can't get the Web Services to perform any faster.

The first problem that I had was that the MS CRM Web Services are not simple services. They accept entities that are defined in the WSDL for them, not simple values. That put me in a complexity spin for a while, until I remembered that I am not working in my own little language, I am working on .NET. A quick trip to Visual Studio and an Add Web Reference + Compile later, I had integrated accessing the MS CRM into Rhino ETL.

Here is how it was done:

import CrmProxy.Crm from CrmProxy

Basically it means that I now had a dll that contains the proxy definitions for the web service, and I imported it. So it is incredibly easy to use.

Then, it was the matter of reading the file. Rhino ETL has integrated with the FileHelpers library, and I couldn't really be happier about it. There are several reasons for that, but the main one is that I run into something that the library can't handle, and I fixed that in 10 minutes, without changing the library code. Speaking of software that I like, this is one of the main criteria that I use to evaluate a piece of software. What happens when I step off the ledge? With FileHelpers, I can extend it so easily, that I really don't care about that.

Anyway, here is a part of the class definition for our file: 

[DelimitedRecord(","), IgnoreFirst]
class Customer:
      [FieldConverter(ConverterKind.Date, "dd/MM/yyyy")] 
      UpdateDate as date
      Id as int
      Name as string
      ResponsibleEmployee as Nullable of int
      [FieldConverter(Rhino.ETL.FileHelpersExtensions.DateTimeConverterWithNullValue, "dd/MM/yyyy","00/00/0000")] 
      ReceptionDate as Nullable of date

As you can see, there isn't much to it except defining the fields, types, etc.

source CustomersFile:
     execute:
            file = Read(typeof(Customer)).From(Configuration.CustomerFile)
            file.OnError(ErrorMode.SaveAndContinue)
            for customer in file:
                  print "Source ${customer.Id}"
                  SendRow( Row.FromObject(customer) ) 
            if file.HasErrors:
                  file.OutputErrors(Configuration.CustomerErrorsFile)
                  AddError("Errors have been written to ${Configuration.CustomerErrorsFile}")

Here I read from the file, use the Row.FromObject() to translate an entity into a row, and then send it forward. One amazing thing here is that FileHelpers will generate an errors file for me on demand. And that one is clear and concise and actually useful. Comparing to the amount of effort that I know are required to pull reasonable errors from SSIS file input, that is a great pleasure.

Anyway, if you missed that, I am very happy about FileHelpers.

Another thing to point out is the Configuration.CustomerFile, etc. The Configuration object is dynamically populated from a config file that you can pass to Rhino ETL (command line arg), which is a simple xml file in the format:

<configuration>
	<CustomerErrorsFile>D:\customers_errors.txt</CustomerErrorsFile>
</configuration>

Why XML? Because this seems like a place where I would want to touch with stuff like xmlpoke, etc. So it is easier to work with. It is also a flat configuration scheme, that doesn't have any semantics other than the simple key/value pair.

So, now that I have the data, I can send it to the destination:

destination Crm:
      initialize:
            Parameters.Srv = CrmService(
                  Url: Configuration.Url,
                  Credentials: NetworkCredential(
                            Configuration.Username,
                           
Configuration.Password,
                           
Configuration.Domain),
                  CallerIdValue: CallerId(CallerGuid: Guid(Configuration.CallerId)),
                  UnsafeAuthenticatedConnectionSharing: true,
                  PreAuthenticate: true
                  )

      onRow:
            theAccount = account(
                  accountnumber: Row.Id.ToString(),
                  name: Row.Name,
                  telephone1: Row.Phone,
                  telephone2: Row.Cellular,
                  telephone3: Row.AdditionalPhone,
                  fax: Row.Fax,
                  accountreceptiondate: CrmDateTime(Value: Row.ReceptionDate.ToString("yyyy-MM-ddT00:00:00")),
                  address1_city: Row.City,
                  )
            result = Parameters.Srv.Create(theAccount)
            print "Created account ${Row.Id} -> ${result}"

      cleanUp:
            Parameters.Srv.Dispose()

As you can see, we have the initialize method, which creates the service, then we instansiate an account instance, fill it with the required parameters, and go to town. It is also notable the easy translation of types from CLR types to CRM types, such as in the case of accountreceptiondate.

All in all, the only difficulities that I had during this were to make heads or tails from the inforamtion in the file, which is where I want the difficulity to lie when I am dealing with ETL processes.

Comments

Mark Monster
10/16/2007 01:41 PM by
Mark Monster

Hi Oren,

I like it, but two things.

1) Never touch the database directly.

2) Does it support settings import ? ;-) I guess not, because settings cannot be changed through the webservices. I'm still waiting for a Titan VPC on my desk, so I can see if CRM 4.0 has changed this.

Are you setting the id's on your own side? I thought it was generated serverside. Because I've heard something that CRM has done somethings to make sure the Global not-so Unique ID is Unique. Also there is some logic in the ID's generated (look in the settings for more information about this).

Ayende Rahien
10/16/2007 01:55 PM by
Ayende Rahien

1/ I am getting ~700 records per minute, which is ridiculously low. I can get tens of thousands using the DB directly.

2/ I am not even sure what settings import is, so the answer is no.

Note that I am setting the account number, not the id.

Guid is GUID, there is a guarantee that they will be unique no matter what you do to them.

I have yet to hear of a real world guid collision case.

Glyn
10/17/2007 08:44 AM by
Glyn

I love the Idea of your ETL framework, and have been playing about with it as a way of introducing myslef to Boo. I was running some tests importing over 31900 sql db rows from one DB into another, but after the first 15000 or so rows the software just hangs.

Is there a debug switch that can be triggered to see what is happening?

Comments have been closed on this topic.