Tobin Harris has asked some questions about how Rhino.ETL will handle transformations. As you can see, I consider this something as trivial as a FizzBuzz test, which is a Good Thing, since it really should be so simple. Tobin's questions really show the current pain points in ETL processes.
- Remove commas from numbers
transform RemoveCommas: for column in row.Columns: if row[column] isa string: row[column] = row[column].Replace(",","")
transform TrimEmptyStringToNull: for column in row.Columns: val = row[column] if val isa string: row[column] = null if val.Trim().Length == 0
transform IntroduceSpace: row.PostalCode = row.PostalCode.Substring(0,4) +' ' + row.PostalCode.Substring(4)
transform MakeTitleCase: row.Title = row.Name.Substring(0,1).ToUpper() + row.Name.Substring(1)
transform RemoveRowsWithoutId: RemoveRow() if not row.Id
transform TranslateDate: row.Date = date.Parse(row.Date).ToString("yyyy-MM-dd")
transform RemoveBadDate: tmp as date row.Date = null if not date.TryParse(row.Date, tmp)
Things that I don't have an implementation of are:
- Remove repeated column headers in data - I don't understand the requirement.
- Unpivot repeated groups onto new rows, Unpivot( startCol, colsPerGroup, numberOfGroups) - I have two problems here, I never groked pivot/unpviot fully, so this require more research, but I have a more serious issue, and that is that this is a transformation over a set of rows, and I can't thing of a good syntax for that, or the semantics it should have.
I am opened for ideas...