Wednesday, March 19, 2008

Twining: Iron Python DSL for DB Update V

{

I'm still at it with Twining. A few projects and events have distracted me temporarily but I had a few moments last night to do a few things that I've intended to for some time.  First a summary of functionality with what's new in red.

cn = "Data Source=.\\sqlexpress;Initial Catalog=MyDB1;Integrated Security=SSPI;"
cn2 = "Data Source=.\\sqlexpress;Initial Catalog=MyDB2;Integrated Security=SSPI;"

# dest, source (you can use the same source column over and over)
# dest, source can be field names, example below uses ordinal position
colmap = {"DestinationField1":1, "DestinationField2":2, "DestinationField3":1}
database(cn).table("MyTable").mapcolumns(colmap).importfrom.csv("C:\\foo.csv")

#old stuff
mytrans = {"Field1":"upper","Field2": lambda x: x.upper(),"Field3": lambda x: x[::-1]}
database(cn).table("MyTable").transform(mytrans).copyto.database(cn2)
database(cn).table("MyTable").transform(mytrans).copyto.xmlFieldsAsAttributes("c:\\here.xml")
database(cn).table("MyTable").transform(mytrans).copyto.xmlFieldsAsElements("c:\\there.xml")
database(cn).table("MyTable").transform(mytrans).copyto.delimited("\t", "c:\\foaf.tsv")
database(cn).table("MyTable").copyto.delimited("\t", "c:\\foaf_no_trans.tsv")
database(cn).table("MyTable").transform(mytrans).copyto.csv("c:\\foo.csv")
database(cn).table("MyTable").copyto.xmlFieldsAsAttributes("c:\\here.xml")
database(cn).table("MyTable").copyto.xmlFieldsAsElements("c:\\there.xml")
database(cn).backupto("C:\\temp\\loungin.bak")

Footnotes: I'm working on changing my implementation of all things CSV/delimiter to using ADO.NET providers - the manually concatenated strings were just proof of concept. Back in 2004 in a thread that none other than Joel Spolsky responded to I learned not to try to reinvent a very hard CSV wheel.

The other bigger change I made was breaking the project apart. What was originally in multiple files as a "sketch" was becoming a nuisance besides poorly structured. I've deconstructed into the following files:

registrar.py - holds global, static data.
sqlhelper.py - runs dynamic sql statements
errorhandle.py - exception handling
database.py - reference to databases, constituent tables, and holds the copyto and importfrom objects
copyto.py - object for exporting as CSV, XML, and database to database
importfrom.py - object for importing data (currently on rough sketch for CSV import)

Most remains unchanged from the last upload but here is the code for the importfrom functionality:

import clr
clr.AddReference("System.Data")
from System import *
from System.Data import *
from System.Data.SqlClient import *
from System.Data.OleDb import *
from System.IO import *

from registrar import *
from sqlhelper import *


class importfrom:
"""Model data import process"""

def __init__(self):
self.columnmap = {}

def build_destination(self):
""" If destination does not exist, construct table"""
pass

def csv(self, path):
""" Import data from a csv file """
fileName = path.split("\\")[-1]
fileDir = path.replace(fileName, "")
connect = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties='text;HDR=No;FMT=Delimited'"
csvConnection = OleDbConnection(String.Format(connect, fileDir))
csvConnection.Open()
csvCommand = OleDbCommand("select * from " + fileName, csvConnection)
sourceReader = csvCommand.ExecuteReader(CommandBehavior.CloseConnection)
masterInsert = ""

while sourceReader.Read():
cols, vals = "", ""
for k in self.column_map.keys():
cols, vals = cols + "[" + k + "],", vals + "'" + str(sourceReader[self.column_map[k]]) + "',"
masterInsert += "insert into " + registrar.tableRegister + "(" + cols[:-1] + ")VALUES(" + vals[:-1] + ")\n"
sourceReader.Close()

try:
sqlhelper().run_sql(registrar.databaseRegister, masterInsert)
except Exception, inst:
errorhandle().print_error("Error occured during import", inst)

Download everything here as a zip file. I've hopefully carved a little space in the schedule amongst various projects to spend more time going back and polishing things. I've just been tasked with migrating about 120,000 records out of Excel so this should provide a good opportunity to add functionality and test it's breakability for the type of thing I often have to do.


}

2 comments:

robert said...

Wow. That looks like a hard way to solve your problem.

If you absolutely have to mangle Excel data through Python and into a database... can I recommend Resolver? It won't cost you anything and it might just solve your problem.

Cheers, Robert

David Seruyange said...

Hey Robert,

The story I'm hoping for this will be:

database('excel').tables('sheet1').copyto.database('mydb')

Since I find myself having to do this often, I'm willing to deal with some pain and nuances for an easy one liner in the future. I've seen a lot of "copy/paste" approaches to importing data from excel and because it's something one can't automate I tend to avoid it.

Hopefully no mangling will take place but I do need to check out Resolver.