One of the more daunting tasks of installing new solutions is data migration from old systems. Years of work and data colection cannot be wasted. And since I’ve always found fun in doing this kind of exploration of old application databases this article describes something which should not even be considered legacy migration: migrating data from access (mdb files) to postgresql.

The solution

I thought this will be a very simple task using the mdb-tools package also available as a ubuntu package:

apt-get install mdbtools
mdb-schema file.mdb > schema.sql
mdb-export -I file.mdb Table1 > Table1.sql

Pretty easy, huh? Just some type conversion and we are all set. Or not? After closer examination I found out that all the Numeric types where badly exported: no negative sign or decimal information: -23.56 became 23. Not exactly an acceptable precision for prices.

Since the mdb-tools packages seems to be discontinued finding a newer version was not the case so I decided to fetch the sources:

apt-get remove mdbtools
apt-get install libtool
apt-get install dpkg-dev
apt-get source mdbtools
cd mdbtools-0.5.99.0.6pre1.0.20051109
./autogen.sh
make #to test the compilation

Ok, everything compiled ok, which was not the case for the source package downloadable on sourceforge and the CVS does not seemed accesible anymore either.

After some more search I found this thread containing a patch for my problem. It did not applied directly but by applying it by hand to ./mdbtools-0.5.99.0.6pre1.0.20051109/src/libmdb/data.c it solved my problem.

So now the numeric data was exported ok after compiling the modifications into mdb-export.

Next step was to convert the data types to their postgresql equivalent. Here are the conversions I needed:

Long Integer => integer

Integer => integer (could also be smallint)

Text (xx) => character varying (xx)

Memo/Hyperlink => text

Boolean => character(1) (could also be bool with some conversion)

Byte => smallint

DateTime => timestamp

After the schema was modified all I needed was to import the data into my postgresql database and convert it to my schema using sql statements.

As an alternative the csv export could be used from python for example for more complex operations.

Alternate solution

I should say that there is an alternate solution to this which might be more friendly to some people. It consists of the following steps:

  • install a postgresql odbc driver on windows
  • configure a DSN for your postgresql database
  • from access export each table to the previously created DSN

The export works perfectly with the only observation that all table and column names are double-quoted. Table Products is in fact “Products” for example. This method has however several disadvantages for me:

  • requires windows and microsoft access which equal to more tools and licenses required. The first method works just with the database file
  • is more complicated to automatize if the integration should be performed on a regular basis