freedb-button.gif

I recently discovered freedb.org when I was looking for a way to check the meta data for my CD collection. Like so many others before me, it did not take long for freedb itself to capture my imagination. It is a project which began in the early days of the internet and it has touched the lives of almost everyone who enjoys music with their computer — whether they realise it or not.

The rest of this page is about the work that I have been doing with the data from freedb, and what I hope to accomplish with it for the benefit of everyone. To find out more about freedb itself, I urge you to visit the main site and immerse yourself in the resources to be found there.

Freedb is stored in a set of more than a million short text files across eleven genre directories. It is most convenient to obtain a copy by downloading one of the compressed archives available from the main site. Being a Linux nut I have a strong preference for the bzip2 tar files and have developed methods for using them efficiently.

Before it is possible to do much that is useful with the content of these files, it is a good idea to convert them into a relational database. Owing to the sheer size of the data, this is not an easy thing to do and if you plan to attempt this yourself, you will need a fast computer with several gigabytes of hard drive space available. You will also need a robust database management system like PostgreSQL installed, and whatever other skills you may have, a great deal of patience will prove to be most valuable for you during the course of this project.

First things first, required is a means of extracting the data from one or more of those text files and formatting it as tab-delimited text for bulk loading into SQL tables. The next few pages describe increasingly more sophisticated ways of doing this.


Simple

Better

Normal

Unicode

These are just the first steps in a very long and complex process of data analysis, correction, and validation. I will post more comments and program code at a later date, possibly sooner if encouraged to do so by interested third parties.