It was not too difficult to write sed scripts to do the job, but they were slow, so I've written a much faster equivalent in the C language: extract-records.c.

The most convenient way to use it is to stream the contents of the archive directly into the program and avoid extracting the files to disk at all, saving quite a bit of time and disk space.


make extract-records

tar --extract --bzip2 --to-stdout --file freedb-complete-20040501.tar.bz2 |\
./extract-records > freedb-complete-20040501.tab


This produces a tab delimited file with the disc and track records merged together. They can be differentiated because the track records begin with an integer field, while disc records begin with a blank field. The two record types have the following formats, expressed as table creation statements for PostgreSQL:


CREATE TABLE iDisc
(
    fTrack TEXT NOT NULL, -- always blank for discs
    fDiscNumber INTEGER NOT NULL,
    fDiscLength INTEGER NOT NULL,
    fRevision INTEGER NOT NULL,
    fProcessor TEXT NOT NULL,
    fSubmitter TEXT NOT NULL,
    fDiscId TEXT NOT NULL,
    fDTitle TEXT NOT NULL,
    fDYear TEXT NOT NULL,
    fDGenre TEXT NOT NULL,
    fExtD TEXT NOT NULL
);

CREATE TABLE iTrack
(
    fTrack INTEGER NOT NULL,
    fDiscNumber INTEGER NOT NULL,
    fFrame INTEGER NOT NULL,
    fTTitle TEXT NOT NULL,
    fExtT TEXT NOT NULL
);


Note that the extraction program ensures that all special characters (especially tabs and newlines which may have been embedded in the freedb data files) have been properly escaped so they will load into the database without incident. Note also that since freedb data fields may extend across multiple lines, these occurrences are detected and concatenated automatically, allowing a simpler and cleaner database design to contain all the data correctly.

Create a database named freedb and add the disc and track tables to it. The merged data from the tab delimited file can then be loaded quite simply with the following commands which separate the record types and import them to each table. Note once again that these commands are PostgreSQL specific so you may have to refer to your manual to find out how to do this using the relational database technology of your choice.


grep '^[^0-9]' < freedb-complete-20040501.tab |\
psql --command "COPY iDisc FROM STDIN" freedb

grep '^[0-9]' < freedb-complete-20040501.tab |\
psql --command "COPY iTrack FROM STDIN" freedb

vacuumdb --analyze freedb


These steps are the minimum required to produce a relational version of the freedb database. It is deficient in a number of important ways which will be addressed in the following pages.