Saturday 26 March 2016

Paradox database.db recycled into Libre Office ready for mailing label generation

Problem

A neighbour asked for some help moving over from a couple of turn of the century windows XP boxes onto a new PC running Windows 10. Most of this was easy as the printer was quite up-to-date and most of the activity on the machine was for email using Thunderbird and document creation using Word 2003. The sting in the tail was the "Mailings" system used to handle a small business mailing list. As can be seen the standard address data was supplemented with some flags marked a..z which are used to record various attributes about the entry.  The main output from the system was mailing labels used for posting catalogs. There are about 5700+ entries in the database.



Research and process

The first task was to find which database system is used to hold the data, then see if the data can be extracted easily. Step two would be to reformat the extracted data into a modern usable format and finally figure out how to create mailing labels from that data.

Finding the brand of database was achieved by online search of files ending in .DB and .PX. Doing a strings command on the .EXE file and looking on the web for hints from the contents.

The strings output gave these names that looked like sub-routines or procedure names near the start of the file.

DBEDIT
DBGARROW
DBINSERT
DBN_CANCEL
DBN_DELETE
DBN_EDIT
DBN_FIRST
DBN_INSERT
DBN_LAST
DBN_NEXT
DBN_POST
DBN_PRIOR 

Search on these names gave hints that Delphi development environment had been used. The searches for .DB and .PX had results that indicated a Paradox database was being used. This matched with the known technology from the age of the system from around 2004. 

There are four main ways to read and refactor database files :
  1. Start the existing database - connect to it and extract all the data,
  2. Read the database files using an extractor tool, reformat or reuse the output,
  3. Read the database files and figure out the low level format to extract the records and rebuild into an input format,
  4. Use the import feature of a new database to read an older one.

The major advantage of using some kind of extraction tool  2) over a forensic bit twiddling approach 3) are time to solution, accuracy and ease of use.  Extraction from database using encryption mandate methods 1 & 4 above. For this project method 2 was to be used.

Knowing that the database was in Paradox format allowed the research to be more focused on obtaining a read and extract tool. Luckily the Wikipedia page for Paradox database listed PxTools & Pxlib combination that could be used to read an existing database file. The PxTools project is partly hosted on Sourceforge and partly on private web pages of  Jan Kneschke. The download links were a bit mixed up but just knowing that such a tool existed was enough to believe that clean extraction would be possible.

After some struggles to get a clean compile of the pxlib and link-load with pxtool ( -lm was needed ) a viable binary was obtained. 

Many database files build in the format of the data into a header record. Once that header is understood the rest of the data extraction is quite straight forward. The pxtool was then able to pull the record format and then provide the data in a variety of formats.

$ more ADDRESS_sceema.txt
File Version:            5.0
File Type:               indexed .DB data file
Tablename:               resttemp.DB
Num. of Records:         5716
Theor. Num. of Rec.:     6980
Num. of Fields:          36
Header size:             2048 (0x800)
Max. Table size:         2 (0x800)
Num. of Data Blocks:     1396
Num. of 1st Data Block:  1
Num. of last Data Block: 1241
Num. of prim. Key fields: 1
Next auto inc. value:    0
Write protected:         0
Code Page:               437 (0x1B5)
Encryption:              0x0
Update time:             27.8.2001 23:00:00 (998949600)

Fieldname          | Type
------------------------------------
         Client No | int(4)
           Surname | char(50)
              Name | char(50)
           Address | char(100)
              Town | char(30)
            County | char(30)
          Postcode | char(10)
         Telephone | char(40)
              Date | int(2)
                 A | boolean(1)
                 B | boolean(1)
                 C | boolean(1)
                 D | boolean(1)
                 E | boolean(1)
                 F | boolean(1)
                 G | boolean(1)
                 H | boolean(1)
                 I | boolean(1)
                 J | boolean(1)
                 K | boolean(1)
                 L | boolean(1)
                 M | boolean(1)
                 N | boolean(1)
                 O | boolean(1)
                 P | boolean(1)
                 Q | boolean(1)
                 R | boolean(1)
                 S | boolean(1)
                 T | boolean(1)
                 U | boolean(1)
                 V | boolean(1)
                 W | boolean(1)
                 X | boolean(1)
                 Y | boolean(1)
                 Z | boolean(1)
     Mailsort Code | int(4)
------------------------------------
     Record length | 346 (0x15A)

Extracts in .cvs and .html and .sql were obtained using the various options of PxTool. The database originally had some more involved functions to enable the use of mail sort codes but this was no longer needed. 

With the data extracted into .csv format, the import into LibreOffice Calc was straightforward. The 5777 address records appear as a row each with the fields separated out into columns.  The standard data filters were applied to the headings row to allow for sorting and searching. Conditional formatting was applied to the A..Z fields to highlight the set values. Some of the Address fields had line breaks which were replaced with, using a find/replace command.

Creating output

The last stage was to figure out how to generate the mailing labels, now only needed for Xmas cards, as the whole catalog process has moved on-line to a website. The lettered column flags are used to filter the data into various categories one of which was to send a Christmas card.

Libre Office has all the features needed to generate mail merge and labels project.  The steps required to do this are all.
  1. Review the data and check the column alignment and data.
  2. Register the spreadsheet as a data source in the Libre office Base program,
  3. Create a template label document with fields from the data source set up in 2)
  4. Generate the actual labels document using a filtered set of the data.
Step 2 is the magic, a new database is created that links back into the spreadsheet. After this step this "bridge" database can be used as a data source for later merges.


Once this whole process has been done for the first time just step 4) is repeated for other output selections. There are a couple of ways to achieve this all of which are demonstrated in YouTube. The most useful YouTube demos coming from TheFrugalComputerGuy who goes over many of the basic and advanced feature of Libre office.

Reimplementation of a database is the sort of project that can be a real time sink. Using the resources already existing this transformation was achieved in a couple of afternoons work.