|   Home   |   Back   |

Tips and Tricks using the Database-Link

 

Finding databases on a hard disk

To find a Microsoft Access database on your hard disk without the file open dialog you can use the Mathematica FileNames command. Searching the harddisk might take some time depending on the size of the hard disk and the speed of your computer.

To find the NWind.mdb example database which ships with Microsoft Access use

[Graphics:Images/index_gr_1.gif]
[Graphics:Images/index_gr_2.gif]

If no valid file name is returned asked your system administrator for the location of this example database. It might be on a network. Replacing "\\" with the drive name will find the path to the sample database.

To open a connection to the database use

[Graphics:Images/index_gr_3.gif]
[Graphics:Images/index_gr_4.gif]

top

Finding the same fieldnames in different tables

In practical data analysis situations it is helpful to find all tables which include the same field names. This is a simple task in some databases by searching the system tables. But even there it is less comfortable then doing this analysis with Mathematica. For Microsoft Access we do not know of a possiblity to search the system tables.

The Database-Link and the Mathematica Add-on Statistics`DataManipulation` is loaded

[Graphics:Images/index_gr_5.gif]

Connecting to the database. In this case to the NWind database (see Finding databases on a hard disk).

[Graphics:Images/index_gr_6.gif]

Generating a list of tables from a database.

[Graphics:Images/index_gr_7.gif]

Now the fieldnames for each table is retrieved

[Graphics:Images/index_gr_8.gif]
[Graphics:Images/index_gr_9.gif]

Calculating the frequencies on the field names and selecting all elements from the list which appear at least twice and which have a field type which is regularly used for id´s.

[Graphics:Images/index_gr_10.gif]
[Graphics:Images/index_gr_11.gif]

Using the Cases command together with the "___#___" pattern to select the tables with the field name we are looking for somewhere in the list of fieldnames

[Graphics:Images/index_gr_12.gif]
CategoryID
Categories
Products
EmployeeID
Employees
Orders
OrderID
Order Details
Orders
ProductID
Order Details
Products
SupplierID
Products
Suppliers

top

Seeing the data while working with it

Very often it is helpful to see the data your are working with. To achieve this you can open the MS Access or MS Excel application in parallel to working with Mathematica and the Database-Link. If you get an error from MS Access or MS Excel then change the sequence in which you connect to the database from Mathematica and open MS Access or MS Excel. If you work with different databases then linking the tables to MS Access gives you the same result.

top

Generating PIVOT tables

The Database-Link just passes through the SQL commands to the database or the ODBC driver. Therefore it is possible to use proprietary extensions to the SQL command set from the vendor of the database if the ODBC driver support them. One example of a proprietary extension is the TRANSFORM/PIVOT command in MS Access.

Load the Database-Link if it is not loaded yet

[Graphics:Images/index_gr_13.gif]

Connecting to the NWind.mdb database if you are not connected yet. (To find  the NWind database see Finding databases on a hard disk).

[Graphics:Images/index_gr_14.gif]

Defining the SQL query to retrieve a pivot table of the quarterly sales for each salesperson

[Graphics:Images/index_gr_15.gif]
LastName 1 2 3 4
Buchanan 3 4 6 6
Callahan 18 9 14 13
Davolio 8 11 19 15
Dodsworth 2 7 3 7
Fuller 8 11 11 12
King 6 13 12 6
Leverling 21 14 10 26
Peacock 19 16 24 21
Suyama 6 8 5 14

For the explenation of Transform., Pivot and DatePart see your MS Access online help.

top

Using the SQL functionality to speed up analysis

To do simply data tasks like selecting, sorting or aggregating data use the functionality which is provided from SQL. Very often this will speed up the data analysis by a multiple factor. To handle data is what SQL server are build for. Using aggregation functions from SQL also allows to save system memory, reduces network traffic and simplifies the code. To demonstrate the effect see the following example.

Load the Database-Link if it is not loaded yet

[Graphics:Images/index_gr_16.gif]

Connecting to the NWind.mdb database if you are not connected yet. (To find  the NWind database see Finding databases on a hard disk).

[Graphics:Images/index_gr_17.gif]

The table with the biggest number of records is the table [Order Details]

[Graphics:Images/index_gr_18.gif]
[Graphics:Images/index_gr_19.gif]

To count the number of records in the table [Order Details] can be done using the SQL function count

[Graphics:Images/index_gr_20.gif]
[Graphics:Images/index_gr_21.gif]

To calculate the turnover since the first record in the database is also simply arithmetic and is quite fast. Only one number gets transfered.

[Graphics:Images/index_gr_22.gif]
[Graphics:Images/index_gr_23.gif]

To do the same calculation with all the data transfered to Mathematica is also a straighforward task and needs only a few lines of code but it takes considerable longer (and eats up memory)

[Graphics:Images/index_gr_24.gif]
[Graphics:Images/index_gr_25.gif]

Other usefull SQL functions are e.g. Distinct, Avg, Max, Min. See your SQL data server handbook

top

Simple generation of  nested lists

A common Mathematica construct to structure data are nested lists. Often it is necessary to organize your data from databases in such a way. To achieve this see the following example on collecting the orders for each sales person

Load the Database-Link if it is not loaded yet

[Graphics:Images/index_gr_26.gif]

Connecting to the NWind.mdb database if you are not connected yet. (To find  the NWind database see Finding databases on a hard disk).

[Graphics:Images/index_gr_27.gif]

The standard SQL command to get the answer is using a order by in the query and do some post processing on the result

[Graphics:Images/index_gr_28.gif]
[Graphics:Images/index_gr_29.gif]

An alternative was is to retrieve the different sales persons who had sold orders by querying the table Orders using the SQL function Distinct.

[Graphics:Images/index_gr_30.gif]
[Graphics:Images/index_gr_31.gif]

Using the Mathematica command StringForm  which is accepted from the Mathematica Sql command it is easy to set up

[Graphics:Images/index_gr_32.gif]
[Graphics:Images/index_gr_33.gif]

The suggested approach runs longer then the first version because it fires a SQL statement for each employee. But the second version is in the end very simple for a task which quite common but difficult to achieve using only SQL.

top

Using the Mathematica commands Rule and Dispatch for data mapping

A very convient way to join data is by using the Mathematica Rule command. This is especially helpful if you work with different databases where the same information is coded in different ways. Matching different country codes is a is a typical example, e.g.. bringing "DE" and "GER" in sync. An example for using Rule and Dispatch is given below. You will recognize that the result could also have been achieved using join but the proposed approach is much more powerfull if you use all the possibilities you have with specifying Rules in Mathematica

Load the Database-Link if it is not loaded yet

[Graphics:Images/index_gr_34.gif]

Connecting to the NWind.mdb database if you are not connected yet. (To find  the NWind database see Finding databases on a hard disk).

[Graphics:Images/index_gr_35.gif]

Define a SQL query to retrieve the number of sales to customers during July, 1993

[Graphics:Images/index_gr_36.gif]
[Graphics:Images/index_gr_37.gif]

Define a SQL to get a mapping list from CustomerID to CompanyName and use Apply to each record which is returned. Dispatch will speed up the later replacement by some orders.

[Graphics:Images/index_gr_38.gif]

Now use your CustmerID replacement rules to the sales

[Graphics:Images/index_gr_39.gif]
Blondel père et fils 1
Centro comercial Moctezuma 1
Chop-suey Chinese 1
Ernst Handel 2
Folk och fä HB 1
Frankenversand 1
GROSELLA-Restaurante 1
Hanari Carnes 2
HILARIÓN-Abastos 1
Ottilies Käseladen 1
Que Delícia 1
Rattlesnake Canyon Grocery 2
Richter Supermarkt 1
Split Rail Beer & Ale 1
Suprêmes délices 1
Toms Spezialitäten 1
Victuailles en stock 1
Vins et alcools Chevalier 1
Wartian Herkku 2
Wellington Importadora 1
White Clover Markets 1

This approach will also work with large data sets when the Dispatch command is used. So far we did not run into limitations yet.

top

Visually generating SQL commands

Using the query generator from MS Access you can generate SQLs. Changing the view on a query to SQL allows a copy and paste of the generated SQL string to Mathematica. It is only necessary to remove the semikolon on the end of the string. This copy/paste approach does not always work right away with handwritten SQL commands when carriage returns where use when setting up the SQL command. Remove them and it works fine most of the time.

top