Saturday, April 17, 2010

The problems of Data Mining

Data mining is an approach to solve learning problems, that approach is different from the classic statistical approach on the subjacent model is specified on automated way, it have applications to all the branches of the knowledge where structured data is available (and also in some cases when the data is available on not structured way).

The kind of problems where data mining techniques can be applied include:

1. Predict if a patient that develop a thrombosis incident can develop it again, that prediction can include clinical and physical data of the patient.

2. Determine if the subscriber of a web based magazine will renew or not his subscription based on the traffic of the subscriber to the magazine web site and other demographics of the subscriber.

3. Recognize the characters on a handwritten form .

Friday, April 16, 2010

Which tool chose to learn Data Mining?

That is a typical question that I receive from friends that want to start learning data mining,  as most of them already have experience with databases I usually recommend SQL Server, the developer edition includes all the algorithms that come with the enterprise edition with a really attractive price of less than 50 USD.

Also compared with other tools like SPSS Clementine and SAS we realice easyly that the entry price level of them is prohibitive

Thursday, April 15, 2010

Connect to SQL Server from Matlab

To connect to SQL Server from Matlab can be a very tricky task, specially because the ODBC connectivity is implemented in Matlab by using a JDBC bridge to ODBC technology, I spent almost 3 hours trying to use ODBC connectivity, after that I moved to the JDBC driver, the installation process is a little bit tricky, the process is the following:

1. Download the JDBC driver from Microsoft from:


2. Execute it, in will be decompressed to the selected location

3. Start the Notepad application as administrator and open the file:

C:\Program Files\MATLAB\R2010a\toolbox\local\classpath.txt

4. Add a reference like this to the JDBC driver after the last line of that file:

c:/SQLJDBC/sqljdbc_2.0/enu/sqljdbc4.jar

5. After that you have to restart MATLAB and then you can test the connection:

dbConn = database('master', 'user', 'password', 'com.microsoft.sqlserver.jdbc.SQLServerDriver', 'jdbc:sqlserver://localhost:1433;databaseName=master;');

6. To test the connection you can execute:

ping(dbConn);

7. Please not that the local instance of SQL Server require to have the TCP/IP protocol network connections enabled.

Wednesday, April 14, 2010

SQL Server 2008 Data Mining Functionalities

After using SQL Server 2008 data mining functionalities I have to say that I'm impressed with the capacity that it includes to run different types of models, out of the box it includes the following algorithms:
  • Classification algorithms.
  • Regression algorithms
  • Segmentation algorithms
  • Association algorithms
  • Sequence analysis
So far I used very effectively the segmentation algorithms and the classification ones with big sets of data, with over 5MM observations.