Saturday, February 16, 2013

Connecting to SQL Server and Summarizing Data with R

I’ve been interrogating a lot of new data lately, and I’ve found a cool way to get basic summary info on a table without using much code. This assumes you have an available table in SQL Server and an installed copy of R (downloadable from http://www.r-project.org/).

Begin by making sure you have the necessary library in R, RODBC. Open R and pull down the PACKAGES>INSTALL PACKAGE(S)... menu.

One

Choose a CRAN mirror.

Two

Choose RODBC.

Three

Now load the library. At the R command prompt, type:

> library(RODBC)

The connection string should be familiar to anyone who has connected to SQL Server from vb or c#. Again at the R command prompt, type:

> conn <- odbcDriverConnect("driver=SQL Server;server=<SERVERNAME>;database=<DB>;uid=<USERNAME>;pwd=<PASSWORD>;")

Remember to use your own server name, database name, user name and password.

Now that you have a connection, you can fire a SQL statement.

> sqlResult <- sqlQuery(conn,"select orderDate, orderDollars, division from orders")

Now you have a record set to interrigate. Try this for some record samples.

> head(sqlResult)
  orderDate orderDollars division
1 2010-01-06        29.99 NATIONAL
2 2010-01-06        26.99 NATIONAL
3 2010-01-06        84.99 NATIONAL
4 2010-01-06        59.99 NATIONAL
5 2010-01-06      197.99 NATIONAL
6 2010-02-02         7.99 NATIONAL

And this will get you summary data.

> summary(sqlResult)
     orderDate            orderDollars                        division      
2010-12-14:  12883    Min.      :  -199.99    INTERNATIONAL: 7888682  
2010-11-29:  11576    1st Qu.:      34.99    NATIONAL         :9598134  
2010-12-15:  11311    Median:      71.99                          
2010-11-26:  11146    Mean   :    116.65                          
2010-06-14:    9825    3rd Qu.:   124.99                          
(Other)   :17430053    Max.    : 840.00                          
NA's      :             22                                             

Note the median and quartile information on orderDollars, metrics that aren’t nearly so easy to get in TSQL.

 

No comments:

Post a Comment