Before diving into this, if you are doing things that aren’t dependent on speed (e.g., it doesn’t matter if it takes 1 second to connect to the database and grab your data and close the database) then you can easily ignore this tip. That said, if you have multiple connections, that connect time can add up.
For example, I recently had an issue where it was taking 4.5+ seconds to connect to a database, run analysis and spit out the results. That’s not terrible if its something for you only but if its a production system and speed is a requirement, that might be too long (and it IS too long).
When I did some analysis using python’s
timer()
I found that more than 50% of that 4.5 seconds time was in establishing database connections so I grabbed my trusty
SQLAlchemy
toolkit and went to work.
For those of you that don’t know, SQLAlchemy is a ‘python SQL toolkit and Object Relational Mapper’ (ORM) that is supposed to make things easier when working with SQL databases. For me, the ORM aspect tends to make things more difficult so I tend to stick with plain SQL queries but the SQL toolkit aspect of SQLAlchemy makes a lot of sense and add some time savings when connecting to a SQL database.
Before we get into the SQLAlchemy aspects, let’s take a second to look at how to connect to a SQL database with the mysql-python connector ( or at least take a look at how I do it ).
First, let’s setup our import statements. For this, we will import
MySQLdb
,
pandas
and
pandas.io.sql
in order to read SQL data directly into a pandas dataframe.
import pandas as pd import MySQLdb import pandas.io.sql as psql
Next, let’s create a database connection, create a query, execute that query and close that database.
# setup the database connection. There's no need to setup cursors with pandas psql. db=MySQLdb.connect(host=HOST, user=USER, passwd=PW, db=DBNAME) # create the query query = "select * from TABLENAME"