Comments:" Excel and SQL "
URL:https://datanitro.com/blog/2013/5/23/Excel_and_sql/
Data can be very valuable, and storing it in Excel is a great way to risk making it worthless. Your file can get deleted, become unusably slow, or subtly accumulate errors.
If you decide to keep a lot of important data in a spreadsheet, there's a good chance you'll come to regret it.
SQL vs. Excel
If Excel isn't a good database, why is it so frequently used as one?
Spreadsheets are effective at storing a small amount of
data (a few hundred or a few thousand rows), and it's not obvious
what goes wrong when you scale past that - at least until you
experience it firsthand. Here's a short list.
- Data Quality
Databases are easy to backup. You can restore your data if you make a mistake, and if you need to roll back changes, you can do it without hunting down last Tuesday's manually backed-up copy of your spreadsheet.
Just as importantly, if you need to work with a lot of data, you can do it with one or two commands. You don't need to worry about skipping the last row or having one incorrect formula out of 1,000.
- Querying Power
Do you want to know how many extra-large widgets were produced between 12 and 3 pm on every even-numbered day in March? Or find every country with government debt above 90 percent of GDP?
You can do it with one query, instead of using a bunch of filters and sorts or a manual selection.
- Multiple users.
If you have two people editing data in Excel, you can expect three copies of the final spreadsheet. This is manageable with a little data, but it becomes time-consuming and error prone with more. Databases are designed to handle multiple users.
- Speed and Storage limits.
When you have hundreds of rows in a spreadsheet, you can see most of your data in a glance and manipulate it on the fly. By the time you get to ten thousand rows, your spreadsheet is unwieldy. You need to maintain pivot tables or other summaries to keep track of everything, and running calculations on the dataset can easily freeze your computer.
Even if you're just storing data, Excel has a hard limit of one million rows per sheet. A good database will store and process as much data as your harddrive can hold.
There are a lot of other advantages to using a proper database, but these should be enough to get started.
SQL in Excel
One of the easiest ways to connect Excel to a
database is with Python. There are Python libraries for just about any type of
database - from SQL Server to MySQL, to NoSQL, Salesforce and Redis.
You can use SQLite if you're just getting started; it's
easy to use and comes installed with Python. This post usesDataNitro
to connect Excel with Python.
Here's a spreadsheet that keeps track of a stock portfolio. Let's move it into SQL.
A stock portfolioWe'll add a cell specifying the database name.
Loading the portfolio into a databaseNow we'll write a short script to create the database:
import sqlite3 conn = sqlite3.connect(Cell("B1").value + '.db') c = conn.cursor() c.execute('''CREATE TABLE ''' + Cell("B1").value + '''(ticker text, position integer, purchase_price real)''') portfolio = [] for cell in Cell("A5").vertical_range: portfolio.append(tuple(cell.horizontal[:3])) c.executemany('INSERT INTO portfolio VALUES (?,?,?)', portfolio) conn.commit() conn.close()
Here, we're reading in the portfolio and adding it to a new table in a new database. For simplicity, we're using Cell B1 ("portfolio") as the name of both the database and the table.
Now we need to read the data back. We'll do this by writing a query in Excel, and executing it with Python:
import sqlite3 conn = sqlite3.connect(Cell("B1").value + '.db') c = conn.cursor() c.execute(Cell("B2").value) # fetch data portfolio = c.fetchall() CellRange("A5:C300").clear() # cleanup existing data for i, position in enumerate(portfolio): Cell(5+i, 1).horizontal = position conn.close()This just cleans up the current data, runs our query, and writes the results to the sheet. Reading it back
As we can see, running it gives us back our data. We can run other queries too:
Getting our position in Google. Getting all long positionsSQL lets us run sophisticated queries without much work. For a big dataset, this is faster, more flexible, and more reliable than using Excel tables.
One issue with SQL is that many people aren't comfortable working with database queries. We can build out a user-friendly interface to SQL.
A graphical SQL interfaceWe'll add dropdowns and blank cells for users to run queries. Here's a code snippet that'll generate a SQL statement from these dropdowns:
query = "select * from " + Cell("B1").value where = [] if Cell("B2").value != "*": where.append("ticker='" + Cell("B2").value + "'") if Cell("E2").value != "*": where.append("position" + Cell("D2").value + str(Cell("E2").value)) if Cell("H2").value != "*": where.append("purchase_price" + Cell("G2").value + str(Cell("H2").value)) if where: # add a where statement query += (" where " + ' and '.join(where)) c.execute(query) # fetch data
We can replace the line c.execute(Cell("B2").value)
above with
this bit of code to generate queries.
Finally, we probably want to be able to add data to our database. We'll add a script that updates the db with any new shares we put in:
import sqlite3 conn = sqlite3.connect(Cell("B1").value + '.db') c = conn.cursor() c.execute("select * from " + Cell("B1").value) # fetch data portfolio = c.fetchall() updates = [] row = 5 # starting row while True: data = tuple(CellRange((row, 1), (row, 3)).value) if data[0]: if data not in portfolio: # new record updates.append(data) row += 1 else: # end of table break c.executemany('INSERT INTO portfolio VALUES (?,?,?)', updates) conn.commit() conn.close()This checks the table for any position that's not already in the database, and adds it. We can use it to add shares in Facebook: Adding more data
If we run a search, the database returns our updated portfolio.
Reading the updated dataAnd of course, we're not limited to using Excel just to read and write data. For example, we can build a dashboard to show how our portfolio is doing:
Calculating PnL
Here, we're using Python to get current stock prices from Yahoo, and
adding some basic calculations and conditional formatting to show the
status of our positions and our portfolio.
This is a very basic example of what you can do once Excel is
connected to a database. Most importantly, it shows that you can
offload data storage and querying to a system that's built for
it, and use Excel at what it's best it: analyzing and reporting data.
(You can download the portfolio spreadsheet, and all the code in this post, here.)
Tweet
Do you need an Excel add-in? Do you want to move your spreadsheet data to a database?
We can do it for you.
Contact us at code@datanitro.com.