As part of our export functions for adjust.com, we generate CSV reports for our clients, which are typically used to create pivot tables in Office. Sometimes those reports can be 100’s of megabytes in size and Office will not be able to open them. And even those files that can be opened often take minutes to load.
Being asked one too many times to “quickly” check one of those reports I came across a very helpful little tool to transform CSVs into SQL without all the hustle of creating complex table structures.
So my basic requirement was to dynamically create the table for a CSV file and then import it into an SQL database.
The “dynamically” part is rather important as our CSVs can easily have 50 or more columns and typing all that
CREATE TABLE statement isn’t exactly fun.
After some searching I found APSW. It does exactly what I was looking for and is super simple to use.
Since we work with OS X at adjust I’ll only describe how to get going there, but APSW is available for almost any OS incl. Windows.
Go to the download page and pick your OS appropriate file - OS X users pick the source .zip
Install dependencies via brew
- Unzip the file, and run
setup.py- with the arguments pointing to the SQLite brew
- Confirm your setup worked by typing
into your console, this should take you into a SQLite shell.
1 2 3 4
Once you got the setup out of the way APSW is quite easy to use.
Let’s say we have a report like this.
1 2 3
Open an ASPW console and use the
autoimport command to read the CSV.
1 2 3 4 5 6 7 8 9 10 11 12
As you can see ASPW correctly detected the file format and the columns. Now you can use simple SQL instead of wrangling with pivot tables.
1 2 3 4 5
The best part is, that SQLite is a lot faster than Excel and you can handle millions of lines without any problem.
Quirks and recommendations
Working with APSW a bit it turned out to be quite robust. The only real issue I encountered is that it tries to parse
dates as US format
yyyy-dd-mm and thus has problems figuring out the correct format if you have
standard ISO format
yyyy-mm-dd with no days > 12 like in the data range
My solution was to simply add a line with one day that clarifies it for APSW, so for example one with the date
Now if you are reading this and are not familiar enough with SQL to get started right away I can recommend a couple of good places to start: