Easy CSV Based Analytics with Metabase

May 17 2018

Metabase has turned up being my every day ready-to-go simple analytics. You can plug in your SQL/File/BigData datastore within seconds, and start creating graphical tables based reports or diagrams through a point and click UI, or through some soft core SQL.

At its root, metabase wants to be a data sharing portal, where users create questions (and thus answers) on datasets.

I sometimes need to quickly run queries on csv files, and metabase seemed like the perfect place to do this. I integrated the csvjdbc driver that can read csv files just as JDBC so metabase can simply talk to the underlying files like it would do for regular relational databases.

what can this custom version of metabase does for you.

Basically it simply sets up a custom datasource that can you use to access and query CSV files.

First you can select it from the drop down list of metabase database:

Then run query on the newly defined csv files based database.

A sample setup is on github for you to play, and below are the instruction to reproduce the resutls on your machine.

SETUP

The jar are slightly too big to be uploaded on github, so you need to download the jar files manually:

START METABASE

starting metabase is simply a matter of running the batch file or the shell script file.

# start.bat 
java -cp csvjdbc.jar:;metabase.jar metabase.core
# or
# start.sh
java -cp csvjdbc.jar:metabase.jar metabase.core

The cusomt sample on github contains sample csv files for burgers. you can review the settings below.

BURGERS CSV DATABASE SETTINGS

Name: burgers Path to csv file: BURGERS fileExtension: .csv separator , quotechar “ indexedFiles YES fileTailPattern .(.*) fileTailParts date fileTailPrepend NO headerline columnTypes String,Double,Double,Double,Double,Double,Double,Double,Double,Double,Double,Date suppressHeaders NO commentChar
ignoreNonParseableLines NO missingValue
trimValues YES Enable Logging burgers.out trimHeaders YES timestampFormat timeFormat dateFormat yyyyMM

sql query

Then run a query on the newly created source.

select * from BigMac where Country in ('Japan','France', 'Argentina');

And you should be able to enjoy the data.

screenshot

Built with Hugo

© Nicolas Modrzyk 2017 - hellonico @ gmail dot com