For this little project, we are going to use the Happybase Python package. Happybase uses HBase’s Thrift API.
For our test, we are going to create a namespace and a table in HBase. We will do this in the HBase shell. To make things simple, our table is going to have only one column family -
data, and we are going to accept all defaults.
% hbase shell hbase> create_namespace "sample_data" hbase> create "sample_data:rfic", "data"
The data for this project was downloaded from data.indy.gov on 10 February 2016.
You can insert directly into a table with the
Table#put() function. However, I recommend using
Batch#put() instead. When the number of records reaches the
Batch#send() will be called. See the section on benchmarks for timing data.
When you’re done, be sure to call
Batch#send() manually, to flush any remaining records to the database.
I ran the program for several batch sizes and averaged the results. As you can see from the time results, we can only go so fast - a little above 2 seconds - before we cannot insert any faster.
|Batch (n)||Time (s)|