Inserting Data into HBase with Python
For this little project, we are going to use the Happybase Python package. Happybase uses HBase’s Thrift API.
For our test, we are going to create a namespace and a table in HBase. We will do this in the HBase shell. To make things simple, our table is going to have only one column family -
data, and we are going to accept all defaults.
% hbase shell hbase> create_namespace "sample_data" hbase> create "sample_data:rfic", "data"
The data for this project was downloaded from data.indy.gov on 10 February 2016.
You can insert directly into a table with the
Table#put() function. However, I recommend using
Batch#put() instead. When the number of records reaches the
Batch#send() will be called. See the section on benchmarks for timing data.
When you’re done, be sure to call
Batch#send() manually, to flush any remaining records to the database.
I ran the program for several batch sizes and averaged the results. As you can see from the time results, we can only go so fast - a little above 2 seconds - before we cannot insert any faster.
|Batch (n)||Time (s)|