Tao Chen
2015-11-03 20:23:17 UTC
Hi Sir/Madam,
Recently I'm trying to use LMDB to store and randomly acess large amount
of features. Each feature blob is 16kB.
Before trying LMDB, I just stack all the features together into one huge
binay file, and use seek function in C++ to access each feature. Since
the feature size is fixed, I can easily compute the address of each
feature in the file.
Then I tried LMDB. The value is the feature as it is. The key is "1",
"2", "3", .... Since 16kB is exactly 4 x page_size, adding the key and
header, each feature will occupy 5 x page_size, so the db file on disk
is about 1.25 times of the previous binary file, this is already a
disadvantage for LMDB, but I still hope there can be some efficiency
trade-off. I use LDMB++ C++ wrapper to access features.
Next, I compared two approach by accessing the same random 1% features
from about 300k features. Before the test, I use vmtouch to evict both
files from memory cache. The result is surprising. The one use LMDB is
1.5 times slower than the raw binary file (30s vs 20s).
Is this because the size of feature (exactly 4 pages)? Do I understand
the use of LMDB incorrectly?
Thank your for your time!
Best Regards,
Tao Chen
Recently I'm trying to use LMDB to store and randomly acess large amount
of features. Each feature blob is 16kB.
Before trying LMDB, I just stack all the features together into one huge
binay file, and use seek function in C++ to access each feature. Since
the feature size is fixed, I can easily compute the address of each
feature in the file.
Then I tried LMDB. The value is the feature as it is. The key is "1",
"2", "3", .... Since 16kB is exactly 4 x page_size, adding the key and
header, each feature will occupy 5 x page_size, so the db file on disk
is about 1.25 times of the previous binary file, this is already a
disadvantage for LMDB, but I still hope there can be some efficiency
trade-off. I use LDMB++ C++ wrapper to access features.
Next, I compared two approach by accessing the same random 1% features
from about 300k features. Before the test, I use vmtouch to evict both
files from memory cache. The result is surprising. The one use LMDB is
1.5 times slower than the raw binary file (30s vs 20s).
Is this because the size of feature (exactly 4 pages)? Do I understand
the use of LMDB incorrectly?
Thank your for your time!
Best Regards,
Tao Chen