[LMDB] What pointer is returned with combination of MDB_WRITEMAP and MDB

Discussion:

[LMDB] What pointer is returned with combination of MDB_WRITEMAP and MDB_RESERVE?

Victor Baybekov

2015-10-01 17:17:12 UTC

Hi,

Docs for MDB_RESERVE say that a returned pointer to the reserved space is
valid "before the next update operation or the transaction ends." Docs
for MDB_WRITEMAP say that it "writes directly to the mmap instead of using
malloc for pages." Does combining the two options return a pointer directly
to a place in a mmap so that this pointer could be used after a transaction
ends or after the next update?

I have a use case where I want to somewhat abuse LMDB safety for
convenience. If I could get a pointer to a place inside a mmap I could work
with LMDB value as opaque blob or as a region inside the single big mmap.
This could be more convenient than creating and opening hundreds of
temporary memory mapped files and keeping open handles to them. For
example, Aeron terms could be stored like this: a stream id per an LMDB db
and a term id for a key in the db.

Thanks!
Victor

Howard Chu

2015-10-02 13:38:50 UTC

Permalink

Content preview: Victor Baybekov wrote: > Hi, > > Docs for MDB_RESERVE say
that a returned pointer to the reserved space is > valid "before the next
update operation or the transaction ends." Docs > for MDB_WRITEMAP say that
it "writes directly to the mmap instead of using > malloc for pages." Does
combining the two options return a pointer directly to > a place in a mmap
[...]

Content analysis details: (-4.2 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, medium
trust
[69.43.206.106 listed in list.dnswl.org]
0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked.
See
http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
for more information.
[URIs: highlandsun.com]
-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
[score: 0.0000]

Yes.

Post by Victor Baybekov
so that this pointer could be used after a transaction ends
or after the next update?

No.

Longer answer: maybe.

Full answer: LMDB is copy-on-write. If you update another record on the same
page, in a later transaction, the contents of that page will be copied to a
new page and the original page will go onto the freelist. In that case, the
pointer you got must not be used again.

If you don't directly update that page and cause it to be copied, then you
might get lucky and be able to use the pointer for a while. It all depends on
what other modifications you do and how they affect that node or neighboring
nodes.

Post by Victor Baybekov
I have a use case where I want to somewhat abuse LMDB safety for convenience.
If I could get a pointer to a place inside a mmap I could work with LMDB value
as opaque blob or as a region inside the single big mmap. This could be more
convenient than creating and opening hundreds of temporary memory mapped files
and keeping open handles to them. For example, Aeron terms could be stored
like this: a stream id per an LMDB db and a term id for a key in the db.
Thanks!
Victor

--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/

Victor Baybekov

2015-10-02 15:04:40 UTC

Permalink

This post might be inappropriate. Click to display it.

Howard Chu

2015-10-02 22:22:17 UTC

Permalink

This post might be inappropriate. Click to display it.

Victor Baybekov

2015-10-30 17:54:01 UTC

Permalink

Thanks a lot! My proof-of-concept code works OK.

I do not understand all subtle details of mmap reliability, could you
please help with these two:

If I write data to a pointer to an opaque blob as discussed above, and my
process crashes before mdb_env_sync, but OS doesn't crash - will that data
be secure in the mmap file?

Also, am I correct that mdb_env_sync synchronizes all dirty pages in the
mmap file as seen by a file system, regardless how they were modified -
either via LMDB API or via a direct pointer writes?

As for "you could at least set a callback to notify you that a block has
moved" - if that is implemented, it would be nice to have a notification
*before* a block is moved (with old and new address, so that right after
the callback it is OK to use the new address), otherwise this non-intended
but convenient use of LMDB won't work anymore.

Best regards,
Victor

Post by Howard Chu

Post by Victor Baybekov
Thank you! I understand this copy-on-write behavior, but am interested if I
could control it a little. What if I use records that are always much bigger
than a single page, e.g. 100 kb with 4kb pages, and make sure that a record is
never updated (via LMDB means) during a lifetime of an environment, - is there
any scenario that the location of such a big record could be changed during a
lifetime of an environment, without updating the record?

At this point in time, no, if you don't update a large record there is no
reason that it will move. That is not to say that this won't change in the
future. The documentation tells you what promises we are willing to make.
Relying on any non-documented behavior is your own responsibility.

Note that the relocation functions in LMDB are intended to accommodate
blocks being moved around. The actual guts of that API haven't been
implemented, but probably in 1.x we'll flesh them out. Given that support,
you could at least set a callback to notify you that a block has moved. But
currently, overflow pages don't move if they're not modified.

Post by Howard Chu

Post by Victor Baybekov
Hi,
Docs for MDB_RESERVE say that a returned pointer to the reserved space is
valid "before the next update operation or the transaction ends." Docs
for MDB_WRITEMAP say that it "writes directly to the mmap
instead of
using
malloc for pages." Does combining the two options return a pointer
directly to
a place in a mmap
Yes.
so that this pointer could be used after a transaction ends
or after the next update?
No.
Longer answer: maybe.
Full answer: LMDB is copy-on-write. If you update another record on the
same page, in a later transaction, the contents of that page will be
copied to a new page and the original page will go onto the freelist. In
that case, the pointer you got must not be used again.
If you don't directly update that page and cause it to be copied, then you
might get lucky and be able to use the pointer for a while. It all depends
on what other modifications you do and how they affect that node or
neighboring nodes.
I have a use case where I want to somewhat abuse LMDB safety for
convenience.
If I could get a pointer to a place inside a mmap I could work with
LMDB value
as opaque blob or as a region inside the single big mmap. This could
be more
convenient than creating and opening hundreds of temporary memory
mapped files
and keeping open handles to them. For example, Aeron terms could
be
stored
like this: a stream id per an LMDB db and a term id for a key in
the
db.
Thanks!
Victor
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/

--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/

Howard Chu

2015-10-30 18:26:59 UTC

Permalink

Content preview: Victor Baybekov wrote: > Thanks a lot! My proof-of-concept
code works OK. > > I do not understand all subtle details of mmap reliability,
could you please > help with these two: > > If I write data to a pointer
to an opaque blob as discussed above, and my > process crashes before mdb_env_sync,
but OS doesn't crash - will that data be > secure in the mmap file? [...]

Content analysis details: (-4.2 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, medium
trust
[69.43.206.106 listed in list.dnswl.org]
0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked.
See
http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
for more information.
[URIs: highlandsun.com]
-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
[score: 0.0000]

Post by Victor Baybekov
Thanks a lot! My proof-of-concept code works OK.
I do not understand all subtle details of mmap reliability, could you please
If I write data to a pointer to an opaque blob as discussed above, and my
process crashes before mdb_env_sync, but OS doesn't crash - will that data be
secure in the mmap file?

Of course. The OS owns the memory, it doesn't matter if your process crashes.

Post by Victor Baybekov
Also, am I correct that mdb_env_sync synchronizes all dirty pages in the mmap
file as seen by a file system, regardless how they were modified - either via
LMDB API or via a direct pointer writes?

Yes.

Post by Victor Baybekov
As for "you could at least set a callback to notify you that a block has
moved" - if that is implemented, it would be nice to have a notification
/before/ a block is moved (with old and new address, so that right after the
callback it is OK to use the new address), otherwise this non-intended but
convenient use of LMDB won't work anymore.

"right after the callback it is OK to use the new address" - that's the point
of the callback, it's job is to make the new address valid. So yes, when it
returns, you use the new address.

Post by Victor Baybekov
Best regards,
Victor
Thank you! I understand this copy-on-write behavior, but am
interested if I
could control it a little. What if I use records that are always
much bigger
than a single page, e.g. 100 kb with 4kb pages, and make sure that
a record is
never updated (via LMDB means) during a lifetime of an
environment, - is there
any scenario that the location of such a big record could be
changed during a
lifetime of an environment, without updating the record?
At this point in time, no, if you don't update a large record there is no
reason that it will move. That is not to say that this won't change in the
future. The documentation tells you what promises we are willing to make.
Relying on any non-documented behavior is your own responsibility.
Note that the relocation functions in LMDB are intended to accommodate
blocks being moved around. The actual guts of that API haven't been
implemented, but probably in 1.x we'll flesh them out. Given that support,
you could at least set a callback to notify you that a block has moved.
But currently, overflow pages don't move if they're not modified.
Hi,
Docs for MDB_RESERVE say that a returned pointer to the
reserved
space is
valid "before the next update operation or the
transaction ends." Docs
for MDB_WRITEMAP say that it "writes directly to the mmap
instead of
using
malloc for pages." Does combining the two options return
a pointer
directly to
a place in a mmap
Yes.
so that this pointer could be used after a transaction ends
or after the next update?
No.
Longer answer: maybe.
Full answer: LMDB is copy-on-write. If you update another
record on the
same page, in a later transaction, the contents of that page
will be
copied to a new page and the original page will go onto the
freelist. In
that case, the pointer you got must not be used again.
If you don't directly update that page and cause it to be
copied, then you
might get lucky and be able to use the pointer for a while.
It all depends
on what other modifications you do and how they affect that
node or
neighboring nodes.
I have a use case where I want to somewhat abuse LMDB
safety for
convenience.
If I could get a pointer to a place inside a mmap I could
work with
LMDB value
as opaque blob or as a region inside the single big mmap.
This could
be more
convenient than creating and opening hundreds of
temporary memory
mapped files
and keeping open handles to them. For example, Aeron
terms could be
stored
like this: a stream id per an LMDB db and a term id for a
key in the
db.
Thanks!
Victor

--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/