wc_db performance (was: wc_db API discussion)

Discussion:

(too old to reply)

Paul Burba

2011-03-14 03:31:38 UTC

...
For the second task, I think the first order of business is to change
the wc-db tree crawler to do one query instead of zillions, or at least,
where several queries are required, to do them all in one transaction.

stsp has been working this recently. Killing the node walker, and
moving to table scans.

Yes. So far, I've been working in the revision status code.
- There are API layering issues (wc_db.c calls into node.c).
This is related to the API discussions in the other thread
so I'll follow up there.
- The revision status code issues about 5 separate queries,
which aren't combined via a transaction and don't use temporary tables.
This is no worse than the previous code using the node walker,
obviously :) But I'll look at fixing this so that the results
returned correspond to the state of the DB as of the time the
svn_wc__db_revision_status() call was made.
For others who want to jump in and help, here is a list of places
where the node walker is still being used. I'm not sure if we can
eliminate it everywhere before release, but each of these should
be looked at to see whether we can use an alternative approach to
subversion/libsvn_client/changelist.c
subversion/libsvn_client/commit_util.c
subversion/libsvn_client/info.c
subversion/libsvn_client/merge.c
subversion/libsvn_client/mergeinfo.c

I'll dive in Monday morning and see what I can do with merge.c and mergeinfo.c.

Paul

subversion/libsvn_client/prop_commands.c
(This should be propget and propset. Proplist is already using
queries involving temporary tables. Rewriting propget on top
of the proplist code would be easy. Propset needs more work.)
subversion/libsvn_client/ra.c
subversion/libsvn_wc/update_editor.c

C. Michael Pilato

2011-03-14 14:53:48 UTC

Permalink

For others who want to jump in and help, here is a list of places
where the node walker is still being used. I'm not sure if we can
eliminate it everywhere before release, but each of these should
be looked at to see whether we can use an alternative approach to
subversion/libsvn_client/changelist.c
subversion/libsvn_client/commit_util.c
subversion/libsvn_client/info.c
subversion/libsvn_client/merge.c
subversion/libsvn_client/mergeinfo.c
subversion/libsvn_client/prop_commands.c
(This should be propget and propset. Proplist is already using
queries involving temporary tables. Rewriting propget on top
of the proplist code would be easy. Propset needs more work.)
subversion/libsvn_client/ra.c
subversion/libsvn_wc/update_editor.c

I want to jump in and help. But (as is common for me, it seems) the first
place I go to jump in stirs up a question. What is the motif we seek to
apply where we need have per-path notification events that need to happen?
For example, I'm familiar with the changelist code, so I started looking at
making this conversion there. But we do per-path notification in that code.
How do we approach this? Do we query old state, make the change, query new
state, diff the states and notify on the diffs? That seems ... wrong. Is
there any pattern established upon which I can model these changes?

--
C. Michael Pilato <***@collab.net>
CollabNet <> www.collab.net <> Distributed Development On Demand

Stefan Sperling

2011-03-14 16:05:39 UTC

Permalink

Post by C. Michael Pilato

The temporary table trick we're using with proplist works well if all
you're doing is reading data from the DB. It sounds like you want to make
modifications, too.

The concern seems to be that invoking a notification callback could
attempt to modify DB state, correct? Which would deadlock if you invoke
the callback after starting a transaction to tweak changelist information
but before this transaction has been committed (i.e. the write lock has
been released)?

I guess you could:

- Read all information you need to determine how the changelist needs
to be changed, using at most a handful of queries.
- Perform sanity checks using this information, and error out if you can
already determine that something won't work out for a specific path.
- Make all the changes in a transaction for all affected paths.
Try to commit the transaction.
- If the commit succeeded, notify for all paths. Else, print some error.

Would that work? We're notifying after the fact in case of success.
However, we can assume that the DB operations finish quickly, so for
the user everything will be nearly instantaneous anyway.

Stefan Sperling

2011-03-14 16:13:21 UTC

Permalink

Post by Stefan Sperling
The concern seems to be that invoking a notification callback could
attempt to modify DB state, correct? Which would deadlock if you invoke
the callback after starting a transaction to tweak changelist information
but before this transaction has been committed (i.e. the write lock has
been released)?

Of course, the same problem happens when the notification callback
does a *read* operation on the DB (because writers block readers --
somehow this restriction is kinda hard to getting used to...)

Branko Čibej

2011-03-14 21:41:37 UTC

Permalink

Post by Stefan Sperling

Of course, the same problem happens when the notification callback
does a *read* operation on the DB (because writers block readers --
somehow this restriction is kinda hard to getting used to...)

The restriction that you may not invoke a callback from within a sqlite
transaction remains. That's what the temporary results tables are for --
they live outside transactions on the main wc-db.

Still, wherever possible, we shouldn't allow wc-db modifications from
within callbacks, except in a controlled manner through the wc-db API
itself (e.g., modifications could also be queued to a temporary table
whilst holding a read lock on wc-db; block writers, but lets other
readers continue, and writers need exclusive locks to the WC anyway).

-- Brane

Philip Martin

2011-03-15 10:26:15 UTC

Permalink

Post by Branko Äibej
The restriction that you may not invoke a callback from within a sqlite
transaction remains. That's what the temporary results tables are for --
they live outside transactions on the main wc-db.

I don't understand the temporary table approach. Taking the
temp__node_props_cache table as an example: it gets populated by a
transaction and lives beyond that transaction. The table then gets
queried by a second transaction and the callbacks are made while that
second transaction is in progress. So callbacks are still being invoked
from within a transaction.

As far as I can see using a temporary table has made the overall "read
properties" operation into one that modifies the database, to create the
temporary table. I guess that by the time the callbacks are made the
database write-lock has been dropped, but there would have been no
write-lock if the temporary table were not used.

--
Philip

Stefan Sperling

2011-03-15 12:02:11 UTC

Permalink

Post by Philip Martin

It works around the infinite loop problem that happens when the
callback inserts rows that end up being picked up by the proplist query.
Which is silly for this callback to do, but then again we somehow agreed
that the callbacks are free to do virtually anything. (I still think
putting newly documented restrictions on them now is fine and would make
our life easier... *shrug*).

Post by Philip Martin
As far as I can see using a temporary table has made the overall "read
properties" operation into one that modifies the database, to create the
temporary table. I guess that by the time the callbacks are made the
database write-lock has been dropped, but there would have been no
write-lock if the temporary table were not used.

I think the temporary table gets put into a separate database file
(or memory, depending on the temp_store pragma or whatsitcalled).
If so, a read lock on the primary db should suffice, shouldn't it?

Philip Martin

2011-03-15 12:52:42 UTC

Permalink

Post by Stefan Sperling

Post by Philip Martin

I'm still confused. I thought the infinite loop problem affected the
old multiple transaction code. If we simply ran a single transaction
and made callbacks then that problem would not exist. We would have the
same limitations that apply to the temporary table approach: the
callback cannot write to the database because a transaction is in
progress.

Post by Stefan Sperling

I think the temporary table gets put into a separate database file
(or memory, depending on the temp_store pragma or whatsitcalled).
If so, a read lock on the primary db should suffice, shouldn't it?

Yes, by the time we invoke the callbacks we have a read-lock. If we
simply ran a single transaction and made callbacks within the
transaction then it would have the same effect. I don't see what the
temporary table achieves.

--
Philip

Hyrum K Wright

2011-03-14 18:36:58 UTC

Permalink

stsp has been working this recently. Killing the node walker, and
moving to table scans.

Before reading this thread, I saw the call into node.c, and have
subsequently removed it.

- The revision status code issues about 5 separate queries,
which aren't combined via a transaction and don't use temporary tables.
This is no worse than the previous code using the node walker,
obviously :) But I'll look at fixing this so that the results
returned correspond to the state of the DB as of the time the
svn_wc__db_revision_status() call was made.

I wrapped this API in a txn in r1081510.

I'll take a gander at some of these, too. (But I'm not entirely sure
what I' gandering at or for...)

-Hyrum