mm_regenerate_vgroup() frequency & lock collisions

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

mm_regenerate_vgroup() frequency & lock collisions

Adam Franco
Administrator
Hi all,

I've been tracking down a timing/locking-related issue we've hit a few times and am wondering what the intended frequency of mm_regenerate_vgroup() is and why this function is triggered by cron and not directly as part of a permission-save event.
  • Is mm_regenerate_vgroup() something that is more of a sanity-check that can run nightly or is it something that must be run more frequently?

  • What is the effect if it is run less often?

Background:
We've encountered an issue where table locks seem to collide and cause database connections to start piling up, taking the site offline until an Apache timeout kills the MySQL connection with the offending table lock.

The collision occurs when a Cron job runs mm_regenerate_vgroup() at the same time that somewhat long-running permissions changes are made (either with cascading permissions changes to a sub-tree or copying permissions to a sub-tree). A table lock created in mm_regenerate_vgroup() (we think in its usage of OPTIMIZE TABLE) ends up conflicting with another query. Because
mm_regenerate_vgroup() is  a pretty quick operation, it is quite difficult to replicate this collision, but we've seen it at least once in production and I've been able to trigger it twice in development.

Any info you can provide on the purpose of mm_regenerate_vgroup() would be extremely helpful -- I'm trying to determine if simply running it less often or reworking its process / the conflicting processes to avoid table-locks would be preferable.

Thanks!
Adam

--

Adam Franco
Senior Software Developer
Information Technology Services
Middlebury College
Middlebury, VT 05753
[hidden email]
802.443.2244

---

You are currently subscribed to monster_menus as: [hidden email].

To unsubscribe click here: http://lists.middlebury.edu/u?id=685503.6b071f880fe6a965a128164e6d09ea81&n=T&l=monster_menus&o=724606

(It may be necessary to cut and paste the above URL if the line is broken)

or send a blank email to [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: mm_regenerate_vgroup() frequency & lock collisions

Dan Wilga-2
Hi Adam,

The purpose of mm_regenerate_vgroup() is to regenerate any groups that are marked as "dirty". This will only happen if either the group is manually edited by an administrator using a /groups URL or if some external bit of code sets the flag that way. We do this, for instance, when we know that the list of overall users or the list of students in a course has changed. Changing a page's permissions does not set this flag.

We have not run into table lock problems since adding mm_retry_query() to the code in question. That's despite the fact that we routinely update several thousand virtual groups per day. But, then again, we don't use the cascade permissions stuff, either.

We run cron once an hour. You could certainly reduce that if you like, especially if you don't use virtual groups. The only drawback is, obviously, that any pages which rely on vgroups won't have the correct permissions for that time.

Perhaps moving the OPTIMIZE TABLE to just after the unset($txn) would help. That way, the OPTIMIZE wouldn't happen unless there were dirty groups in the first place. I suspect most users would be in this camp. (There's no sense optimizing a table if it never changes.)

On 1/21/16 12:13 PM, Adam Franco wrote:
Hi all,

I've been tracking down a timing/locking-related issue we've hit a few times and am wondering what the intended frequency of mm_regenerate_vgroup() is and why this function is triggered by cron and not directly as part of a permission-save event.
  • Is mm_regenerate_vgroup() something that is more of a sanity-check that can run nightly or is it something that must be run more frequently?

  • What is the effect if it is run less often?

Background:
We've encountered an issue where table locks seem to collide and cause database connections to start piling up, taking the site offline until an Apache timeout kills the MySQL connection with the offending table lock.

The collision occurs when a Cron job runs mm_regenerate_vgroup() at the same time that somewhat long-running permissions changes are made (either with cascading permissions changes to a sub-tree or copying permissions to a sub-tree). A table lock created in mm_regenerate_vgroup() (we think in its usage of OPTIMIZE TABLE) ends up conflicting with another query. Because
mm_regenerate_vgroup() is  a pretty quick operation, it is quite difficult to replicate this collision, but we've seen it at least once in production and I've been able to trigger it twice in development.

Any info you can provide on the purpose of mm_regenerate_vgroup() would be extremely helpful -- I'm trying to determine if simply running it less often or reworking its process / the conflicting processes to avoid table-locks would be preferable.

Thanks!
Adam

--

Adam Franco
Senior Software Developer
Information Technology Services
Middlebury College
Middlebury, VT 05753
[hidden email]
802.443.2244

---

You are currently subscribed to monster_menus as: [hidden email].

To unsubscribe click here: http://lists.middlebury.edu/u?id=685500.19fa7de7038497527f6a88cf1629251d&n=T&l=monster_menus&o=724606

(It may be necessary to cut and paste the above URL if the line is broken)

or send a blank email to [hidden email]


---

You are currently subscribed to monster_menus as: [hidden email].

To unsubscribe click here: http://lists.middlebury.edu/u?id=685503.6b071f880fe6a965a128164e6d09ea81&n=T&l=monster_menus&o=724608

(It may be necessary to cut and paste the above URL if the line is broken)

or send a blank email to [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: mm_regenerate_vgroup() frequency & lock collisions

Adam Franco
Administrator
In reply to this post by Adam Franco
Thanks Dan,

Your background was quite helpful. We don't actually use virtual-groups at all -- I was mistakenly thinking that the list of per-page individuals was a virtual-group. Still, I was just able to replicate the issue in a test as shown in this process list:

Inline image 1

Once in this state, additional page loads by other clients begin backing up as well:
Inline image 2

This is ephemeral enough (maybe 1 in 10-20 tests) that I can't conclusively say that the OPTIMIZE TABLE call is the cause of problem or that the proposed fix will actually stop the lock wait, but if moving it to the end of the if ($vgids) { ... } statement after unset($txn); wouldn't cause problems for those who do use virtual groups, then that might be a good bet just to rule it out as an option. We'll try running for a while with this patch in place and will report back if we notice any "Waiting for table metadata lock" issues caused by other statements.

Thanks again!
Adam

--

Adam Franco
Senior Software Developer
Information Technology Services
Middlebury College
Middlebury, VT 05753
[hidden email]
802.443.2244

On Thu, Jan 21, 2016 at 1:14 PM, Dan Wilga <[hidden email]> wrote:
Hi Adam,

The purpose of mm_regenerate_vgroup() is to regenerate any groups that are marked as "dirty". This will only happen if either the group is manually edited by an administrator using a /groups URL or if some external bit of code sets the flag that way. We do this, for instance, when we know that the list of overall users or the list of students in a course has changed. Changing a page's permissions does not set this flag.

We have not run into table lock problems since adding mm_retry_query() to the code in question. That's despite the fact that we routinely update several thousand virtual groups per day. But, then again, we don't use the cascade permissions stuff, either.

We run cron once an hour. You could certainly reduce that if you like, especially if you don't use virtual groups. The only drawback is, obviously, that any pages which rely on vgroups won't have the correct permissions for that time.

Perhaps moving the OPTIMIZE TABLE to just after the unset($txn) would help. That way, the OPTIMIZE wouldn't happen unless there were dirty groups in the first place. I suspect most users would be in this camp. (There's no sense optimizing a table if it never changes.)


On 1/21/16 12:13 PM, Adam Franco wrote:
Hi all,

I've been tracking down a timing/locking-related issue we've hit a few times and am wondering what the intended frequency of mm_regenerate_vgroup() is and why this function is triggered by cron and not directly as part of a permission-save event.
  • Is mm_regenerate_vgroup() something that is more of a sanity-check that can run nightly or is it something that must be run more frequently?

  • What is the effect if it is run less often?

Background:
We've encountered an issue where table locks seem to collide and cause database connections to start piling up, taking the site offline until an Apache timeout kills the MySQL connection with the offending table lock.

The collision occurs when a Cron job runs mm_regenerate_vgroup() at the same time that somewhat long-running permissions changes are made (either with cascading permissions changes to a sub-tree or copying permissions to a sub-tree). A table lock created in mm_regenerate_vgroup() (we think in its usage of OPTIMIZE TABLE) ends up conflicting with another query. Because
mm_regenerate_vgroup() is  a pretty quick operation, it is quite difficult to replicate this collision, but we've seen it at least once in production and I've been able to trigger it twice in development.

Any info you can provide on the purpose of mm_regenerate_vgroup() would be extremely helpful -- I'm trying to determine if simply running it less often or reworking its process / the conflicting processes to avoid table-locks would be preferable.

Thanks!
Adam

--

Adam Franco
Senior Software Developer
Information Technology Services
Middlebury College
Middlebury, VT 05753
[hidden email]
<a href="tel:802.443.2244" value="+18024432244" target="_blank">802.443.2244

---

You are currently subscribed to monster_menus as: [hidden email][hidden email].

To unsubscribe click here: http://lists.middlebury.edu/u?id=685500.19fa7de7038497527f6a88cf1629251d&n=T&l=monster_menus&o=724606

(It may be necessary to cut and paste the above URL if the line is broken)

or send a blank email to [hidden email]


---

You are currently subscribed to monster_menus as: [hidden email].

To unsubscribe click here: http://lists.middlebury.edu/u?id=685438.780c6126d238396bdd2f98c1d84c15c7&n=T&l=monster_menus&o=724608

(It may be necessary to cut and paste the above URL if the line is broken)

or send a blank email to [hidden email]


---

You are currently subscribed to monster_menus as: [hidden email].

To unsubscribe click here: http://lists.middlebury.edu/u?id=685503.6b071f880fe6a965a128164e6d09ea81&n=T&l=monster_menus&o=724618

(It may be necessary to cut and paste the above URL if the line is broken)

or send a blank email to [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: mm_regenerate_vgroup() frequency & lock collisions

grahamtk
In reply to this post by Dan Wilga-2
A question related to vgroups and mm_regenerate_vgroup():

I want to use the "All logged in users" group permission to create an
intranet, where only logged in users can see a part of the content on the
site.

What I am now uncertain of is wether users that were created (by the ldap
module) on their first login will be in this vgroup, or if the cronjob
mentioned in this mailgroup topic have to be run for such users to be in the
group?

Thanks.
Øyvind



--
View this message in context: http://monster-menus.2910260.n2.nabble.com/mm-regenerate-vgroup-frequency-lock-collisions-tp7573163p7573172.html
Sent from the Monster Menus mailing list archive at Nabble.com.

---
You are currently subscribed to monster_menus as: [hidden email].
To unsubscribe click here: http://lists.middlebury.edu/u?id=685503.6b071f880fe6a965a128164e6d09ea81&n=T&l=monster_menus&o=725477
or send a blank email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: mm_regenerate_vgroup() frequency & lock collisions

grahamtk
This post has NOT been accepted by the mailing list yet.
In reply to this post by Dan Wilga-2
A question related to vgroups and mm_regenerate_vgroup():

I want to use the "All logged in users" group permission to create an intranet, where only logged in users can see a part of the content on the site.

What I am now uncertain of is wether users that were created (by the ldap module) on their first login will be in this vgroup, or if the cronjob mentioned in this mailgroup topic have to be run for such users to be in the group?

Thanks.
Øyvind