"Hey, since the upgrade, I can't use the power edit feature anymore!" came the request. Ok. There have been several different upgrades over the past few months. The menu editor module has been updated. The server has been upgraded. The site is in heavy use, so there are lots of content changes.
First up: any new permissions? Why yes, I see a bunch of permissions for this module, those must be new. Add them, you're good to go! Wait, I guess not, these permissions can't get easily set across your 25 domains. Ok, fine, we'll create our own permission check for this so the right people can do it. Hey, whaddya know? We already did that! And... that's not the problem.
Ok. Next up. Drupal, especially older Drupal, doesn't like putting files on menus. I remember we had some issues around that that involved custom code and links starting with :. Aha, no links with : anymore! And it does look like these affected menus are completely normal. But wait, when we edit these exact same menus on the test site, they work fine. Argh!
Error logs. Let's see if there's something useful in there. Dang, there are dozens of log entries for every request, and it's a very busy site. One result of upgrading PHP has made a lot of formerly acceptable code practices no longer kosher. This doesn't make the site break (other than the current issue) but it makes the logs virtually useless.
Ok. Maybe it's something new in the menu that the user doesn't have permission to access. Let's get a fresh copy of the database and step through. 1.8GB of data migrated and sanitized, great, the dev site is up to date! But... it still works completely fine there. No problem to debug! Ok. This looks like an environment issue then. Our main dev environment is still running the older version. Create a new site on an up-to-date dev workstation. Ha! Finally can reproduce the problem.
First, let's clear out some of that log garbage so we can see if it's actually logging the issue. Ok, got that down to 3 messages per request -- nothing to do with the menu editor. Continue debugging. Ok, there's this big recursive access-checking tree it walks... man, with 138 entries, NetBeans debugging is coming up short -- I can't set a conditional breakpoint that only stops on a particular value. Scrap that, getting nowhere.
Hey, why did it just skip that validation step of the form? That's not right -- I submitted the form, it should go through the form validation and then submit handling. Why did it entirely skip that? Ok, let's look at the $_POST variable. Hmm, I'm seeing a whole bunch of nested arrays, but no form_id? That's what it's looking for, when it skips the validation. Ok, what's going on here? Open up the browser LiveHTTPHeaders extension, capture the post. Yup, no issue there, there's form_build_id and form_id right there in the POST body. But wait, why aren't they available in PHP? POST max body size perhaps? This is a pretty long form submission, over 28K. But the server limit is 2M, shouldn't be a problem. Search Google. Nothing. Try again, and again, with different words. Aha! max_input_vars: added in ~PHP 5.4 to prevent a DOS attack involving the way PHP builds the $_POST and other request arrays. New default value: 1000. Let's see, how many variables are we trying to post? 1228. And form_id and form_build_id are right at the end. Bingo!
Total time to diagnose problem: > 4 hours
Total time to fix problem: 30 seconds
That right there is why it's so hard to provide accurate estimates -- you don't know what you don't know. And this stuff is constantly changing -- no matter how experienced you are, and how diligent, there is always stuff you don't know.
It was added in PHP 5.3.9 and even backported to older minors as a security fix in certain distros.
Not only is your conclusion a correct and important one – it's often almost impossible to know in advance how much time the analysis of a problem will take –, but the problem with
max_input_vars is easily one that could bite others, too. Seems really very hard to debug.
Maybe there should have been a general announcement on drupal.org to tell sites with big forms to increase that setting.
Where possible, I tell clients that I can't provide an accurate estimate to fix a bug with an unknown cause, so I suggest that I spend up to x hours attempting to identify the issue and discuss with them again after I've done that.
If they refuse to accept that, then maybe just roll a die for your estimate, and then bill however long it took you ;)
Max input vars will also break the menu links edit page on a large menu. Had that last month.
Most people outside the realm of technology/support aren't aware of the fact that the vast majority of time solving a unique technical problem, is to first learn how to fix said problem. Thanks for the invaluable insight.
This is a problem we frequently encounter in support when working with clients. We bill by the hour, especially for bugs that has a workaround. And as you've mentioned it could take hours to track down a problem but a mere 30 seconds to fix it. So when we send a one file patch with simple change this problem usually comes up. And its hard to reason with some client too.
P.S.: Love your captcha
It is always like that with diagnosis of an issue taking a significant amount of time longer than it takes to fix- especially since we so rarely have the full picture, the user complains a feature is broken but that doesn't necessarily actually point to the break- just what they are unable to use. We get this all the time!
Great article. Spot on with it. So many times things only take a few seconds to fix, but an age to track down. Like they say on who wants to be a millionaire: "it's only easy if you know the answer".
One of the reasons why our industry needs agile, not fixed-price, pricing.