Book Review – Force.com Enterprise Architecture

As I headed out from Dreamforce, one of my last stops was the developer library where I saw Andrew Fawcett signing his new book, “Force.com Enterprise Architecture”. It took me a while to get around to reading it, and I thought I’d share a few comments since he was kind enough to give me a copy.

As I discuss in my Pluralsight course “Learning Technology in the Information Age”, I feel that books provide a unique value proposition – along with taking a course, they are the best way to gain domain knowledge that is curated and organized in a way that is easy to learn. So that’s how I measure the value of a book beyond the obvious standards of clarity and accuracy – by choice of content and organization.

The first thing you should know is that this is not a book for beginners. This is not the book for an admin to read who wants to learn Apex. It’s also not the book to read if your goal is to obtain one of the innumerable certifications that Salesforce offers. This book is intended for intermediate to expert level Force.com developers.

The title, “Force.com Enterprise Architecture” is a rather generic title that is accurate enough, but as you will see, tends to obscure the real value of the book. This is a good book for any Force.com developer who wants to learn to how to architect solutions on the platform. The exact approaches in the book aren’t necessarily applicable or necessary for every solution, but they demonstrate the right way to think about architecture on the platform.

That said, if you are a developer who is thinking about creating a managed package or application to distribute on the AppExchange, this book isn’t just good – it’s indispensable. It is a “drop everything you are doing and buy a copy for every member of your team before you do anything else” kind of book.

There are many books on Salesforce and Force.com, including many books published by Salesforce itself, but what almost all of them have in common is that they are written by in-house developers and consultants. As far as I know there are just two books in existence written by developers who have shipped major managed packages on the AppExchange and this is one of them (mine is the other). Andrew Fawcett is CTO at FinancialForce, and he may know more than anyone in the world on what it takes to ship a Force.com application (myself included) – so if you’re even thinking about doing that, you’d be a fool not to buy this book and study it carefully. It’s full of the kinds of hints, tricks and suggestions that you won’t find anywhere else (including the books published by Salesforce – most of their authors haven’t shipped managed packages either).

And it’s a great complement for Advanced Apex Programming – you’ll find there is little overlap between them.

New Course Teaches Data Visualization Using the Salesforce Platform

I’m pleased to announce my latest Pluralsight course “Data Visualization for Developers”. This is not a course on Force.com – but in some ways it’s even better. It teaches the principles and practice of data visualization using Force.com as an underlying technology.

The course is published on Pluralsight.com. Free trials are available if you are not already a subscriber.

Find out more in this preview video:

 

Goodbye Script Limits, Hello what?

Perhaps the most surprising change for Winter ’14 is the elimination of script limits, to be replaced with a single CPU time limit for each transaction.

This is an extraordinary change, and it’s worth taking a few minutes to explore the consequence, both long term and short term, of this decision. Keep in mind, that what follows are my preliminary thoughts – I’m still somewhat in a state of shock 🙂

In the immediate future, I don’t expect this change to have any impact. I believe SFDC when they say they’ve analyzed the situation and that no current code will exceed the CPU time limits.

To understand the long term impacts, let’s consider the real meaning of this change.

  • First, managed packages will no longer have their own set of script limits, or their own CPU time – CPU time will be shared among all managed packages interacting with a transaction and code native to the organization.
  • Second, my understanding is that time spent within Salesforce code counts as CPU time. Up until now, script limits only impact your code – a long running built-in operation such as a sort or RegEx would only count as a single script line.

This will obviously have an immediate impact on how one might code for efficiency. Your code can be more verbose – there will be less need to build complex conditional statements that are barely readable in order to cram everything into one line of code. Not having to trade-off readability for efficiency will be very nice.

For the first time Apex developers will need to care about the efficiency of the built-in Apex class code. This will be a whole new topic for discussion, as the community gradually discovers which classes and methods are good, and which should be avoided, and when.

The real question comes down to what happens going forward – say, six to twelve months from now. Without the script limits, the pressure to optimize code will be reduced, and I’m sure we’ll see code appear on orgs that would never have survived in the current system.

As an ISV partner, this brings up an interesting question. What happens when some of that bad code, either on an org or in another package, uses up most of the CPU time, and when it becomes time for my package to run, limits are exceeded? Running a debug log with profile information should presumably allow identification of the greedy piece of code, but how many sys admins will take the time or trouble to actually figure this out? It’s so much easier to blame a package – possibly the one unfortunate enough to have tipped the CPU limit.  As this occurs more and more often, one can envision a case where customers gradually lose trust in applications in general, never knowing if one can be safely run. Ultimately this could impact trust in the platform overall.

Arguments that the proposed CPU time limits are generous (and they are are), don’t (so far) address the well known fact that software inevitably expands to use available CPU time (often because it’s expensive to optimize code and therefore often not done unless it’s necessary).

There seem to me three possibilities going forward.

  1. There is a real commitment within SFDC to build infrastructure to support inefficient code, so the performance will increase faster than the spread of inefficient code. (And don’t try to convince me that people won’t write inefficient code ).
  2. The amount of headroom in the current CPU limits really is so great that it pretty much takes an infinite loop to exceed it. (I’m sure I won’t be the only one experimenting with this in days and weeks to come).
  3. The engineers who made this choice are deluding themselves that all Apex developers will continue to write efficient code even when they don’t have to.

As an ISV partner who ships a very large application, I confess that the relaxed script limits are definitely going to make life easier. At the same time, I really hope that when CPU time limits are exceeded, they don’t just post an error blaming the application that tripped the limit, but rather more detailed information that explains to users where the CPU time went – so that it is easy for clients and vendors alike to quickly focus on the code or package that deserves the blame.

Death of a Platform Bug

In my previous post, I walked through the process of discovering, diagnosing and reporting a legitimate platform bug. As I mentioned previously, on any platform as large and complex as Force.com, bugs are inevitable. Every OS has them. Every framework has them.

One of the biggest considerations when evaluating a platform bug is when it a appears. For example: if a bug appears on a new API version, and the platform is versioned – you can avoid the bug by either working around it, or by staying with the old API version until it is fixed.

If a bug is just there – and has been there for a while, you can either come up with a workaround, or just not use that particular feature – because the bug has always existed, there’s little or no risk the bug will impact code that you ship.

But, if a bug appears on the platform and breaks existing code – that’s a big problem. That’s why Salesforce puts in such a huge effort to test new releases, running every unit test (including customer unit tests and package tests) on the new version to detect any possible breaking change. Unfortunately, the DataDotComEntitySetting bug was this type of bug.

As it turns out, the problem related to a security setting on that particular object – one that I presume is used by Data.com Clean when enabled. It’s also not a common problem – it impacted our application and that of one other ISV (who started seeing sudden errors appearing with customers who enabled Data.com clean).

The good news is, that once we were able to reach the right people at Data.com to convey the impact that the problem was causing, they were phenomenal. They provided us with access to a sandbox with data.com to verify both the error and confirm the fix, they kept us updated as to the progress, and today – confirmed that a patch has been pushed out to production.

So – it’s a happy ending.

But, happy ending notwithstanding, it did point out one area that I hope Salesforce will work to improve. You see, the application I’m working is large and complex – and makes use of many platform features. So I’ve probably run into (and helped discover) more than my share of platform issues. Over the past few years I’ve noticed a dramatic improvement in the ability of the Salesforce frontline support to confirm, prioritize and address platform bugs. I’ve noticed a marked improvement in the Known Issues site – and the quick identification of workarounds where possible (and remember, for a developer, a workaround is as usually almost good as a fix). I’ve seen rapid and accurate responses on StackExchange.

I don’t know how Salesforce is organized internally, but from where I sit, the Data.com support group hasn’t quite gotten the message yet. Yes, they were great at confirming that a platform bug existed, but after that – things got… difficult. I won’t go into details, but it took some pretty extraordinary efforts on our part to finally reach the right people where we were able to have a good discussion and get real feedback that we could work with and convey to our customers. Anyway, I’m confident that they’ve learned as much from the experience as we have, and I am thrilled to see this particular platform bug dead and buried.

 

 

 

 

Anatomy of a Platform Bug

Update 5/20/13 – See “Death of a Platform Bug

Platforms and frameworks have bugs.

Nobody really likes to discuss it – especially platform and framework vendors. But it’s like Murphy’s law of computer programming: Every non-trivial program has at least one bug. In fact, one of the signs that you have become an “expert” on a platform or framework is that a high percentage of the problems that you run into and can’t solve are, in fact, platform bugs rather than your own code.

I’ve found bugs in Windows, MFC, ATL and the .NET Framework. Nowadays I find them in Force.com. The experience is pretty similar on all of the platforms. First you have to be very sure that it’s really not your bug – this can be harder than you might think. There’s a lot of detective work involved – unlike your own code, you can’t necessarily know what is going on with the platform – I once found a VB bug where I actually had to disassemble a part of the VB control interface code in order to demonstrate to the developers where their mistake was. Which brings us to one of the biggest challenges – getting past the first-line support team to someone who can actually solve the problem (or convince them that you really know what you’re talking about and that they should forward the information).

I thought it might be interesting to walk through what the process is like with an example that I am currently dealing with. This is a story in-progress – I will add more information as it becomes available.

It began with our latest release – where on some systems we started seeting many of our unit tests fail with the following error:

FATAL_ERROR|System.DmlException: Insert failed.
First exception on row 0; first error:
UNKNOWN_EXCEPTION, INVALID_TYPE:
sObject type 'DataDotComEntitySetting' is not supported.: []

This was perplexing. After all, we don’t access an object called DataDotComEntitySetting. In fact, we don’t reference anything related to Data.com.

As a software vendor, you really don’t want to see most of your unit tests start failing. So this became a top priority issue.

Our first concern was whether we could install the software at all. The answer is, of course – yes. If you’ve read “Advanced Apex Programming”, you’ve seen unit test design patterns that allow you to dynamically enable or disable individual unit tests before or after deployment – so we’re not dead in the water. However, not being able to run unit tests means we can’t validate the operation of the application on those systems – which is definitely not good.

Because we could disable tests for installation and then reenable them after the software was installed, we were able to eliminate one theory – that the problem was purely related to software installation – perhaps some security issue related to the user context used during unit tests on installation.

Another early step was, of course, to search for other instances of this problem. Unfortunately, this was one of those cases where we clearly were innovators. There was only one reference to a similar problem, and our scenario did not match the one described.

This left us with a number of questions.

Was this really related to Data.com?

Yes, the error message referenced an object called ‘DataDotComEntitySetting’, but I’ve seen cases where an error message has nothing even remotely related to do with real source of the error. This is especially true in a complex framework, where internal error handling attempts to internally recover from a problem and only after a cascade of errors do you finally see an unrecoverable error – that has nothing to do with the original problem. In this case, there are a number of factors that suggested it really related to data.com aside from the object name. First, both systems on which we saw the problem did have Data.com enabled – too small a sample for a firm conclusion, but an indicator nonetheless. Second, the StackExchange issue was seemingly related to a jigsaw package, that later seems to have been integrated into Data.com. Later in this article you’ll see how we obtained further proof.

What changed?

Our new software release had dozens of unit test errors – most of them on code that had not changed from the previous version (as a reasonably agile organization, we have frequent releases). But there was one change that impacted the entire codebase – we upgraded from API 25 to API 27, mostly in order to take advantage the new string library and some other new Apex features. When code breaks from one API version to another, that can be an indicator of a platform bug as compared to a bug in your own code.

Looking for a Workaround

At this point we had already submitted an initial case. But when dealing with potential platform bugs, you can’t just sit around and wait for support. You need information – the more the better. Fortunately, we have some great customers who are ok with us using the license management system to log in to their sandboxes – when you do so, you can see detailed debug logs for your managed packages. The push upgrades system also provides better information than a regular package install. This allowed us to see where the failure was occurring.

The code, in a nutshell, was like this.

// Code that creates some test lead
// objects but doesn’t insert them
List<Lead> newleads = initTestLeads();
InsertTestObjects(newleads);

The InsertTestObjects function is a public method that we use to insert test objects and perform some additional tasks. In this case, it sets a static variable so that our trigger framework will know to ignore these test objects.

public static void InsertTestObjects(List<SObject> objs)
{
   DisableExternalUpdates = true;
   insert objs;
   DisableExternalUpdates = false;
}

The error was occurring during the insert. We saw it occur on Leads, Contacts and Accounts – a fact that again pointed towards Data.com as the culprit, as it uses those objects.

One thing we found in the debug logs was that when the problem occurred, no object triggers were being called (at least in our application, or in user code). This provided additional evidence that the problem was not in our code or other user code, though it theoretically could have been in a different managed package.

This code is extremely simple. So we looked for ways to reproduce the problem.

  • We built some unit test classes in the sandbox that contained similar code. They worked perfectly.
  • We created another test package that contained similar code and tried to install it. It worked perfectly.

Things are so much easier when you can reproduce a problem. When you can’t….

What this did tell us however, is that whatever it took to cause this problem, it was not obvious. We had some test functions that failed, and others with almost identical code that succeeded. The problem was not intermittent – tests that failed did so consistently, those that passed also did so consistently. But there was no clear pattern.

So our next step was to create some patch versions of the application and see if we could change things to get the test to pass.

And we found something. If instead of calling the InsertTestObjects function we called a new strongly typed InsertTestLeads function, most of the tests passed.

public static void InsertTestLeads(List<Lead> objs)
{
   DisableExternalUpdates = true;
   insert objs;
   DisableExternalUpdates = false;
}

This would suggest that it was perhaps a language issue, except for one problem: there were other places in the code where a direct strongly types insertion would fail. For example:

Account act = new Account(…..);
insert act;

This would fail with the same error. Not everywhere, just in some test functions.

Presenting the Case

We were very fortunate to be assigned a really good support person, but we’d also done our homework. While the original case was filed as a “application won’t install” problem, by the time we were on a GotoMeeting with support we could demonstrate failing tests, had log files showing the problem, and could demonstrate code changes that could in some cases resolve the problem. In short, we had overwhelming evidence that we were dealing with a platform bug.

The support person, who was familiar with data.com, then walked us through some experiments. One of them involved turning off the “Clean” feature in data.com. That did it – the tests stopped failing.

So now we were in as ideal a situation as one could ask for under the circumstances. Salesforce support agreed that it was a platform bug, and we knew for sure that it related to data.com.

You may think I’m glad it’s a platform issue, and while in some sense there is relief that it’s not our code, the truth is that it would be much better if it were our code – we can fix our code. Now we have to hope that Salesforce will commit the resources to resolve the issue, and be able to figure it out – the inconsistent nature of the problem suggests that it may be hard to track down.

This is the “dark side” of modern software development – where we build applications based on packages, platforms, frameworks and services, many of which are outside of our control. It’s certainly not unique to Force.com. The best thing you can do is to be proactive – work with the platform and framework vendors to resolve issues, but be prepared to work with them on solving the issues, and where possible, develop workarounds.

I’ll add updates to this post as new information becomes available.

Meanwhile, if you have any insight to share, feel free to leave a comment (note, comments are moderated to limit spam so you won’t see them immediately)