Death of a Platform Bug

In my previous post, I walked through the process of discovering, diagnosing and reporting a legitimate platform bug. As I mentioned previously, on any platform as large and complex as Force.com, bugs are inevitable. Every OS has them. Every framework has them.

One of the biggest considerations when evaluating a platform bug is when it a appears. For example: if a bug appears on a new API version, and the platform is versioned – you can avoid the bug by either working around it, or by staying with the old API version until it is fixed.

If a bug is just there – and has been there for a while, you can either come up with a workaround, or just not use that particular feature – because the bug has always existed, there’s little or no risk the bug will impact code that you ship.

But, if a bug appears on the platform and breaks existing code – that’s a big problem. That’s why Salesforce puts in such a huge effort to test new releases, running every unit test (including customer unit tests and package tests) on the new version to detect any possible breaking change. Unfortunately, the DataDotComEntitySetting bug was this type of bug.

As it turns out, the problem related to a security setting on that particular object – one that I presume is used by Data.com Clean when enabled. It’s also not a common problem – it impacted our application and that of one other ISV (who started seeing sudden errors appearing with customers who enabled Data.com clean).

The good news is, that once we were able to reach the right people at Data.com to convey the impact that the problem was causing, they were phenomenal. They provided us with access to a sandbox with data.com to verify both the error and confirm the fix, they kept us updated as to the progress, and today – confirmed that a patch has been pushed out to production.

So – it’s a happy ending.

But, happy ending notwithstanding, it did point out one area that I hope Salesforce will work to improve. You see, the application I’m working is large and complex – and makes use of many platform features. So I’ve probably run into (and helped discover) more than my share of platform issues. Over the past few years I’ve noticed a dramatic improvement in the ability of the Salesforce frontline support to confirm, prioritize and address platform bugs. I’ve noticed a marked improvement in the Known Issues site – and the quick identification of workarounds where possible (and remember, for a developer, a workaround is as usually almost good as a fix). I’ve seen rapid and accurate responses on StackExchange.

I don’t know how Salesforce is organized internally, but from where I sit, the Data.com support group hasn’t quite gotten the message yet. Yes, they were great at confirming that a platform bug existed, but after that – things got… difficult. I won’t go into details, but it took some pretty extraordinary efforts on our part to finally reach the right people where we were able to have a good discussion and get real feedback that we could work with and convey to our customers. Anyway, I’m confident that they’ve learned as much from the experience as we have, and I am thrilled to see this particular platform bug dead and buried.

 

 

 

 

Anatomy of a Platform Bug

Update 5/20/13 – See “Death of a Platform Bug

Platforms and frameworks have bugs.

Nobody really likes to discuss it – especially platform and framework vendors. But it’s like Murphy’s law of computer programming: Every non-trivial program has at least one bug. In fact, one of the signs that you have become an “expert” on a platform or framework is that a high percentage of the problems that you run into and can’t solve are, in fact, platform bugs rather than your own code.

I’ve found bugs in Windows, MFC, ATL and the .NET Framework. Nowadays I find them in Force.com. The experience is pretty similar on all of the platforms. First you have to be very sure that it’s really not your bug – this can be harder than you might think. There’s a lot of detective work involved – unlike your own code, you can’t necessarily know what is going on with the platform – I once found a VB bug where I actually had to disassemble a part of the VB control interface code in order to demonstrate to the developers where their mistake was. Which brings us to one of the biggest challenges – getting past the first-line support team to someone who can actually solve the problem (or convince them that you really know what you’re talking about and that they should forward the information).

I thought it might be interesting to walk through what the process is like with an example that I am currently dealing with. This is a story in-progress – I will add more information as it becomes available.

It began with our latest release – where on some systems we started seeting many of our unit tests fail with the following error:

FATAL_ERROR|System.DmlException: Insert failed.
First exception on row 0; first error:
UNKNOWN_EXCEPTION, INVALID_TYPE:
sObject type 'DataDotComEntitySetting' is not supported.: []

This was perplexing. After all, we don’t access an object called DataDotComEntitySetting. In fact, we don’t reference anything related to Data.com.

As a software vendor, you really don’t want to see most of your unit tests start failing. So this became a top priority issue.

Our first concern was whether we could install the software at all. The answer is, of course – yes. If you’ve read “Advanced Apex Programming”, you’ve seen unit test design patterns that allow you to dynamically enable or disable individual unit tests before or after deployment – so we’re not dead in the water. However, not being able to run unit tests means we can’t validate the operation of the application on those systems – which is definitely not good.

Because we could disable tests for installation and then reenable them after the software was installed, we were able to eliminate one theory – that the problem was purely related to software installation – perhaps some security issue related to the user context used during unit tests on installation.

Another early step was, of course, to search for other instances of this problem. Unfortunately, this was one of those cases where we clearly were innovators. There was only one reference to a similar problem, and our scenario did not match the one described.

This left us with a number of questions.

Was this really related to Data.com?

Yes, the error message referenced an object called ‘DataDotComEntitySetting’, but I’ve seen cases where an error message has nothing even remotely related to do with real source of the error. This is especially true in a complex framework, where internal error handling attempts to internally recover from a problem and only after a cascade of errors do you finally see an unrecoverable error – that has nothing to do with the original problem. In this case, there are a number of factors that suggested it really related to data.com aside from the object name. First, both systems on which we saw the problem did have Data.com enabled – too small a sample for a firm conclusion, but an indicator nonetheless. Second, the StackExchange issue was seemingly related to a jigsaw package, that later seems to have been integrated into Data.com. Later in this article you’ll see how we obtained further proof.

What changed?

Our new software release had dozens of unit test errors – most of them on code that had not changed from the previous version (as a reasonably agile organization, we have frequent releases). But there was one change that impacted the entire codebase – we upgraded from API 25 to API 27, mostly in order to take advantage the new string library and some other new Apex features. When code breaks from one API version to another, that can be an indicator of a platform bug as compared to a bug in your own code.

Looking for a Workaround

At this point we had already submitted an initial case. But when dealing with potential platform bugs, you can’t just sit around and wait for support. You need information – the more the better. Fortunately, we have some great customers who are ok with us using the license management system to log in to their sandboxes – when you do so, you can see detailed debug logs for your managed packages. The push upgrades system also provides better information than a regular package install. This allowed us to see where the failure was occurring.

The code, in a nutshell, was like this.

// Code that creates some test lead
// objects but doesn’t insert them
List<Lead> newleads = initTestLeads();
InsertTestObjects(newleads);

The InsertTestObjects function is a public method that we use to insert test objects and perform some additional tasks. In this case, it sets a static variable so that our trigger framework will know to ignore these test objects.

public static void InsertTestObjects(List<SObject> objs)
{
   DisableExternalUpdates = true;
   insert objs;
   DisableExternalUpdates = false;
}

The error was occurring during the insert. We saw it occur on Leads, Contacts and Accounts – a fact that again pointed towards Data.com as the culprit, as it uses those objects.

One thing we found in the debug logs was that when the problem occurred, no object triggers were being called (at least in our application, or in user code). This provided additional evidence that the problem was not in our code or other user code, though it theoretically could have been in a different managed package.

This code is extremely simple. So we looked for ways to reproduce the problem.

  • We built some unit test classes in the sandbox that contained similar code. They worked perfectly.
  • We created another test package that contained similar code and tried to install it. It worked perfectly.

Things are so much easier when you can reproduce a problem. When you can’t….

What this did tell us however, is that whatever it took to cause this problem, it was not obvious. We had some test functions that failed, and others with almost identical code that succeeded. The problem was not intermittent – tests that failed did so consistently, those that passed also did so consistently. But there was no clear pattern.

So our next step was to create some patch versions of the application and see if we could change things to get the test to pass.

And we found something. If instead of calling the InsertTestObjects function we called a new strongly typed InsertTestLeads function, most of the tests passed.

public static void InsertTestLeads(List<Lead> objs)
{
   DisableExternalUpdates = true;
   insert objs;
   DisableExternalUpdates = false;
}

This would suggest that it was perhaps a language issue, except for one problem: there were other places in the code where a direct strongly types insertion would fail. For example:

Account act = new Account(…..);
insert act;

This would fail with the same error. Not everywhere, just in some test functions.

Presenting the Case

We were very fortunate to be assigned a really good support person, but we’d also done our homework. While the original case was filed as a “application won’t install” problem, by the time we were on a GotoMeeting with support we could demonstrate failing tests, had log files showing the problem, and could demonstrate code changes that could in some cases resolve the problem. In short, we had overwhelming evidence that we were dealing with a platform bug.

The support person, who was familiar with data.com, then walked us through some experiments. One of them involved turning off the “Clean” feature in data.com. That did it – the tests stopped failing.

So now we were in as ideal a situation as one could ask for under the circumstances. Salesforce support agreed that it was a platform bug, and we knew for sure that it related to data.com.

You may think I’m glad it’s a platform issue, and while in some sense there is relief that it’s not our code, the truth is that it would be much better if it were our code – we can fix our code. Now we have to hope that Salesforce will commit the resources to resolve the issue, and be able to figure it out – the inconsistent nature of the problem suggests that it may be hard to track down.

This is the “dark side” of modern software development – where we build applications based on packages, platforms, frameworks and services, many of which are outside of our control. It’s certainly not unique to Force.com. The best thing you can do is to be proactive – work with the platform and framework vendors to resolve issues, but be prepared to work with them on solving the issues, and where possible, develop workarounds.

I’ll add updates to this post as new information becomes available.

Meanwhile, if you have any insight to share, feel free to leave a comment (note, comments are moderated to limit spam so you won’t see them immediately)

New course: Force.com for .NET Developers

I’m pleased to announce my latest Pluralsight course “Force.com for .NET Developers”. This course is a prequel to my course “Force.com and Apex Fundamentals for Developers” intended specifically for .NET developers who are curious about Force.com

Here’s how I describe the course:

Force.com is a unique cloud development platform that is in many ways different from traditional software development platforms – even those based on cloud technologies. This short course is designed specifically for .NET developers to understand the nature of Force.com by comparing and translating .NET concepts to their Force.com equivalents.

If you are a .NET developer, I encourage you to check it out – if you’re not already a Pluralsight subscriber, they have a free trial available (see right sidebar for link).

Intriguing Design Pattern for Scheduled APEX

One of the disadvantages of Scheduled APEX is that a scheduled class can’t be updated. Force.com creates an instance of the APEX class when it is scheduled, preventing it from being updated. You can’t edit a scheduled class, or update it via a ChangeSet, the Force.com IDE or a package update.

What’s more, Force.com prevents updates to any dependent classes as well. Thus it is quite easy for a scheduled class to “poison” an application – preventing many, if not all of its components, from being updated. As a result, updates to applications that use Scheduled Apex often require any scheduled jobs to be manually aborted before an update can take place.

It turns out, however, that a recent API update allows use of a design pattern that can help you avoid most of these problems.

Here’s how it works.

You still create a class that implements the Schedulable interface, but this class will be a simple wrapper that defines it’s own interface – call it IScheduleTest. This interface is identical to Schedulable. The execute method of the Schedulable global class creates an instance of a second class that implements the new IScheduleTest interface using the new Type class instantiation method. It then calls the execute method on that interface. It looks like this:

global class ScheduleTest Implements Schedulable
{
  public Interface IScheduleTest
  {
    void execute(SchedulableContext sc);
  }

  global void execute(SchedulableContext sc)
  {
    Type targettype = Type.forName('CalledByScheduleTest');
    if(targettype!=null)
    {
      IScheduleTest obj = 
      (IScheduleTest)targettype.NewInstance();
      obj.execute(sc);
    }
  }
}

The second class looks something like this:

public class CalledByScheduleTest 
  implements ScheduleTest.IScheduleTest
{
  public void Execute(SchedulableContext sc)
  {
    System.debug('called in schedule');
  }
}

What does this accomplish?

You still can’t modify the ScheduleTest class once it’s scheduled, but this class is so simple, you may never need to update it. You can update the CalledByScheduleTest class. Using the Type.NewInstance method to create the class dynamically prevents the platform from seeing it as a dependent class.

I’ve been able to successfully update the CalledByScheduleTest class even during a managed package update as long as the scheduled ScheduleTest class remains unchanged. Though this design pattern is not officially documented (to my knowledge), I see no reason why it should not work reliably going forwards.

This design pattern eliminates one of the major impediments to using Scheduled Apex and is worth not only considering for new designs, but as a possible retrofit to existing applications.

 

New Course on Migrating to Apex

I’m pleased to announce the immediate availability of my first ever online course – Force.com and Apex Fundamentals for Developers on Pluralsight.com.

Think of this course as a prequel to “Advanced Apex Programming”. The book was designed for intermediate and experienced Apex developers, but isn’t a perfect fit for experienced software developers who are moving from other languages to Apex and Force.com. This course fills that gap.

It’s designed for experienced programmers who are beginning or intermediate Force.com developers. It does not teach computer programming – I assume that those viewing the cost know how to program in a modern block structured language, and that they know how to read documentation.

Here’s how I describe the course:

Apex is the native language of the Force.com platform, and there is a huge demand for skilled developers in this space. The Java/C# like Apex language looks familiar enough that experienced developers often expect a short learning curve, but the platform is actually radically different, and requires use of a unique set of set of design patterns. In this course, you’ll learn the core concepts that are essential for every Apex programmer to learn, and a roadmap to further resources to help you quickly become an expert in this rapidly growing space.

This course, like Advanced Apex Programming, is not an overview or comprehensive introduction to the Force.com platform. It is a book on developing software, and the emphasis is on programming, design patterns and best practices.

I invite everyone to check it out – if you’re not already a Pluralsight subscriber, they have free trials available (see right sidebar). You’ll probably find other content there that you’ll like as well – I chose Pluralsight because I know quite a few of their course authors (both personally and by reputation), and they are the best (honestly, I feel honored to be counted in their company).

eBook Edition Available

I’m pleased to announce immediate availability of Kindle and Nook editions of Advanced Apex programming. The Kindle edition should also be available on Amazon.co.uk, Amazon.de, Amazon.it, Amazon.es and, as I understand it, to visitors from India on the main Amazon.com site.

This is the first technical book I published as an eBook, and I found the conversion process to be … interesting. The challenge related to code listings – how do you render listings in a format that can display in multiple resolutions on everything from a desktop to a smartphone? eBooks do support fixed fonts, but many do not support horizontal scrolling, and arbitrary wrapping of code can make it completely unreadable.

After some experimentation, I ended up including the listings as screen captured images from the Force.com IDE. This guaranteed consistent formatting, and allowed me to use a smaller font. It also allowed use of syntax coloring. There are tradeoffs to this approach – for one thing, the code is not text searchable. Use of color also makes the code a bit harder to read on older ePaper devices (like the original Kindles).

If you have a color Kindle or Nook, or one of the newer high-contrast e-paper readers, it should look very good. For older e-paper devices or smaller devices (like Smartphones), you’ll probably want to download the sample code to refer to as you read the book.

For the moment, the Kindle editions are not yet linked to the entries on the print edition on Amazon (that usually takes a few days).

You can find the latest links to purchase here.

Force.com is the next Visual Basic

I just came back from the Dreamforce conference with an epiphany – Force.com is the next Visual Basic. Some less experienced software developers might think that’s an insult, but those of us who have been around know that it’s not merely a compliment – it’s an observation that, if true, represents a potential tectonic shift to our industry.

To understand why, I need to take you back over 20 years.

When Visual Basic 1.0 came out (I participated in the original beta program), the reactions to the product fell into three categories:

  • Most professional C++ programmers dismissed it. VB was a “toy language” or a “glue language” for components – not for serious software development.
  • Increasing number of software engineers embraced the language because, to put it simply, when it came to desktop applications you could be an order of magnitude more productive in VB than in C++. It may not have had the stature and features of a “real” professional language, but it sure was profitable to work in it.
  • VB was easy enough for anyone to use, so everyone did. Doctors, lawyers, students – millions of VB developers sprang up out of nowhere and wrote a lot of code. Much of it was very bad code, but that’s what happens when a bunch of amateurs get in the game. Entire book, magazine and training industries grew up to help them get better, and many of them did and built entire careers around the platform.

By the time VB6 came around, it was the most popular software development language and platform in the world. Simply because it was easy, and it was productive.

Why was it productive? Because VB put an abstraction layer over the Windows API that was infinitely easier to use than coding to the native API or other available frameworks such as MFC or ATL. You couldn’t do everything in VB6, but you could do most of what you needed, and could call the API directly if you really needed to. Having a rich set of available components to purchase didn’t hurt either.

Microsoft did a lot of things right building the VB community. They had great developer and ISV relations. They supported several conferences. There were books, documentation, whitepapers and so on. They really set the standard on how to build a platform.

Then they created the .NET framework.

There was a lot of negative reaction from the original VB6 community towards VB .NET, some calling it “VB .NOT” or VB.Fred (coined by Bill Vaughn). Some programmers made the transition. Some switched to C#. But two things were clear. First, VB .NET was indeed a powerful, serious, professional language and platform for software developers. Personally, I love it, and still use it all the time. But it was equally clear that VB .NET is not easy. In fact, the entire .NET framework is robust, powerful, sophisticated and complex. It’s a great platform for software developers, but is it a platform that makes it easy for non-programmers to write line of business applications? Not even close.

Both VB .NET and C# are native languages to the .NET framework – the Windows API of today’s software. Missing was the magic of the original VB – that layer of abstraction that made it easy for anyone to write software.

I’ve been searching for that magic for a long time. I kept waiting for it to appear out of nowhere the way VB 1.0 did. I sure didn’t expect it to sneak up on me from behind.

(more…)

Books at Dreamforce

Going to Dreamforce? Be sure to go to the Developer Keynote on Thursday at 3:00pm where Salesforce will be giving everyone who attends a free copy of “Advanced Apex Programming”!

Ok, it’s not free – you did pay for your Dreamforce pass, so technically it’s included in the conference cost, but you know what I mean.

You’re also welcome to come by any of my sessions, or developer theater talk.

… or stop by the Full Circle CRM booth (#125) where I’ll be most of the time when I’m not presenting or attending sessions.

Developer Theater Talk at Dreamforce

I’ll be doing a presentation at the Developer Theater at Dreamforce at 10:00am on Thursday

Here’s the official description:

Real programmers use Apex

Salesforce has always been known for its powerful point-and-click customization tools. The “Clicks before Code” mantra expressed the idea that a wide variety of business processes and use cases could be solved using workflows, formulas and other non-programming techniques. But in truth, Force.com is a software development platform – as robust and powerful as frameworks like .NET or J2EE. As such, it’s no longer enough to know some language syntax and specific techniques. You need to think as a software architect, to be able to draw on multiple design patterns based on requirements and limits, and to consider lifecycle costs (and how to mitigate them in Force.com through good design, testing and diagnostics).
In this session you’ll meet Dan Appleman, the author of the new book “Advanced Apex Programming for Salesforce.com and Force.com” that builds on the existing Apex and Force.com documentation to help developers reach that next level. You’ll discover how (and why) a past Microsoft MVP and expert .NET developer ended up writing a major AppExchange package, and some of the surprising things he discovered along the way.

When is a Boolean not a Boolean?

The other day I saw a question on a forum in which it was noted that the following Apex code fails with a null exception error:

Boolean testVal = null;
if ( testVal )
System.debug( ‘Testval is true.’ );
else
System.debug( ‘Testval is false.’ );

This is somewhat unexpected in that in almost every other language, a null value in a Boolean variable is considered False.

In Apex, primitive variables always support a null value in addition to their expected values. Thus if you enter the following code into the developer console:

Integer i;
Boolean b;
Date d;
DateTime dt;
Decimal dl;
Double dbl;
Long l;
String s;
Time t;
system.debug(i==null);
system.debug(b==null);
system.debug(d==null);
system.debug(dt==null);
system.debug(dl==null);
system.debug(dbl==null);
system.debug(l==null);
system.debug(s==null);
system.debug(t==null);

You’ll see that they all come out as true. And if you try to access any method on the object, you’ll get a null reference exception.

Boolean variables in Apex are thus consistent with other primitive data types.

Where it gets interesting is when you look at how you might use a Boolean value in code.

Let’s say you have an uninitialized Boolean variable b;

if(b) … will fail with a null reference exception

if(!b)… will fail with a null reference exception

Fair enough, but now consider the following two statements:

Boolean b;
if(b!=true) system.debug(‘b is not true’);
if(b!=false) system.debug(‘b is not false’);

The result will be:

b is not true
b is not false

Well now, everyone knows that a real Boolean variable can’t be ‘not true’ (which is false) and ‘not false’ (which is true) at the same time.

Which means, to put it bluntly, that a Boolean variable in Apex is not really a true Boolean variable – at least as far as purists would be concerned.

Fortunately, I’m not much of a purist, so the real concern is what this means in practice.

First – always initialize your variables! This is especially true for those of you coming from .NET, which kindly initializes value type variables for you.

Second, test for null values where appropriate, even if you are using primitive data types.

Third, the safest comparison for Booleans is always against true – i.e. b == true or b != true. if you do this consistently, any null values that do sneak in will be treated as false. But does this mean you should always compare Booleans against True? Not necessarily – because you might want to catch those null reference exceptions to detect uninitialized variables or failures to check against null values.

It does mean you should never compare against false –
because in Apex b!= false doesn’t mean b is always true.