Process Builder Optimization: Don’t Evaluate the Next Criteria!

Process Builder Optimization: Don’t Evaluate the Next Criteria!

As process builder has grown more capable, it has also grown in its ability to consume CPU time. Most of the time, this won’t matter – however, if you aren’t careful, your carefully built process that works just fine with individual records, will blow up spectacularly when you start doing bulk operations. Using the Evaluate Next Criteria option can be a major contributor.

The reason is simple – at this time, one of the greatest consumer of CPU time in processes is the criteria – the rule.

Let’s look at a simple example:

This process has four criteria nodes that divide up states by the size of the state. Each of the criteria has simple formula rule such as this one that identifies the largest states: CONTAINS(‘CA,TX,FL,NY,PA,IL,OH,GA,NC,MI,NJ’, [Lead].State ) . The action calls out to an Apex method that logs consumed CPU time (trust me when I say this action costs very little CPU time).

Now imagine inserting 200 leads distributed evenly across all 50 states. With the process as is, all of the leads are evaluated by each node, regardless of the fact that it is impossible for a lead to qualify under more than one criteria (a lead can’t be in two states at once). The average time consumed was 303ms.

Running the same lead insertion after modifying the process to stop after each criteria is met, results in fewer leads being evaluated on each criteria node. In this example, since we distributed the leads equally by state, 75% will be evaluated on the second node, 50% on the third, and the rest on the last. This reduces the average CPU time consumption to 250ms.

What if all of the records belong to California – in other words, all of them match the first criteria node? In the first process example, where “Evaluate the Next Criteria” is set, there is no difference in CPU time! It still consumes 303ms. This makes sense, as all of the leads are evaluated by each rule.

However, when using the second process that does not evaluate the next criteria, in this scenario only the first criteria is evaluated and the average CPU time drops to 185ms!

Minimizing Rule Execution is an Essential Part of Process Builder Optimization

Executing criteria nodes is costly – by my estimates it runs 75-100ms per criteria against 200 records. When considered in terms of overall CPU time usage, this means that if you execute 100 criteria across multiple processes and sub-processes, that alone may be enough to exceed CPU limits even if nothing else is present in your org (workflows, actions, Apex, validation rules, etc.) – and who has that? Hopefully this will be an area where future optimization work will be done, but for now, it’s essential to design your processes to minimize the number of rules that execute – and stopping evaluation as soon as possible is a great start.

For more ideas on improving overall org performance, especially regarding interactions between Apex and declarative, check out my Pluralsight course “Adopting Trigger Design Patterns in Existing Salesforce Orgs

I was asked if this advice contradicts the “One Process to Rule Them All” design pattern of combining multiple processes on an object into one and using the “Evaluate Next Criteria Option” in that case. It does not. There are two reasons why:
1. Combining criteria into one process will cost the same or less CPU time than having it in separate processes.
2. Combining processes into one makes it possible to add an initial negative criteria node – one that stops execution for any record that does not meet any of the subsequent criteria – thus improving efficiency over having multiple processes where each criteria is evaluated for every record.

Benchmarking CPU time usage is tricky. I have an entire section in my book “Advanced Apex Programming” about how it’s done, and you can find how those techniques can be applied to declarative constructs in a talk Robert Watson and I did at Dreamforce back in 2016 called “It’s About (CPU) Time – The Dark Art of Benchmarking“. The specific numbers in that talk are long obsolete, but the methodology is sound, and I’ve continued to use it to examine Process Builder. The numbers here were generated on a API 47 scratch org on 2/4/2020. Preliminary testing shows comparable numbers on an API 48 Spring 2020 preview org). Numbers were averaged across 5 tests.

New Course: Trigger Design Patterns Without Frameworks

Patterns are more important than frameworks.

There. I said it. Blasphemy right? When it comes to Salesforce triggers, it’s common to think about frameworks. What’s the best framework to use? Should I find one or create my own? A framework can offer efficiency – code reuse and code you don’t have to write yourself. It can provide discipline – everybody has to use the framework. It can improve reliability and maintainability, and make problems easier to debug.

And in a brand new org with a single development team, you can make it work – 100% enthusiastic adoption.

But we live in the real world. And real orgs are often.. well, a mess. Or, to adopt a more Orwellian term- a “Happy Soup”. There are multiple development teams, some of whom don’t talk to each other. There’s no way they would all agree on a single framework. And the cost and risk of rewriting existing code into a framework is prohibitive. Nobody knows what half the code actually does anyway.

That’s right, in the real world, trigger frameworks are not nearly as useful as they sound. But that doesn’t mean you should give up. As it turns out, it’s possible to apply the same design patterns that trigger frameworks use when working in “Happy Soup” orgs. It’s possible to make small incremental changes that have massive benefit. In fact, you can achieve most of the benefits of a trigger framework at almost no cost and no risk.

You can find out how in my new Pluralsight course “Adopting Trigger Design Patterns in Existing Salesforce Orgs“.


The Case of the Missing Exception

Some bugs are hard.

Last week I had one of the hardest. It only happened occasionally, after a row lock error, in very specific scenarios, on a customer production org. It was, of course, impossible to reproduce. And given that it only occurred now and then for random users, capturing a debug log was out of the question.

So what do you do? You go old-school. Search the code for any execution path that could possibly lead to the results we saw in the data. And after many hours of research, I found nothing. There was no scenario that could lead to the results we were seeing. And there were no workflows, processes or flows that could do it either. We started wondering if maybe some outside integration was involved, but that seemed unlikely.

Well, there’s that old saying “When you’ve eliminated the impossible, whatever remains, however improbable, must be the truth”. There was one “impossible” code path that could theoretically lead to what I was seeing, but it could only happen in one case – if you could somehow read a field from an SObject that was not included in a query, having it return null instead of throwing an exception.

You’ve all seen this exception. Imagine a custom object Soql_Query_Test__c that has two fields, Test_Field_1__c and Test_Field_2__c and you execute the following code

Soql_Query_Test__c sqt = 
[Select ID, Test_Field_1__c 
from Soql_Query_Test__c Limit 1];
String s = sqt.Test_Field_2__c;

The result is the notorious SObjectException “SObject row was retrieved via SOQL without querying the requested field: Soql_Query_Test__c.Test_field_2__c”

I’ve seen that exception many times. It’s invaluable during development and testing when it comes to making sure that all of the fields we use are in a query. But the only way our bug was possible was if I could read an unqueried field without raising that exception.

I tried everything I could think of – converting the object into a generic SObject, passing it to functions and accessing the field there. The exception always appeared. Was I on the wrong track? Was this actually happening? What could our code be doing?

Fortunately, we have unit tests – good unit tests. We even have unit tests that simulate row locking exceptions, so I was able to run that code path, though not for the exact scenario that would reproduce the bug. Still, I could set some fields in a source record, add some debug statements and see exactly what happened.

And sure enough, the improbable was true. I had a record. It had a field that had a value in the database but was not included in the query. I confirmed it was not included in the query using the wonderful SOBject.GetPopulatedFieldsAsMap function. But when my code accessed the field, the value was null. No exception. Null. I was floored.

I started trying other things in the org where I was experimenting – different field types, dynamic vs. static DML, dynamic vs. static queries, and finally had a breakthrough. I set the other field to a random value, and the exception vanished.

Soql_Query_Test__c sqt = 
[Select ID, Test_Field_1__c 
from Soql_Query_Test__c Limit 1];
sqt.Test_Field_1__c = 'Changed value';
String s = sqt.Test_Field_2__c;

This results in no exception, and the string is set to null.

If you set any field in a record, reading any unqueried field on the record will return null instead of raising an exception.

I had my answer, and was able to implement a solution so we could patch the bug. But I’ll be honest, at this point my biggest question was – how could I not have known about this?

Is it a Bug, or a Feature?

The next day I reached out to Don Robins who is an expert trainer. He knew about this, and his view, and that of another trainer he spoke to was that this was a known and expected behavior. The reasoning: that once you set any field in a retrieved record, further missing field SObjectExceptions are disabled under the assumption that you (the developer) know what you are doing at this point.

Robert Watson, a co-worker and expert Apex developer hadn’t seen this, but found the following StackExchange post:

This post suggests that it was a bug that was introduced late 2017. But I knew our code dated to mid 2016. Fortunately, it’s possible to set the API version for an Apex class, so I set the class I was experimenting with to API 24 – which is about 6 years ago – and saw the same behavior. This leads me to conclude that either this behavior has always existed, possibly by design, or that it was an unversioned change.

You may wonder how could an unversioned change this significant occur and not be detected? What about the infamous Hammer test?

Well, think about what would have actually happened when this change was introduced. Existing code would continue to work. The lack of an exception would only break a test that was checking to verify that a missing field exception occurs – and what would be the point of such a test? In truth – this is not going to be a breaking change, and while it might have been caught by an internal Salesforce validation test, it’s highly unlikely any customer orgs, functionality or unit tests would be impacted.

A friend of mine at Salesforce brought the following known issue to my attention: suggesting that it is a bug. And yes, I did miss this when searching for existing issues before doing my own research. Oops – lesson learned (again).

So this brings us to the big question: is this a feature? Is it an unversioned change? Or is it a bug? And ultimately, should this behavior be changed?

It’s not an easy question to answer.

Does it make sense to ignore unqueried fields once you’ve set any field value? I can see the logic in that argument, but let’s rephrase it.

When updating a record, do you ever read fields on that record? Of course you do. And is there any scenario where, on reading an existing field, you would intentionally leave it out of the query string in order to return null instead of the existing value? Probably not.

Yes, you can make the argument that the developers should know what they’re doing and make sure to query all fields, but we developers make mistakes. And the earlier we find a mistake, the better. Which scenario is more likely to help discover a missing query term earlier – an exception, or returning null? Obviously, the exception. The only way you’d detect the incorrect null field value is if you looked for it, or saw the consequences later in the data – as I did. So while it makes sense to me to allow writing fields that were not queried,  I think it would be better for developers to have the exception always occur when accessing unqueried fields that have not been explicitly set.

So I’m leaning towards the “it’s a bug” camp… but is this a bug worth fixing?

The nature of this “bug” is the lack of an exception. How much code exists out there where someone queries a record, writes a field, and then inadvertently reads an unqueried field? Especially considering that this behavior may have existed from the earliest days of Apex? I’m afraid to even ask – the number could be enormous.

Sure, they would version this fix. But then you’ll have a new version of Apex where an exception might be thrown that wasn’t thrown before. Everyone will have to test their code. Unit tests will help, but only for those who have good unit tests, and even then, there can easily exist code paths where the bug was missed – which could lead to the sudden appearance of intermittent and occasional exceptions in code that is currently working for anyone who wants to upgrade their code to a new API version. For some orgs this could present a costly and risky obstacle to upgrading to a new API version – at exactly the time where the new Apex compiler promises to bring new enhancements to the language.

So yes, it may be a bug, but this may be a bug where the cure costs more than it’s worth. In which case, there’s only one thing left to do – turn it into a feature and document it.

Whichever approach they choose, this has been a fascinating case – I hope you found it as interesting as I did. And please spread the word – this behavior is something every Apex developer should know about and consider both at design time and when creating unit tests and QA plans.

A Most Interesting Apex Trigger Framework

In my book Advanced Apex Programming, I spend quite a bit of time discussing trigger design patterns. But I’m going to let you in on a little secret – what you find in the book isn’t really a “design pattern”, so much as a design concept.

And despite the chapter name “One trigger to rule them all”, I didn’t originate the idea that it was a good idea to control execution sequence by using just one trigger – experienced Apex developers already knew this. What I think I brought to the table was the idea that we could take advantage of the Apex language object oriented features to implement that concept in some really good, supportable and reliable ways.

Here’s a secret – the examples I used in the book do not, in fact, accurately reflect the framework I used in our own products. The framework we use is considerably more sophisticated. But the examples do reflect the concepts that our framework uses.

I did this because I do not believe there is any one “right” trigger design pattern or framework for everyone and every situation. So my goal in the book was to demonstrate the concepts involved, in the hope that others would build on it – come up with variations of different design patterns and frameworks based on those concepts.

I was thrilled to see the other day a blog post by Hari Krishnan called “An architecture framework to handle triggers in the platform”. It’s beautiful piece of work (and I do appreciate the shout out). As with our own framework, I don’t think it’s a solution for every scenario, but it does present a very elegant object oriented implementation to the problem. What really struck me was the innovative use of dynamic typing to instantiate objects based on the object type and name. Our own framework doesn’t use that approach, for the obvious reason that it was built before Apex supported dynamic object creation by type, but it’s definitely worth considering for any design going forward.

I don’t know if Hari has worked on the .NET platform (he does mention Java and C#), but the idea of dispatching by name is one we’ve seen in a number of Microsoft frameworks and languages. One can’t help but wonder if, now that we have a real tooling API, someone might come up with a client tool to generate and manage trigger handlers based on a framework like this….

Not only might this automate some of the “plumbing”, but conceivably bring us to that state of Nirvana where, with judicious use of some global interfaces, we might be able to control order of trigger execution across cooperating packages and between packages and Apex code on an organization instance.

Ah well, one can dream. Meanwhile, kudos to Hari for a fine piece of work. Definitely worth a read.

Code Coverage and Functional Testing for Optional Salesforce Features

A couple of days ago Matt Lacey posted an excellent article on developing for optional Salesforce features. He ended it with a question – how do you ensure code coverage for those orgs that have those features disabled?

For example – let’s say you have code that only runs when multi-currency is enabled on an org:

if(Schema.SObjectType.Opportunity.fields.GetMap().Get('CurrencyIsoCode') != null)
    // Do this on multi-currency orgs
    obj.Put('CurrencyIsoCode', o.Get('CurrencyIsoCode'));

How do you get code coverage for this section?

One way to do this is as follows:

First, we refactor out the currency test into it’s own function as follows:

private static Boolean m_IsMultiCurrency = null;
public static Boolean IsMultiCurrencyOrg()
    if(m_IsMultiCurrency!=null) return m_IsMultiCurrency;
    m_IsMultiCurrency = Schema.SObjectType.Opportunity.fields.GetMap().Get('CurrencyIsoCode') != null;
    return m_IsMultiCurrency;

Though not necessary for this example, in any real application where you have lots of tests for whether it’s a multi-currency org, you may be calling this test fairly often, and each call to Schema.SObjectType.Opportunity.fields.GetMap().Get(‘CurrencyIsoCode’) counts against your limit of 100 Describe calls. This function (which is written to minimize script lines even if called frequently) is a good tradeoff of script lines to reduce Describe calls for most applications.

Next, add a static variable to your application’s class called TestMode

public static Boolean TestMode = false;

Now the code block that runs on multicurrency orgs can look like this:

if(TestMode || IsMultiCurrencyOrg)
   // Do this on multi-currency orgs
   String ISOField = (TestMode && !IsMultiCurrencyOrg())?
                     'FakeIsoCode' : 'CurrencyIsoCode';
   obj.Put(ISOField, o.Get(ISOField));

What we’ve effectively done here is allow that block of code to also run when a special TestMode static variable is set. And instead of using the CurrencyIsoCode field which would fail on non-multicurrency orgs, we substitute in any dummy Boolean field. This can be another field on the object that you define, or you can just reuse some existing field that isn’t important for the test. There may be other changes you need to avoid errors in the code, but liberal use of the TestMode variable can help you maximize the code that runs during the test.

Why use a TestMode variable instead of Test.IsRunningTest()? Because the goal here is to get at least one pass through the code, probably in one specialized unit test. You probably won’t want this code to run in every unit test.

With this approach you can achieve both code coverage and, with clever choice of fields and field initialization, functional test results, even on orgs where a feature is disabled.


When is a Boolean not a Boolean?

The other day I saw a question on a forum in which it was noted that the following Apex code fails with a null exception error:

Boolean testVal = null;
if ( testVal )
System.debug( ‘Testval is true.’ );
System.debug( ‘Testval is false.’ );

This is somewhat unexpected in that in almost every other language, a null value in a Boolean variable is considered False.

In Apex, primitive variables always support a null value in addition to their expected values. Thus if you enter the following code into the developer console:

Integer i;
Boolean b;
Date d;
DateTime dt;
Decimal dl;
Double dbl;
Long l;
String s;
Time t;

You’ll see that they all come out as true. And if you try to access any method on the object, you’ll get a null reference exception.

Boolean variables in Apex are thus consistent with other primitive data types.

Where it gets interesting is when you look at how you might use a Boolean value in code.

Let’s say you have an uninitialized Boolean variable b;

if(b) … will fail with a null reference exception

if(!b)… will fail with a null reference exception

Fair enough, but now consider the following two statements:

Boolean b;
if(b!=true) system.debug(‘b is not true’);
if(b!=false) system.debug(‘b is not false’);

The result will be:

b is not true
b is not false

Well now, everyone knows that a real Boolean variable can’t be ‘not true’ (which is false) and ‘not false’ (which is true) at the same time.

Which means, to put it bluntly, that a Boolean variable in Apex is not really a true Boolean variable – at least as far as purists would be concerned.

Fortunately, I’m not much of a purist, so the real concern is what this means in practice.

First – always initialize your variables! This is especially true for those of you coming from .NET, which kindly initializes value type variables for you.

Second, test for null values where appropriate, even if you are using primitive data types.

Third, the safest comparison for Booleans is always against true – i.e. b == true or b != true. if you do this consistently, any null values that do sneak in will be treated as false. But does this mean you should always compare Booleans against True? Not necessarily – because you might want to catch those null reference exceptions to detect uninitialized variables or failures to check against null values.

It does mean you should never compare against false –
because in Apex b!= false doesn’t mean b is always true.