About Job Queue vs many Companies in SaaS

Job queues and their platform façade, scheduled tasks, are the beating heart of the background execution giant.

It might turn being a giant made of sand if its two pillar legs are not so strong: resiliency and stability. Resiliency is the property of returning into a good steady state when unpredictable events happen. Stability is a broader concept and it could be resumed roughly, considering Job Queue, by the ratio between success and failure results. Generally speaking, the higher is the value, the more stable is the Job Queue feature.

Within SaaS there is a third pillar (leg) that needs to be respected and this is duly written in the documentation: there can be max 3 concurrent scheduled tasks running per environment. Yes, you heard well, per environment – not per company -.

Please remember that in SaaS, the task scheduler is a microservice that is serving all tenant requests bounded to a specific application service. In shorts: it processes A LOT of concurrent scheduled tasks altogether, but always topping max 3 per single environment.

Giving the magic number 3 to an environment, I believe it was a choice between cost (related to Microsoft resource governance) and prevent the chance of lock escalation in a single company (self-inflicted performance problem).

And now it comes the first advice to telemetry champs in Microsoft. Please add telemetry signals related to scheduled task throttling. This is one of the last pieces in the puzzle of proficiently and proactively monitoring Job Queue usage and performances.

It is trivial to say, at this stage, that the more companies you have inside your environment and number of active recurring job queues, the higher is the risk of overlapping or even to have this feature ending being unusable at all.

The question now is: how far could you go with Job Queue if you have a discrete number of companies? Let’s do the math.

Give it a period of running in background from 8 PM – someone might work until late – to 7 AM the day after – there are always early birds -. In total, non-working hour jobs might span through 11 hours – I know, still a huge amount of time -. This is then the prospected output considering the given constraints:

Companies / JQ per company10 JQ20 JQ30 JQ40 JQ50 JQ100 JQ200 JQ300 JQ
2 Companies20 JQE / 99 min each40 JQE / 50 min each60 JQE / 33 min each80 JQE / 25 min each100 JQE / 20 min each200 JQE / 10 min each400 JQE / 5 min each600 JQE / 3 min each
4 Companies40 JQE / 50 min each80 JQE / 25 min each120 JQE / 17 min each160 JQE / 12 min each200 JQE / 10 min each400 JQE / 5 min each800 JQE / 2 min each1200 JQE / 2 min each
6 Companies60 JQE / 33 min each120 JQE / 17 min each180 JQE / 11 min each240 JQE / 8 min each300 JQE / 7 min each600 JQE / 3 min each1200 JQE / 2 min each1800 JQE / 1 min each
8 Companies80 JQE / 25 min each160 JQE / 12 min each240 JQE / 8 min each320 JQE / 6 min each400 JQE / 5 min each800 JQE / 2 min each1600 JQE / 1 min each2400 JQE / 1 min each
10 Companies100 JQE / 20 min each200 JQE / 10 min each300 JQE / 7 min each400 JQE / 5 min each500 JQE / 4 min each1000 JQE / 2 min each2000 JQE / 1 min each3000 JQE / 1 min each
12 Companies120 JQE / 17 min each240 JQE / 8 min each360 JQE / 6 min each480 JQE / 4 min each600 JQE / 3 min each1200 JQE / 2 min each2400 JQE / 1 min each3600 JQE / 1 min each
15 Companies150 JQE / 13 min each300 JQE / 7 min each450 JQE / 4 min each600 JQE / 3 min each750 JQE / 3 min each1500 JQE / 1 min each3000 JQE / 1 min each4500 JQE / 0 min each
20 Companies200 JQE / 10 min each400 JQE / 5 min each600 JQE / 3 min each800 JQE / 2 min each1000 JQE / 2 min each2000 JQE / 1 min each4000 JQE / 0 min each6000 JQE / 0 min each
30 Companies300 JQE / 7 min each600 JQE / 3 min each900 JQE / 2 min each1200 JQE / 2 min each1500 JQE / 1 min each3000 JQE / 1 min each6000 JQE / 0 min each9000 JQE / 0 min each
Hours Available11

Considering the scenario above, I am assuming that every single JQE is taking the same amount of time. For example, if the environment is made of 10 companies and during the night shift there should be running 10 Job Queues per company, it means that you have 100 Job Queues that might span 11 hours with a max concurrency of 3. This will result in roughly having 20 minutes per Job Queue, before the throttling happens.

In real life scenarios, running with the same amount of time quite rarely happens, of course, but this statistical inference will provide few numbers and how far you can go with the Job Queue feature vs multiple companies in SaaS.

In all of this time equation, keep in mind that Microsoft applies upgrades, updates, platform and application hotfixes, database maintenance and so on and so forth. And mostly during the nightly time window.

CONCLUSION

Be sure to add in your implementation questionnaire for a potential customer the number of companies AND the prospected job queue entries per company. Even if the operational limit is set to 300 companies, you might end up in having trouble keeping Job Queues under control, even with just 40/50 companies.

If you fumble in stretched scenarios with many companies and job queues, there could be several alternatives that might mimic somehow the Job Queue feature and could be used, then, in combination with Job Queue or act in substitution (decoupling), such as triggering calls from external APIs, use Power Automate, Azure Logic Apps and/or Azure Queues. Of course, all of them requires a deeper analysis and implementation tests if one or a mix of these fits in.  

If also the aforementioned featured solutions are not enough or cannot be applied in resolving your background schedule dilemma, the last chance is to split companies into multiple environments to automatically spread the load between different online task scheduler services.

2 thoughts on “About Job Queue vs many Companies in SaaS

Add yours

Leave a comment

Blog at WordPress.com.

Up ↑