Job queues and their platform façade, scheduled tasks, are the beating heart of the background execution giant.
It might turn being a giant made of sand if its two pillar legs are not so strong: resiliency and stability. Resiliency is the property of returning into a good steady state when unpredictable events happen. Stability is a broader concept and it could be resumed roughly, considering Job Queue, by the ratio between success and failure results. Generally speaking, the higher is the value, the more stable is the Job Queue feature.
Within SaaS there is a third pillar (leg) that needs to be respected and this is duly written in the documentation: there can be max 3 concurrent scheduled tasks running per environment. Yes, you heard well, per environment – not per company -.
Please remember that in SaaS, the task scheduler is a microservice that is serving all tenant requests bounded to a specific application service. In shorts: it processes A LOT of concurrent scheduled tasks altogether, but always topping max 3 per single environment.
Giving the magic number 3 to an environment, I believe it was a choice between cost (related to Microsoft resource governance) and prevent the chance of lock escalation in a single company (self-inflicted performance problem).
And now it comes the first advice to telemetry champs in Microsoft. Please add telemetry signals related to scheduled task throttling. This is one of the last pieces in the puzzle of proficiently and proactively monitoring Job Queue usage and performances.
It is trivial to say, at this stage, that the more companies you have inside your environment and number of active recurring job queues, the higher is the risk of overlapping or even to have this feature ending being unusable at all.
The question now is: how far could you go with Job Queue if you have a discrete number of companies? Let’s do the math.
Give it a period of running in background from 8 PM – someone might work until late – to 7 AM the day after – there are always early birds -. In total, non-working hour jobs might span through 11 hours – I know, still a huge amount of time -. This is then the prospected output considering the given constraints:
Companies / JQ per company | 10 JQ | 20 JQ | 30 JQ | 40 JQ | 50 JQ | 100 JQ | 200 JQ | 300 JQ |
2 Companies | 20 JQE / 99 min each | 40 JQE / 50 min each | 60 JQE / 33 min each | 80 JQE / 25 min each | 100 JQE / 20 min each | 200 JQE / 10 min each | 400 JQE / 5 min each | 600 JQE / 3 min each |
4 Companies | 40 JQE / 50 min each | 80 JQE / 25 min each | 120 JQE / 17 min each | 160 JQE / 12 min each | 200 JQE / 10 min each | 400 JQE / 5 min each | 800 JQE / 2 min each | 1200 JQE / 2 min each |
6 Companies | 60 JQE / 33 min each | 120 JQE / 17 min each | 180 JQE / 11 min each | 240 JQE / 8 min each | 300 JQE / 7 min each | 600 JQE / 3 min each | 1200 JQE / 2 min each | 1800 JQE / 1 min each |
8 Companies | 80 JQE / 25 min each | 160 JQE / 12 min each | 240 JQE / 8 min each | 320 JQE / 6 min each | 400 JQE / 5 min each | 800 JQE / 2 min each | 1600 JQE / 1 min each | 2400 JQE / 1 min each |
10 Companies | 100 JQE / 20 min each | 200 JQE / 10 min each | 300 JQE / 7 min each | 400 JQE / 5 min each | 500 JQE / 4 min each | 1000 JQE / 2 min each | 2000 JQE / 1 min each | 3000 JQE / 1 min each |
12 Companies | 120 JQE / 17 min each | 240 JQE / 8 min each | 360 JQE / 6 min each | 480 JQE / 4 min each | 600 JQE / 3 min each | 1200 JQE / 2 min each | 2400 JQE / 1 min each | 3600 JQE / 1 min each |
15 Companies | 150 JQE / 13 min each | 300 JQE / 7 min each | 450 JQE / 4 min each | 600 JQE / 3 min each | 750 JQE / 3 min each | 1500 JQE / 1 min each | 3000 JQE / 1 min each | 4500 JQE / 0 min each |
20 Companies | 200 JQE / 10 min each | 400 JQE / 5 min each | 600 JQE / 3 min each | 800 JQE / 2 min each | 1000 JQE / 2 min each | 2000 JQE / 1 min each | 4000 JQE / 0 min each | 6000 JQE / 0 min each |
30 Companies | 300 JQE / 7 min each | 600 JQE / 3 min each | 900 JQE / 2 min each | 1200 JQE / 2 min each | 1500 JQE / 1 min each | 3000 JQE / 1 min each | 6000 JQE / 0 min each | 9000 JQE / 0 min each |
Hours Available | 11 |
Considering the scenario above, I am assuming that every single JQE is taking the same amount of time. For example, if the environment is made of 10 companies and during the night shift there should be running 10 Job Queues per company, it means that you have 100 Job Queues that might span 11 hours with a max concurrency of 3. This will result in roughly having 20 minutes per Job Queue, before the throttling happens.
In real life scenarios, running with the same amount of time quite rarely happens, of course, but this statistical inference will provide few numbers and how far you can go with the Job Queue feature vs multiple companies in SaaS.
In all of this time equation, keep in mind that Microsoft applies upgrades, updates, platform and application hotfixes, database maintenance and so on and so forth. And mostly during the nightly time window.
CONCLUSION
Be sure to add in your implementation questionnaire for a potential customer the number of companies AND the prospected job queue entries per company. Even if the operational limit is set to 300 companies, you might end up in having trouble keeping Job Queues under control, even with just 40/50 companies.
If you fumble in stretched scenarios with many companies and job queues, there could be several alternatives that might mimic somehow the Job Queue feature and could be used, then, in combination with Job Queue or act in substitution (decoupling), such as triggering calls from external APIs, use Power Automate, Azure Logic Apps and/or Azure Queues. Of course, all of them requires a deeper analysis and implementation tests if one or a mix of these fits in.
If also the aforementioned featured solutions are not enough or cannot be applied in resolving your background schedule dilemma, the last chance is to split companies into multiple environments to automatically spread the load between different online task scheduler services.
This is just gold, Thankyou
LikeLiked by 1 person
Great post Duilio! Something we just kept in mind but never had a thought about putting it on paper and make it useful to the BC community.
LikeLiked by 1 person