For a long running script that was also executed in a lot of batches in Azure Automation, we recently started getting this error occasionally:
Thread failed to start. (Thread failed to start. (Exception of type 'System.OutOfMemoryException' was thrown.))
We contacted MS Support about it, since there were no changes to the script and the error only happenend very rarely.
We received the reply that Microsoft is working on moving the underlying Azure Automation infrastructure into Azure containers which probably cause the issue. It should auto-resolve after the move has been completely performed.
Reply from Microsoft Support
->Currently, Azure automation is in the process of migrating from SQL Sandbox to Azure containerized environment, we see that the jobs which were running successfully are failing in container environment. More details of this process as below.
Azure Automation is currently in the process of modernizing its backend platform and bring forth a range of improvements. The new platform offers enhanced performance and reliability. Platform updates have required changes to the internal directory structure and the usage of environment variables. If your runbooks have taken dependencies on those internal implementation details, you may find that the behavior of your runbooks in the new environment may change, and your runbooks may require updating in order to adapt to the new platform. The product team has put processes in place to minimize issues arising out of this backend change and is monitoring service health continuously. We are actively working on mitigating any issues and promise to extend our complete support to resolve your queries.
As I checked from the backend, the jobs last failed are from 21st Oct which picked container as environment to run. This is a clear issue that has occurred because of migration to ACI(container). There is no action plan from your side, this will be resolved automatically.
Apart from the above, I do not see any failed jobs that picked container environment.
On 26th, I do see 10 jobs got generated out of which 4 picked container environment instead of Sandbox and completed successfully. Below are the 4 job ids:
As per the latest update from PG on 26th Oct, 70% of the deployment has been completed in West Europe(automation account region). In the coming days, you will see all the jobs using only container environment.
Kindly monitor if you see any failed jobs in the meantime and let me know so I can report them. Also, please reach out to me if you have further questions.