I have setup replication in Azure IaaS environment using sql 2012. We have a large database of size 1 TB and replications is setup with 5 publications(around 200 tables) we have filters for few tables as well.
I see that log reader agent and distribution agent jobs are failing with 'Query timeout expired' and High sysyem usage. So I have changed the agent profile setting to -Querytimeout 50000 and -readbatchsize 1, however I am frequently seeing the issue, when ever there is huge data update\loads on the publisher database.
Let me know how to identify the problem and reason for job failures.
Please suggest the steps and necessary setting to ensure that replication agent jobs work inspite of heavy load.