Hi All
We have transaction replication configured for multiple databases between two servers. The replication configuration is standard, with no custom procedures, all agent
profiles are set to default, etc. The only non-standard configuration is that CDC is configured on some of the destination objects. A snapshot publication is also configured, which we will get to later.
We've come across a bizarre incident with transactional replication where the distribution agent would just skip a number of replication commands. Initially this was a
rare occurrence and we would manually fix the data on the subscriber. The missing data was only occurring for one specific publisher database. Our initial thoughts were that the data was just being deleted by the users of the replicated data. As the occurrence
of missing replication data increased we investigated by making use of one of the destination tables that had CDC configured. But we could not find evidence of the missing data being deleted by the users.
After this we went through all possible reasons why the data would not be replicated to the subscriber, following all suggestions in this article: http://technet.microsoft.com/en-us/library/ms152532(v=sql.105).aspx
After exhausting all options, we ran a trace on the subscriber, focusing on a single article publication. The trace was setup to return the events SQL:BatchStarting and RPC:Starting. When the distribution agent failed, being unable to update a row that did not exist at the subscriber, the corresponding insert command was found in the distribution database (using sp_browsereplcmds) but not in the trace. We noticed that the distribution agent had in fact skipped few xact_seqno’s. This was occurring for many of the distribution agents.
Checking the missed xact_seqno’s against MSrepl_transactions, we noticed that the missed replication commands were always at around 11PM every day. The snapshot replication publication was scheduled to run at this time. The schedule was changed to run daily at 4PM, after which the skipped replication commands occurred at around this time. We have since disabled the snapshot replication job and will be able to determine whether this is the cause tomorrow.
So I guess my question is, is there a reason why a snapshot publication (presuming this is the cause) would cause a transaction replication distribution agent to skip replication commands?