Quantcast
Channel: SQL Server Replication forum
Viewing all articles
Browse latest Browse all 4054

Snapshot Hangs - Will not complete

$
0
0

We have had a successful Transactional Replication Topology in place for the better part of 4 years and recently experienced a hiccup that I have yet to resolve or find a similarly posted issue. 

Our environment consists of 2 main SQL Servers both SQL2005 (9.00.3042.00 X64 SP2) running Windows 2003 SP2. Production Server (A) and Reporting Server (B). The Distributor is on (B) and we have approximately 30 Publications with 700 Articles setup in Push Subscriptions executing at various times throughout the day. All agents run at the Distributor. Due to the size of a few key tables, we have setup a handful of single Pubs to account for our larger Articles, although the term large is relative with max table size at 5M Rows and smallest of our largest tables at 1M rows. The prior snapshot size for a 1M row table was approx. 950,000KB.

Recently we needed to reinitialize a subscription and marked the sub for “Use a new snapshot”. A prior snapshot for this table would typically take 60 – 120 seconds to generate and could be performed without issue until now. When attempting to create the snapshot (during Non-Peak or Peak Hours) , the agent instantaneously jumps to 31% and hangs. The same is true for all of our larger tables.  The problem is not isolated to a single Pub/Article but appear to be tied to the total size of the Articles in the Pub.     During the holiday maintenance window, the agent for the subject Snapshot ran for more than 7 hours and never completed, never advanced and eventually manually canceled.

My thought was the location that stores the Snapshot may be low on disk space but have confirmed we have 30GB’s available / 30% available free space (100GB Drive). When watching the BCP file size, it does appear to increase from 4000kb to 4100kb to 4200kb but literally at a snail’s pace with no hope of reaching the 950,000KB size that I know it will eventually need to reach.

When executing the BCP command line tool, we get decent throughput at 14K RPS so by those standards a 1M Row table should take approximately 71 seconds which is in line with prior successful results.

As a test, I took the exact same Article and created a Publication on Server B (my reporting server which also houses the distributor). The snapshot took 48 seconds to complete which is in line with Server A’s prior results when the system was functioning properly. Not 100% sure on what I’ve ruled out by conducting this test; I’d like to think I ruled out the distributor as the bottleneck…

We had a similar problem 2 years ago and found over a thousand orphaned records in MSSubscription, MSArticles and MSPublications Table. Apparently the SQL2005 GUI doesn’t do a very good job of cleaning up after a change is made in replication.

Over the holiday, I tried dropping all Subs and Pubs and creating everything from scratch (scripts).  The smaller tables generate snapshots without issue, but the larger tables just sit and spin.  Running out of options with current in-house skill set.  Does anyone have any thoughts on the issue or maybe could point me in different direction?   

Many Thanks,



Viewing all articles
Browse latest Browse all 4054

Trending Articles