Transactional Replication:
Changes are sent to subscribers as they are actually occurring. There is a log reader agent that runs on the distributor. It connect-up to the publication database and it scans through the transaction log of the publication database looking for commit to transactions. It changes those transactions from physical log records into logical operations. Those logical operations are entered into the distribution database. The distribution agent will push or pull the changes to the subscription database and apply them there. The good thing about Transactional replication is you do get to see all the changes. So, if you want to consume all of those this is the technique you have to use.
Subscribers use subscriptions for read only purposes!
The latency can vary depending on the network connectivity between the publisher and the distributor and the distributor and the subscriber. Many people use a separate server as the distributor and not have the distributor be on the same server as the publisher. It also can vary by how many subscribers you have, how many distribution agents there are, and the hardware speed on the distributor.
You can also replicate on systems that are not running SQL Server such as Oracle.
Because the Transactional Replication Log reader agent (moves transactions into the distribution database) job scans the transactional log of the publication database, it is possible that the log on the publication database may have to grow. If the log reader agent job cannot process the log records fast enough, or the periodicity of the log reader agent job has changed. The log records in that database cannot be cleared or truncated until they’ve been processed by the log reader agent job. This is even if those log records have nothing to do with the transactional publication. This all means you can run into log growth problems.
If a subscriber goes offline its transactions are held in the distribution database till the subscriber comes online. Data loss can occur if the publisher fails. Transactional replication only protects the data in the publication.
To track changes, transactional replication with queued updating subscriptions must be able to uniquely identify every row in every published table.
When to use Transactional replication:
(1) You want to send changes to the subscriber(s) when they occur.
(2) Low latency is desired between the publisher and subscriber.
(3) The application requires all the intermediate states of the database.
(4) The publishers use many INSERT, UPDATE, and DELETE statements.
Transactional replication with Updatable Subscriptions:
This is like transactional replication but subscribers can also update the published articles.
|
Transactional Replication with Database Mirroring:
This adds redundancy and more availability to the replication stream if a server fails etc.
Merge Replication:
All the subscribers can change their subscription data. You do not see the intermediate changes.
The idea is a set of changes happens on the publication, a set of changes happens on the subscriber(s), and they connect-up every so often through the merge agent. The merge agent looks at all the changes and merges them together. It does conflict detection.
This is good for making many changes and having the systems to occasionally connect-up, merge the changes, get all the changes everyone else has done, and then disconnect. This means you do not need to have permanent network connectivity. You do not send the changes as they occur.
If you have a high volatility of changes at the various site, but you do not want to consume all those changes then use Merge replication. Example: Company with one data center and many branches throughout the country. Every so often the branches connect up through the data center and the merge agent merges all the changes together.
To track changes, merge replication must be able to uniquely identify every row in every published table.
You can specify that conflicts be detected at the column-level, so that only changes to the same row and column are considered a conflict.
The drawback is the way change tracking occurs. Change tracking is done through triggers. So, when something changes on the publication it fires a trigger, and the trigger body has to process the change. This can have a detrimental performance effect on a system if there are a lot of changes going on.
When to use Merge replication:
(1) Subscribers update the same data,
(2) Each subscriber requires different data,
(3) Conflicts might occur, or
(4) Intermediate data is not required.
Snapshot Replication:
This will create a snapshot of the publication at a point in time. You drop the old snapshot. You can take a portion of the database and publish it. How often snapshot is applied to the subscription. You will not see intermediate changes. You see the net effect of what has occurred over time. Example: Small amount of data but volatility is high.
Peer-to-Peer Transactional Replication:
This is a special case of Transactional Replication. This allows you to set up a topology of nodes and have changes from all nodes to be replicated automatically to all other nodes. Each server (node) acts as a publisher and a subscriber for the same data. For example, if you have three nodes and one goes down the other two nodes can carry on processing. When that third node comes back on line, it can automatically sync-up all the changes that came from the other two nodes. This adds redundancy and ability to recover from various nodes being off line. You also have read and write query scale out.
- Peer-to-peer replication is available only in SQL Server 2005 Enterprise Edition.
Metrics For Replication:
(1) Read versus read/write capability:
(2) What kind of bandwidth available: sending entire table or not
(3) Latency between servers: dependent on how schedule agents (how often publish to other nodes) and connectivity (how often connect up) between servers
(4) Schema (table) changes at publisher: No plain Transactional, Yes Merge
(5) Schema changes at subscriber: Possibly
(6) Supports automatic failover: No
(7) System data transferred: No
(8) Can you select individual articles: Yes
(9) Subscriber database protected: No
(10) Subscriber server can be used as a reporting server: Yes
(11) Can we tolerate loss of data
(12) Volatility (how often it changes/number of transactions) of Data: If have many transactions it may be taxing on a Transactional replication system
(13) Amount of Data published
(14) Kind of data involved
(15) Scalability of capacity and topology in a flexible way to respond to load
(16) Reliability
(17) Consistency
(18) Load balancing to use more servers to lessen load on other servers
(19) Conflict resolution: Automatic resolution
(20) With Peer to Peer replication it’s easy to do reads/writes and push to all other nodes
(21) With Transactional replication with updatable subscriptions, it’s harder (than P2P) to get all the nodes making changes and publishing and publishing back to all the other nodes
(22) With Merge replication, you can coalesce all the changes made at the servers
Terminology:
(A) Publisher: database source
(B) Publication: collection of articles to be published
(C) Article: data that is to be replicated
(D) Subscriber: receiver of publication
(E) Subscription: publication received
(F) Agents:
(F1) Snapshot copies initial snapshot (schema and data) to subscriber
(F2) Log reader monitors transaction logs on published databases
(F3) Merge syncs all the changes together
(F4) Distribution moves transaction to subscribers
(F5) Queue reader agent updates replication
Tips:
(1) Beware of large UPDATE statements such as global updates to the entire table
(2) Use small blocks of records when you can
(3) Replicate often to avoid conflicts
(4) Use automatic conflict resolution polices (default behavior) when possible
(5) Use reliable primary keys to prevent duplicates
(6) Avoid summary tables at each site
(7) Design applications to reduce possibility of conflicts
(8) A challenge of trigger-based replication is record-bouncing where a change (trigger) at a site propagates to the server where it (fires a trigger which) propagates back down to site, and gets into an endless loop.
(9) Design for resilience so you do not have to monitor the system constantly: (a) log errors, (b) databases may be inconsistent due to failed statements, (c) tolerate network loss, …