Hi all,
I am in a prestudy phase to investigate how to store large sets of parameter updates from numerous of PLCs into a historical database. Typically, we expect
5000 parameter updates per second.
My question: is it possible to use transactional replication in such a way that bulk inserts in the published database are replicated
as bulk inserts in the subscriber database?
Some more background information on this question:
Currently, I have created the following setup that comes closest to the desired application architecture (SQL SERVER2019):
Buffer database [on SERVER01]:
- Holds events of the last x days (e.g. last month) – sliding window.
- User will not run queries on this database to be sure to maintain high INSERT performance (avoid table locks).
- We use a buffer database to allow the historical database to be down.
- I use partitioning to be able to switch out old data easily.
- I have created a small test program in C# that uses the the Nuget package EFCore.BulkExtensions to bulk insert 5000 events every second. In SQL profiler,
I can indeed verify that these records are bulk inserted. I use bulk insert because it offers me the best insert performance, compared to individual inserts.
Historical database [on SERVER02]:
- Holds and remembers all events, records will never be deleted.
- User will typically run any type of queries on this database.
Replication:
- I have configured transactional replication to replicate the records from the buffer DB to the historical DB.
- I only replicate INSERT commands (no UPDATE and DELETE commands).
- The distribution DB is located on SERVER01.
Whenever I start my test program:
- Records are inserted into the buffer database at a rate of 5000 records per second, no problem.
- Publisher to distributor replication can catch up fast enough.
- However, distributor to subscriber is VERY slow (1M records take > 30 minutes).
I experimented to change some settings in the distribution agent (BcpBatchSize, CommitBatchSize, CommitBatchThreshold), but this doesn’t
really change the performance. I’m even not sure if they are really relevant for my use case.
I think that the fundament of the problem is that every insert of the bulk insert is a single command and that commands are replicated
in the subscriber database one by one.
I know that I could consider doing the bulk inserts on both databases and avoid replication for the involved articles. However, this would
require extra development to manage and monitor which records were correctly bulk inserted in which DB. If I can do I with replication, it comes ‘out of the box’.
Thanks in advance for your feedback,
Stephan