Replication Failed with Log Reader Agent Failed to Construct Command

Your rating: None Average: 3.9 (8 votes)

The development team was updating a native xml column in batches of 200 or so and attempted to speed things up by increasing the batch size. Once they hit about 500 they broke the log reader agent and we saw these errors:

2008-08-06 23:39:47.503 Status: 4, code: 20262, text: 'Delivering replicated transactions, xact count: 1,  command count: 632101.'.
2008-08-06 23:39:47.581 Status: 0, code: 20011, text: 'The process could not execute 'sp_replcmds' on 'DB1'.'.
2008-08-06 23:39:47.643 The process could not execute 'sp_replcmds' on 'DB1'.
2008-08-06 23:39:47.659 Status: 0, code: 18805, text: 'The Log Reader Agent failed to construct a replicated command from log sequence number (LSN) {000a4c8e:0006a8eb:000b}. Back up the publication database and contact Customer Support Services.'.
2008-08-06 23:39:47.722 Status: 0, code: 22037, text: 'The process could not execute 'sp_replcmds' on 'DB1'.'.
2008-08-07 00:01:23.272 The concurrent snapshot for publication 'DB1' is not available because it has not been fully generated or the Log Reader Agent is not running to activate it. If generation of the concurrent snapshot was interrupted, the Snapshot Agent for the publication.

Investigation Tips:
You can get more verbose logging messages by appending -Output to your Job Agent as noted in this kb:
http://support.microsoft.com/kb/q312292/ All it says is to append these parameters to replication "Run agent." job step.
-Output C:\ReplOutput.txt -OutputVerboseLevel 2
(Remember to remove when you are completed with your troubleshooting.)

You can run DBCC LOG (DB1,3) to try and find the LSN to see what object it was if your developers don't know what caused it and this info may help them find.

Solution:
Basically destroy the Publication and start all over, or try these commands to dump all the existing log commands for the database and do a new snapshot.

This command updates the record that identifies the last distributed transaction of the server:
EXEC sp_repldone @xactid = NULL, @xact_segno = NULL, @numtrans = 0, @time = 0, @reset = 1
This command flushes the article cache (you loose all of commands not replicated).
EXEC sp_replflush

The Log reader agent managed to start successfully this time.
Even though we managed to resolve the error and have replication funtioning again but we have had this happen three times, but it became unreliable in that even while updating just 200 rows at a time it would cause and error so we destroyed the Publication and started from scratch.