You are viewing a single comment's thread from:

RE: Hive Node Setup for the Smart, the Dumb, and the Lazy.

in Blockchain Wizardry • 6 months ago

Damn you 😡 It is still going. You were right and I remembered it wrong. I've dug out a 15 months old results of full sync and it was running over 37 hours up to 72M+. Compared to that current version appears to be slightly faster, but still couple times slower than what I thought it would be.

To be honest it smells like a bug (or more optimistically - as an optimization opportunity). There are couple of hiccups when node is not receiving blocks fast enough, but for the most part block processing is reported at close to 100% time. On the other hand computer seems to be sleeping, using around single core only, which is weird, since decomposing signatures, that used to make sync 7 times slower than replay, since HF26 is supposedly done on multiple threads and preemptively, as soon as block arrives, so I'd expect at least some bursts of higher CPU activity. Maybe I should use some config option for that?

It would be nice to have a comparison on the same machine: pure replay vs replay with full validation vs sync.

Sort:  

Signatures are checked ahead of time in separate threads, and sufficient number of threads are default allocated.

Whenever you see block processing at 100%, then the bottleneck is the single-core speed of your system (it's processing operations and updating state).

 6 months ago (edited) 

The results are in:

  • revision: 4921cb8c4abe093fa173ebfb9340a94ddf5ace7a
  • same config in both runs (no AH or other plugins that add a lot of data, just witness and APIs, including wallet bridge)
  • in both runs 87310000 blocks were processed (actually slightly more, with replay covering around 10 blocks extra that previous sync run added to block log while in live sync)
  • replay with validation (from start up to Performance report (total).) - 124225649 ms which is 34.5 hours, avg. block processing time (from Performance report at block) is 1.423 ms/block
  • sync (from start up to entering live mode) - 143988777 ms which is 40 hours, avg. block processing time (from Syncing Blockchain) is 1.649 ms/block

I'm curious how @gtg measurements will look in comparison.

Sync to replay ratio shoots up the most in areas of low blockchain activity, which is understandable, since small blocks are processed faster than they can be acquired from network, but in other areas sync is still 10-20% slower.

And the likely reason I remembered sync as faster than that is due to difference in computer speed - my home computer appears to be over 60% faster than the one I was running above experiments on, which would mean it should almost fit the sync inside 24 hours.

 6 months ago  

For now I have results for first 50M blocks:

50M blocksReal timelast 100k real timelast 100k cpu timeparallel speedup
Replay6:32:45 43.466s 61.132sx1.4064
Replay + Validate11:03:00 84.337s395.575sx4.6904
Resync14:31:33103.266s182.288sx1.7652

I just counted last 100k block times (cpu / real) so it's not a great measurement. I can have better numbers once I complete those runs. But it seems that replay with validation can somehow make a better use of multiple threads than validation during resync.

It might be the state undo logic slowing down blockchain processing in a sequential manner (this computation is probably skipped for replay+validate). But I doubt there is a way to disable it to check that, short of modifying the code for the test.

Probably we should modify the code dealing with checkpoints to skip undo logic up to the checkpoint. This would allow us to confirm if it is the bottleneck, and it would also give us a speedup when checkpoints are set if it turns out to be the bottleneck.

It should be easy to test - just cut out two lines with session in database::apply_block_extended (I'm actually assuming that out of order blocks won't reach that routine during sync, but if they do, it would be a source of slowdown).

I'd be surprised if undo sessions were the problem. They are relatively slow and worthy of optimization, but in relation to simple transactions, mostly custom_jsons, so their performance is significant when there is many of them, like during block production, reapplication of pending or in extreme stress tests with colony+queen. During sync we only have one session per block.

Yes, your assumption is correct, blocks are strictly processed in order during sync, the P2P code ensures this. If it's easy to test, let me know what you find out: I guess work with @gandalf so that the test is performed on the same machine as previous measurements.

Loading...