Deterministic Debugging of CKB’s Flaky Tests


This doc keeps track of my experience diagnosing CKB’s unstable tests. While the issue is unlikely to affect most application developers on CKB, it may be of interest to Rust developers working on complex multithreaded systems or those curious about advanced debugging techniques.
A PR has been created in CKB to fix tests per discovery documented in this doc.
Every now and then, when running CKB’s test, you might see errors like:
pthread lock: Invalid argument
terminate called without an active exception
These failures occur randomly (~5% of test runs) and are difficult to reproduce. I suspected the issue might be related to ckb-sync, and this post documents the debugging journey, tools used, and eventual fix.
Reproducing the Issue
Testing with cargo nextest
Cargo-nextest is a test runner for Rust projects. Based on the actual command executed by make test
, I could piece together the following command:
$ cargo nextest run --features with_sentry --no-fail-fast \
--hide-progress-bar --success-output immediate-final \
--failure-output immediate-final -p ckb-sync
However, this command would fail to build. It seems that running the ckb-sync
package alone, ckb-sync
is missing a dev-time feature dependency, like the following line:
ckb-tx-pool = { workspace = true, features = ["internal"] }
After adding this line, the above command would succeed in building and proceed to run the tests from ckb-sync
package.
On my machine, the test fails after ~20 iterations (5% failure rate) with either pthread lock: Invalid argument
or terminate called without an active exception
.
Deterministic Debugging with rr
Multithreaded Failures
rr is a lightweight debugging tool for recording, replaying and debugging execution of applications (trees of processes and threads)
records a program’s execution once and allows deterministic replay of that exact run multiple times. By capturing all sources of nondeterminism (including thread scheduling), rr
enables reliable debugging of multithreaded failures, making it especially useful for diagnosing flaky or intermittent test issues in complex systems like CKB.
Installation
Install rr
per the instructions.
Note:
rr
has specific requirements on OS and your CPU, refer to the docs for more details.
To limit the traces generated by rr, I will focus on one particular test tests::sync_shared::test_insert_parent_unknown_block
. From my local experiments, this test might throw the above mentioned errors.
Set rr
as the cargo runner
$ export CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUNNER="/home/ubuntu/rr-obj/bin/rr record"
The rr
binary on my machine is installed to /home/ubuntu/rr-obj/bin/rr
, you might want to adjust this accordingly.
Run the test repeatedly
Then we can keep running this command until one of the errors shows up:
$ cargo nextest run --features with_sentry --no-fail-fast \\
--hide-progress-bar --success-output immediate-final \\
--failure-output immediate-final -p ckb-sync \\
tests::sync_shared::test_insert_parent_unknown_block
As each test run would accumulate rr
traces and test data, you might use the following command between test runs to clean up all those data:
$ rm -rf ~/.local/share/rr/ /tmp/ckb-tmp-* /tmp/.tmp*
It’s not hard to write a script that keeps running the test until it fails:
#!/usr/bin/env bash
set -x
# This script simply runs the test repeatedly until faliure happens
unset CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUNNER
while true; do
cargo nextest run --features with_sentry --no-fail-fast \\
--hide-progress-bar --success-output immediate-final \\
--failure-output immediate-final -p \\
ckb-sync tests::sync_shared::test_insert_parent_unknown_block
RETURN_CODE=$?
rm -rf /tmp/ckb-tmp-* /tmp/.tmp*
if [[ "$RETURN_CODE" -ne 0 ]]; then
echo "Failed with $RETURN_CODE"
exit $RETURN_CODE
fi
done
Or try this one that takes rr
into account:
#!/usr/bin/env bash
set -x
RR="${RR:-/home/ubuntu/rr-obj/bin/rr}"
# This script runs the test wrapped with rr, so when failure happens,
# you can replay the failed test using rr
export CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUNNER="$RR record"
while true; do
cargo nextest run --features with_sentry --no-fail-fast \\
--hide-progress-bar --success-output immediate-final \\
--failure-output immediate-final -p \\
ckb-sync tests::sync_shared::test_insert_parent_unknown_block
RETURN_CODE=$?
rm -rf /tmp/ckb-tmp-* /tmp/.tmp*
if [[ "$RETURN_CODE" -ne 0 ]]; then
echo "Failed with $RETURN_CODE, use $RR replay to rerun the failure!"
exit $RETURN_CODE
fi
rm -rf ~/.local/share/rr
done
For some reason, the test tends to fail more often running under rr
on my machine, my guess is that rr
alters the behavior of multi-thread scheduling. Or maybe rr
picks a different execution path compared to what is normally tested in our daily development.
Failures Observed
Error Case: pthread lock: Invalid argument
With enough runs, one of the test outputs might look like following:
$ cargo nextest run --features with_sentry --no-fail-fast --hide-progress-bar --success-output immediate-final --failure-output immediate-final -p ckb-sync tests::sync_shared::test_insert_parent_unknown_block
Finished `test` profile [unoptimized + debuginfo] target(s) in 0.24s
info: using target runner `/home/ubuntu/rr-obj/bin/rr record` defined by environment variable `CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUNNER`
────────────
Nextest run ID d8f86415-500a-4b18-9093-cd21a656013a with nextest profile: default
Starting 1 test across 1 binary (68 tests skipped)
SIGABRT [ 2.098s] ckb-sync tests::sync_shared::test_insert_parent_unknown_block
──── STDOUT: ckb-sync tests::sync_shared::test_insert_parent_unknown_block
running 1 test
test tests::sync_shared::test_insert_parent_unknown_block ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 68 filtered out; finished in 0.93s
──── STDERR: ckb-sync tests::sync_shared::test_insert_parent_unknown_block
pthread lock: Invalid argument
────────────
Summary [ 2.099s] 1 test run: 0 passed, 1 failed, 68 skipped
SIGABRT [ 2.098s] ckb-sync tests::sync_shared::test_insert_parent_unknown_block
──── STDOUT: ckb-sync tests::sync_shared::test_insert_parent_unknown_block
running 1 test
test tests::sync_shared::test_insert_parent_unknown_block ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 68 filtered out; finished in 0.93s
──── STDERR: ckb-sync tests::sync_shared::test_insert_parent_unknown_block
pthread lock: Invalid argument
error: test run failed
At this case, you can find that rr
has generated traces:
$ ls -lh ~/.local/share/rr/
total 12K
drwxrwx--- 2 ubuntu ubuntu 4.0K Apr 27 05:55 ckb_sync-a645bdacec5c2bf3-0
drwxrwx--- 2 ubuntu ubuntu 4.0K Apr 27 05:55 ckb_sync-a645bdacec5c2bf3-1
drwxrwx--- 2 ubuntu ubuntu 4.0K Apr 27 05:55 ckb_sync-a645bdacec5c2bf3-2
-rw------- 1 ubuntu ubuntu 8 Apr 27 05:55 cpu_lock
lrwxrwxrwx 1 ubuntu ubuntu 27 Apr 27 05:55 latest-trace -> ckb_sync-a645bdacec5c2bf3-2
I run the clean command above between each run, so you’ll only see three traces here. The last one corresponds to the actual failing test, while the first two might be bookkeeping runs of nextest
. If you didn’t run the clean command, there could be more traces.
Now you can repeat the failed case:
$ ~/rr-obj/bin/rr replay
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/ubuntu/.local/share/rr/ckb_sync-a645bdacec5c2bf3-2/mmap_hardlink_4_ckb_sync-a645bdacec5c2bf3...
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /home/ubuntu/.local/share/rr/ckb_sync-a645bdacec5c2bf3-2/mmap_hardlink_4_ckb_sync-a645bdacec5c2bf3.
Use `info auto-load python-scripts [REGEXP]' to list them.
Remote debugging using 127.0.0.1:7932
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/.build-id/1c/8db5f83bba514f8fd5f1fb6d7be975be1bb855.debug...
BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file
This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Downloading separate debug info for system-supplied DSO at 0x6fffd000
0x00007ce375295540 in _start () from /lib64/ld-linux-x86-64.so.2
(rr)
rr
starts a gdb session for the ckb-sync test. We can use c
to continue to run the test:
(rr) c
Continuing.
Downloading separate debug info for /lib/x86_64-linux-gnu/libstdc++.so.6
running 1 test
test tests::sync_shared::test_insert_parent_unknown_block ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 68 filtered out; finished in 0.93s
pthread lock: Invalid argument
[New Thread 72541.73153]
[New Thread 72541.72544]
[New Thread 72541.73028]
[New Thread 72541.73029]
[New Thread 72541.73082]
[New Thread 72541.73109]
[New Thread 72541.73111]
[New Thread 72541.73113]
[New Thread 72541.73114]
[New Thread 72541.73115]
[New Thread 72541.73118]
[New Thread 72541.73119]
[New Thread 72541.73152]
[New Thread 72541.73156]
[New Thread 72541.73157]
Thread 2 received signal SIGABRT, Aborted.
[Switching to Thread 72541.73153]
Download failed: Invalid argument. Continuing without source file ./nptl/./nptl/pthread_kill.c.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
warning: 44 ./nptl/pthread_kill.c: No such file or directory
(rr)
And use bt
to print stack trace when the error happens:
(rr) bt
As the stack trace is quite long, I’m posting a screenshot instead:
We can track the failure to this line:
ffi::rocksdb_optimistictransactiondb_close(self.inner);
For some reason, closing an optimistic transaction DB triggers the failure. But the cause is unclear. Just a guess: what should a proper shutdown process in RocksDB look like? Is simply closing the DB instance enough?
As long as you don’t run rr record ...
or rerun the test, you can use rr replay
to rerun the failed test as many times as needed. Or, you can use rr replay ~/.local/share/rr/ckb_sync-a645bdacec5c2bf3-2
to manually pick the trace of the failure test. This way, you can continue running more rr record ...
.
Error Case: terminate called without an active exception
Enough runs of the test also reveal another error:
$ cargo nextest run --features with_sentry --no-fail-fast --hide-progress-bar --success-output immediate-final --failure-output immediate-final -p ckb-sync tests::sync_shared::test_insert_parent_unknown_block
Finished `test` profile [unoptimized + debuginfo] target(s) in 0.24s
info: using target runner `/home/ubuntu/rr-obj/bin/rr record` defined by environment variable `CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUNNER`
────────────
Nextest run ID 2aa60494-7933-4b99-9f5f-cdced260411a with nextest profile: default
Starting 1 test across 1 binary (68 tests skipped)
SIGABRT [ 2.083s] ckb-sync tests::sync_shared::test_insert_parent_unknown_block
──── STDOUT: ckb-sync tests::sync_shared::test_insert_parent_unknown_block
running 1 test
test tests::sync_shared::test_insert_parent_unknown_block ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 68 filtered out; finished in 0.92s
──── STDERR: ckb-sync tests::sync_shared::test_insert_parent_unknown_block
terminate called without an active exception
────────────
Summary [ 2.084s] 1 test run: 0 passed, 1 failed, 68 skipped
SIGABRT [ 2.083s] ckb-sync tests::sync_shared::test_insert_parent_unknown_block
──── STDOUT: ckb-sync tests::sync_shared::test_insert_parent_unknown_block
running 1 test
test tests::sync_shared::test_insert_parent_unknown_block ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 68 filtered out; finished in 0.92s
──── STDERR: ckb-sync tests::sync_shared::test_insert_parent_unknown_block
terminate called without an active exception
error: test run failed
We can also use rr
to jump into a gdb session:
$ ~/rr-obj/bin/rr replay
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/ubuntu/.local/share/rr/ckb_sync-a645bdacec5c2bf3-2/mmap_hardlink_4_ckb_sync-a645bdacec5c2bf3...
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /home/ubuntu/.local/share/rr/ckb_sync-a645bdacec5c2bf3-2/mmap_hardlink_4_ckb_sync-a645bdacec5c2bf3.
Use `info auto-load python-scripts [REGEXP]' to list them.
Remote debugging using 127.0.0.1:9252
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/.build-id/1c/8db5f83bba514f8fd5f1fb6d7be975be1bb855.debug...
BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file
This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
0x000072be440c4540 in _start () from /lib64/ld-linux-x86-64.so.2
(rr)
We can also use c
to continue with the test, and use bt
to check out stack trace when failure happens:
(rr) c
Continuing.
running 1 test
test tests::sync_shared::test_insert_parent_unknown_block ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 68 filtered out; finished in 0.92s
terminate called without an active exception
[New Thread 74722.74724]
[New Thread 74722.74725]
[New Thread 74722.74741]
[New Thread 74722.74742]
[New Thread 74722.74743]
[New Thread 74722.74744]
[New Thread 74722.74749]
[New Thread 74722.74750]
[New Thread 74722.74751]
[New Thread 74722.74752]
[New Thread 74722.74753]
[New Thread 74722.74754]
[New Thread 74722.74755]
[New Thread 74722.74771]
[New Thread 74722.74773]
[New Thread 74722.74775]
[New Thread 74722.74776]
Thread 1 received signal SIGABRT, Aborted.
Download failed: Invalid argument. Continuing without source file ./nptl/./nptl/pthread_kill.c.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
warning: 44 ./nptl/pthread_kill.c: No such file or directory
(rr) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x000072be4384527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x000072be438288ff in __GI_abort () at ./stdlib/abort.c:79
#5 0x000072be43ca5ff5 in __gnu_cxx::__verbose_terminate_handler () at ../../../../src/libstdc++-v3/libsupc++/vterminate.cc:95
#6 0x000072be43cbb0da in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:48
#7 0x000072be43ca5a55 in std::terminate () at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:58
#8 0x00005f7ac4eab073 in std::__terminate () at /usr/include/x86_64-linux-gnu/c++/13/bits/c++config.h:322
#9 std::thread::~thread (this=0x72be3c170ea0, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/std_thread.h:173
#10 0x00005f7ac4eefb68 in std::default_delete<std::thread>::operator() (this=0x5f7ac7268318 <rocksdb::PeriodicTaskScheduler::Default()::timer+152>,
__ptr=0x72be3c170ea0) at /usr/include/c++/13/bits/unique_ptr.h:99
#11 0x00005f7ac4ede608 in std::unique_ptr<std::thread, std::default_delete<std::thread> >::~unique_ptr (
this=0x5f7ac7268318 <rocksdb::PeriodicTaskScheduler::Default()::timer+152>, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/unique_ptr.h:404
#12 0x00005f7ac505d876 in rocksdb::Timer::~Timer (this=0x5f7ac7268280 <rocksdb::PeriodicTaskScheduler::Default()::timer>, __in_chrg=<optimized out>)
at rocksdb/util/timer.h:48
#13 0x000072be43847a76 in __run_exit_handlers (status=0, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true)
at ./stdlib/exit.c:108
#14 0x000072be43847bbe in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:138
#15 0x000072be4382a1d1 in __libc_start_call_main (main=main@entry=0x5f7ac3eeaf40 <main>, argc=argc@entry=4, argv=argv@entry=0x7ffe436ca588)
at ../sysdeps/nptl/libc_start_call_main.h:74
#16 0x000072be4382a28b in __libc_start_main_impl (main=0x5f7ac3eeaf40 <main>, argc=4, argv=0x7ffe436ca588, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7ffe436ca578) at ../csu/libc-start.c:360
#17 0x00005f7ac3e1a4c5 in _start ()
(rr)
This is actually less obvious than the previous one, the error happens when C++ code destructs a timer instance.
Just to take a guess: maybe there is a timer left unprocessed, when closing a Rocksdb DB instance?
Diving into the Code
Once we had a reliable way to reproduce the failure and replay it in rr
, it is time to figure out why it was happening.
When diving into the code, the first thing I noticed is a redundant delete C++ call:
ffi::rocksdb_optimistictransactiondb_close_base_db(self.base_db);
ffi::rocksdb_optimistictransactiondb_close(self.inner);
If we track down the code, both Line 152 and 153 here will eventually invoke ~StackableDB
(Line 34-41):
~StackableDB() override {
if (shared_db_ptr_ == nullptr) {
delete db_;
} else {
assert(shared_db_ptr_.get() == db_);
}
db_ = nullptr;
}
After the first invocation, db_
will become nullptr
, and C++’s delete
first checks if its operand is nullptr
. If so, the delete
call will just be a NOOP.
So the extra delete C++ call is redundant, not a double-free. In fact, even if we remove one of the double delete calls, we could still run into failures. Here’s one possible stacktrace:
And now, it’s mostly just repeating the following loop:
Read the code.
Add
println!
in Rust code orprintf
lines in C++ code.Use the above bash script to compile and repeatedly run the tests until a failure occurs.
Use
rr
to replay the failure in a gdb session, setting breakpoints to peek into memory data. If required, restartrr
to begin another gdb session.
Throughout this process, rr
has been super reliable and helpful. As long as bash script captures a test failure, rr
can deterministically rerun it as many times as you like.
By the end, I had likely started over 100 gdb sessions using rr
to replay the failures.
Failure Workflow Analysis
After enough trials, I’ve nailed down to the following workflow (I’m using tests::sync_shared::test_insert_parent_unknown_block
as an example, I believe other tests could be fixed similarly):
At the very start, two
shared
structures and twochain_controller
s are created (Line 83-94).The
shared1
variable created at Line 83 and its discarded chain controller work as expected.The issue stems from the
shared
andchain
variables created at Line 84.As part of the block starting at 84, start_chain_services is invoked. It creates a bunch of threads, some of which might hold a copy of
Shared
structure, meaning each of those threads also holdsArc<RocksDB>
instance.A unit test might not use CKB’s full shutdown process like a normal CKB node does. When the test
tests::sync_shared::test_insert_parent_unknown_block
terminates, this thread (Line 59-73) is still running.- Note: By terminate, I mean you can see
test tests::sync_shared::test_insert_parent_unknown_block ... ok
generated by Rust’s test infrastructure; in other words, you can think that control flow for the main test thread has already exited the methodtest_insert_parent_unknown_block
.
- Note: By terminate, I mean you can see
When Rust initiates thread termination (or pthread
determines to kill all still running threads), PreloadUnverifiedBlocksChannel
still holds a copy of Shared
structure, which maintains a copy of Arc<RocksDB>
.
Then the
Drop
impl of the underlyingOptimisticTransactionDB
starts executing, which will invoke RocksDB’s destructor (Line 717):DBImpl::~DBImpl() { ThreadStatus::OperationType cur_op_type = ThreadStatusUtil::GetThreadOperation(); ThreadStatusUtil::SetThreadOperation(ThreadStatus::OperationType::OP_UNKNOWN);
which then invokes RocksDB’s global Timer’s Cancel method (Line 86):
InstrumentedMutexLock l(&mutex_);
This is when
Timer::Cancel
decides to acquire a mutex lock, and pthread signals an error, aborting the test program.
This leads to a crash like pthread lock: Invalid argument
or
terminate called without an active exception
.
What Caused the Crash
My best guess at what happened here is:
When
test_insert_parent_unknown_block
finishes, Rust decides that all the requested tests (in this case, just one) have finished, and starts terminating the process.At this point, two threads are involved:
The thread containing
PreloadUnverifiedBlocksChannel
runsDrop
trait impl, which then calls RocksDB’s destructor, and eventually invokes RocksDB’sTimer::Cancel
on the global Timer object.A cleanup thread (most likely the top running thread) runs cleanup hooks for RocksDB’s global Timer structure, which also target the same global timer.
If Thread 1 finishes before 2, the termination shuts down cleanly and the test passes.
However, if Thread 2 finishes first, the thread containing PreloadUnverifiedBlocksChannel
will acquire a lock on a pthread mutex, which has already been cleaned up (in other words, this pthread mutex might not be properly initialized), and will trigger the aborting process, causing the process running the test to fail even when the Rust test itself succeeds.
The Fix
Based on the above assumption, I’ve prepared a fix in this commit, to ensure that all chain service’s threads are properly terminated before the testing function exits.
Now when running the following test command, the test never fails:
$ cargo nextest run --features with_sentry --no-fail-fast \\
--hide-progress-bar --success-output immediate-final \\
--failure-output immediate-final -p \\
ckb-sync tests::sync_shared::test_insert_parent_unknown_block
You can also run this command including bash script indefinitely until a failure happens.
With this fix, I could no longer reproduce the pthread error—even after running the command for over an hour.
Additional Notes
Googling the Error led to this discussion. Unfortunately, the proposed fix does not work for us.
Different RocksDB Versions: Currently, CKB is running on
ckb-rocksdb
v0.21.1 using RocksDB 8.5.4. CKB also developed a newer version:ckb-rocksdb
v0.22.0, with RocksDB 9.10.0. I’ve tried both and found the newer one has lower failure frequency, but the issue still occurs.
✍🏻 Written by Xuejie Xiao
His previous posts include:
Against ROP Attacks: A Blockchain Architect’s Take on VM-Level Security
A Journey Optimizing CKB Smart Contract: Porting Bitcoin as an Example
Optimizing C++ Code for CKB-VM: Porting Bitcoin as an Example
Find more in his personal website Less is More.
Subscribe to my newsletter
Read articles from Cryptape directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
