Jump to content

Possible Bug/Regression in SystemC 2.3.2: Race Condition


AmeyaVS

Recommended Posts

Hello everyone,

Has anyone faced issues with Linux Build for SystemC 2.3.2 configured with PThreads as the threading library backend?

I have configured the SystemC library using the following command:

cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=$HOME/apps/systemc-2.3.2 -DCMAKE_CXX_STANDARD=11 -DCMAKE_CXX_EXTENSIONS=OFF -DENABLE_PTHREADS=ON -DENABLE_PHASE_CALLBACKS=ON ..

# Then followed by
make
#
make check
# and
make install

I having issues with the SystemC kernel getting stuck in deadlock.

I am currently working on Ubuntu 18.04 using g++:

g++ (Ubuntu 7.3.0-16ubuntu3) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Though I do not experience such issues with the QuickThreads back-end, but for now I need SystemC kernel with PThreads support.

I have attached the back-trace of a minimal example/regression test from the test-suite which triggers this behavior:(systemc-regressions-2.3.2)

TEST : systemc/1666-2011-compliance/living_dead_bug/living_dead_bug.cpp  (11/871)

Note: This behavior is also observed with SystemC 2.3.1a release.

Best Regards,

Ameya Vikram Singh

 

SystemC.RaceCondition.GDB.BackTrace.txt

Link to comment
Share on other sites

Hello @Roman Popov,

I did try it on Ubuntu 16.04 with GCC 5.4.0, and currently I do not observe this behavior.

Probably some regression in the base system, I will try to narrow it down once I get some time.

I just wanted to give a heads-up in-case there is an issue with the SystemC kernel.

Since, most of the application packaged on these Linux systems do get some form of regression testing. 

Regards,

Ameya Vikram Singh

Link to comment
Share on other sites

Hello @Roman Popov,

It seems the issue is consistent with multiple different Linux OS with recent versions of GLIBC.

From what I could figure out was the internal implementation for pthread mutex and condition variables have been updated.

I will try to find the discussion on the same, but for now I think it would be better if someone from working group could also provide some insight into the issue.

Regards,

Ameya Vikram Singh

Link to comment
Share on other sites

Hi Ameya,

I currently don't have access to such new Linux platforms, but I may have a suspicion about a potential root cause.
Can you please check, if it helps to change the sc_process_b::delete_process function in src/sysc/kernel/sc_process.cpp  as follows:

 // if ( this != sc_get_current_process_b() )
 if ( NULL == sc_get_current_process_b() )

Thanks and Greetings from Duisburg,
  Philipp

Link to comment
Share on other sites

Hi Ameya,

thanks for testing.  As said before, unfortunately I cannot reproduce this on my end.  I would need more details on the current behavior:

  • The backtrace looks like there is something broken during model teardown. Have you seen other cases?
  • Are all simulations with processes hanging in a similar way? (e.g. can you provide a full regression result?)

Greetings from Duisburg,
  Philipp

Link to comment
Share on other sites

Hello @Philipp A Hartmann,

I have probably seen this behavior in other regression tests also. But currently I do not recall all of them, this one was the first one to deadlock.

I will try to run the regression test-suite with individual tests and post the results whenever I get a chance.

Thanks and Regards,

Ameya Vikram Singh

 

Link to comment
Share on other sites

  • 2 years later...

I get this problem on Fedora 33.

Helgrind detects plenty of errors (see below), and the hang seems to be occurring when a call to pthread_cond_destoy is made whilst the object is still being held. I haven't debugged further.

 

==87323== Thread #3: lock order "0x5378948 before 0x4ACC978" violated
==87323==  
==87323== Observed (incorrect) order is: acquisition of lock at 0x4ACC978
==87323==    at 0x483DB42: mutex_lock_WRK (hg_intercepts.c:907)
==87323==    by 0x4841A3A: pthread_mutex_lock (hg_intercepts.c:923)
==87323==    by 0x49EFEA4: sc_core::sc_cor_pkg_pthread::yield(sc_core::sc_cor*) (sc_cor_pthread.cpp:250)
==87323==    by 0x4A108B9: sc_core::sc_thread_process::suspend_me() (sc_thread_process.h:223)
==87323==    by 0x4A1AB84: sc_core::sc_thread_process::wait(sc_core::sc_time const&) (sc_thread_process.h:324)
==87323==    by 0x4A1A081: sc_core::wait(sc_core::sc_time const&, sc_core::sc_simcontext*) (sc_wait.cpp:167)
==87323==    by 0x4068CB: sc_core::sc_module::wait(double, sc_core::sc_time_unit) (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe)
==87323==    by 0x406E56: Top::control() (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe)
==87323==    by 0x4A10504: sc_core::sc_process_b::semantics() (sc_process.h:685)
==87323==    by 0x4A16C21: sc_core::sc_thread_cor_fn(void*) (sc_thread_process.cpp:117)
==87323==    by 0x49EFBB0: sc_core::sc_cor_pthread::invoke_module_method(void*) (sc_cor_pthread.cpp:127)
==87323==    by 0x4840737: mythread_wrapper (hg_intercepts.c:387)
==87323==  
==87323==  followed by a later acquisition of lock at 0x5378948
==87323==    at 0x483DB42: mutex_lock_WRK (hg_intercepts.c:907)
==87323==    by 0x4841A3A: pthread_mutex_lock (hg_intercepts.c:923)
==87323==    by 0x49EFEC4: sc_core::sc_cor_pkg_pthread::yield(sc_core::sc_cor*) (sc_cor_pthread.cpp:252)
==87323==    by 0x4A108B9: sc_core::sc_thread_process::suspend_me() (sc_thread_process.h:223)
==87323==    by 0x4A1AB84: sc_core::sc_thread_process::wait(sc_core::sc_time const&) (sc_thread_process.h:324)
==87323==    by 0x4A1A081: sc_core::wait(sc_core::sc_time const&, sc_core::sc_simcontext*) (sc_wait.cpp:167)
==87323==    by 0x4068CB: sc_core::sc_module::wait(double, sc_core::sc_time_unit) (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe)
==87323==    by 0x406E56: Top::control() (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe)
==87323==    by 0x4A10504: sc_core::sc_process_b::semantics() (sc_process.h:685)
==87323==    by 0x4A16C21: sc_core::sc_thread_cor_fn(void*) (sc_thread_process.cpp:117)
==87323==    by 0x49EFBB0: sc_core::sc_cor_pthread::invoke_module_method(void*) (sc_cor_pthread.cpp:127)
==87323==    by 0x4840737: mythread_wrapper (hg_intercepts.c:387)
==87323==  
==87323== Required order was established by acquisition of lock at 0x5378948
==87323==    at 0x483DB42: mutex_lock_WRK (hg_intercepts.c:907)
==87323==    by 0x4841A3A: pthread_mutex_lock (hg_intercepts.c:923)
==87323==    by 0x49EFEA4: sc_core::sc_cor_pkg_pthread::yield(sc_core::sc_cor*) (sc_cor_pthread.cpp:250)
==87323==    by 0x4A1204B: sc_core::sc_simcontext::crunch(bool) (sc_simcontext.cpp:504)
==87323==    by 0x4A0DD19: sc_core::sc_simcontext::simulate(sc_core::sc_time const&) (sc_simcontext.cpp:887)
==87323==    by 0x4A0FB61: sc_core::sc_start(sc_core::sc_time const&, sc_core::sc_starvation_policy) (sc_simcontext.cpp:1718)
==87323==    by 0x4A0FC8C: sc_core::sc_start() (sc_simcontext.cpp:1752)
==87323==    by 0x406436: sc_main (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe)
==87323==    by 0x49F6E8C: sc_elab_and_sim (sc_main_main.cpp:89)
==87323==    by 0x49F6CCE: main (sc_main.cpp:36)
==87323==  
==87323==  followed by a later acquisition of lock at 0x4ACC978
==87323==    at 0x483DB42: mutex_lock_WRK (hg_intercepts.c:907)
==87323==    by 0x4841A3A: pthread_mutex_lock (hg_intercepts.c:923)
==87323==    by 0x49EFEC4: sc_core::sc_cor_pkg_pthread::yield(sc_core::sc_cor*) (sc_cor_pthread.cpp:252)
==87323==    by 0x4A1204B: sc_core::sc_simcontext::crunch(bool) (sc_simcontext.cpp:504)
==87323==    by 0x4A0DD19: sc_core::sc_simcontext::simulate(sc_core::sc_time const&) (sc_simcontext.cpp:887)
==87323==    by 0x4A0FB61: sc_core::sc_start(sc_core::sc_time const&, sc_core::sc_starvation_policy) (sc_simcontext.cpp:1718)
==87323==    by 0x4A0FC8C: sc_core::sc_start() (sc_simcontext.cpp:1752)
==87323==    by 0x406436: sc_main (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe)
==87323==    by 0x49F6E8C: sc_elab_and_sim (sc_main_main.cpp:89)
==87323==    by 0x49F6CCE: main (sc_main.cpp:36)
==87323==  
==87323==  Lock at 0x5378948 was first observed
==87323==    at 0x484199B: pthread_mutex_init (hg_intercepts.c:785)
==87323==    by 0x49EFA83: sc_core::sc_cor_pthread::sc_cor_pthread() (sc_cor_pthread.cpp:79)
==87323==    by 0x49EFD5D: sc_core::sc_cor_pkg_pthread::create(unsigned long, void (*)(void*), void*) (sc_cor_pthread.cpp:175)
==87323==    by 0x4A17350: sc_core::sc_thread_process::prepare_for_simulation() (sc_thread_process.cpp:334)
==87323==    by 0x4A0D819: sc_core::sc_simcontext::prepare_to_simulate() (sc_simcontext.cpp:712)
==87323==    by 0x4A0DB83: sc_core::sc_simcontext::initialize(bool) (sc_simcontext.cpp:819)
==87323==    by 0x4A0DBC5: sc_core::sc_simcontext::simulate(sc_core::sc_time const&) (sc_simcontext.cpp:844)
==87323==    by 0x4A0FB61: sc_core::sc_start(sc_core::sc_time const&, sc_core::sc_starvation_policy) (sc_simcontext.cpp:1718)
==87323==    by 0x4A0FC8C: sc_core::sc_start() (sc_simcontext.cpp:1752)
==87323==    by 0x406436: sc_main (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe)
==87323==    by 0x49F6E8C: sc_elab_and_sim (sc_main_main.cpp:89)
==87323==    by 0x49F6CCE: main (sc_main.cpp:36)
==87323==  Address 0x5378948 is 24 bytes inside a block of size 128 alloc'd
==87323==    at 0x483AF1D: operator new(unsigned long) (vg_replace_malloc.c:342)
==87323==    by 0x49EFD52: sc_core::sc_cor_pkg_pthread::create(unsigned long, void (*)(void*), void*) (sc_cor_pthread.cpp:175)
==87323==    by 0x4A17350: sc_core::sc_thread_process::prepare_for_simulation() (sc_thread_process.cpp:334)
==87323==    by 0x4A0D819: sc_core::sc_simcontext::prepare_to_simulate() (sc_simcontext.cpp:712)
==87323==    by 0x4A0DB83: sc_core::sc_simcontext::initialize(bool) (sc_simcontext.cpp:819)
==87323==    by 0x4A0DBC5: sc_core::sc_simcontext::simulate(sc_core::sc_time const&) (sc_simcontext.cpp:844)
==87323==    by 0x4A0FB61: sc_core::sc_start(sc_core::sc_time const&, sc_core::sc_starvation_policy) (sc_simcontext.cpp:1718)
==87323==    by 0x4A0FC8C: sc_core::sc_start() (sc_simcontext.cpp:1752)
==87323==    by 0x406436: sc_main (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe)
==87323==    by 0x49F6E8C: sc_elab_and_sim (sc_main_main.cpp:89)
==87323==    by 0x49F6CCE: main (sc_main.cpp:36)
==87323==  Block was alloc'd by thread #1
==87323==  
==87323==  Lock at 0x4ACC978 was first observed
==87323==    at 0x484199B: pthread_mutex_init (hg_intercepts.c:785)
==87323==    by 0x49EFA83: sc_core::sc_cor_pthread::sc_cor_pthread() (sc_cor_pthread.cpp:79)
==87323==    by 0x49EFFBC: __static_initialization_and_destruction_0(int, int) (sc_cor_pthread.cpp:63)
==87323==    by 0x49EFFEF: _GLOBAL__sub_I_sc_cor_pthread.cpp (sc_cor_pthread.cpp:285)
==87323==    by 0x40108ED: call_init.part.0 (in /usr/lib64/ld-2.32.so)
==87323==    by 0x40109D7: _dl_init (in /usr/lib64/ld-2.32.so)
==87323==    by 0x40010C9: ??? (in /usr/lib64/ld-2.32.so)
==87323==  Address 0x4acc978 is 24 bytes inside data symbol "_ZN7sc_coreL8main_corE"


 

Link to comment
Share on other sites

Hello @Paul Floyd,

Thank you for the confirmation.

I did try out the valgrind tool with helgrind to understand the underlying threading issue.

Even I came up to the same conclusion then, but I have lost the references to the underlying changes and discussions in the glibc posix_thread_* constructs.

I did spend sometime looking into the issue and the internal implementation in SystemC library.

But from what I could gather is that the usage of pthread_cond variable is somewhat inconsistent.

As what I understand from earlier comments, people in the working groups probably don't have access to newer Linux systems to reproduce the issue.

We switched to using QuickThread Implementation later, but lost a consistent way to analyze simulation threads, and synchronization issues.

 

Currently, I don't even know if there are any plans to modernize the SystemC kernel with C++11 std::thread's, which would ease a lot for platform/tool support. 

Regards,

Ameya Vikram Singh

Link to comment
Share on other sites

  • 1 month later...

Can you tell me if your regression log file had either of these two messages?

 

Warning: (W558) disable() or dont_initialize() called on process with no static sensitivity, it will be orphaned: top.target

In file: /Users/acg/work/sysc/systemc-2.3.2/src/sysc/kernel/sc_simcontext.cpp:766

 

Warning: (W558) disable() or dont_initialize() called on process with no static sensitivity, it will be orphaned: top.control.dyn_target

In file: /Users/acg/work/sysc/systemc-2.3.2/src/sysc/kernel/sc_simcontext.cpp:1219

In process: top.control @ 0 s

Link to comment
Share on other sites

Hello @Andy Goodrich,

Here is the snippet of the log that is generated in my SystemC environment setup with pthreads enabled:


        SystemC 2.3.4_pub_rev_20191203-Accellera --- Jan 26 2021 22:27:50
        Copyright (c) 1996-2019 by all Contributors,
        ALL RIGHTS RESERVED
SystemC Simulation

Warning: (W558) disable() or dont_initialize() called on process with no static sensitivity, it will be orphaned: top.target
In file: ../src/sysc/kernel/sc_simcontext.cpp:771

Warning: (W558) disable() or dont_initialize() called on process with no static sensitivity, it will be orphaned: top.control.dyn_target
In file: ../src/sysc/kernel/sc_simcontext.cpp:1233
In process: top.control @ 0 s

Success

The test executable never completes it's execution as it hangs due to a race condition.

Regards,

Ameya Vikram Singh

Link to comment
Share on other sites

Interesting, the fact you got the "Success" indicates that SystemC returned to the main thread. That message is from sc_main(). The hang is occurring during tear-down of the "target" thread after exit from sc_main(). The culprit is this kill within the control thread:

target_handle.kill();

So could I ask one more favor and have you comment out that call and lets see if things get all the way through. Thanks.

 

 

Link to comment
Share on other sites

Interesting, the fact you got the "Success" indicates that SystemC returned to the main thread. That message is from sc_main(). The hang is occurring during tear-down of the "target" thread after exit from sc_main(). The culprit is this kill within the control thread:

target_handle.kill();

So could I ask one more favor and have you comment out that call and lets see if things get all the way through. Thanks.

 

Thanks for testing, let me look at things a bit, and I'll get back to you. Looks like there is an interaction between the teardown of threads and the code that "kills" a thread within the SystemC simulator.

Link to comment
Share on other sites

Hello @Andy Goodrich,

Thank you for the update.

As also pointed out by @Paul Floyd in his post regarding Helgrind report here:

 

Probably the Pthread threading constructs and implementation in SystemC needs to be looked into.

Earlier also around the same time I reported the issue, I also did an analysis and found that the underlying constructs used in, pthread were updated.

Something to do with futex updates for pthread in libc and the Linux Kernel. I don't have my notes now, since I have moved out.

But one can probably look at differences in various system library versions for the operating system releases e.g. Ubuntu 16.04 vs Ubuntu 20.04 releases or even Fedora releases from the same time frame.

Hope it helps.

Regards,

Ameya Vikram Singh

 

Link to comment
Share on other sites

The difference between a run with the kill on thread "target" and not doing that kill is that the sc_process_b object associated with "target" is reduced to a reference count of 1 from 2 by that kill. During tear down a reference decrement occurs and as a result the reference count for "target" goes to 0 causing an actual deletion of its sc_cor_pthread instance. Something about that deletion is causing the issue. 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...