AmeyaVS Posted September 10, 2018 Report Share Posted September 10, 2018 Hello everyone, Has anyone faced issues with Linux Build for SystemC 2.3.2 configured with PThreads as the threading library backend? I have configured the SystemC library using the following command: cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=$HOME/apps/systemc-2.3.2 -DCMAKE_CXX_STANDARD=11 -DCMAKE_CXX_EXTENSIONS=OFF -DENABLE_PTHREADS=ON -DENABLE_PHASE_CALLBACKS=ON .. # Then followed by make # make check # and make install I having issues with the SystemC kernel getting stuck in deadlock. I am currently working on Ubuntu 18.04 using g++: g++ (Ubuntu 7.3.0-16ubuntu3) 7.3.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Though I do not experience such issues with the QuickThreads back-end, but for now I need SystemC kernel with PThreads support. I have attached the back-trace of a minimal example/regression test from the test-suite which triggers this behavior:(systemc-regressions-2.3.2) TEST : systemc/1666-2011-compliance/living_dead_bug/living_dead_bug.cpp (11/871) Note: This behavior is also observed with SystemC 2.3.1a release. Best Regards, Ameya Vikram Singh SystemC.RaceCondition.GDB.BackTrace.txt Quote Link to comment Share on other sites More sharing options...
Roman Popov Posted September 11, 2018 Report Share Posted September 11, 2018 Can't reproduce on my machine (Ubuntu 16.04, gcc 6.3). Probably something specific to Ubuntu 18.04. Can you check with some older OS? Quote Link to comment Share on other sites More sharing options...
AmeyaVS Posted September 11, 2018 Author Report Share Posted September 11, 2018 Hello @Roman Popov, I did try it on Ubuntu 16.04 with GCC 5.4.0, and currently I do not observe this behavior. Probably some regression in the base system, I will try to narrow it down once I get some time. I just wanted to give a heads-up in-case there is an issue with the SystemC kernel. Since, most of the application packaged on these Linux systems do get some form of regression testing. Regards, Ameya Vikram Singh Quote Link to comment Share on other sites More sharing options...
AmeyaVS Posted September 14, 2018 Author Report Share Posted September 14, 2018 Hello @Roman Popov, It seems the issue is consistent with multiple different Linux OS with recent versions of GLIBC. From what I could figure out was the internal implementation for pthread mutex and condition variables have been updated. I will try to find the discussion on the same, but for now I think it would be better if someone from working group could also provide some insight into the issue. Regards, Ameya Vikram Singh Quote Link to comment Share on other sites More sharing options...
Philipp A Hartmann Posted September 15, 2018 Report Share Posted September 15, 2018 Hi Ameya, I currently don't have access to such new Linux platforms, but I may have a suspicion about a potential root cause. Can you please check, if it helps to change the sc_process_b::delete_process function in src/sysc/kernel/sc_process.cpp as follows: // if ( this != sc_get_current_process_b() ) if ( NULL == sc_get_current_process_b() ) Thanks and Greetings from Duisburg, Philipp Quote Link to comment Share on other sites More sharing options...
AmeyaVS Posted September 15, 2018 Author Report Share Posted September 15, 2018 Hello @Philipp A Hartmann, Thank you for your reply. Unfortunately even after applying the changes the issue still persists. In-case you need more inputs do let me know. Thanks and Regards, Ameya Vikram Singh Quote Link to comment Share on other sites More sharing options...
Philipp A Hartmann Posted September 19, 2018 Report Share Posted September 19, 2018 Hi Ameya, thanks for testing. As said before, unfortunately I cannot reproduce this on my end. I would need more details on the current behavior: The backtrace looks like there is something broken during model teardown. Have you seen other cases? Are all simulations with processes hanging in a similar way? (e.g. can you provide a full regression result?) Greetings from Duisburg, Philipp Quote Link to comment Share on other sites More sharing options...
AmeyaVS Posted September 19, 2018 Author Report Share Posted September 19, 2018 Hello @Philipp A Hartmann, I have probably seen this behavior in other regression tests also. But currently I do not recall all of them, this one was the first one to deadlock. I will try to run the regression test-suite with individual tests and post the results whenever I get a chance. Thanks and Regards, Ameya Vikram Singh Quote Link to comment Share on other sites More sharing options...
AmeyaVS Posted September 19, 2018 Author Report Share Posted September 19, 2018 Hello @Philipp A Hartmann, It seems to be an issue with this test scenario.(systemc/1666-2011-compliance/living_dead_bug) I tried running the regression test suite on another system and except for this test, all the other tests passes. Regards, Ameya Vikram Singh Quote Link to comment Share on other sites More sharing options...
Paul Floyd Posted December 13, 2020 Report Share Posted December 13, 2020 I get this problem on Fedora 33. Helgrind detects plenty of errors (see below), and the hang seems to be occurring when a call to pthread_cond_destoy is made whilst the object is still being held. I haven't debugged further. ==87323== Thread #3: lock order "0x5378948 before 0x4ACC978" violated ==87323== ==87323== Observed (incorrect) order is: acquisition of lock at 0x4ACC978 ==87323== at 0x483DB42: mutex_lock_WRK (hg_intercepts.c:907) ==87323== by 0x4841A3A: pthread_mutex_lock (hg_intercepts.c:923) ==87323== by 0x49EFEA4: sc_core::sc_cor_pkg_pthread::yield(sc_core::sc_cor*) (sc_cor_pthread.cpp:250) ==87323== by 0x4A108B9: sc_core::sc_thread_process::suspend_me() (sc_thread_process.h:223) ==87323== by 0x4A1AB84: sc_core::sc_thread_process::wait(sc_core::sc_time const&) (sc_thread_process.h:324) ==87323== by 0x4A1A081: sc_core::wait(sc_core::sc_time const&, sc_core::sc_simcontext*) (sc_wait.cpp:167) ==87323== by 0x4068CB: sc_core::sc_module::wait(double, sc_core::sc_time_unit) (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe) ==87323== by 0x406E56: Top::control() (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe) ==87323== by 0x4A10504: sc_core::sc_process_b::semantics() (sc_process.h:685) ==87323== by 0x4A16C21: sc_core::sc_thread_cor_fn(void*) (sc_thread_process.cpp:117) ==87323== by 0x49EFBB0: sc_core::sc_cor_pthread::invoke_module_method(void*) (sc_cor_pthread.cpp:127) ==87323== by 0x4840737: mythread_wrapper (hg_intercepts.c:387) ==87323== ==87323== followed by a later acquisition of lock at 0x5378948 ==87323== at 0x483DB42: mutex_lock_WRK (hg_intercepts.c:907) ==87323== by 0x4841A3A: pthread_mutex_lock (hg_intercepts.c:923) ==87323== by 0x49EFEC4: sc_core::sc_cor_pkg_pthread::yield(sc_core::sc_cor*) (sc_cor_pthread.cpp:252) ==87323== by 0x4A108B9: sc_core::sc_thread_process::suspend_me() (sc_thread_process.h:223) ==87323== by 0x4A1AB84: sc_core::sc_thread_process::wait(sc_core::sc_time const&) (sc_thread_process.h:324) ==87323== by 0x4A1A081: sc_core::wait(sc_core::sc_time const&, sc_core::sc_simcontext*) (sc_wait.cpp:167) ==87323== by 0x4068CB: sc_core::sc_module::wait(double, sc_core::sc_time_unit) (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe) ==87323== by 0x406E56: Top::control() (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe) ==87323== by 0x4A10504: sc_core::sc_process_b::semantics() (sc_process.h:685) ==87323== by 0x4A16C21: sc_core::sc_thread_cor_fn(void*) (sc_thread_process.cpp:117) ==87323== by 0x49EFBB0: sc_core::sc_cor_pthread::invoke_module_method(void*) (sc_cor_pthread.cpp:127) ==87323== by 0x4840737: mythread_wrapper (hg_intercepts.c:387) ==87323== ==87323== Required order was established by acquisition of lock at 0x5378948 ==87323== at 0x483DB42: mutex_lock_WRK (hg_intercepts.c:907) ==87323== by 0x4841A3A: pthread_mutex_lock (hg_intercepts.c:923) ==87323== by 0x49EFEA4: sc_core::sc_cor_pkg_pthread::yield(sc_core::sc_cor*) (sc_cor_pthread.cpp:250) ==87323== by 0x4A1204B: sc_core::sc_simcontext::crunch(bool) (sc_simcontext.cpp:504) ==87323== by 0x4A0DD19: sc_core::sc_simcontext::simulate(sc_core::sc_time const&) (sc_simcontext.cpp:887) ==87323== by 0x4A0FB61: sc_core::sc_start(sc_core::sc_time const&, sc_core::sc_starvation_policy) (sc_simcontext.cpp:1718) ==87323== by 0x4A0FC8C: sc_core::sc_start() (sc_simcontext.cpp:1752) ==87323== by 0x406436: sc_main (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe) ==87323== by 0x49F6E8C: sc_elab_and_sim (sc_main_main.cpp:89) ==87323== by 0x49F6CCE: main (sc_main.cpp:36) ==87323== ==87323== followed by a later acquisition of lock at 0x4ACC978 ==87323== at 0x483DB42: mutex_lock_WRK (hg_intercepts.c:907) ==87323== by 0x4841A3A: pthread_mutex_lock (hg_intercepts.c:923) ==87323== by 0x49EFEC4: sc_core::sc_cor_pkg_pthread::yield(sc_core::sc_cor*) (sc_cor_pthread.cpp:252) ==87323== by 0x4A1204B: sc_core::sc_simcontext::crunch(bool) (sc_simcontext.cpp:504) ==87323== by 0x4A0DD19: sc_core::sc_simcontext::simulate(sc_core::sc_time const&) (sc_simcontext.cpp:887) ==87323== by 0x4A0FB61: sc_core::sc_start(sc_core::sc_time const&, sc_core::sc_starvation_policy) (sc_simcontext.cpp:1718) ==87323== by 0x4A0FC8C: sc_core::sc_start() (sc_simcontext.cpp:1752) ==87323== by 0x406436: sc_main (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe) ==87323== by 0x49F6E8C: sc_elab_and_sim (sc_main_main.cpp:89) ==87323== by 0x49F6CCE: main (sc_main.cpp:36) ==87323== ==87323== Lock at 0x5378948 was first observed ==87323== at 0x484199B: pthread_mutex_init (hg_intercepts.c:785) ==87323== by 0x49EFA83: sc_core::sc_cor_pthread::sc_cor_pthread() (sc_cor_pthread.cpp:79) ==87323== by 0x49EFD5D: sc_core::sc_cor_pkg_pthread::create(unsigned long, void (*)(void*), void*) (sc_cor_pthread.cpp:175) ==87323== by 0x4A17350: sc_core::sc_thread_process::prepare_for_simulation() (sc_thread_process.cpp:334) ==87323== by 0x4A0D819: sc_core::sc_simcontext::prepare_to_simulate() (sc_simcontext.cpp:712) ==87323== by 0x4A0DB83: sc_core::sc_simcontext::initialize(bool) (sc_simcontext.cpp:819) ==87323== by 0x4A0DBC5: sc_core::sc_simcontext::simulate(sc_core::sc_time const&) (sc_simcontext.cpp:844) ==87323== by 0x4A0FB61: sc_core::sc_start(sc_core::sc_time const&, sc_core::sc_starvation_policy) (sc_simcontext.cpp:1718) ==87323== by 0x4A0FC8C: sc_core::sc_start() (sc_simcontext.cpp:1752) ==87323== by 0x406436: sc_main (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe) ==87323== by 0x49F6E8C: sc_elab_and_sim (sc_main_main.cpp:89) ==87323== by 0x49F6CCE: main (sc_main.cpp:36) ==87323== Address 0x5378948 is 24 bytes inside a block of size 128 alloc'd ==87323== at 0x483AF1D: operator new(unsigned long) (vg_replace_malloc.c:342) ==87323== by 0x49EFD52: sc_core::sc_cor_pkg_pthread::create(unsigned long, void (*)(void*), void*) (sc_cor_pthread.cpp:175) ==87323== by 0x4A17350: sc_core::sc_thread_process::prepare_for_simulation() (sc_thread_process.cpp:334) ==87323== by 0x4A0D819: sc_core::sc_simcontext::prepare_to_simulate() (sc_simcontext.cpp:712) ==87323== by 0x4A0DB83: sc_core::sc_simcontext::initialize(bool) (sc_simcontext.cpp:819) ==87323== by 0x4A0DBC5: sc_core::sc_simcontext::simulate(sc_core::sc_time const&) (sc_simcontext.cpp:844) ==87323== by 0x4A0FB61: sc_core::sc_start(sc_core::sc_time const&, sc_core::sc_starvation_policy) (sc_simcontext.cpp:1718) ==87323== by 0x4A0FC8C: sc_core::sc_start() (sc_simcontext.cpp:1752) ==87323== by 0x406436: sc_main (in /home/paulf/scratch/systemc/accellera_systemc/systemc-regressions-2.3.3/run/systemc/1666-2011-compliance/living_dead_bug/systemc.exe) ==87323== by 0x49F6E8C: sc_elab_and_sim (sc_main_main.cpp:89) ==87323== by 0x49F6CCE: main (sc_main.cpp:36) ==87323== Block was alloc'd by thread #1 ==87323== ==87323== Lock at 0x4ACC978 was first observed ==87323== at 0x484199B: pthread_mutex_init (hg_intercepts.c:785) ==87323== by 0x49EFA83: sc_core::sc_cor_pthread::sc_cor_pthread() (sc_cor_pthread.cpp:79) ==87323== by 0x49EFFBC: __static_initialization_and_destruction_0(int, int) (sc_cor_pthread.cpp:63) ==87323== by 0x49EFFEF: _GLOBAL__sub_I_sc_cor_pthread.cpp (sc_cor_pthread.cpp:285) ==87323== by 0x40108ED: call_init.part.0 (in /usr/lib64/ld-2.32.so) ==87323== by 0x40109D7: _dl_init (in /usr/lib64/ld-2.32.so) ==87323== by 0x40010C9: ??? (in /usr/lib64/ld-2.32.so) ==87323== Address 0x4acc978 is 24 bytes inside data symbol "_ZN7sc_coreL8main_corE" Quote Link to comment Share on other sites More sharing options...
AmeyaVS Posted December 13, 2020 Author Report Share Posted December 13, 2020 Hello @Paul Floyd, Thank you for the confirmation. I did try out the valgrind tool with helgrind to understand the underlying threading issue. Even I came up to the same conclusion then, but I have lost the references to the underlying changes and discussions in the glibc posix_thread_* constructs. I did spend sometime looking into the issue and the internal implementation in SystemC library. But from what I could gather is that the usage of pthread_cond variable is somewhat inconsistent. As what I understand from earlier comments, people in the working groups probably don't have access to newer Linux systems to reproduce the issue. We switched to using QuickThread Implementation later, but lost a consistent way to analyze simulation threads, and synchronization issues. Currently, I don't even know if there are any plans to modernize the SystemC kernel with C++11 std::thread's, which would ease a lot for platform/tool support. Regards, Ameya Vikram Singh Quote Link to comment Share on other sites More sharing options...
Paul Floyd Posted December 13, 2020 Report Share Posted December 13, 2020 I might one day have a go at getting QuickThreads to work with Valgrind, but that is another kettle of fish. In the meantime this testcase has one potential deadlock plus an error when deallocating resources. Quote Link to comment Share on other sites More sharing options...
maehne Posted December 16, 2020 Report Share Posted December 16, 2020 Thanks for reporting this issue, I have forwarded it to the SystemC LWG. Quote Link to comment Share on other sites More sharing options...
Andy Goodrich Posted January 26, 2021 Report Share Posted January 26, 2021 Can you tell me if your regression log file had either of these two messages? Warning: (W558) disable() or dont_initialize() called on process with no static sensitivity, it will be orphaned: top.target In file: /Users/acg/work/sysc/systemc-2.3.2/src/sysc/kernel/sc_simcontext.cpp:766 Warning: (W558) disable() or dont_initialize() called on process with no static sensitivity, it will be orphaned: top.control.dyn_target In file: /Users/acg/work/sysc/systemc-2.3.2/src/sysc/kernel/sc_simcontext.cpp:1219 In process: top.control @ 0 s Quote Link to comment Share on other sites More sharing options...
AmeyaVS Posted January 26, 2021 Author Report Share Posted January 26, 2021 Hello @Andy Goodrich, Here is the snippet of the log that is generated in my SystemC environment setup with pthreads enabled: SystemC 2.3.4_pub_rev_20191203-Accellera --- Jan 26 2021 22:27:50 Copyright (c) 1996-2019 by all Contributors, ALL RIGHTS RESERVED SystemC Simulation Warning: (W558) disable() or dont_initialize() called on process with no static sensitivity, it will be orphaned: top.target In file: ../src/sysc/kernel/sc_simcontext.cpp:771 Warning: (W558) disable() or dont_initialize() called on process with no static sensitivity, it will be orphaned: top.control.dyn_target In file: ../src/sysc/kernel/sc_simcontext.cpp:1233 In process: top.control @ 0 s Success The test executable never completes it's execution as it hangs due to a race condition. Regards, Ameya Vikram Singh Quote Link to comment Share on other sites More sharing options...
Andy Goodrich Posted February 2, 2021 Report Share Posted February 2, 2021 Interesting, the fact you got the "Success" indicates that SystemC returned to the main thread. That message is from sc_main(). The hang is occurring during tear-down of the "target" thread after exit from sc_main(). The culprit is this kill within the control thread: target_handle.kill(); So could I ask one more favor and have you comment out that call and lets see if things get all the way through. Thanks. Quote Link to comment Share on other sites More sharing options...
AmeyaVS Posted February 7, 2021 Author Report Share Posted February 7, 2021 Hello @Andy Goodrich, Sorry for the delayed response. As per your suggestion removing the call to kill the thread. Let's the tests to completion without test failure. Regards, Ameya Vikram Singh Quote Link to comment Share on other sites More sharing options...
Andy Goodrich Posted February 10, 2021 Report Share Posted February 10, 2021 Interesting, the fact you got the "Success" indicates that SystemC returned to the main thread. That message is from sc_main(). The hang is occurring during tear-down of the "target" thread after exit from sc_main(). The culprit is this kill within the control thread: target_handle.kill(); So could I ask one more favor and have you comment out that call and lets see if things get all the way through. Thanks. Thanks for testing, let me look at things a bit, and I'll get back to you. Looks like there is an interaction between the teardown of threads and the code that "kills" a thread within the SystemC simulator. Quote Link to comment Share on other sites More sharing options...
Andy Goodrich Posted February 10, 2021 Report Share Posted February 10, 2021 My response ended up in our previous message exchange on my screen, so let me send it again just in case... Thanks for testing, let me look at things a bit, and I'll get back to you. Looks like there is an interaction between the teardown of threads and the code that "kills" a thread within the SystemC simulator. Quote Link to comment Share on other sites More sharing options...
AmeyaVS Posted February 10, 2021 Author Report Share Posted February 10, 2021 Hello @Andy Goodrich, Thank you for the update. As also pointed out by @Paul Floyd in his post regarding Helgrind report here: Probably the Pthread threading constructs and implementation in SystemC needs to be looked into. Earlier also around the same time I reported the issue, I also did an analysis and found that the underlying constructs used in, pthread were updated. Something to do with futex updates for pthread in libc and the Linux Kernel. I don't have my notes now, since I have moved out. But one can probably look at differences in various system library versions for the operating system releases e.g. Ubuntu 16.04 vs Ubuntu 20.04 releases or even Fedora releases from the same time frame. Hope it helps. Regards, Ameya Vikram Singh Quote Link to comment Share on other sites More sharing options...
Andy Goodrich Posted February 13, 2021 Report Share Posted February 13, 2021 The difference between a run with the kill on thread "target" and not doing that kill is that the sc_process_b object associated with "target" is reduced to a reference count of 1 from 2 by that kill. During tear down a reference decrement occurs and as a result the reference count for "target" goes to 0 causing an actual deletion of its sc_cor_pthread instance. Something about that deletion is causing the issue. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.