Jump to content

Segfault on sc_process_handle.terminated_event() in sc_event_and_list


Recommended Posts

I have the following code:

sc_core::sc_event_and_list processes_complete;

for(int i = 0; i < 10; ++i){
    auto process = sc_core::sc_spawn(sc_bind(&My_Class::my_method, this, args));
    processes_complete &= process.terminated_event();
    wait(10, SC_US); //Comment me
}
                      
wait(processes_complete);  //Line 58

And I get a Segmentation Fault with the following traceback:

#0  0x00007ffff7a1439d in sc_core::sc_event_list::add_dynamic(sc_core::sc_thread_process*) const () from /opt/systemc-2.3.2/lib-linux64//libsystemc-2.3.2.so
#1  0x00007ffff7a36cf2 in sc_core::wait(sc_core::sc_event_and_list const&, sc_core::sc_simcontext*) () from /opt/systemc-2.3.2/lib-linux64//libsystemc-2.3.2.so
#2  0x00005555555801d2 in sc_core::sc_module::wait (el=..., this=0x7fffffffd8c8) at /opt/systemc-2.3.2/include/sysc/kernel/sc_module.h:189
#3  My_Class::my_other_method (this=0x7fffffffd8c8) at source/my_class.cpp:58
#4  0x00007ffff7a31caf in sc_core::sc_thread_cor_fn(void*) () from /opt/systemc-2.3.2/lib-linux64//libsystemc-2.3.2.so
#5  0x00007ffff7b135df in qt_blocki () from /opt/systemc-2.3.2/lib-linux64//libsystemc-2.3.2.so

My understanding is that the sc_process_handle should persist even if the sc_thread_process itself is destroyed, and the terminated_event is a member of the sc_process_handle, but looking at the traceback, it looks like the sc_event_and_list is doing something with the sc_thread_process, which probably doesn't exists anymore.

Any suggestions; am I doing something wrong here?  Do I need to keep the sc_process_handle in scope in order to prevent its terminated_event from being destroyed?

EDIT:  It appears keeping the sc_process_handle in scope is necessary:

If I add the sc_process_handles to a list, ie 

sc_core::sc_event_and_list processes_complete;
std::list<sc_process_handle> processes;

for(int i = 0; i < 10; ++i){
    auto process = sc_core::sc_spawn(sc_bind(&My_Class::my_method, this, args));
    processes.push_back(process);
    processes_complete &= process.terminated_event();
    wait(10, SC_US); //Comment me
}
                      
wait(processes_complete);  //Line 58

Then the sefault disappears.

Interestingly, if I remove the wait(10, SC_US) from the for loop, it also disappears, but that may just be luck as to when the handle is destroyed.

On that note, do I need to keep handles on spawned processes if I just want them to run and be destroyed when the finish?  ie, I have a lot of code that looks like

Quote

sc_core::sc_spawn(sc_bind(&My_Class::my_method, this, args));

For which I don't assign the return value of sc_spawn.

 

EDIT2:

I am sometimes getting a segfault a the end of my simulation with the following traceback:

#0  0x00007ffff7a4d724 in sc_core::sc_thread_process::next_exist (this=0x555500000000) at ../../src/sysc/kernel/sc_thread_process.h:426
#1  0x00007ffff7a4f0e3 in sc_core::sc_process_table::queue<sc_core::sc_thread_process*>::remove (this=0x5555557d5fe8, handle=0x555555933400)at kernel/sc_simcontext.cpp:184
#2  0x00007ffff7a4e32b in sc_core::sc_process_table::remove (this=0x5555557d5fe0, handle=0x555555933400) at kernel/sc_simcontext.cpp:141
#3  0x00007ffff7a49067 in sc_core::sc_simcontext::remove_process (this=0x7ffff6ff2900, handle=0x555555933400) at kernel/sc_simcontext.cpp:200
#4  0x00007ffff7a552c7 in sc_core::sc_thread_process::~sc_thread_process (this=0x555555933400, __in_chrg=<optimized out>) at kernel/sc_thread_process.cpp:484
#5  0x00007ffff7a55300 in sc_core::sc_thread_process::~sc_thread_process (this=0x555555933400, __in_chrg=<optimized out>) at kernel/sc_thread_process.cpp:486
#6  0x00007ffff7a42fb0 in sc_core::sc_process_b::delete_process (this=0x555555933400) at kernel/sc_process.cpp:177
#7  0x00007ffff7a2ecfd in sc_core::sc_process_b::reference_decrement (this=0x555555933400) at ../../src/sysc/kernel/sc_process.h:611
#8  0x00007ffff7a4ea9c in sc_core::sc_simcontext::crunch (this=0x7ffff6ff2900, once=false) at kernel/sc_simcontext.cpp:525
#9  0x00007ffff7a4a475 in sc_core::sc_simcontext::simulate (this=0x7ffff6ff2900, duration=...) at kernel/sc_simcontext.cpp:889
#10 0x00007ffff7a4c3e0 in sc_core::sc_start (duration=..., p=sc_core::SC_EXIT_ON_STARVATION) at kernel/sc_simcontext.cpp:1710
#11 0x00007ffff7a4c52e in sc_core::sc_start () at kernel/sc_simcontext.cpp:1745
#12 0x00005555555a0164 in sc_main (argc=<optimized out>, argv=<optimized out>) at source/main.cpp:42
#13 0x00007ffff7a3583f in sc_core::sc_elab_and_sim (argc=2, argv=0x7fffffffec98) at kernel/sc_main_main.cpp:87
#14 0x00007ffff7a3565b in main (argc=2, argv=0x7fffffffec98) at kernel/sc_main.cpp:36
#15 0x00007ffff7d8f954 in __libc_start_main () from /lib/ld-musl-x86_64.so.1
#16 0x0000000000000000 in ?? ()

Using GDB, I see that handle->name() is one of my statically created processes.  Not sure why next_exist() is causing a segfault, but I'm beginning to wonder, is this because I'm not keeping handles on my dynamically created processes?  

EDIT 3:  Found, what I think is the cause of this problem -- apparently you shouldn't sc_spawn threads in a module's start_of_simulation() method.  It seems this is unrelated to my previous question which still stands:  If I don't keep a process_handle on a spawned process, is it allowed to run indefinitely, or is there a change is will be destroyed.  Based on my current code, it looks like it gets to run forever (or until completion), but I'd like to confirm that.

Link to comment
Share on other sites

It is in my code:

void My_Class::start_of_simulation(){
 	sc_core::sc_spawn(sc_bind(&My_Class::my_method, this));
 }

I don't see exactly why this should cause problems, but if I do it, then approximately 50% of the time at the end of simulation, I get a segfault with the above traceback.

Because the segfault is happening when traversing the sc_process_table::queue, I'm guessing that when spawning a thread at start of simulation (which intentionally finishes, by the way), it gets added to the process_table, but for whatever reason, never removed.  The process itself is removed (I think this happens in sc_simcontext::crunch), but because the table isn't updated, the linked-list has an invalid ptr, hence the segfault.  Just a guess though.

EDIT:

Sorry, I think I misunderstood your question, you are asking where the traceback is from?  It is triggered at the end of the execution during model cleanup; you can see that it is in the destructor ( sc_core::sc_thread_process::~sc_thread_process ).  That's actually the destructor of one of my statically created (SC_THREAD) threads, not the dynamic one.  During its destruction, it traverses the sc_process_table::queue and hits a bad pointer when doing sc_core::sc_thread_process::next_exist.

Link to comment
Share on other sites

On the first question:

Yes, you do need to keep the handle to the (terminating) process alive, if you continue to reference any related object like the terminated event.  Otherwise, when the process terminates without any existing handles to it, the event will be destroyed together with the process instance itself and you're sensitive to a no-longer-existing event, causing the memory corruption.  The crash itself is fixed in the master branch of the SystemC proof-of-concept simulator, but not released yet.  This fix would then lead to removing the event from any waiting processes, though.  So in this case, you would just miss the notification.

On the second part:

Which version of SystemC are you using?  Can you confirm this with SystemC 2.3.2?

Greetings from Duisburg,
  Philipp

Link to comment
Share on other sites

Some further comments:

  • You don't need to keep handles to processes, if you don't intend to interact with them later.
  • The crash you saw was likely caused by the fact, that the process was killed before you tried to register the sensitivity to it through your sc_event_and_list. Even if you kept the handle alive, you would still miss the notification: the process has already terminated when you run your wait(processes_complete); and the event will not be triggered again.
  • It would be much safer to only store the handles and build up the event list locally to wait for the completion of still running processes:
    std::vector<sc_process_handle> processes;
    
    for(int i = 0; i < 10; ++i){
        auto process = sc_core::sc_spawn(sc_bind(&My_Class::my_method, this, args));
        processes.push_back(process);
        wait(10, SC_US);
    }
    
    // wait for running processes
    {
      sc_event_and_list processes_running;
      for( const auto& p : processes ) {
        if ( p.valid() && !p.terminated() )
          processes_running &= p.terminated_event();
      }
      wait( processes_running );
    }

     

For the second part (crash during process cleanup):

  • Do you see a difference when spawning the thread during end_of_elaboration()?
  • sc_spawn'ed threads from an start_of_simulation() callback a strictly speaking "dynamic threads" in the SystemC terminology, yes.  But internally, they are still added to the process table in order to start them in the correct order relative to the rest of the kernel initialization.  In the destructor however, only non-dynamic threads are removed from the process table.  This means, you may have indeed found a bug in SystemC 2.3.2.

I'll check this with the Language Working Group.

Thanks and Greetings from Duisburg,
  Philipp

Link to comment
Share on other sites

Hi Philipp,

Thanks for your reply. I like your suggestion of building the sc_event_and_list immediately before waiting on it and checking that processes are still alive while doing so; I'm sure I would have hit hard-to-debug problems if I had waited on processes that had already terminated, even if I kept the handles around.

With regards to the second problem, I can confirm I am using SystemC 2.3.2.  I have not tried spawning at end_of_elaboration; I've refactored my code so my spawn occurs in another location--neither end_of_elaboration, nor start_of_simulation.  If I get a chance, I will run that experiment.  In any case, I agree that it seems the current implementation has a bug by not removing the dynamic thread, created in start_of_simulation, from the process table.  The logic needed in the destructor to identify these particular dynamic threads might not be worth it since one could simply use the SC_THREAD macro to statically spawn them at the start (potentially with a helper function if dynamic arguments need to be bound).  The fix could be as simple as documenting that sc_spawn'ing in start_of_simulation is not allowed.

Link to comment
Share on other sites

Also, one small modification to your suggested code:  I noticed that if you wait on an empty sc_event_and_list, it will never progress (which kind of makes sense--with no events, there's nothing to wake the thread).  With that in mind, I would change

wait(processes_running);

to

if(processes_running.size())
  wait(processes_running);

On the other hand, it might be nice for debuggability if waiting on an empty sc_event_and_list didn't wait at all or threw an exception.

Link to comment
Share on other sites

Update:  I wrote the following in an attempt to replicate the start_of_simulation sc_spawn segfault and found that it's pretty hard to re-create; the following program appears to do it pretty reliably though:

#define SC_INCLUDE_DYNAMIC_PROCESSES
#include <systemc.h>
#include <iostream>

class Test : public sc_core::sc_module {
	SC_HAS_PROCESS(Test);

public:
	void run(){
		std::cout << "Spawned" << std::endl;
	}

	void run2(){
		for(int i = 0; i < 10; ++i){
			wait(10, SC_US);
			sc_core::sc_spawn(sc_bind(&Test::run, this));
		}
	}

	void run3(){
		for(int i = 0; i < 100; ++i)
			wait(10, SC_US);
	}

	Test(const sc_core::sc_module_name &name_){
		SC_THREAD(run2);
		SC_THREAD(run3);
	}

	void start_of_simulation(){
		for(int i = 0; i < 10; ++i)
			sc_core::sc_spawn(sc_bind(&Test::run, this));
	}
};

int sc_main(int argc, char *argv[]){
	Test test("test");

	sc_start();
}

// g++ -O3 -std=c++14 -I/opt/systemc-2.3.2/include -L/opt/systemc-2.3.2/lib-linux64 -lsystemc test.cpp

Also, if I leave everything the same but switch out start_of_simulation for end_of_elaboration, I still get the segfault.

The bug depends on the order of additions to the process table (because the static process' destructor will walk the table until it finds a match).  If I comment out either run3 or the spawn in run2, I don't see the segfault.  Hopefully this is helpful in finding a solution.

Link to comment
Share on other sites

20 hours ago, sheridp@umich.edu said:

The fix could be as simple as documenting that sc_spawn'ing in start_of_simulation is not allowed.

IEEE 1666-2011 explicitly allows spawning threads from start_of_simulation, see 4.4.3 (b).

19 hours ago, sheridp@umich.edu said:

On the other hand, it might be nice for debuggability if waiting on an empty sc_event_and_list didn't wait at all or threw an exception.

I agree.  Having some way to decide on the behavior on an empty event list would be helpful.  I'll forward your suggestions to the LWG.

19 hours ago, sheridp@umich.edu said:

Also, if I leave everything the same but switch out start_of_simulation for end_of_elaboration, I still get the segfault.

The bug depends on the order of additions to the process table (because the static process' destructor will walk the table until it finds a match).  If I comment out either run3 or the spawn in run2, I don't see the segfault.  Hopefully this is helpful in finding a solution.

Thanks for preparing the example!  I'll try to reproduce and investigate further.

Greetings from Duisburg,
  Philipp

Link to comment
Share on other sites

On 4/30/2018 at 6:30 PM, sheridp@umich.edu said:

On the other hand, it might be nice for debuggability if waiting on an empty sc_event_and_list didn't wait at all or threw an exception.

In fact, IEEE 1666-2011 already requires an error when calling wait/next_trigger with an empty event list (5.2.17, 5.2.18).  This is currently not the case for the proof-of-concept implementation and will be fixed in the next version.

Unfortunately, I have not been able to reproduce the crash using your example on any of my setups.  Can you please try if the attached patch fixes the crash for you?

Thanks,
  Philipp

0001-sc_-_process-remove-dynamic-processes-from-process_t.patch

Link to comment
Share on other sites

Hi Philipp,

Sure. Can you point me in the direction of the official SystemC repo (I currently just downloaded the source from http://accellera.org/downloads/standards/systemc).

I found the following:  https://github.com/systemc/systemc-2.3 , which looks official, but it's not version 2.3.2.  On the other hand, I found https://github.com/avidan-efody/systemc-2.3.2, but it looks unofficial.

If there is no official repo that includes 2.3.2, I can just create one from the downloaded source.

Thanks,

Patrick

Link to comment
Share on other sites

Hi Patrick,

right now, there is no official publicly accessible repository of the SystemC proof-of-concept implementation.  Both repos you mention above are not owned by Accellera.  So applying the patch manually on top of a local source tree is sufficient:

cd systemc-2.3.2
patch -p1 < 0001_sc_process_*.patch

# usual install steps
mkdir -p objdir
cd objdir
../configure
make
make install

Thanks,
  Philipp

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...