Jump to content

Large model execution hitting segmentation fault - suggestions?


Recommended Posts

Posted

I have a model of a large system which I am attempting to compile / simulate but am hitting a seg fault during elaboration. The model is parameterizable, and to some extent I can cause the seg fault to occur at different times during the elaboration by changing the parameters.

I have done the obvious things I can find regarding suggestions... ulimit returns "unlimited", and I have varied my available memory (this is running in a VM) up to 32GB with no change. I have run the same model on a different system with the same result. The VM is Ubuntu while the other system is a RedHat production system. I am using SystemC version 2.3.3 on one and 2.3.4 on the other. And I'm compiling with c++17 flags set.

I launched the executable with gdb and did a backtrace. I'm not an application debugger, so the output doesn't tell me just other than it's in the bowels of SystemC libraries (which I expected). I'll paste the output below.

I've added a series of cout calls in the code to determine where the fail is happening also. Each time is it in the middle of a series of binds which are at different points of a loop. It's not near an extreme, and I am relatively confident in my indexing (although that could be an issue). The fact that the exact fail point moves leads me to believe that is not the issue.

One final thing I tried was dumping the output of the getrlimit command to see if my stack was limited. It reports an enormous number, which I believe corresponds to my ulimit report of unlimited.

I'm uncertain where to go next. I accept that I'm trying to build a large model, but it's hard to believe it is that large. All of my allocation of module hierarchy is dynamic, and my calls to new are not the fail points...

Any advice on next debugging steps, or insight into potential limits of the libraries would be appreciated.

Here is the gdb dump if it is informative for anyone...

Program received signal SIGSEGV, Segmentation fault.
0x000055555557656f in sc_core::sc_port_b<sc_core::sc_signal_inout_if<bool> >::bind (this=0x555555a9a548, interface_=...)
    at /opt/systemc-2.3.4//include/sysc/communication/sc_port.h:286
286		{ base_type::bind( interface_ ); }
(gdb) bt
#0  0x000055555557656f in sc_core::sc_port_b<sc_core::sc_signal_inout_if<bool> >::bind (this=0x555555a9a548, interface_=...)
    at /opt/systemc-2.3.4//include/sysc/communication/sc_port.h:286
#1  0x0000555555578586 in sc_core::sc_port_b<sc_core::sc_signal_inout_if<bool> >::operator() (this=0x555555a9a548, interface_=...)
    at /opt/systemc-2.3.4//include/sysc/communication/sc_port.h:289
#2  0x0000555555574dfa in Dut<Payload_t, CheckerPkt_t>::Dut (this=0x7fffffe5c7f0, _name=..., _cfg_info=..., _box_id=0) at ./include/dut.h:86
#3  0x000055555556f436 in sc_main (argc=1, argv=0x55555560ed10) at src/main.cpp:25
#4  0x00007ffff7e7ab76 in sc_elab_and_sim () from /opt/systemc-2.3.4//lib-linux64/libsystemc-2.3.4.so
#5  0x00007ffff78a80b3 in __libc_start_main (main=0x7ffff7e720f0 <main>, argc=1, argv=0x7fffffffdfa8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffdf98) at ../csu/libc-start.c:308
#6  0x000055555556f2ae in _start ()

 

Posted

Hello @Bryan,

You quote:

1 hour ago, Bryan said:

I have a model of a large system which I am attempting to compile / simulate but am hitting a seg fault during elaboration. The model is parameterizable, and to some extent I can cause the seg fault to occur at different times during the elaboration by changing the parameters

Seems to an issue with seems like stack is getting corrupted.

I would probably look into this, if you have a stack state object you are passing as a reference to another object and this objects holds the reference longer than the object lifetime on stack.

Regards,

Ameya Vikram Singh

Posted

Processes in systemC receive a stack size of only 64k by default. Your main stack may be large, but individual processes are small. It is recommended to keep data in the heap and only use references or pointers. If your payload is large in and sc_signal, that could explain.

Posted

Ameya and David,

Thank you both for the replies. Following up on my problem, it turns out I had a combination of issues which compounded to make this harder to find. The change to my parameters which made it appear that the issue was stack size related was a partial red-herring. I just happened to be crossing a threshold where the compile wasn't hitting a seg fault. Ultimately I had 2 issues:

  • In one of the many for loops I use to construct my design, I had a "<=" in the condition test which should have been just a "<"
  • Once resolved, I DID in fact still need to increase my stack size to avoid a seg fault in sc_main.

After many iterations, I have the design elaborating cleanly now.

Unfortunately, I've hit a new problem that I'm even less equipped to resolve. Again soliciting advice on where I go from here. I have used setrlimit to set the stack size both very large and smaller, and it does not change this behavior. I did reduce the size of some of the parameters of my design, and it will go further before it hits a seg-fault:

Warning: (W518) stack setup failed: unsuccessful stack protection ignored: Cannot allocate memory, address=0x, enable=true
In file: ../../../src/sysc/kernel/sc_cor_qt.cpp:172

Error: (E518) stack setup failed: failed to allocate stack memory
In file: ../../../src/sysc/kernel/sc_cor_qt.cpp:273

 

When I shrink the parameters, I still get the Warning above but execution will continue. Then I received a seg-fault after it starts a brief execution:

Warning: (W518) stack setup failed: unsuccessful stack protection ignored: Cannot allocate memory, address=0x, enable=true
In file: ../../../src/sysc/kernel/sc_cor_qt.cpp:172
reset

<snip of output>

Segmentation fault (core dumped)

 

I can turn off some work to see if I get past the seg-fault of the second message, but I'm uncertain what to do to address the stack setup error of the first messages. Advice or thoughts on where to look?

 

Posted

After more research, I found some information on the default stack size for threads in the systemc kernel. I discovered the ability to override the default stack size when compiling during the install. So I bumped it up about 4x of the default size.

Now I still get the W518 warning, but my simulation actually begins! It hits a different issue which is my own fault, so I shall debug and correct. But the increased default stack size has allowed me to move forward.

Should I be concerned about the W518 warning, or can it be ignored since the sim has begun executing?

Posted

SystemC has its own stack size method. Please refer to IEEE Std 1666-2011 section 5.2.16.

Also, it is rare that a design should use 4x the stack space, which suggests to me that you are misusing automatic variables. I assume you have a good grasp of C++ memory allocation rules. Are you creating anything with vanilla C-style arrays in your struct's or method's? You should convert those to std::vector or sc_vector as appropriate. This problem often occurs with video processing applications.

How many SystemC processes do you have running?

Posted

David,

You may be correct in that I may not be handling my variables correctly (I am not a highly experienced C++ user). I have in the past attempted to use sc_vectors to pass signals and fifo interfaces across modules of my design, but I ran into challenges with those. I abandoned the sc_vectors and replaced them with multidimensional arrays of ports. I am building a model of a design which has a multiple layers of hierarchy, each layer assembles modules with their own indexing. At the top, I have multidimensional arrays which are 4-dimensions. That felt beyond my ability when it came to creating the vectors, and the math gets a bit complicated when determining index positions within the vectors for binding.

But to make certain I have terminology correct... I do not create new arrays within my SC_THREADs or SC_METHODs. I do define arrays in my class objects. Those consist of either ports (sc_in / sc_out), fifo interfaces, or multidimensional arrays of fifo pointers (I dynamically create the fifos when running the constructor). The total number of fifos I use within the design is quite large for data movement.

So while this is not video application, it is a very large scale machine which I am modeling.

Regarding the number of processes... I can calculate the number of SC_THREADs which I am launching. That is ~20,000. I have converted some threads to methods where it was straightforward to do so based on other suggestions. But in addition to the SC_THREADs, I also spawn a number of dynamic processes as well. That number is quite large at ~132,000. (I think my math adding those up is correct) If there is a mechanism to report the number of active threads from within my model, I'm happy to do that.

The model does continue to run now with my logic corrections which I still see the W518. It's not fast by any means, but it does generate cycles.

Posted

That is a large model indeed. It might be worth understanding how much stack each of your processes actually uses (worst case stack analysis). I would guess that the processes are relatively uniform. If using gcc, you can obtain a map from the linker with stack usage information using a switch (-fstack-usage I think). 
 

The rules for stack usage are fairly straight forward  Memory is allocated in roughly 4 different areas, based on syntax of declaration:

memory is allocated in roughly 4 different areas, based on syntax of declaration:

1. True constants will be stored inside the code text (.text).

2. Globals, and any variable declared static will be stored in either the data section or the block symbol storage section of memory (.data and .bss respectively).

3. Anything created with new or various forms of malloc (dynamic memory allocation) will be created on the free store and managed with pointers.The free store is known as the heap section (.heap).

4. Finally, remaining variables are declared inside functions (i.e. methods), and will be located either on the stack (most likely) or in registers (depending upon the processor).

you can use the size of operator to determine the size of various data types.

 

 

Posted

David,

Your recommendation was helpful on the -fstack-usage recommendation (that is the correct option). It uncovered a handful (~4) large stack consumers which were statically allocating. One being the top-level testbench itself as I didn't dynamically create it in my sc_main function. I reduced each of those down to more reasonable sizes. The challenge left is the sheer number of processes that I generate as I commented prior. Even with this cleanup, I still require the larger stack size compilation of systemc.

One example: In one module, the call to before_end_of_elaboration reports 624 bytes of static allocation. That function was a much larger reporter before I improved it. The large consumer in that function is the spawning of 32 dynamic processes, so the spawn_options (which I now dynamically create rather than statically previously). That particular process is repeated 4096 times in my full model, so alone it accounts for ~2.55MB of static allocation. I believe that is my largest single static consumer with the next one being ~850kB in total.

To your 1 types of storage:

1 - I have very few constants that I use across the model (< 50)

2 - I have no global variables (unless SystemC itself creates them), and no static variables

3 - I have converted everything which is obvious to me to convert to dynamic allocations. The pointers themselves still require storage though, and I believe that is a significant portion or my reported stack usage

4 - Plenty of these. I have 17 functions which report > 128 bytes of static allocation, most are smaller but repeated many times.

Do you, or anyone else, know if online (or in print) resources which discuss optimizing a large model like this for run-time execution? That would be a future step for me. I have converted some threads to methods, and I have used some dynamic processes as well. I pass pointers or references where I know how to do so. I cannot help but wonder if there is a way to improve my performance with the sheer number of sc_fifo instances which exist in my model. It would take me a while to summarize the number.

Thanx again for your assistance. (btw, I have taken your Duolos SystemC & TLM courses before, so we have had conversations during those - about a year ago...)

Posted

Bryan-

I don't know what you're modeling but it indeed sounds like a very large system.

There are many cases where functionality must be modeled via separate processes (and even dynamic processes).

But there are some cases where separate processes are not actually needed. Case in point: you don't necessarily need to use

separate processes and sc_fifo to model complex fifo interconnect. It is possible to model the interconnect and fifos all within a single process in many cases,

via procedural code. Whether this is applicable in your case depends on what you're trying to achieve in your model.

Generally speaking, the more functionality you "collapse" into a single process (rather than using separate processes), the more abstract your model becomes,

which is usually a goal with SystemC models. For example, in TLM2 virtual platforms, the general idea is to only have a separate process for each system initiator.

-Stuart Swan

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...