Jump to content

Roman Popov

Members
  • Content Count

    269
  • Joined

  • Last visited

  • Days Won

    29

Everything posted by Roman Popov

  1. Because of dynamic dispatch. https://en.cppreference.com/w/cpp/language/virtual
  2. Roman Popov

    How can I implement SC_FORK JOIN_ANY /SC_FORK JOIN_NONE

    Ok, thanks a lot for example. I'm not using UVM myself and have no idea what it does. But it looks like UVM does not allow you to call wait() inside run_phase, because it terminates the thread immediately. If you rewrite your code this way: void run_phase(uvm::uvm_phase& phase) { cout<< " run: before fork/join" << endl; // cout << "run: after fork/join" << endl; wait(40,SC_NS); cout<<"Current time is "<< sc_time_stamp() << endl; } You will see that "Current time is.." won't get printed. And because thread is terminated, event_list got destroyed and you see the error you see. From my perspective it is UVM-SystemC bug, it should prevent you from calling wait() from run_phase, by reporting something like "wait() is not allowed inside run_phase". I suggest you to post the issue to UVM SystemC support forum (http://forums.accellera.org/forum/38-systemc-verification-uvm-systemc-scv/) and see if anyone of UVM-SystemC developers can help/comment. If you put your code in a regular SC_THREAD you will see that everything works as expected .
  3. Roman Popov

    How can I implement SC_FORK JOIN_ANY /SC_FORK JOIN_NONE

    This is interesting, comment in source code says that it reports this error when event_list is destroyed, but some process is still waiting for it. Can you provide a small reproducer (complete code example) showing how you use it?
  4. SystemC has no special concept similar to SystemVerilog interfaces. Instead you can use regular SystemC modules to work as SystemVerilog interfaces. SystemC has it's own notion of interface that is similar to interface notion in object-oriented programming: interface is basically a set of pure virtual functions. So for example you can define a simple memory interface as a set of mem_read and mem_write functions. SystemC requires that interfaces inherit virtually from sc_interface: struct memory_if : virtual sc_interface { virtual uint32_t mem_read( uint32_t address) = 0; // reads data from memory at given address virtual void mem_write( uint32_t address, uint32_t data) = 0; // writes data to memory at given address }; Next you can define a module that implements and exports this interface. For example: struct memory_module : sc_module, memory_if { sc_export<memory_if> mem_export{"mem_export"}; SC_CTOR(memory_module) { mem_export.bind(*this); } private: std::array<uint32_t , 1024> mem; uint32_t mem_read(uint32_t address) override { return mem.at(address >> 2); } void mem_write(uint32_t address, uint32_t data) override { mem.at(address >> 2) = data; } }; You can have other modules implementing memory_if. For example you can write a module that translates mem_read and mem_write function calls into a cycle-accurate signal-level memory protocol. And this is what SystemVerilog interfaces often do. Now the closest analog to SystemVerilog virtual interface is SystemC sc_port. For example you can define a test_module that has a sc_port<memory_if> and sends memory reads and writes using this port: struct test_module : sc_module { sc_port<memory_if> mem_port{"mem_port"}; SC_CTOR(test_module) { SC_THREAD(test_thread); } private: void test_thread() { cout << " test_thread \n"; const int TEST_SIZE = 10; for (int i = 0; i < TEST_SIZE; ++i) { cout << hex << "Write mem[" << i * 4 << "] = " << i << "\n"; mem_port->mem_write(i*4, i); } for (int i = 0; i < TEST_SIZE; ++i) { cout << hex << "Read mem[" << i * 4 << "] == " << mem_port->mem_read( i * 4 ) << "\n"; } } }; To create a simulation you instantiate both memory_module and test_module and bind port to export: int sc_main (int argc, char **argv) { test_module tmod{"tmod"}; memory_module mmod{"mmod"}; tmod.mem_port.bind(mmod.mem_export); sc_start(); return 0; }
  5. It is explained in section 14.13 of SystemC standard. You should read all items from a) to n) to understand all usage details. From my experience, most models that I've seen do not use byte enables. But let's consider a basic example. Suppose we have a a memory target and it receives following transaction: { command = TLM_WRITE_COMMAND, address = 0x0, data[4] = { 0x1, 0x2, 0x3, 0x4 } byte_enable[4] = { 0x0, 0xff, 0x0, 0xff } } byte_enable [ N ] == 0x0 means that N'th byte of data should be ignored by target. Consider that memory had following values before processing the transaction: memory[ 0 ] == 0xA; memory[ 1 ] == 0xB; memory[ 2 ] == 0xC; memory[ 3 ] == 0xD; Since byte_enable[0] and byte_enable[2] are equal to 0x0, it means that corresponding bytes will be ignored during processing of transaction. So after processing the transaction memory will hold following values: memory[ 0 ] == 0xA; memory[ 1 ] == 0x2; memory[ 2 ] == 0xC; memory[ 3 ] == 0x4;
  6. Roman Popov

    Concatenate in SystemC

    As expected: https://stackoverflow.com/questions/52550/what-does-the-comma-operator-do
  7. Roman Popov

    Concatenate in SystemC

    Do you want to concatenate a group of sc_signal<bool> into sc_signal<sc_uint> ? You need to create a SC_METHOD to do this, for example: sc_signal<bool> bool_sig0{"bool_sig0"}; sc_signal<bool> bool_sig1{"bool_sig1"}; sc_signal<sc_uint<2>> uint_sig{"uint_sig"}; ... SC_METHOD(concat_method); sensitive << bool_sig0 << bool_sig1; ... void concat_method() { sc_uint<2> new_val; new_val[0] = bool_sig0; new_val[1] = bool_sig1; uint_sig = new_val; }
  8. Roman Popov

    sc_fifo with multiple readers (or writers)

    This is a common confusion, but in fact there is no real parallelism in SystemC. What is called SC_THREAD process in SystemC is commonly known as coroutine in software world (or fiber, if you are familiar with Windows API). So you can assume a SystemC model is a single-threaded application. Here is what SystemC standard says: 4.2.1.2 Evaluation phase So: Co-routine semantics is guaranteed. "thread-safety" issues from parallel programming textbooks are not applicable in SystemC models, because they are single-threaded. Order of process evaluation in evaluation phase is not deterministic. In your case it means any of consumers can read from fifo first. So behavior of your model is non-deterministic. To make your model deterministic "access to shared storage should be explicitly synchronized ". There is no recommended way, it all depends on a concept you want to model. Do you want to broadcast a message to all readers? What do you want to do when a reader is not ready to accept a message? Or do you want only a single reader to receive a message? In that case, which one will have the priority?
  9. Roman Popov

    Automation for probing port connectivity

    Yes, with TLM sockets this won't work well: for example classes implementing TLM interfaces inside tlm_utils::* sockets do not inherit from sc_object. So you will have "not an sc_object" reported if you call print_connections function on some generic TLM model. In general such a problem is a subset of "serialization" problem, which is not possible to solve in standard C++. Because C++ wants to be "zero cost" and does not provide dynamic introspection capabilities. You can solve the problem radically by utilizing C++ compiler internals. By doing so you can implement whatever functionality you want (usually people want the same features they have in Verilog/VHDL tools). I'm not sure, but probably you can have this feature of dumping connectivity from commercial SystemC tools.
  10. Roman Popov

    Automation for probing port connectivity

    I think you have answered your question yourself 🙂 In practice however most channels inherit sc_object, so they have method name(). So you can do something like this (print_connections method): #include <systemc.h> struct void_if : sc_interface { }; struct test : sc_module , void_if { struct submodule : sc_module , void_if { sc_export<void_if> SC_NAMED(void_export); SC_CTOR(submodule) { void_export.bind(*this); } } SC_NAMED(sub); struct not_sc_object : void_if { } non_sc_obj; sc_vector<sc_port<void_if>> SC_NAMED(ports, 3); SC_CTOR(test) { ports[0].bind(*this); ports[1].bind(sub.void_export); ports[2].bind(non_sc_obj); } void print_connections(const sc_object * cur, std::string padding ) const { cout << padding << cur->name(); const sc_interface * binded_iface = nullptr; if (auto exp = dynamic_cast<const sc_export_base*>(cur)) { cout << " (export) "; binded_iface = exp->get_interface(); } else if (auto port = dynamic_cast<const sc_port_base*>(cur)) { cout << " (port) "; binded_iface = port->get_interface(); } if (binded_iface) { cout << " bound to "; if (auto binded_obj = dynamic_cast<const sc_object*>(binded_iface)) { cout << binded_obj->name(); } else { cout << "not an sc_object"; } } cout << "\n"; for (auto *child : cur->get_child_objects()) { print_connections(child, padding + " "); } } void start_of_simulation() override { for (auto * obj : sc_get_top_level_objects()) { print_connections ( obj , ""); } } }; int sc_main(int, char**) { test test_inst{"test_inst"}; sc_start(); return 0; } Will print: test_inst test_inst.sub test_inst.sub.void_export (export) bound to test_inst.sub test_inst.ports test_inst.ports_0 (port) bound to test_inst test_inst.ports_1 (port) bound to test_inst.sub test_inst.ports_2 (port) bound to not an sc_object Sometimes you also want to print hierarchical port-to-port connections (instead of port-to-interface) . This I think is not possible. For our internal uses we had modified Accellera SystemC kernel to preserve information about hierarchical connections.
  11. Roman Popov

    Initial value port

    Nice, I didn't know such a method exists! Looks like it does what you want.
  12. Roman Popov

    Initial value port

    In addition to what David said. Here is what SystemC standard says: So SystemC standard guarantees that signal value is initialized by default constructor. So for example sc_signal<int> is guaranteed to be initialized with 0. Accellera SystemC also supports passing initial value as a second constructor parameter: sc_signal<int> int_signal{"int_signal", 42}; // Initialized to 42 Looks like this feature is not part of IEEE standard yet. As David noted, ports in SystemC have no value, they behave more like a pointers to channels. So "value" of port always equals to value of binded channel.
  13. Hi all, For debugging purposes it may be useful to add all signals in design to trace file. However, sc_signal::trace() which may allow to do it automatically is deprecated. Why is that? In general I think it will be useful to add trace method to sc_interface, so that all channels that support tracing can implement it. And then it will be possible to implement something like: sc_trace_all(sc_trace_file *tf) // add all objects that support tracing to trace file.
  14. Yes, but direction of binds matters. For example if you want to bind ( in0 -> in1 -> signal ) then you will need to write: in0 ( in1 ); in1 ( signal ); OR in1 ( signal ); in0 ( in1 ); In your examples you bind like this: Ex1: port0 <- port1 -> signal // port0 not binded Ex2: signal <- port0 -> signal // port0 binded to 2 signals I agree that error message in first case is misleading.
  15. Roman Popov

    Array when declare port for model

    Do you use SystemC 2.3.3? Check RELEASENOTES file.
  16. Roman Popov

    Array when declare port for model

    This is a very old style. With a modern SystemC you can have the same with sc_vector: //exxe.h class EXXE : public sc_module { public: sc_vector<sc_in<sc_dt::uint64>> SC_NAMED(clock,3); sc_vector<sc_in<bool>> SC_NAMED(input,5); EXXE(sc_module_name); ~EXXE(); } But as David mentioned, before starting learning SystemC you should learn C++ first. Trying to understand SystemC without being proficient with C++ is a waste of time.
  17. This is not an error, it is warning. It means sc_stop was called more than once. Usually this is not a problem. I'm not aware what HeMPS is, I suggest you to ask code authors.
  18. The problem is, when you integrate RTL IP into Loosely-Timed VP that way, the whole simulator will have a performance of cycle-accurate model. Because clock generator will be always on, and Verilated model will be triggered even if it is idle. So don't try to boot Linux with such a simulator. If your RTL IP supports power gating or clock gating, it is a good idea to disable clock generation when RTL IP is turned off. In that case you don't pay for what you don't use: you can boot your OS quickly and then enable clock generator when you start to debug IP-specific driver code.
  19. Roman Popov

    What is SystemC library special in?

    In that case you will have to specify library paths manually.
  20. Roman Popov

    SC_THREADS not starting?

    This is definition: void StimGen(void) {} Declaration should look like this: void StimGen(void);
  21. Roman Popov

    SC_THREADS not starting?

    Your code is wrong, you should get a compiler error, because you have two definitions for stimuli::StimGen. My guess is that you have not added stimuli.cpp to your project file. So it is just ignored. As a result you have empty process void StimGen(void) { /*Do nothing*/}
  22. Roman Popov

    SC_THREADS not starting?

    Can you provide a complete source code?
  23. A common knowledge is that RTL simulations in SystemC are slow, comparing to HDLs. Because SystemC is just a library, and HDL simulators are optimizing compilers. I've decided to experiment a little bit to measure the difference in simulation performance quantitatively. The simulated design is a pipeline of registers, without any logic between them. Each register is modeled as a separate process (SC_METHOD, SC_CTHREAD or always_ff). This design is artificial, since there is no designer-written logic, but it should allow to measure simulator kernel efficiency. I've used 4 simulators: Accellera SystemC with registers modeled as SC_METHODs Accellera SystemC with registers modeled as SC_CTHREADs Commercial optimizing Verilog simulator that compiles HDL to C/C++. Afaik, this simulator is discrete-event, the same as SystemC. Verilator. A cycle-based simulator, that compiles HDL to optimized C/C++. Here are results : Simulating 20 ms, clock is 20Mhz. All simulation runtimes are in seconds. Number of registers in pipeline | SystemC/SC_METHODs | SystemC/SC_CTHREADs | Commercial Verilog simulator | Verilator 100 1.3 2.4 0.85 0.2 200 2.7 5.5 1.75 0.28 500 8.9 17 6.5 0.49 1000 18 46 15.6 0.96 2000 65 159 37 1.8 4000 180 428 73 3.7 8000 920 - 133 7.4 *(I didn't run a SC_CTHREAD simulation for 8000 regs) Here is source code for DUTs: SystemC: SC_MODULE(dff_reg) { sc_in<bool> clk{"clk"}; sc_in<bool> rstn{"rstn"}; sc_in<uint32_t> data_in{"data_in"}; sc_out<uint32_t> data_out{"data_out"}; SC_CTOR(dff_reg) { #ifndef CTHREAD_DUT SC_METHOD(proc); sensitive << clk.pos() << rstn.neg(); #else SC_CTHREAD(proc_thread, clk.pos()); async_reset_signal_is(rstn, false); #endif } void proc() { if (!rstn) { data_out = 0; } else { data_out = data_in; } } void proc_thread() { data_out = 0; wait(); while (1) { data_out = data_in; wait(); } } }; SC_MODULE(dut) { sc_in<bool> clk{"clk"}; sc_in<bool> rstn{"rstn"}; sc_in<uint32_t> data_in{"data_in"}; sc_out<uint32_t> data_out{"data_out"}; SC_CTOR(dut) { for (size_t i = 0; i < N_REGS; ++i) { dff_insts[i].clk(clk); dff_insts[i].rstn(rstn); dff_insts[i].data_in(data_io[i]); dff_insts[i].data_out(data_io[i+1]); } SC_METHOD(in_method); sensitive << data_in; SC_METHOD(out_method); sensitive << data_io[N_REGS]; } private: void in_method() { data_io[0] = data_in; } void out_method() { data_out = data_io[N_REGS]; } sc_vector<dff_reg> dff_insts{"dff_insts", N_REGS}; sc_vector<sc_signal<uint32_t>> data_io{"data_io", N_REGS + 1}; }; Verilog: module dff_reg ( input bit clk, input bit rstn, input int data_in, output int data_out ); always_ff @(posedge clk or negedge rstn) begin if (~rstn) begin data_out <= 0; end else begin data_out <= data_in; end end endmodule module dut ( input bit clk, input bit rstn, input int data_in, output int data_out ); int data_io[N_REGS + 1]; assign data_io[0] = data_in; assign data_out = data_io[N_REGS]; genvar i; generate for(i = 0; i < N_REGS; i = i + 1) begin: DFF_INST dff_reg D ( .clk(clk), .rstn(rstn), .data_in(data_io[i]), .data_out(data_io[i+1]) ); end endgenerate endmodule
  24. Roman Popov

    Benchmarking RTL simulation with SystemC

    One experiment with locality, I've replaced sc_vectors in original example with contiguous storage: SC_MODULE(dut) { sc_in<bool> clk{"clk"}; sc_in<bool> rstn{"rstn"}; sc_in<uint32_t> data_in{"data_in"}; sc_out<uint32_t> data_out{"data_out"}; SC_CTOR(dut) { dff_insts = static_cast<dff_reg *>(::operator new(sizeof(dff_reg) * N_REGS)); for (size_t i = 0; i < N_REGS; ++i) { new (dff_insts + i) dff_reg(sc_gen_unique_name("dff_reg")); } data_io = static_cast<sc_signal<uint32_t> *>(::operator new(sizeof(sc_signal<uint32_t>) * (N_REGS + 1))); for (size_t i = 0; i < N_REGS + 1; ++i) { new (data_io + i) sc_signal<uint32_t>(sc_gen_unique_name("data_io")); } for (size_t i = 0; i < N_REGS; ++i) { dff_insts[i].clk(clk); dff_insts[i].rstn(rstn); dff_insts[i].data_in(data_io[i]); dff_insts[i].data_out(data_io[i+1]); } SC_METHOD(in_method); sensitive << data_in; SC_METHOD(out_method); sensitive << data_io[N_REGS]; } private: void in_method() { data_io[0] = data_in; } void out_method() { data_out = data_io[N_REGS]; } dff_reg *dff_insts; sc_signal<uint32_t> *data_io; // sc_vector<dff_reg> dff_insts{"dff_insts", N_REGS}; // sc_vector<sc_signal<uint32_t>> data_io{"data_io", N_REGS + 1}; }; Here are results (runtime in seconds): Number of regs | Original (sc_vector) | Contiguous storage | Improvement over original 100 1.3 sec 1.0 sec 23 % 200 2.7 sec 2.3 sec 14 % 500 8.9 sec 7.1 sec 20 % 1000 18 sec 15.4 sec 14 % 2000 65 sec 45 sec 30 % 4000 180 sec 117 sec 35 % So even without improving a primitives it is possible to gain some improvement in performance. I think this technique can be applied internally to same places inside SystemC kernel. There are places inside SystemC kenel that utilize arena allocators, but this technique is not applied systematically ( This is my observation, probably SystemC authors did benchmarking and implemented arenas only where it makes sense )
  25. Roman Popov

    Benchmarking RTL simulation with SystemC

    Yes, intuitively both locality and cache line utilization should have a significant impact on performance. I think it is possible to make a faster C++ library for RTL simulation if we were to design it from scratch today. However, in SystemC case it would be hard to improve performance without breaking backwards compatibility. But the question is, do we really want to simulate large scale RTL designs using a library like SystemC? I don't think there is much demand for this. If you are using SystemC for design, then you have a tool that converts it into Verilog. And after you did this conversion you can use Verilog compiler or Verilator for large simulations. This flow is already that utilized for example by Chisel and RocketChip https://github.com/freechipsproject/rocket-chip#emulator, they generate Verilog from Scala and use Verilator for simulations. Using Verilator with SystemC is even easier, since Verilator can generate a SystemC wrapper that can be easily integrated into SystemC-based verification environment.
×