Jump to content

Roman Popov

Members
  • Content Count

    257
  • Joined

  • Last visited

  • Days Won

    28

Roman Popov last won the day on December 10

Roman Popov had the most liked content!

4 Followers

About Roman Popov

  • Rank
    Advanced Member

Profile Information

  • Location
    .Hillsboro, Oregon

Recent Profile Visitors

1,115 profile views
  1. Hi all, For debugging purposes it may be useful to add all signals in design to trace file. However, sc_signal::trace() which may allow to do it automatically is deprecated. Why is that? In general I think it will be useful to add trace method to sc_interface, so that all channels that support tracing can implement it. And then it will be possible to implement something like: sc_trace_all(sc_trace_file *tf) // add all objects that support tracing to trace file.
  2. Yes, but direction of binds matters. For example if you want to bind ( in0 -> in1 -> signal ) then you will need to write: in0 ( in1 ); in1 ( signal ); OR in1 ( signal ); in0 ( in1 ); In your examples you bind like this: Ex1: port0 <- port1 -> signal // port0 not binded Ex2: signal <- port0 -> signal // port0 binded to 2 signals I agree that error message in first case is misleading.
  3. Roman Popov

    Array when declare port for model

    Do you use SystemC 2.3.3? Check RELEASENOTES file.
  4. Roman Popov

    Array when declare port for model

    This is a very old style. With a modern SystemC you can have the same with sc_vector: //exxe.h class EXXE : public sc_module { public: sc_vector<sc_in<sc_dt::uint64>> SC_NAMED(clock,3); sc_vector<sc_in<bool>> SC_NAMED(input,5); EXXE(sc_module_name); ~EXXE(); } But as David mentioned, before starting learning SystemC you should learn C++ first. Trying to understand SystemC without being proficient with C++ is a waste of time.
  5. This is not an error, it is warning. It means sc_stop was called more than once. Usually this is not a problem. I'm not aware what HeMPS is, I suggest you to ask code authors.
  6. The problem is, when you integrate RTL IP into Loosely-Timed VP that way, the whole simulator will have a performance of cycle-accurate model. Because clock generator will be always on, and Verilated model will be triggered even if it is idle. So don't try to boot Linux with such a simulator. If your RTL IP supports power gating or clock gating, it is a good idea to disable clock generation when RTL IP is turned off. In that case you don't pay for what you don't use: you can boot your OS quickly and then enable clock generator when you start to debug IP-specific driver code.
  7. Roman Popov

    What is SystemC library special in?

    In that case you will have to specify library paths manually.
  8. Roman Popov

    SC_THREADS not starting?

    This is definition: void StimGen(void) {} Declaration should look like this: void StimGen(void);
  9. Roman Popov

    SC_THREADS not starting?

    Your code is wrong, you should get a compiler error, because you have two definitions for stimuli::StimGen. My guess is that you have not added stimuli.cpp to your project file. So it is just ignored. As a result you have empty process void StimGen(void) { /*Do nothing*/}
  10. Roman Popov

    SC_THREADS not starting?

    Can you provide a complete source code?
  11. Link on a main page is wrong: Leads to UVM downloads...
  12. Roman Popov

    Benchmarking RTL simulation with SystemC

    One experiment with locality, I've replaced sc_vectors in original example with contiguous storage: SC_MODULE(dut) { sc_in<bool> clk{"clk"}; sc_in<bool> rstn{"rstn"}; sc_in<uint32_t> data_in{"data_in"}; sc_out<uint32_t> data_out{"data_out"}; SC_CTOR(dut) { dff_insts = static_cast<dff_reg *>(::operator new(sizeof(dff_reg) * N_REGS)); for (size_t i = 0; i < N_REGS; ++i) { new (dff_insts + i) dff_reg(sc_gen_unique_name("dff_reg")); } data_io = static_cast<sc_signal<uint32_t> *>(::operator new(sizeof(sc_signal<uint32_t>) * (N_REGS + 1))); for (size_t i = 0; i < N_REGS + 1; ++i) { new (data_io + i) sc_signal<uint32_t>(sc_gen_unique_name("data_io")); } for (size_t i = 0; i < N_REGS; ++i) { dff_insts[i].clk(clk); dff_insts[i].rstn(rstn); dff_insts[i].data_in(data_io[i]); dff_insts[i].data_out(data_io[i+1]); } SC_METHOD(in_method); sensitive << data_in; SC_METHOD(out_method); sensitive << data_io[N_REGS]; } private: void in_method() { data_io[0] = data_in; } void out_method() { data_out = data_io[N_REGS]; } dff_reg *dff_insts; sc_signal<uint32_t> *data_io; // sc_vector<dff_reg> dff_insts{"dff_insts", N_REGS}; // sc_vector<sc_signal<uint32_t>> data_io{"data_io", N_REGS + 1}; }; Here are results (runtime in seconds): Number of regs | Original (sc_vector) | Contiguous storage | Improvement over original 100 1.3 sec 1.0 sec 23 % 200 2.7 sec 2.3 sec 14 % 500 8.9 sec 7.1 sec 20 % 1000 18 sec 15.4 sec 14 % 2000 65 sec 45 sec 30 % 4000 180 sec 117 sec 35 % So even without improving a primitives it is possible to gain some improvement in performance. I think this technique can be applied internally to same places inside SystemC kernel. There are places inside SystemC kenel that utilize arena allocators, but this technique is not applied systematically ( This is my observation, probably SystemC authors did benchmarking and implemented arenas only where it makes sense )
  13. Roman Popov

    Benchmarking RTL simulation with SystemC

    Yes, intuitively both locality and cache line utilization should have a significant impact on performance. I think it is possible to make a faster C++ library for RTL simulation if we were to design it from scratch today. However, in SystemC case it would be hard to improve performance without breaking backwards compatibility. But the question is, do we really want to simulate large scale RTL designs using a library like SystemC? I don't think there is much demand for this. If you are using SystemC for design, then you have a tool that converts it into Verilog. And after you did this conversion you can use Verilog compiler or Verilator for large simulations. This flow is already that utilized for example by Chisel and RocketChip https://github.com/freechipsproject/rocket-chip#emulator, they generate Verilog from Scala and use Verilator for simulations. Using Verilator with SystemC is even easier, since Verilator can generate a SystemC wrapper that can be easily integrated into SystemC-based verification environment.
  14. Roman Popov

    Benchmarking RTL simulation with SystemC

    One more experiment with adding a combinational method to each register module. In SystemC case it will create additional N events inside signals (vs only 1 clock event used in original experiment) SystemC source: SC_MODULE(dff_reg) { sc_in<bool> clk{"clk"}; sc_in<bool> rstn{"rstn"}; sc_in<uint32_t> data_in{"data_in"}; sc_out<uint32_t> data_out{"data_out"}; SC_CTOR(dff_reg) { SC_METHOD(proc); sensitive << clk.pos() << rstn.neg(); SC_METHOD(preproc); sensitive << data_in; } private: sc_signal<int> data_imp{"data_imp"}; void preproc() { data_imp = data_in * 3 + 1; } void proc() { if (!rstn) { data_out = 0; } else { data_out = data_imp; } } void proc_thread() { data_out = 0; wait(); while (1) { data_out = data_imp; wait(); } } }; Verilog code: module dff_reg ( input bit clk, input bit rstn, input int data_in, output int data_out ); int data_imp; always_comb begin data_imp = data_in * 3 + 1; end always_ff @(posedge clk or negedge rstn) begin if (~rstn) begin data_out <= 0; end else begin data_out <= data_imp; end end endmodule Simulation results (runtime in seconds): Number of registers | SystemC (2 SC_METHODs) | Commercial Verilog simulator 100 2.8 1.1 200 7 2.3 300 10 4.5 500 20 10.5 1000 40 31 2000 164 52 In this case Verilog compiler leads considerably even on small designs. Probably like a Verilator it does some static scheduling to reduce a number of events in optimized build.
  15. Roman Popov

    Benchmarking RTL simulation with SystemC

    What is interesting here, is that SystemC performance is close to Commercial HDL Compiler for small to medium size designs. But as design grows larger then 1000 processes, SystemC starts to lose significantly. I've profiled 4000 regs SystemC design in VTune to see what happens inside. Hotspot analysis is not surprising: 90% of cpu time is consumed by sc_event::trigger, ports reads/write through virtual sc_interface methods and sc_prim_channel_registry::perform_update. What is interesting is micro-architectural analysis: 90% of time performance is bound by data cache misses. So as design grows large, the only important thing is cache efficiency. And this is the weakness of SystemC, since SystemC design primitives are very expensive. For example sizeof(sc_signal<bool>) is 216 bytes!
×