Jump to content

All Activity

This stream auto-updates     

  1. Yesterday
  2. Last week
  3. Vegh, Janos

    Systemc performance

    Yes, this is exactly what I originally wanted to do: just to measure how the _relative_ efficiency changes if wait/next_trigger event handling is utilized. I do not say I want to measure something absolute, like in the case of supercomputers the important parameter is what time is needed for which benchmark. BTW: how the efficiency of the simulation depends on the size of the design, is also important. The preliminary tests say that the large designs eat up the computing resource, and the simulation time strongly increases. I have uploaded the 'benchmark system' to https://github.com/jvegh/SystemCperformance The primary point was to make as little offset as possible, and I hope I made it applicable to other 'benchmarking', too. The easy-to change CMake enables to play with compilers, versions, etc., and the provided .tex files enable to make quickly publication-quality diagrams. My experiences say that at low "module" numbers the measurement not necessarily provides reliable results: the used resource measurement is not designed for such utilization. With larger scatter, but it works. For the first look, however, it looks like that the tool is sensitive to the internals of the computer. My first idea was that I can see the "memory-bound" and "computation-bound" feature of the SystemC kernel, of course on the top of the cache behavior. From the data measured on my laptop, I see some strange effects near the (I guess) cache capacity bounds. The OS is the same (Ubuntu 18.04), but the processor belongs to a different generation.
  4. Roman Popov

    Systemc performance

    Real-life simulation performance usually depends a lot on modeling style. For high-level TLM-2.0 models share of simulation time consumed by SystemC primitives is usually much lower, comparing to time consumed by "business logic" of models. Efficiency of simulation kernel (like context switches and channels) is much more important for low-level RTL simulations.
  5. David Black

    Systemc performance

    I would say it is definitely interesting.
  6. Mitesh Patel

    Property in assertion

    Following code will help you. property setup_hold_checker; time current_time; @(posedge (clk ^ CK_EDGE_SEL)) disable iff (~resetn || disable_assertion) (1, current_time = $time) |=> @(ev_data_delay) ($time - current_time) ##0 (1,$display("\t **setup_hold_checker**= %t ",(($time - current_time) > (SETUP_TIME+HOLD_TIME)))); endproperty Regards, Mitesh Patel
  7. uvm_reg_hw_reset_seq perform mirror operation if register operation is enabled for the sequence. Here, you can set volatile bit of the field which you want to exclude from comparison. Volatile - 1 - UVM_NO_CHECK => Comparison disabled Volatile - 0 - UVM_CHECK => Comparison enabled Regards, Mitesh Patel
  8. Mitesh Patel

    uvm_reg_bit_bash_seq

    Yes, this uvm_reg_bit_bash will get all registers assigned to the specified map, and perform bitbash operation. There are some following hook up to prevent some register from bitbash operation : 1. by setting NO_REG_TESTS or NO_REG_BIT_BASH_TEST through resource db for specified register. Regards, Mitesh Patel
  9. Hi Following code may helpful to you. // Declaration uvm_reg_map maps[$]; uvm_reg regs[$]; uvm_status_e status; // Getting number of maps associated with block handle blk.get_maps(maps); foreach(maps[d]) begin regs.delete(); // Getting associated total registers with map maps[d].get_registers(regs); foreach(regs[k]) begin regs[k].read(status, rd_val, maps[d]); // Do necessary operation based on read value end end Regards, Mitesh Patel
  10. Hi, I have an issue where teh DUT returns an X for register read but the reg2bus still sees it as a 0. When I was going over the uvm_reg_bus_op struct the type for data is uvm_reg_data_t which is only 2-state. Is there a way to override this argument to support 3/4-state? Maybe uvm_reg_data_logic_t? Please let me know. Thanks in advance! Vignesh
  11. Hi Eyck, In fact, when using sem.wait() and sem.post(), the program is also blocked. So that i only use sem.post() and i dont know whether the semaphore can do the arbitration or not? But the program is running. How is your opinion? void Bus::b_transport(int id, tlm::tlm_generic_payload& trans, sc_core::sc_time& delay) { sc_dt::uint64 global_addr = trans.get_address(); int slaveid = this->address_to_slaveid(global_addr); if(slaveid < 0) { trans.set_response_status(tlm::TLM_ADDRESS_ERROR_RESPONSE); std::cout << "\e[1;31m" << this->name() << ": " << "\e[1;31mInvalid address " << global_addr << std::endl; return; } sc_dt::uint64 slave_addr = global_addr - this->starts[slaveid]; trans.set_address(slave_addr); // make sure we are allowed to get access sem.wait(); this->initiator_socket[slaveid]->b_transport(trans, delay); // return back the token so that others can access sem.post(); trans.set_address(global_addr); } BR, Hai Dang
  12. I would stick with the two methods where the one calls enable() and disable() for the second mehtod. Why enable()/disable()? The LRM says (section 5.6.6): So with suspend() you still have the sensitivity handling in place while with disable() you don't. Moreover if there is an event while being supended it becomes immediately runnable. So falling edge of reset implies a process activation although there is not rising edge on clock. An implementation could look like: // class declaration: sc_core::sc_process_handle process_output_hndl; // constructor body: SC_METHOD(process_output); sensitive<<clk.pos()<<reset; SC_METHOD(process_output_enable); sensitive<<enable; // function implementation: void fir::process_output() { if(!process_output_hndl.valid()) process_output_hndl = sc_process_handle(sc_get_current_process_handle()); if (reset.read() == 0) { //reset sequence } else { //normal operation } } void fir::process_output_enable() { if (enable.read()) { if(process_output_hndl.valid()) process_output_hndl.enable(); } else { if(process_output_hndl.valid()) process_output_hndl.disable(); } } BR
  13. Hi, from the snippets I see it does not get clear what you are doing. Basically you would Instantiate the ordered_semphore scc:ordered_semaphore sem{1} and in b_transport() you wait() and post(): void Bus::b_transport(int id, tlm::tlm_generic_payload& trans, sc_core::sc_time& delay) { sc_dt::uint64 global_addr = trans.get_address(); int slaveid = this->address_to_slaveid(global_addr); if(slaveid < 0) { trans.set_response_status(tlm::TLM_ADDRESS_ERROR_RESPONSE); std::cout << "\e[1;31m" << this->name() << ": " << "\e[1;31mInvalid address " << global_addr << std::endl; return; } sc_dt::uint64 slave_addr = global_addr - this->starts[slaveid]; trans.set_address(slave_addr); // make sure we are allowed to get access sem.wait(); this->initiator_socket[slaveid]->b_transport(trans, delay); // return back the token so that others can access sem.post(); trans.set_address(global_addr); } BR
  14. Vegh, Janos

    Systemc performance

    I'll surely do, but this was the first verified (by me) measurement. And, I wanted to know if it can be interesting for this audience. I will clean up the code and upload in the coming days. Although the absolute values of the data are not so meaningful, even I did the measurements in debug mode.
  15. David Black

    Systemc performance

    Perhaps you would like to share your code for measurements via GitHub? Measuring performance can be tricky to say the least. How you compile (compiler, version, SystemC version) and what you measure can really change results. Probably helps to specify your computer's specifications (Processor, RAM, cache, OS version) too. Processor (vendor, version) L1 cache size L2 cache size L3 cache size RAM OS (name, version) Compiler (name, version) Compiler switches (--std, -O) SystemC version SystemC installation switches How time is measured and from what point (e.g. start_of_simulation to end_of_simulation) Memory consumption information if possible This will help to make meaningful statements about the measurements and allow others to reproduce/verify your results. It is also important to understand how these results should be interpreted (taken advantage of) and compared. As with respect to TLM, it will get a lot more challenging. For example, what style of coding: Loosely Timed, Approximately Timed. Are sc_clock's involved?
  16. Vegh, Janos

    Systemc performance

    Hello, I saw many times on this list the question about the performance of wait() vs. next_trigger, and in general, I think that when speaking about system performance, it is important to know what execution speed can be expected.I decided to measure the performance, the result see in the attachment. The starting point was the 'next_trigger' example. I slightly modified it to make altogether 10 calls to the respective function, and added an external repetition cycle, i.e. altogether 1000 "actions" are executed by a module. After that I prepared the equivalent functionality using 'wait()'. In the main test program I create a vector of modules with length 0,1,2,5,10, etc, and measure the execution time using Linux facilities, at the beginning of elaboration, immediately before starting and after stopping. From those data I determined the MAPS value (in analogy with MIPS: how many "actions" are executed per seconds). The modules (as elements of the vector) work on the same time scale, i.e. SystemC receives requests from all modules at the same time. The computer has limited computing capacity and memory bandwidth, so I expected to see those limitations in the execution times. I think I can see two "roofline" effects (http://doi.acm.org/10.1145/1498765.1498785). Considering the elaboration phase only, the memory subsystem limits the performance. For very low number of elements, the memory bandwidth is not saturated, so the performance initially increases proportionally, and after a while it becomes constant. (interestingly, with a slight difference for the two actions). Considering the simulation phase, the major limitation is the available computing capacity; when it is reached, the apparent execution times gets longer, and the performance of the simulator starts to decrease. Correspondingly, from the total execution time, the effect of both 'rooflines' can be seen, although the memory limitation is less expressed. The data confirm what was told earlier on the list that the implementation is more or less the same, and so is their performance. I also see some difference in the behavior of the two methods; even it looks like that the next_trigger may have !two! rooflines; it may be the sign of some SW and/or HW cache. This is something where I would need the help of a core developer (for co-authorship): probably the effect of some configuration parameters can be seen from outside. This simple benchmark has a minimum amount of non-kernel functionality, no i/o ports, binding, etc; so in my eyes it can be for SystemC what Linpack is for supercomputers. I want to measure also the effect of tracing, logging, etc. Any meas ideas welcome. I think that in TLM simulation (and other, non-circuit simulations) it is an important factor to find out the proper computer size for the design. As shown in the figure, the efficiency drops for the too large (compared to ??? HW parameters) designs, so it might be better to invest into a larger configuration. (BTW: notice how the number of context switches correlate with the degradation of the performance) Best regards Janos RooflineSystemC.pdf
  17. Earlier
  18. Hi Eyck, In fact, "wait()" doesnt work. So can you have a brief utilization of these functions? Kind regards, Hai Dang
  19. Hi Eyck, It works with your solution. I initialized the counter value from 0 then i tried with 1, it seems that there is no difference (or small difference). void Bus::b_transport(int id, tlm::tlm_generic_payload& trans, sc_core::sc_time& delay) { sc_dt::uint64 global_addr = trans.get_address(); int slaveid = this->address_to_slaveid(global_addr); if(slaveid < 0) { trans.set_response_status(tlm::TLM_ADDRESS_ERROR_RESPONSE); std::cout << "\e[1;31m" << this->name() << ": " << "\e[1;31mInvalid address " << global_addr << std::endl; return; } sc_dt::uint64 slave_addr = global_addr - this->starts[slaveid]; this->post(); trans.set_address(slave_addr); this->initiator_socket[slaveid]->b_transport(trans, delay); trans.set_address(global_addr); //std::cout << "\e[1;31m MB " << id << std::endl; sc_core::wait(this->bus_delay); } And i have a doubt that: "post()" or "wait()" is better in this case? Thank you a lot for your helps. HD
  20. TRANG

    timer with systemC

    Thank for your support,
  21. Philipp A Hartmann

    timer with systemC

    The match will occur (almost) at the "correct" point in time during the simulation. However, if you sample the value from an unrelated process, there might be some process evaluation ordering dependency (i.e. whether the update_method had already been run). It depends on your requirements, whether this might be an issue. If you do the checks outside of the simulation, i.e. between sc_start calls, you would need to complete the deltas (as per the loop sketched above) before every check. You cannot call sc_start during end_of_simulation.
  22. TRANG

    timer with systemC

    I'm sorry make you confused. Summary: if my test case is: sc_start(2*period,time_unit); while (sc_pending_activity_at_current_time()) { sc_start(SC_ZERO_TIME); } or sc_start(3*period,time_unit); then compare match occur.( is correct) If my test case is: sc_start(2*period,time_unit); then compare match don't occur . I want to my source code detect when time simulate near the over to pending to execute event m_update before over.
  23. Eyck

    timer with systemC

    But what is the issue with running the simulation for 3*period? Best regards
  24. From the snippets you provide it looks ok. Assuming that bus_mutex is a sc_mutex this is you problem. sc_mutex does not do arbitration. It selects randomly which waiting thread to grant the lock (actually the next activated process base on an even notification) . But what you expect is arbitration. So either you write an arbiter or you may use and ordered semaphore with an initial counter value of 1. You may find an implementation here: https://git.minres.com/SystemC/SystemC-Components/src/branch/master/incl/scc/ordered_semaphore.h The semaphore grant access based on a queue. So eventually you get first-comes-first-serves (FCFS) behavior, but all requestors have equal priority. Best regards
  25. TRANG

    timer with systemC

    i try : void GTimer::end_of_simulation() { while (sc_pending_activity_at_current_time()) { sc_start(SC_ZERO_TIME); } } But It is not true? My problem is the time simulates is over before trigger event. I want to force trigger event before end simulate.
  26. Philipp A Hartmann

    timer with systemC

    I don't fully understand the question? My snippet above runs all remaining delta cycles at the current time without advancing the time any further. You can wrap your original sc_start call with the loop in a small helper function (e.g. sc_start_including_deltas(...) , if you find yourself copying the snippet too often.
  27. TRANG

    timer with systemC

    Thanks @Philipp A Hartmann it is OK if I use : sc_start(2*period,time_unit); while( sc_pending_activity_at_current_time() ) sc_start( SC_ZERO_TIME ); But I want to pending in my code. Is it possible?
  28. David Black

    Simple program problem

    Your code is operating correctly; however, you are displaying the results at the wrong time. This is because the write() method of sc_signal is not the same as a simple store operation. In normal programming, you expect that if you do: variable = 5; Then immediately after storing the value into your variable, the variable will contain the value. You might then assume that the write() of sc_signal is the same. It is not. It takes a small amount of time for the value to be stored. The amount of time is tiny and is refered to as a "delta delay" time. You can cause your program to wait a delta cycle before displaying the result, but it is slightly more complicated because you are using an SC_METHOD style process instead of the SC_THREAD style process. If you are using C++11 you can replace your cout line with this to see correct answer: sc_spawn( [&]{ wait( SC_ZERO_TIME ); ///< wait one delta-cycle delay cout << "@" << sc_time_stamp() <<"(" << sc_delta_count() << ")" << " || in1:" << in1.read() << " + in2:" << in2.read() << "= val:" << out.read() << endl; });
  1. Load more activity
×