Jump to content

Vegh, Janos

Members
  • Content Count

    3
  • Joined

  • Last visited

  • Days Won

    1

Vegh, Janos last won the day on June 20

Vegh, Janos had the most liked content!

About Vegh, Janos

  • Rank
    Member

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Yes, this is exactly what I originally wanted to do: just to measure how the _relative_ efficiency changes if wait/next_trigger event handling is utilized. I do not say I want to measure something absolute, like in the case of supercomputers the important parameter is what time is needed for which benchmark. BTW: how the efficiency of the simulation depends on the size of the design, is also important. The preliminary tests say that the large designs eat up the computing resource, and the simulation time strongly increases. I have uploaded the 'benchmark system' to https://github.com/jvegh/SystemCperformance The primary point was to make as little offset as possible, and I hope I made it applicable to other 'benchmarking', too. The easy-to change CMake enables to play with compilers, versions, etc., and the provided .tex files enable to make quickly publication-quality diagrams. My experiences say that at low "module" numbers the measurement not necessarily provides reliable results: the used resource measurement is not designed for such utilization. With larger scatter, but it works. For the first look, however, it looks like that the tool is sensitive to the internals of the computer. My first idea was that I can see the "memory-bound" and "computation-bound" feature of the SystemC kernel, of course on the top of the cache behavior. From the data measured on my laptop, I see some strange effects near the (I guess) cache capacity bounds. The OS is the same (Ubuntu 18.04), but the processor belongs to a different generation.
  2. I'll surely do, but this was the first verified (by me) measurement. And, I wanted to know if it can be interesting for this audience. I will clean up the code and upload in the coming days. Although the absolute values of the data are not so meaningful, even I did the measurements in debug mode.
  3. Hello, I saw many times on this list the question about the performance of wait() vs. next_trigger, and in general, I think that when speaking about system performance, it is important to know what execution speed can be expected.I decided to measure the performance, the result see in the attachment. The starting point was the 'next_trigger' example. I slightly modified it to make altogether 10 calls to the respective function, and added an external repetition cycle, i.e. altogether 1000 "actions" are executed by a module. After that I prepared the equivalent functionality using 'wait()'. In the main test program I create a vector of modules with length 0,1,2,5,10, etc, and measure the execution time using Linux facilities, at the beginning of elaboration, immediately before starting and after stopping. From those data I determined the MAPS value (in analogy with MIPS: how many "actions" are executed per seconds). The modules (as elements of the vector) work on the same time scale, i.e. SystemC receives requests from all modules at the same time. The computer has limited computing capacity and memory bandwidth, so I expected to see those limitations in the execution times. I think I can see two "roofline" effects (http://doi.acm.org/10.1145/1498765.1498785). Considering the elaboration phase only, the memory subsystem limits the performance. For very low number of elements, the memory bandwidth is not saturated, so the performance initially increases proportionally, and after a while it becomes constant. (interestingly, with a slight difference for the two actions). Considering the simulation phase, the major limitation is the available computing capacity; when it is reached, the apparent execution times gets longer, and the performance of the simulator starts to decrease. Correspondingly, from the total execution time, the effect of both 'rooflines' can be seen, although the memory limitation is less expressed. The data confirm what was told earlier on the list that the implementation is more or less the same, and so is their performance. I also see some difference in the behavior of the two methods; even it looks like that the next_trigger may have !two! rooflines; it may be the sign of some SW and/or HW cache. This is something where I would need the help of a core developer (for co-authorship): probably the effect of some configuration parameters can be seen from outside. This simple benchmark has a minimum amount of non-kernel functionality, no i/o ports, binding, etc; so in my eyes it can be for SystemC what Linpack is for supercomputers. I want to measure also the effect of tracing, logging, etc. Any meas ideas welcome. I think that in TLM simulation (and other, non-circuit simulations) it is an important factor to find out the proper computer size for the design. As shown in the figure, the efficiency drops for the too large (compared to ??? HW parameters) designs, so it might be better to invest into a larger configuration. (BTW: notice how the number of context switches correlate with the degradation of the performance) Best regards Janos RooflineSystemC.pdf
×
×
  • Create New...