Jump to content
trev

Simulation Speed

Recommended Posts

Hi,

 

   I've been reading the forum for some time now but this is my first post. I've been using SystemC AMS for mixed signal modelling - mainly fracN PLL's, mixing verilator to generate verilog based systemC modules and SystemC AMS/ System C modules for analog/mixed signal blocks.

 

I have been using CppSim previously for the same tasks. the System C AMS implementation is significantly slower than using CppSim. I've hacked together a really simple VCO (voltage controlled oscillator) model as an example:

#include <systemc-ams.h>

SCA_TDF_MODULE(con)
{
	sca_tdf::sca_out<double> outp;

	SCA_CTOR(con) {};

	void processing()
	{
	outp = 0;
	}
};

SCA_TDF_MODULE(vco)   
{
	sca_tdf::sca_out<double> out;

	sca_tdf::sca_in<double> vin;
	
	double ampl;   // output amplitude (v)
	double freq;   // centre frequency (Hz)
	double kvco;   // (Hz/V)
	
	vco ( sc_core::sc_module_name m, double ampl_ = 1.0, double freq_ = 1.0e9, double kvco_ = 1.0)  :
		ampl(ampl_), freq(freq_), kvco(kvco_) {} 
	
	void set_attributes(){}

	void processing()
	{
		double ts;
		ts = get_timestep().to_seconds();
		out = ampl*sin(2.0*M_PI*(freq+(kvco*vin.read()))*ts);
	}

};

int sc_main(int argc,char* argv[])
{

	sca_tdf::sca_signal<double> op;	
	sca_tdf::sca_signal<double> vin;	

	sca_trace_file* tr=sca_create_tabular_trace_file("vco_tb");

	vco DUT("DUT");
	DUT.out(op);
	DUT.vin(vin);
	DUT.set_timestep(1, SC_NS);

	con DUT2("DUT2");
	DUT2.outp(vin);

	sca_trace(tr,op,"op");
	sc_core::sc_start(1,SC_MS);
	
	sca_close_tabular_trace_file(tr);

	return 0;
}

        SystemC 2.3.1-Accellera --- May 20 2015 15:16:23
        Copyright (c) 1996-2014 by all Contributors,
        ALL RIGHTS RESERVED

     SystemC AMS extensions 2.0 Version: 2.0_beta1 --- BuildRevision: 1739  20140531
    Copyright (c) 2010-2014  by Fraunhofer-Gesellschaft
          Institut Integrated Circuits / EAS
      Licensed under the Apache License, Version 2.0

Info: SystemC-AMS:
    2 SystemC-AMS modules instantiated
    1 SystemC-AMS views created
    2 SystemC-AMS synchronization objects/solvers instantiated

Info: SystemC-AMS:
    1 dataflow clusters instantiated
      cluster 0:
        2 dataflow modules/solver, contains e.g. module: DUT
        2 elements in schedule list,  
        1 ns cluster period,
        ratio to lowest:  1              e.g. module: DUT
        ratio to highest: 1 sample time  e.g. module: DUT
        0 connections to SystemC de, 0 connections from SystemC de

real    0m4.771s
user    0m3.142s
sys    0m1.629s

the equivalent implementation runs in 105mS in CppSim.

I'm using the same 1nS time step and 1e6 samples in both cases.

(BTW: If I run for say 1e7 samples the ratio of the difference remains roughly the same)

 

Any ideas on how to increase the speed for the SystemC AMS or comments on improving the code for speed would be much appreciated.

 

Thanks in advance

 

 

 

Share this post


Link to post
Share on other sites

Hello,

 

the big performance differences between SystemC AMS and CPPSim may be explainable by different compilation flags building the SystemC and CPPSim libraries. For optimal performance, you have to make sure that not only your model, but also the SystemC and the SystemC-AMS libraries, are compiled in Release mode, i.e., without Debug symbols (-g) and with full optimisation (-O3). Linking against libraries compiled in Debug mode (-g -O0) can lead to a slow down of several orders of magnitude.

 

Another factor for slower performance may be file i/o: Your SystemC AMS simulation leads to a text file with 1 million lines of 3 columns of double values. Does your your CPPSim simulation yield a similar trace?

 

Regards, Torsten

Share this post


Link to post
Share on other sites

Hi Torsten,

 

    Thanks very much for the reply.

 

You are absolutely correct about the I/O, CppSim writes output in hspice compatible binary format. Switching off the output in both yields roughly 700mS for System AMS versus 60mS for CppSim.

 

Compiling the System AMS module with the -O3 option yields a 30% improvement over the raw performance of the simulation (with no I/O), I will take a look at recompiling the SystemC AMS source with the optimised mode.

 

Still, that currently leaves me with nearly an order of magnitude difference in raw simulation speed and no output data !

 

best regards

 

trev

Share this post


Link to post
Share on other sites

Hi,

 

   I recompiled the Fraunhofer SystemC AMS code without the -g option as suggested,  OPT_CXXFLAGS (which I take to be the optimised CXXFLAGS) is set to "-O3 -g -Wall -pedantic -Wno-long-long", so I'm using

 -fPIC -O3 -Wall -pedantic -Wno-long-long

 

It makes no difference

 

regards

trev

Share this post


Link to post
Share on other sites

Thanks for your new feedback! One performance bottleneck in SystemC AMS are the virtual function calls to the processing functions. Yours are extremely simple so that the virtual function call itself can have a remarkable impact on performance. One possibility to improve performance is to use the multirate features of the SystemC AMS TDF MoC to calculate/process many samples during one activation. You just have to make sure that the system stays schedulable. Also multirate TDF causes more constraints on the synchronization with the DE MoC.

 

Please note that your current benchmark does not constitute the complexity of a typical SystemC AMS virtual prototype, which typically also interfaces with modules expressed using other Models of Computation (MoC). The synchronization with other MoCs does come with some performance penalty, which pure data flow simulators don't have. This may explain the observed difference in performance with respect to CPPSim. However, this is just a guess from my side, as I don't have any experience with CPPSim.

Share this post


Link to post
Share on other sites

Thanks again for your reply Torsten.

 

You are correct my posted example is extremely simple !

The motivation behind my post was that I'm seeing similar results with much bigger systems, the PLL as mentioned above.

The problem with the bigger systems is doing an apples for apples comparison since I haven't been rigorously implementing exactly the same functionality in the same blocks between SystemC AMS and CppSim (I've added additional complexity to the SystemC AMS modules). However, and for what it's worth, for a 100uS run with a 100pS time step I get

real	0m28.829s
user	0m26.767s
sys	0m1.797s

for SystemC AMS (using the O3 optimisation)

real	0m19.839s
user	0m19.759s
sys	0m0.004s

for SystemC AMS (-O3) with no IO (which isn't particularly useful ! but gives a closer comparison of raw performance)

and

real	0m7.224s
user	0m7.213s
sys	0m0.012s

for CppSim (with IO on)

 

The difference may not seem like much - but this is only for a fraction of the required total simulation time - particularly when generating modulation on the VCO.

 

I'm going to think about how to implement your multirate suggestion.

 

regards

 

trev

Share this post


Link to post
Share on other sites

Perhaps you can run some profiling tools (e.g. gprof) to examine where most of the simulation computation time is spend. My expectation is, with such a simple example, that you are primarily looking at the SystemC simulation kernel performance, not so much the AMS extensions. But profiling should reveal some insight in this.

Share this post


Link to post
Share on other sites

   I did run the example I posted using gprof

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ns/call  ns/call  name    
 30.00      0.03     0.03                             sca_core::sca_implementation::sca_synchronization_layer_process::wait_for_next_start()
 20.00      0.05     0.02                             sca_core::sca_implementation::sca_solver_base::get_current_period()
 10.00      0.06     0.01  2000000     5.00     5.00  sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double)
 10.00      0.07     0.01  1000000    10.00    25.00  vco::processing()
 10.00      0.08     0.01  1000000    10.00    10.00  sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read(unsigned long) const
 10.00      0.09     0.01                             sca_core::sca_implementation::sca_synchronization_alg::schedule_element::run()
 10.00      0.10     0.01                             sca_core::sca_implementation::sca_synchronization_layer_process::cluster_process()
  0.00      0.10     0.00  2000000     0.00     5.00  sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double)
		     Call graph (explanation follows)

granularity: each sample hit covers 2 byte(s) for 10.00% of 0.10 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]     40.0    0.01    0.03                 sca_core::sca_implementation::sca_synchronization_alg::schedule_element::run() [1]
                0.01    0.02 1000000/1000000     vco::processing() [3]
                0.00    0.01 1000000/2000000     sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [5]
                0.00    0.00 1000000/1000000     con::processing() [69]
-----------------------------------------------
                                                 <spontaneous>
[2]     30.0    0.03    0.00                 sca_core::sca_implementation::sca_synchronization_layer_process::wait_for_next_start() [2]
-----------------------------------------------
                0.01    0.02 1000000/1000000     sca_core::sca_implementation::sca_synchronization_alg::schedule_element::run() [1]
[3]     25.0    0.01    0.02 1000000         vco::processing() [3]
                0.01    0.00 1000000/1000000     sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read(unsigned long) const [7]
                0.00    0.01 1000000/2000000     sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [5]
-----------------------------------------------
                                                 <spontaneous>
[4]     20.0    0.02    0.00                 sca_core::sca_implementation::sca_solver_base::get_current_period() [4]
-----------------------------------------------
                0.00    0.01 1000000/2000000     vco::processing() [3]
                0.00    0.01 1000000/2000000     sca_core::sca_implementation::sca_synchronization_alg::schedule_element::run() [1]
[5]     10.0    0.00    0.01 2000000         sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [5]
                0.01    0.00 2000000/2000000     sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double) [6]
-----------------------------------------------
                0.01    0.00 2000000/2000000     sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [5]
[6]     10.0    0.01    0.00 2000000         sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double) [6]
-----------------------------------------------
                0.01    0.00 1000000/1000000     vco::processing() [3]
[7]     10.0    0.01    0.00 1000000         sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read(unsigned long) const [7]
                0.00    0.00 1000000/1000000     sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::read(unsigned long, unsigned long) const [70]
-----------------------------------------------
                                                 <spontaneous>
[8]     10.0    0.01    0.00                 sca_core::sca_implementation::sca_synchronization_layer_process::cluster_process() [8]
-----------------------------------------------

I haven't been able to spend the time to look at the fraunhofer source code though to work out what exactly the classes taking all the cpu cycles are supposed to do.

I will recompile the larger example with the profiler on and see if there are any significant differences.

Share this post


Link to post
Share on other sites

profiling on the larger example

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ns/call  ns/call  name    
  6.20      0.67     0.67                             sys_rand::mrand()
  5.55      1.26     0.60                             sca_util::sca_implementation::sca_matrix_base<double>::resize(unsigned long, unsigned long)
  4.99      1.80     0.54                             sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::setup_equation_system()
  4.94      2.33     0.53                             sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd_common(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time const&)
  4.29      2.79     0.46                             sca_tdf::sca_implementation::sca_port_attributes::get_time_internal(unsigned long) const
  3.82      3.20     0.41                             sys_rand::gasdev()
  3.68      3.59     0.40                             MA_LequSparseCodegen
  3.50      3.97     0.38                             sca_util::sca_implementation::sca_matrix_base<double>::operator=(sca_util::sca_implementation::sca_matrix_base<double> const&)
  2.99      4.29     0.32                             sca_util::sca_implementation::sca_matrix_base<sca_util::sca_vector<double> >::operator()(long)
  2.89      4.60     0.31                             sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::initialize()
  2.89      4.91     0.31                             sparse_get_value
  2.66      5.19     0.29                             sca_util::sca_implementation::sca_matrix_base<double>::operator()(long) const
  2.43      5.45     0.26                             sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::coeff_changed(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&)
  2.24      5.69     0.24                             sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::calculate(double)
  2.15      5.92     0.23                             MA_GenerateSumMatrixWeighted
  2.15      6.15     0.23                             sparse_resize
  1.77      6.34     0.19                             sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::convert_to_double()
  1.49      6.50     0.16                             sca_tdf::sca_ct_proxy::to_double() const
  1.40      6.65     0.15  4000000    37.50    37.50  sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double)
  1.40      6.80     0.15                             sca_tdf::sca_ct_proxy::operator double() const
  1.40      6.95     0.15                             sparse_get_value_ref
  1.35      7.10     0.15                             sca_util::sca_implementation::sca_matrix_base<double>::operator()(long, long)
  1.12      7.22     0.12                             sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time, sca_tdf::sca_de::sca_in<double> const&, double)
  1.07      7.33     0.12                             sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time, double, double, sc_core::sc_time)
  1.03      7.44     0.11                             vco_1f::processing()
  1.03      7.55     0.11                             MA_GenerateProductValueSparse
  1.03      7.66     0.11                             MA_SortSparseColumms
  0.89      7.76     0.10                             sca_util::sca_implementation::sca_matrix_base<double>::operator[](unsigned long) const
  0.84      7.85     0.09                             MA_FreeSparse
  0.84      7.94     0.09                             sca_core::sca_implementation::sca_synchronization_alg::schedule_element::run()
  0.75      8.02     0.08                             cp_1f::processing()
  0.75      8.10     0.08                             ana_solv
  0.70      8.17     0.08                             sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time, sca_util::sca_vector<double>&, double, double, sc_core::sc_time)
  0.70      8.25     0.08                             sca_util::sca_implementation::sca_matrix_base_typeless::sca_matrix_base_typeless(sca_util::sca_implementation::sca_matrix_base_typeless const&)
  0.65      8.32     0.07                             sca_util::sca_implementation::sca_matrix_base<double>::set_sparse_mode()
  0.65      8.39     0.07                             sca_tdf::sca_implementation::sca_ct_delay_buffer<double>::~sca_ct_delay_buffer()
  0.65      8.46     0.07                             sca_core::sca_implementation::sca_solver_base::get_current_time()
  0.65      8.53     0.07                             sca_util::sca_implementation::sca_matrix_base<double>::get_ref_for_write(sparse_matrix*, long, long)
  0.65      8.60     0.07                             sca_core::sca_implementation::sca_signed_time::sca_signed_time(sca_core::sca_implementation::sca_signed_time const&)
  0.56      8.66     0.06                             MA_ProductSparseVector
  0.47      8.71     0.05  4000000    12.50    50.00  sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double)
  0.47      8.76     0.05  2000000    25.00    25.00  sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::read(unsigned long) const
  0.47      8.81     0.05                             sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::convert_to_sca_port(sca_tdf::sca_out_base<double>&)
  0.47      8.86     0.05                             sca_core::sca_implementation::sca_solver_base::write_sc_value(sc_core::sc_time, sc_core::sc_time, sca_core::sca_implementation::sca_sync_value_handle_base&)
  0.47      8.91     0.05                             sca_util::sca_implementation::sca_matrix_base<double>::remove()
  0.47      8.96     0.05                             sca_util::sca_implementation::sca_matrix_base<double>::get_flat()
  0.47      9.01     0.05                             std::valarray<sca_util::sca_vector<double> >::operator[](unsigned long)
  0.47      9.06     0.05                             tocodedec
  0.42      9.10     0.05                             sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::initialize_equation_system(int, double)
  0.42      9.15     0.05  4000000    11.25    11.25  sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::read(unsigned long, unsigned long) const
  0.42      9.19     0.05                             sca_util::sca_implementation::sca_matrix_base<double>::resize(unsigned long)
  0.37      9.23     0.04                             sc_core::sc_spawn_object<sc_boost::_bi::bind_t<void, sc_boost::_mfi::mf1<void, sca_core::sca_implementation::sca_solver_base, int>, sc_boost::_bi::list2<sc_boost::_bi::value<sca_core::sca_implementation::sca_solver_base*>, sc_boost::_bi::value<long> > > >::~sc_spawn_object()
  0.37      9.27     0.04                             sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::calculate_timeinterval(unsigned long&, long&, long, sca_core::sca_implementation::sca_signed_time&, sca_core::sca_implementation::sca_signed_time&, sc_core::sc_time&, sc_core::sc_time&)
  0.37      9.31     0.04                             sca_core::sca_implementation::sca_solver_base::get_current_period()
  0.37      9.35     0.04                             sca_core::sca_implementation::sca_solver_base::get_sc_value_on_time(sc_core::sc_time, sca_core::sca_implementation::sca_sync_value_handle_base&)
  0.37      9.39     0.04                             sca_core::sca_implementation::sca_synchronization_layer_process::cluster_process()
  0.37      9.43     0.04                             sca_util::sca_vector<sca_util::sca_vector<double> >::operator()(unsigned long)
  0.37      9.47     0.04                             sca_core::sca_module::get_timestep() const
  0.33      9.51     0.04                             MA_LequSparseSolut
  0.33      9.54     0.04  1000000    35.00    60.00  sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double, unsigned long)
  0.33      9.58     0.04                             sca_util::sca_implementation::sca_matrix_base_typeless::dimx() const
  0.28      9.61     0.03                             MA_ConvertFullToSparse
  0.28      9.64     0.03                             vco_1f::initialize()
  0.28      9.67     0.03                             sca_tdf::sca_ltf_nd::calculate(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, double, double, sc_core::sc_time const&)
  0.28      9.70     0.03                             sca_tdf::sca_de::sca_in<bool>::read(unsigned long)
  0.28      9.73     0.03                             sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::operator=(double const&)
  0.28      9.76     0.03                             sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read(unsigned long) const
  0.28      9.79     0.03                             sca_core::sca_implementation::sca_port_base::get_if_id() const
  0.28      9.82     0.03                             sca_util::sca_implementation::sca_matrix_base<double>::write_pending() const
  0.28      9.85     0.03                             ana_init_sparse
  0.23      9.87     0.03                             sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::get_in_value_by_index(unsigned long)
  0.23      9.90     0.03                             sca_util::sca_implementation::sca_matrix_base<double>::sca_matrix_base(sca_util::sca_implementation::sca_matrix_base<double> const&)
  0.23      9.92     0.03                             sca_util::sca_implementation::sca_matrix_base<double>::operator[](unsigned long)
  0.19      9.94     0.02                             MA_CopySparse
  0.19      9.96     0.02                             MA_InitSparse
  0.19      9.98     0.02                             MA_SumMatrixWeighted
  0.19     10.00     0.02                             sc_dt::uint64_to_double(unsigned long long)
  0.19     10.02     0.02                             sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::set_timestep(double, sc_core::sc_time_unit)
  0.19     10.04     0.02                             sca_tdf::sca_implementation::sca_ct_delay_buffer<double>::~sca_ct_delay_buffer()
  0.19     10.06     0.02                             sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::set_timestep(double, sc_core::sc_time_unit)
  0.19     10.08     0.02                             sca_core::sca_module::get_sync_domain()
  0.19     10.10     0.02                             sca_core::sca_implementation::sca_solver_base::sc_write_value_process(int)
  0.19     10.12     0.02                             sca_core::sca_implementation::sca_synchronization_layer_process::sca_synchronization_layer_process(sca_core::sca_implementation::sca_synchronization_alg::sca_cluster_objT*)
  0.19     10.14     0.02                             sca_util::sca_implementation::sca_matrix_base<sca_util::sca_vector<double> >::resize(unsigned long, unsigned long)
  0.19     10.16     0.02                             sca_tdf::sca_implementation::sca_port_attributes::get_rate_internal() const
  0.19     10.18     0.02                             sca_tdf::sca_implementation::sca_port_attributes::get_time(unsigned long) const
  0.19     10.20     0.02                             sca_tdf::sca_implementation::sca_tdf_signal_impl_base::get_timestep_calculated_ref(unsigned long) const
  0.19     10.22     0.02                             sca_core::sca_implementation::sca_signed_time::operator>(sc_core::sc_time const&) const
  0.14     10.24     0.02  4858913     3.09     3.09  sca_core::sca_implementation::sca_sync_value_handle<bool>::read_tmp()
  0.14     10.25     0.02  1000000    15.00    15.00  sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::write(double const&, unsigned long)
  0.14     10.27     0.02  1000000    15.00    15.00  sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::get_ref_for_write(unsigned long, unsigned long) const
  0.14     10.28     0.02                             sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::set_max_delay(sc_core::sc_time)
  0.14     10.30     0.02                             sca_util::sca_implementation::sca_matrix_base<double>::sca_matrix_base(unsigned long, unsigned long, bool)
  0.14     10.31     0.02                             sca_util::sca_implementation::sca_matrix_base_typeless::reset_access_flag()
  0.14     10.33     0.02                             sc_core::sc_in<unsigned int>::operator unsigned int const&() const
  0.09     10.34     0.01  1000000    10.00    25.00  sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double, unsigned long)
  0.09     10.35     0.01  1000000    10.00    10.00  sca_tdf::sca_de::sca_out<bool>::write_sc_signal()
  0.09     10.36     0.01  1000000    10.00    10.00  constant::processing()
  0.09     10.37     0.01  1000000    10.00    10.00  sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::get_timestep(unsigned long) const
  0.09     10.38     0.01  1000000    10.00    10.00  sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::get_rate() const
  0.09     10.39     0.01                             MA_ReallocSparse
  0.09     10.40     0.01                             MA_SortSparseList
  0.09     10.41     0.01                             ps_divider::ps_div_val()
  0.09     10.42     0.01                             ps_divider::ps_div_core()
  0.09     10.43     0.01                             sc_core::sc_time::operator*=(double)
  0.09     10.44     0.01                             sc_core::sc_module::wait(sc_core::sc_time const&)
  0.09     10.45     0.01                             sc_core::sc_signal<bool, (sc_core::sc_writer_policy)0>::operator=(sc_core::sc_signal<bool, (sc_core::sc_writer_policy)0> const&)
  0.09     10.46     0.01                             sc_core::operator+(sc_core::sc_time const&, sc_core::sc_time const&)
  0.09     10.47     0.01                             sca_tdf::sca_module::register_post_method(void (sca_tdf::sca_module::*)())
  0.09     10.48     0.01                             sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::start_of_simulation()
  0.09     10.49     0.01                             sca_tdf::sca_de::sca_out<bool>::write(bool const&, unsigned long)
  0.09     10.50     0.01                             sca_core::sca_module::is_initialize_executing()
  0.09     10.51     0.01                             sca_core::sca_module::is_processing_executing()
  0.09     10.52     0.01                             sca_core::sca_module::is_change_attributes_executing()
  0.09     10.53     0.01                             sca_core::sca_module::elaborate()
  0.09     10.54     0.01                             sca_core::sca_implementation::sca_port_base::get_sc_value_on_time(sc_core::sc_time, sca_core::sca_implementation::sca_sync_value_handle_base&)
  0.09     10.55     0.01                             sca_core::sca_implementation::sca_port_base::register_sca_schedule(sc_core::sc_time, sca_core::sca_implementation::sca_sync_value_handle_base&)
  0.09     10.56     0.01                             sca_core::sca_implementation::sca_signed_time::sca_signed_time(sc_core::sc_time const&)
  0.09     10.57     0.01                             sca_core::sca_implementation::sca_solver_base::add_solver_trace(sca_util::sca_implementation::sca_trace_object_data&)
  0.09     10.58     0.01                             sca_core::sca_implementation::NOT_VALID_SCA_TIME()
  0.09     10.59     0.01                             sca_core::sca_implementation::sca_sync_value_handle<bool>::write_tmp(bool)
  0.09     10.60     0.01                             sca_core::sca_implementation::sca_synchronization_layer_process::wait_for_next_start()
  0.09     10.61     0.01                             loop_filt::processing()
  0.09     10.62     0.01                             sc_core::sc_object::simcontext() const
  0.09     10.63     0.01                             sca_tdf::sca_implementation::sca_port_attributes::get_delay_internal() const
  0.09     10.64     0.01                             sca_tdf::sca_de::sca_in<bool>::is_delay_changed() const
  0.09     10.65     0.01                             sca_tdf::sca_de::sca_in<bool>::get_rate() const
  0.09     10.66     0.01                             sca_tdf::sca_in<double>::read(unsigned long) const
  0.09     10.67     0.01                             sca_core::sca_module::get_max_timestep() const
  0.09     10.68     0.01                             sca_tdf::sca_signal_if<double>** std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<sca_tdf::sca_signal_if<double>*>(sca_tdf::sca_signal_if<double>* const*, sca_tdf::sca_signal_if<double>* const*, sca_tdf::sca_signal_if<double>**)
  0.09     10.69     0.01                             sparse_write_value
  0.05     10.69     0.01                             sca_tdf::sca_implementation::sca_ct_delay_buffer<double>::get_value(sca_core::sca_implementation::sca_signed_time, double&)
  0.05     10.70     0.01                             sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time, sca_util::sca_vector<double>&, sca_util::sca_vector<double> const&, double, sc_core::sc_time)
  0.05     10.70     0.01                             sca_core::sca_implementation::sca_sync_value_handle_base::get_index()
  0.05     10.71     0.01                             sca_core::sca_implementation::sca_sync_value_handle_base::set_index(long)
  0.05     10.71     0.01                             sca_util::sca_implementation::sca_matrix_base<double>::get_sparse_matrix()
  0.05     10.72     0.01                             sca_util::sca_implementation::sca_matrix_base_typeless::set_ignore_negative()
  0.05     10.72     0.01                             non-virtual thunk to sca_core::sca_implementation::sca_port_impl<sc_core::sc_signal_in_if<bool> >::sc_get_interface() const

the sys_rand::mrand() and sys_rand::gasdev()  are my own classes for generating random variables, mrand is a Mersenne Twister and gasdev generates a gaussian distribution, vco_1f and cp_1f are vco and chargepump blocks with flicker noise - I had been working on improving these classes and blocks since I suspected that their implementation was sub optimal, and this proves it !

		     Call graph (explanation follows)


granularity: each sample hit covers 2 byte(s) for 0.09% of 10.72 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]      6.2    0.67    0.00                 sys_rand::mrand() [1]
-----------------------------------------------
                                                 <spontaneous>
[2]      5.6    0.60    0.00                 sca_util::sca_implementation::sca_matrix_base<double>::resize(unsigned long, unsigned long) [2]
-----------------------------------------------
                                                 <spontaneous>
[3]      5.0    0.54    0.00                 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::setup_equation_system() [3]
-----------------------------------------------
                                                 <spontaneous>
[4]      4.9    0.53    0.00                 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd_common(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time const&) [4]
-----------------------------------------------
                                                 <spontaneous>
[5]      4.3    0.46    0.00                 sca_tdf::sca_implementation::sca_port_attributes::get_time_internal(unsigned long) const [5]
-----------------------------------------------
                                                 <spontaneous>
[6]      3.8    0.41    0.00                 sys_rand::gasdev() [6]
-----------------------------------------------
                                                 <spontaneous>
[7]      3.7    0.40    0.00                 MA_LequSparseCodegen [7]
-----------------------------------------------
                                                 <spontaneous>
[8]      3.5    0.38    0.00                 sca_util::sca_implementation::sca_matrix_base<double>::operator=(sca_util::sca_implementation::sca_matrix_base<double> const&) [8]
-----------------------------------------------
                                                 <spontaneous>
[9]      3.0    0.32    0.00                 sca_util::sca_implementation::sca_matrix_base<sca_util::sca_vector<double> >::operator()(long) [9]
-----------------------------------------------
                                                 <spontaneous>
[10]     2.9    0.31    0.00                 sparse_get_value [10]
-----------------------------------------------
                                                 <spontaneous>
[11]     2.9    0.31    0.00                 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::initialize() [11]
-----------------------------------------------
                                                 <spontaneous>
[12]     2.7    0.29    0.00                 sca_util::sca_implementation::sca_matrix_base<double>::operator()(long) const [12]
-----------------------------------------------
                                                 <spontaneous>
[13]     2.4    0.26    0.00                 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::coeff_changed(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&) [13]
-----------------------------------------------
                                                 <spontaneous>
[14]     2.2    0.24    0.00                 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::calculate(double) [14]
-----------------------------------------------
                                                 <spontaneous>
[15]     2.1    0.23    0.00                 MA_GenerateSumMatrixWeighted [15]
-----------------------------------------------
                                                 <spontaneous>
[16]     2.1    0.23    0.00                 sparse_resize [16]
-----------------------------------------------
                0.01    0.04 1000000/4000000     sca_core::sca_implementation::sca_synchronization_alg::schedule_element::run() [23]
                0.04    0.11 3000000/4000000     sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::operator=(double const&) [19]
[17]     1.9    0.05    0.15 4000000         sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [17]
                0.15    0.00 4000000/4000000     sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double) [22]
-----------------------------------------------
                                                 <spontaneous>
[18]     1.8    0.19    0.00                 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::convert_to_double() [18]
-----------------------------------------------
                                                 <spontaneous>
[19]     1.7    0.03    0.15                 sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::operator=(double const&) [19]
                0.04    0.11 3000000/4000000     sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [17]
-----------------------------------------------
                                                 <spontaneous>
[20]     1.5    0.16    0.00                 sca_tdf::sca_ct_proxy::to_double() const [20]
-----------------------------------------------
                                                 <spontaneous>
[21]     1.4    0.15    0.00                 sparse_get_value_ref [21]
-----------------------------------------------
                0.15    0.00 4000000/4000000     sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [17]
[22]     1.4    0.15    0.00 4000000         sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double) [22]
-----------------------------------------------
                                                 <spontaneous>
[23]     1.4    0.09    0.06                 sca_core::sca_implementation::sca_synchronization_alg::schedule_element::run() [23]
                0.01    0.04 1000000/4000000     sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [17]
                0.01    0.00 1000000/1000000     constant::processing() [102]
-----------------------------------------------
                                                 <spontaneous>
[24]     1.4    0.15    0.00                 sca_tdf::sca_ct_proxy::operator double() const [24]
-----------------------------------------------
                                                 <spontaneous>
[25]     1.4    0.15    0.00                 sca_util::sca_implementation::sca_matrix_base<double>::operator()(long, long) [25]
-----------------------------------------------
                                                 <spontaneous>
[26]     1.1    0.12    0.00                 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time, sca_tdf::sca_de::sca_in<double> const&, double) [26]
-----------------------------------------------
                                                 <spontaneous>
[27]     1.1    0.12    0.00                 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time, double, double, sc_core::sc_time) [27]
-----------------------------------------------
                                                 <spontaneous>
[28]     1.1    0.04    0.08                 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::calculate_timeinterval(unsigned long&, long&, long, sca_core::sca_implementation::sca_signed_time&, sca_core::sca_implementation::sca_signed_time&, sc_core::sc_time&, sc_core::sc_time&) [28]
                0.04    0.03 1000000/1000000     sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double, unsigned long) [46]
                0.02    0.00 1000000/1000000     sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::write(double const&, unsigned long) [92]
-----------------------------------------------
                                                 <spontaneous>
[29]     1.0    0.11    0.00                 vco_1f::processing() [29]
-----------------------------------------------
                                                 <spontaneous>
[30]     1.0    0.11    0.00                 MA_GenerateProductValueSparse [30]
-----------------------------------------------
                                                 <spontaneous>
[31]     1.0    0.11    0.00                 MA_SortSparseColumms [31]
-----------------------------------------------
                                                 <spontaneous>
[32]     0.9    0.10    0.00                 sca_util::sca_implementation::sca_matrix_base<double>::operator[](unsigned long) const [32]
-----------------------------------------------
                                                 <spontaneous>
[33]     0.8    0.09    0.00                 MA_FreeSparse [33]
-----------------------------------------------
                                                 <spontaneous>
[34]     0.7    0.08    0.00                 ana_solv [34]
-----------------------------------------------
                                                 <spontaneous>
[35]     0.7    0.08    0.00                 cp_1f::processing() [35]
-----------------------------------------------
                                                 <spontaneous>
[36]     0.7    0.08    0.00                 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time, sca_util::sca_vector<double>&, double, double, sc_core::sc_time) [36]
-----------------------------------------------
                                                 <spontaneous>
[37]     0.7    0.08    0.00                 sca_util::sca_implementation::sca_matrix_base_typeless::sca_matrix_base_typeless(sca_util::sca_implementation::sca_matrix_base_typeless const&) [37]
-----------------------------------------------
                                                 <spontaneous>
[38]     0.7    0.03    0.05                 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read(unsigned long) const [38]
                0.05    0.00 4000000/4000000     sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::read(unsigned long, unsigned long) const [56]
-----------------------------------------------
                                                 <spontaneous>
[39]     0.7    0.05    0.02                 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::convert_to_sca_port(sca_tdf::sca_out_base<double>&) [39]
                0.01    0.00 1000000/1000000     sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::get_timestep(unsigned long) const [103]
                0.01    0.00 1000000/1000000     sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::get_rate() const [104]
                0.00    0.00 1000000/1000000     sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::get_time(unsigned long) const [183]
-----------------------------------------------
                                                 <spontaneous>
[40]     0.7    0.07    0.00                 sca_util::sca_implementation::sca_matrix_base<double>::set_sparse_mode() [40]
-----------------------------------------------
                                                 <spontaneous>
[41]     0.7    0.07    0.00                 sca_tdf::sca_implementation::sca_ct_delay_buffer<double>::~sca_ct_delay_buffer() [41]
-----------------------------------------------
                                                 <spontaneous>
[42]     0.7    0.07    0.00                 sca_core::sca_implementation::sca_solver_base::get_current_time() [42]
-----------------------------------------------
                                                 <spontaneous>
[43]     0.7    0.07    0.00                 sca_util::sca_implementation::sca_matrix_base<double>::get_ref_for_write(sparse_matrix*, long, long) [43]
-----------------------------------------------
                                                 <spontaneous>
[44]     0.7    0.07    0.00                 sca_core::sca_implementation::sca_signed_time::sca_signed_time(sca_core::sca_implementation::sca_signed_time const&) [44]
-----------------------------------------------
                                                 <spontaneous>
[45]     0.6    0.06    0.00                 MA_ProductSparseVector [45]
-----------------------------------------------
                0.04    0.03 1000000/1000000     sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::calculate_timeinterval(unsigned long&, long&, long, sca_core::sca_implementation::sca_signed_time&, sca_core::sca_implementation::sca_signed_time&, sc_core::sc_time&, sc_core::sc_time&) [28]
[46]     0.6    0.04    0.03 1000000         sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double, unsigned long) [46]
                0.01    0.02 1000000/1000000     sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double, unsigned long) [73]
-----------------------------------------------
                                                 <spontaneous>
[47]     0.5    0.04    0.02                 sca_core::sca_implementation::sca_solver_base::get_sc_value_on_time(sc_core::sc_time, sca_core::sca_implementation::sca_sync_value_handle_base&) [47]
                0.02    0.00 4858913/4858913     sca_core::sca_implementation::sca_sync_value_handle<bool>::read_tmp() [91]
-----------------------------------------------
                                                 <spontaneous>
[48]     0.5    0.05    0.00                 tocodedec [48]
-----------------------------------------------
                0.05    0.00 2000000/2000000     sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read() const [53]
[49]     0.5    0.05    0.00 2000000         sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::read(unsigned long) const [49]
-----------------------------------------------
                                                 <spontaneous>
[50]     0.5    0.05    0.00                 sca_core::sca_implementation::sca_solver_base::write_sc_value(sc_core::sc_time, sc_core::sc_time, sca_core::sca_implementation::sca_sync_value_handle_base&) [50]
                0.00    0.00 1000000/1000000     sca_core::sca_implementation::sca_sync_value_handle<bool>::store_tmp() [182]
                0.00    0.00       1/5           sca_core::sca_implementation::sca_sync_value_handle<bool>::resize(int) [205]
                0.00    0.00       1/1           sca_core::sca_implementation::sca_sync_value_handle<bool>::backup_tmp() [224]
                0.00    0.00       1/1           sca_core::sca_implementation::sca_sync_value_handle<bool>::restore_tmp() [225]
-----------------------------------------------
                                                 <spontaneous>
[51]     0.5    0.05    0.00                 sca_util::sca_implementation::sca_matrix_base<double>::remove() [51]
-----------------------------------------------
                                                 <spontaneous>
[52]     0.5    0.05    0.00                 sca_util::sca_implementation::sca_matrix_base<double>::get_flat() [52]
-----------------------------------------------
                                                 <spontaneous>
[53]     0.5    0.00    0.05                 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read() const [53]
                0.05    0.00 2000000/2000000     sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::read(unsigned long) const [49]
-----------------------------------------------
                                                 <spontaneous>
[54]     0.5    0.05    0.00                 std::valarray<sca_util::sca_vector<double> >::operator[](unsigned long) [54]
-----------------------------------------------
                                                 <spontaneous>
[55]     0.4    0.05    0.00                 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::initialize_equation_system(int, double) [55]
-----------------------------------------------
                0.05    0.00 4000000/4000000     sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read(unsigned long) const [38]
[56]     0.4    0.05    0.00 4000000         sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::read(unsigned long, unsigned long) const [56]
-----------------------------------------------
                                                 <spontaneous>
[57]     0.4    0.05    0.00                 sca_util::sca_implementation::sca_matrix_base<double>::resize(unsigned long) [57]

BTW: This is just the top few lines from the profiler !

Curious to know what others have to say.

My take on this (and it's something I've suspected for sometime now and the profile results  would seem to backup) is that the matrix solving isn't very efficient in the fraunhofer implementation.

Share this post


Link to post
Share on other sites

Hi,

 

   I've been reading the forum for some time now but this is my first post. I've been using SystemC AMS for mixed signal modelling - mainly fracN PLL's, mixing verilator to generate verilog based systemC modules and SystemC AMS/ System C modules for analog/mixed signal blocks.

 

I have been using CppSim previously for the same tasks. the System C AMS implementation is significantly slower than using CppSim. I've hacked together a really simple VCO (voltage controlled oscillator) model as an example:

#include <systemc-ams.h>

SCA_TDF_MODULE(con)
{
	sca_tdf::sca_out<double> outp;

	SCA_CTOR(con) {};

	void processing()
	{
	outp = 0;
	}
};

SCA_TDF_MODULE(vco)   
{
	sca_tdf::sca_out<double> out;

	sca_tdf::sca_in<double> vin;
	
	double ampl;   // output amplitude (v)
	double freq;   // centre frequency (Hz)
	double kvco;   // (Hz/V)
	
	vco ( sc_core::sc_module_name m, double ampl_ = 1.0, double freq_ = 1.0e9, double kvco_ = 1.0)  :
		ampl(ampl_), freq(freq_), kvco(kvco_) {} 
	
	void set_attributes(){}

	void processing()
	{
		double ts;
		ts = get_timestep().to_seconds();
		out = ampl*sin(2.0*M_PI*(freq+(kvco*vin.read()))*ts);
	}

};

int sc_main(int argc,char* argv[])
{

	sca_tdf::sca_signal<double> op;	
	sca_tdf::sca_signal<double> vin;	

	sca_trace_file* tr=sca_create_tabular_trace_file("vco_tb");

	vco DUT("DUT");
	DUT.out(op);
	DUT.vin(vin);
	DUT.set_timestep(1, SC_NS);

	con DUT2("DUT2");
	DUT2.outp(vin);

	sca_trace(tr,op,"op");
	sc_core::sc_start(1,SC_MS);
	
	sca_close_tabular_trace_file(tr);

	return 0;
}

        SystemC 2.3.1-Accellera --- May 20 2015 15:16:23
        Copyright (c) 1996-2014 by all Contributors,
        ALL RIGHTS RESERVED

     SystemC AMS extensions 2.0 Version: 2.0_beta1 --- BuildRevision: 1739  20140531
    Copyright (c) 2010-2014  by Fraunhofer-Gesellschaft
          Institut Integrated Circuits / EAS
      Licensed under the Apache License, Version 2.0

Info: SystemC-AMS:
    2 SystemC-AMS modules instantiated
    1 SystemC-AMS views created
    2 SystemC-AMS synchronization objects/solvers instantiated

Info: SystemC-AMS:
    1 dataflow clusters instantiated
      cluster 0:
        2 dataflow modules/solver, contains e.g. module: DUT
        2 elements in schedule list,  
        1 ns cluster period,
        ratio to lowest:  1              e.g. module: DUT
        ratio to highest: 1 sample time  e.g. module: DUT
        0 connections to SystemC de, 0 connections from SystemC de

real    0m4.771s
user    0m3.142s
sys    0m1.629s

the equivalent implementation runs in 105mS in CppSim.

I'm using the same 1nS time step and 1e6 samples in both cases.

(BTW: If I run for say 1e7 samples the ratio of the difference remains roughly the same)

 

Any ideas on how to increase the speed for the SystemC AMS or comments on improving the code for speed would be much appreciated.

 

Thanks in advance

 

Hello Sir,

I am a little bit curious as to why you consider the simulation speed to be so

important. SystemC-AMS is only for high-level behavioral modelling, so at

the end, speed of simulation really does not matter. Rather, what matters is

if the simulation results match what you expect to see.

What machine are you running your eimulation on, and what compiler was

used to compile CppSim ? Please note that proprietary Intel C++ compilers

optimize the heck out of every program to be compiled, and the execution

time of programs compiled with Intel compilers is very fast compared to that

pf those compiled with gcc.

 

Share this post


Link to post
Share on other sites

Hi Dakupoto,

   I'm not really sure why you would not consider simulation speed to be important.

I can only refer to my own experience using SystemC AMS for virtual prototyping of a real system to be implemented. The advantage of virtual prototyping is that a number of iterations, architectural changes and detail level refinements etc. can be applied to the model before arriving at an adequate solution that meets the project specification. The longer the virtual prototype takes to execute the less time can be spent to iterate and refine the design. Time being a finite commodity !

The results above are all for the same machine with an intel i5-3570 3.4GHz cpu (admittedly not the fastest in existance but no sloth either) 8G RAM running linux and using the standard g++ compiler version 4.8.4.
BTW the Cppsim results are for a straight compilation with no optimisation, whereas the SystemC AMS has been compiled with the -03 option. I can't comment (or rather won't !) on the Intel compiler versus g++ versus llvm or whatever.

best regards

trev

Share this post


Link to post
Share on other sites

Hello all,

I had a quick look to the profiler results.

 

For the large example this lines:

5.55 1.26 0.60 sca_util::sca_implementation::sca_matrix_base<double>::resize(unsigned long, unsigned long)
4.99 1.80 0.54 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::setup_equation_system()
4.94 2.33 0.53 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd_common(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time const&)

 

indicate, that the model may can be improved. It seems, that an/the ltf objects detect a coefficient change may in each timestep, which results in an equation system re-initialization. I recommend, to initialize the coefficients (e.g. num/den vectors) in the initialize callback and do not touch them in the processing callback. If you switch between an restricted number of coefficient sets, use for each set a separate ltf object and switch between the objects (you can use the same state vector to hold the states). this prevents the time consuming equation system re-initialization.

In the current version, the change detection is simply done by detecting a write access to the vector/matrix - this is improved in the next version, in which we check whether the value has been realy changed.

 

 

For the small example, it looks like, that the most time is spent for context switching:

30.00 0.03 0.03 sca_core::sca_implementation::sca_synchronization_layer_process::wait_for_next_start()

 

In the current implementation a SystemC-AMS cluster is embedded in a SystemC Thread. At least at the end of each SystemC-AMS clusterexecution the SystemC-AMS time is synchronized with the SystemC time. Therefore a context switch is required. So the performance depends on which SystemC version, with which thread implementation on which operating system you are using. The fastest version is usually the qt-Thread version under Linux and the slowest one the pthread versions.

 

 

20.00 0.05 0.02 sca_core::sca_implementation::sca_solver_base::get_current_period()

This is may be an optimization problem (depending on your platform). In this function calculations with sc_time objects (64Bit integer values) are done. I saw for some examples, that those calculations at some plattforms are suprisingly slow without the highest optimization level.

 

Best regards

Karsten

Share this post


Link to post
Share on other sites

Hi Karsten,

 

  Many thanks for the reply.

Regarding your comments on the large example: The code I profiled above has a block something like outlined below:

SCA_TDF_MODULE(vco_1f)
{
    sca_tdf::sca_out<double> out;
            :
            :
    sca_tdf::sca_ltf_nd  ltf_nd;
            
    sca_util::sca_vector < sca_util::sca_vector < double > > num, den;
            :
            :
};
void vco_1f::initialize()
{

    num(0)(0)    =    something;
    num(1)(0)    =    something_else;
            :
            :
    num(8)(0)    =    something_else_again;

    den(0)(1)    =     something_den;
    den(1)(1)    =     something_else_den;
            :
            :
    den(8)(1)    =     something_else_again_den;

    for ( int i = 0; i < 9; i++)
        {
            den(i)(0) = 1.0;
        }
}

void vco_1f::processing()
{


    double fnoise;

    
    fnoise = PRNG.generate();
    fnoise = fnoise + ltf_nd(num(0), den(0), PRNG.generate(), 1.0);
            :
            :
    fnoise = fnoise + ltf_nd(num(8), den(8), PRNG.generate(), 1.0);
            :
}

As you say in the message the input to the ltf_nd changes during the processing step but the coefficients do not - they are defined during initialize. Splitting the ltf_nd into 9 different ltf_nd objects however results in a significant speed improvement !

real    0m8.105s
user    0m8.107s
sys    0m0.000s

( non optimised) which is much, much closer to the CppSim values.

 

So I take back everything I said about the matrix solving !

 

 

best regards

 

trev

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×