Jump to content

trev

Members
  • Content Count

    8
  • Joined

  • Last visited

About trev

  • Rank
    Member

Profile Information

  • Gender
    Male
  • Location
    France

Recent Profile Visitors

185 profile views
  1. trev

    Simulation Speed

    Hi Karsten, Many thanks for the reply. Regarding your comments on the large example: The code I profiled above has a block something like outlined below: SCA_TDF_MODULE(vco_1f) { sca_tdf::sca_out<double> out; : : sca_tdf::sca_ltf_nd ltf_nd; sca_util::sca_vector < sca_util::sca_vector < double > > num, den; : : }; void vco_1f::initialize() { num(0)(0) = something; num(1)(0) = something_else; : : num(8)(0) = something_else_again; den(0)(1) = something_den; den(1)(1) = something_else_den; : : den(8)(1) = something_else_again_den; for ( int i = 0; i < 9; i++) { den(i)(0) = 1.0; } } void vco_1f::processing() { double fnoise; fnoise = PRNG.generate(); fnoise = fnoise + ltf_nd(num(0), den(0), PRNG.generate(), 1.0); : : fnoise = fnoise + ltf_nd(num(8), den(8), PRNG.generate(), 1.0); : } As you say in the message the input to the ltf_nd changes during the processing step but the coefficients do not - they are defined during initialize. Splitting the ltf_nd into 9 different ltf_nd objects however results in a significant speed improvement ! real 0m8.105s user 0m8.107s sys 0m0.000s ( non optimised) which is much, much closer to the CppSim values. So I take back everything I said about the matrix solving ! best regards trev
  2. trev

    Simulation Speed

    Hi Dakupoto, I'm not really sure why you would not consider simulation speed to be important. I can only refer to my own experience using SystemC AMS for virtual prototyping of a real system to be implemented. The advantage of virtual prototyping is that a number of iterations, architectural changes and detail level refinements etc. can be applied to the model before arriving at an adequate solution that meets the project specification. The longer the virtual prototype takes to execute the less time can be spent to iterate and refine the design. Time being a finite commodity ! The results above are all for the same machine with an intel i5-3570 3.4GHz cpu (admittedly not the fastest in existance but no sloth either) 8G RAM running linux and using the standard g++ compiler version 4.8.4. BTW the Cppsim results are for a straight compilation with no optimisation, whereas the SystemC AMS has been compiled with the -03 option. I can't comment (or rather won't !) on the Intel compiler versus g++ versus llvm or whatever. best regards trev
  3. trev

    Simulation Speed

    profiling on the larger example Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 6.20 0.67 0.67 sys_rand::mrand() 5.55 1.26 0.60 sca_util::sca_implementation::sca_matrix_base<double>::resize(unsigned long, unsigned long) 4.99 1.80 0.54 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::setup_equation_system() 4.94 2.33 0.53 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd_common(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time const&) 4.29 2.79 0.46 sca_tdf::sca_implementation::sca_port_attributes::get_time_internal(unsigned long) const 3.82 3.20 0.41 sys_rand::gasdev() 3.68 3.59 0.40 MA_LequSparseCodegen 3.50 3.97 0.38 sca_util::sca_implementation::sca_matrix_base<double>::operator=(sca_util::sca_implementation::sca_matrix_base<double> const&) 2.99 4.29 0.32 sca_util::sca_implementation::sca_matrix_base<sca_util::sca_vector<double> >::operator()(long) 2.89 4.60 0.31 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::initialize() 2.89 4.91 0.31 sparse_get_value 2.66 5.19 0.29 sca_util::sca_implementation::sca_matrix_base<double>::operator()(long) const 2.43 5.45 0.26 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::coeff_changed(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&) 2.24 5.69 0.24 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::calculate(double) 2.15 5.92 0.23 MA_GenerateSumMatrixWeighted 2.15 6.15 0.23 sparse_resize 1.77 6.34 0.19 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::convert_to_double() 1.49 6.50 0.16 sca_tdf::sca_ct_proxy::to_double() const 1.40 6.65 0.15 4000000 37.50 37.50 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double) 1.40 6.80 0.15 sca_tdf::sca_ct_proxy::operator double() const 1.40 6.95 0.15 sparse_get_value_ref 1.35 7.10 0.15 sca_util::sca_implementation::sca_matrix_base<double>::operator()(long, long) 1.12 7.22 0.12 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time, sca_tdf::sca_de::sca_in<double> const&, double) 1.07 7.33 0.12 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time, double, double, sc_core::sc_time) 1.03 7.44 0.11 vco_1f::processing() 1.03 7.55 0.11 MA_GenerateProductValueSparse 1.03 7.66 0.11 MA_SortSparseColumms 0.89 7.76 0.10 sca_util::sca_implementation::sca_matrix_base<double>::operator[](unsigned long) const 0.84 7.85 0.09 MA_FreeSparse 0.84 7.94 0.09 sca_core::sca_implementation::sca_synchronization_alg::schedule_element::run() 0.75 8.02 0.08 cp_1f::processing() 0.75 8.10 0.08 ana_solv 0.70 8.17 0.08 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time, sca_util::sca_vector<double>&, double, double, sc_core::sc_time) 0.70 8.25 0.08 sca_util::sca_implementation::sca_matrix_base_typeless::sca_matrix_base_typeless(sca_util::sca_implementation::sca_matrix_base_typeless const&) 0.65 8.32 0.07 sca_util::sca_implementation::sca_matrix_base<double>::set_sparse_mode() 0.65 8.39 0.07 sca_tdf::sca_implementation::sca_ct_delay_buffer<double>::~sca_ct_delay_buffer() 0.65 8.46 0.07 sca_core::sca_implementation::sca_solver_base::get_current_time() 0.65 8.53 0.07 sca_util::sca_implementation::sca_matrix_base<double>::get_ref_for_write(sparse_matrix*, long, long) 0.65 8.60 0.07 sca_core::sca_implementation::sca_signed_time::sca_signed_time(sca_core::sca_implementation::sca_signed_time const&) 0.56 8.66 0.06 MA_ProductSparseVector 0.47 8.71 0.05 4000000 12.50 50.00 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) 0.47 8.76 0.05 2000000 25.00 25.00 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::read(unsigned long) const 0.47 8.81 0.05 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::convert_to_sca_port(sca_tdf::sca_out_base<double>&) 0.47 8.86 0.05 sca_core::sca_implementation::sca_solver_base::write_sc_value(sc_core::sc_time, sc_core::sc_time, sca_core::sca_implementation::sca_sync_value_handle_base&) 0.47 8.91 0.05 sca_util::sca_implementation::sca_matrix_base<double>::remove() 0.47 8.96 0.05 sca_util::sca_implementation::sca_matrix_base<double>::get_flat() 0.47 9.01 0.05 std::valarray<sca_util::sca_vector<double> >::operator[](unsigned long) 0.47 9.06 0.05 tocodedec 0.42 9.10 0.05 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::initialize_equation_system(int, double) 0.42 9.15 0.05 4000000 11.25 11.25 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::read(unsigned long, unsigned long) const 0.42 9.19 0.05 sca_util::sca_implementation::sca_matrix_base<double>::resize(unsigned long) 0.37 9.23 0.04 sc_core::sc_spawn_object<sc_boost::_bi::bind_t<void, sc_boost::_mfi::mf1<void, sca_core::sca_implementation::sca_solver_base, int>, sc_boost::_bi::list2<sc_boost::_bi::value<sca_core::sca_implementation::sca_solver_base*>, sc_boost::_bi::value<long> > > >::~sc_spawn_object() 0.37 9.27 0.04 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::calculate_timeinterval(unsigned long&, long&, long, sca_core::sca_implementation::sca_signed_time&, sca_core::sca_implementation::sca_signed_time&, sc_core::sc_time&, sc_core::sc_time&) 0.37 9.31 0.04 sca_core::sca_implementation::sca_solver_base::get_current_period() 0.37 9.35 0.04 sca_core::sca_implementation::sca_solver_base::get_sc_value_on_time(sc_core::sc_time, sca_core::sca_implementation::sca_sync_value_handle_base&) 0.37 9.39 0.04 sca_core::sca_implementation::sca_synchronization_layer_process::cluster_process() 0.37 9.43 0.04 sca_util::sca_vector<sca_util::sca_vector<double> >::operator()(unsigned long) 0.37 9.47 0.04 sca_core::sca_module::get_timestep() const 0.33 9.51 0.04 MA_LequSparseSolut 0.33 9.54 0.04 1000000 35.00 60.00 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double, unsigned long) 0.33 9.58 0.04 sca_util::sca_implementation::sca_matrix_base_typeless::dimx() const 0.28 9.61 0.03 MA_ConvertFullToSparse 0.28 9.64 0.03 vco_1f::initialize() 0.28 9.67 0.03 sca_tdf::sca_ltf_nd::calculate(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, double, double, sc_core::sc_time const&) 0.28 9.70 0.03 sca_tdf::sca_de::sca_in<bool>::read(unsigned long) 0.28 9.73 0.03 sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::operator=(double const&) 0.28 9.76 0.03 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read(unsigned long) const 0.28 9.79 0.03 sca_core::sca_implementation::sca_port_base::get_if_id() const 0.28 9.82 0.03 sca_util::sca_implementation::sca_matrix_base<double>::write_pending() const 0.28 9.85 0.03 ana_init_sparse 0.23 9.87 0.03 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::get_in_value_by_index(unsigned long) 0.23 9.90 0.03 sca_util::sca_implementation::sca_matrix_base<double>::sca_matrix_base(sca_util::sca_implementation::sca_matrix_base<double> const&) 0.23 9.92 0.03 sca_util::sca_implementation::sca_matrix_base<double>::operator[](unsigned long) 0.19 9.94 0.02 MA_CopySparse 0.19 9.96 0.02 MA_InitSparse 0.19 9.98 0.02 MA_SumMatrixWeighted 0.19 10.00 0.02 sc_dt::uint64_to_double(unsigned long long) 0.19 10.02 0.02 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::set_timestep(double, sc_core::sc_time_unit) 0.19 10.04 0.02 sca_tdf::sca_implementation::sca_ct_delay_buffer<double>::~sca_ct_delay_buffer() 0.19 10.06 0.02 sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::set_timestep(double, sc_core::sc_time_unit) 0.19 10.08 0.02 sca_core::sca_module::get_sync_domain() 0.19 10.10 0.02 sca_core::sca_implementation::sca_solver_base::sc_write_value_process(int) 0.19 10.12 0.02 sca_core::sca_implementation::sca_synchronization_layer_process::sca_synchronization_layer_process(sca_core::sca_implementation::sca_synchronization_alg::sca_cluster_objT*) 0.19 10.14 0.02 sca_util::sca_implementation::sca_matrix_base<sca_util::sca_vector<double> >::resize(unsigned long, unsigned long) 0.19 10.16 0.02 sca_tdf::sca_implementation::sca_port_attributes::get_rate_internal() const 0.19 10.18 0.02 sca_tdf::sca_implementation::sca_port_attributes::get_time(unsigned long) const 0.19 10.20 0.02 sca_tdf::sca_implementation::sca_tdf_signal_impl_base::get_timestep_calculated_ref(unsigned long) const 0.19 10.22 0.02 sca_core::sca_implementation::sca_signed_time::operator>(sc_core::sc_time const&) const 0.14 10.24 0.02 4858913 3.09 3.09 sca_core::sca_implementation::sca_sync_value_handle<bool>::read_tmp() 0.14 10.25 0.02 1000000 15.00 15.00 sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::write(double const&, unsigned long) 0.14 10.27 0.02 1000000 15.00 15.00 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::get_ref_for_write(unsigned long, unsigned long) const 0.14 10.28 0.02 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::set_max_delay(sc_core::sc_time) 0.14 10.30 0.02 sca_util::sca_implementation::sca_matrix_base<double>::sca_matrix_base(unsigned long, unsigned long, bool) 0.14 10.31 0.02 sca_util::sca_implementation::sca_matrix_base_typeless::reset_access_flag() 0.14 10.33 0.02 sc_core::sc_in<unsigned int>::operator unsigned int const&() const 0.09 10.34 0.01 1000000 10.00 25.00 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double, unsigned long) 0.09 10.35 0.01 1000000 10.00 10.00 sca_tdf::sca_de::sca_out<bool>::write_sc_signal() 0.09 10.36 0.01 1000000 10.00 10.00 constant::processing() 0.09 10.37 0.01 1000000 10.00 10.00 sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::get_timestep(unsigned long) const 0.09 10.38 0.01 1000000 10.00 10.00 sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::get_rate() const 0.09 10.39 0.01 MA_ReallocSparse 0.09 10.40 0.01 MA_SortSparseList 0.09 10.41 0.01 ps_divider::ps_div_val() 0.09 10.42 0.01 ps_divider::ps_div_core() 0.09 10.43 0.01 sc_core::sc_time::operator*=(double) 0.09 10.44 0.01 sc_core::sc_module::wait(sc_core::sc_time const&) 0.09 10.45 0.01 sc_core::sc_signal<bool, (sc_core::sc_writer_policy)0>::operator=(sc_core::sc_signal<bool, (sc_core::sc_writer_policy)0> const&) 0.09 10.46 0.01 sc_core::operator+(sc_core::sc_time const&, sc_core::sc_time const&) 0.09 10.47 0.01 sca_tdf::sca_module::register_post_method(void (sca_tdf::sca_module::*)()) 0.09 10.48 0.01 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::start_of_simulation() 0.09 10.49 0.01 sca_tdf::sca_de::sca_out<bool>::write(bool const&, unsigned long) 0.09 10.50 0.01 sca_core::sca_module::is_initialize_executing() 0.09 10.51 0.01 sca_core::sca_module::is_processing_executing() 0.09 10.52 0.01 sca_core::sca_module::is_change_attributes_executing() 0.09 10.53 0.01 sca_core::sca_module::elaborate() 0.09 10.54 0.01 sca_core::sca_implementation::sca_port_base::get_sc_value_on_time(sc_core::sc_time, sca_core::sca_implementation::sca_sync_value_handle_base&) 0.09 10.55 0.01 sca_core::sca_implementation::sca_port_base::register_sca_schedule(sc_core::sc_time, sca_core::sca_implementation::sca_sync_value_handle_base&) 0.09 10.56 0.01 sca_core::sca_implementation::sca_signed_time::sca_signed_time(sc_core::sc_time const&) 0.09 10.57 0.01 sca_core::sca_implementation::sca_solver_base::add_solver_trace(sca_util::sca_implementation::sca_trace_object_data&) 0.09 10.58 0.01 sca_core::sca_implementation::NOT_VALID_SCA_TIME() 0.09 10.59 0.01 sca_core::sca_implementation::sca_sync_value_handle<bool>::write_tmp(bool) 0.09 10.60 0.01 sca_core::sca_implementation::sca_synchronization_layer_process::wait_for_next_start() 0.09 10.61 0.01 loop_filt::processing() 0.09 10.62 0.01 sc_core::sc_object::simcontext() const 0.09 10.63 0.01 sca_tdf::sca_implementation::sca_port_attributes::get_delay_internal() const 0.09 10.64 0.01 sca_tdf::sca_de::sca_in<bool>::is_delay_changed() const 0.09 10.65 0.01 sca_tdf::sca_de::sca_in<bool>::get_rate() const 0.09 10.66 0.01 sca_tdf::sca_in<double>::read(unsigned long) const 0.09 10.67 0.01 sca_core::sca_module::get_max_timestep() const 0.09 10.68 0.01 sca_tdf::sca_signal_if<double>** std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<sca_tdf::sca_signal_if<double>*>(sca_tdf::sca_signal_if<double>* const*, sca_tdf::sca_signal_if<double>* const*, sca_tdf::sca_signal_if<double>**) 0.09 10.69 0.01 sparse_write_value 0.05 10.69 0.01 sca_tdf::sca_implementation::sca_ct_delay_buffer<double>::get_value(sca_core::sca_implementation::sca_signed_time, double&) 0.05 10.70 0.01 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time, sca_util::sca_vector<double>&, sca_util::sca_vector<double> const&, double, sc_core::sc_time) 0.05 10.70 0.01 sca_core::sca_implementation::sca_sync_value_handle_base::get_index() 0.05 10.71 0.01 sca_core::sca_implementation::sca_sync_value_handle_base::set_index(long) 0.05 10.71 0.01 sca_util::sca_implementation::sca_matrix_base<double>::get_sparse_matrix() 0.05 10.72 0.01 sca_util::sca_implementation::sca_matrix_base_typeless::set_ignore_negative() 0.05 10.72 0.01 non-virtual thunk to sca_core::sca_implementation::sca_port_impl<sc_core::sc_signal_in_if<bool> >::sc_get_interface() const the sys_rand::mrand() and sys_rand::gasdev() are my own classes for generating random variables, mrand is a Mersenne Twister and gasdev generates a gaussian distribution, vco_1f and cp_1f are vco and chargepump blocks with flicker noise - I had been working on improving these classes and blocks since I suspected that their implementation was sub optimal, and this proves it ! Call graph (explanation follows) granularity: each sample hit covers 2 byte(s) for 0.09% of 10.72 seconds index % time self children called name <spontaneous> [1] 6.2 0.67 0.00 sys_rand::mrand() [1] ----------------------------------------------- <spontaneous> [2] 5.6 0.60 0.00 sca_util::sca_implementation::sca_matrix_base<double>::resize(unsigned long, unsigned long) [2] ----------------------------------------------- <spontaneous> [3] 5.0 0.54 0.00 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::setup_equation_system() [3] ----------------------------------------------- <spontaneous> [4] 4.9 0.53 0.00 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd_common(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time const&) [4] ----------------------------------------------- <spontaneous> [5] 4.3 0.46 0.00 sca_tdf::sca_implementation::sca_port_attributes::get_time_internal(unsigned long) const [5] ----------------------------------------------- <spontaneous> [6] 3.8 0.41 0.00 sys_rand::gasdev() [6] ----------------------------------------------- <spontaneous> [7] 3.7 0.40 0.00 MA_LequSparseCodegen [7] ----------------------------------------------- <spontaneous> [8] 3.5 0.38 0.00 sca_util::sca_implementation::sca_matrix_base<double>::operator=(sca_util::sca_implementation::sca_matrix_base<double> const&) [8] ----------------------------------------------- <spontaneous> [9] 3.0 0.32 0.00 sca_util::sca_implementation::sca_matrix_base<sca_util::sca_vector<double> >::operator()(long) [9] ----------------------------------------------- <spontaneous> [10] 2.9 0.31 0.00 sparse_get_value [10] ----------------------------------------------- <spontaneous> [11] 2.9 0.31 0.00 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::initialize() [11] ----------------------------------------------- <spontaneous> [12] 2.7 0.29 0.00 sca_util::sca_implementation::sca_matrix_base<double>::operator()(long) const [12] ----------------------------------------------- <spontaneous> [13] 2.4 0.26 0.00 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::coeff_changed(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&) [13] ----------------------------------------------- <spontaneous> [14] 2.2 0.24 0.00 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::calculate(double) [14] ----------------------------------------------- <spontaneous> [15] 2.1 0.23 0.00 MA_GenerateSumMatrixWeighted [15] ----------------------------------------------- <spontaneous> [16] 2.1 0.23 0.00 sparse_resize [16] ----------------------------------------------- 0.01 0.04 1000000/4000000 sca_core::sca_implementation::sca_synchronization_alg::schedule_element::run() [23] 0.04 0.11 3000000/4000000 sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::operator=(double const&) [19] [17] 1.9 0.05 0.15 4000000 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [17] 0.15 0.00 4000000/4000000 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double) [22] ----------------------------------------------- <spontaneous> [18] 1.8 0.19 0.00 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::convert_to_double() [18] ----------------------------------------------- <spontaneous> [19] 1.7 0.03 0.15 sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::operator=(double const&) [19] 0.04 0.11 3000000/4000000 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [17] ----------------------------------------------- <spontaneous> [20] 1.5 0.16 0.00 sca_tdf::sca_ct_proxy::to_double() const [20] ----------------------------------------------- <spontaneous> [21] 1.4 0.15 0.00 sparse_get_value_ref [21] ----------------------------------------------- 0.15 0.00 4000000/4000000 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [17] [22] 1.4 0.15 0.00 4000000 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double) [22] ----------------------------------------------- <spontaneous> [23] 1.4 0.09 0.06 sca_core::sca_implementation::sca_synchronization_alg::schedule_element::run() [23] 0.01 0.04 1000000/4000000 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [17] 0.01 0.00 1000000/1000000 constant::processing() [102] ----------------------------------------------- <spontaneous> [24] 1.4 0.15 0.00 sca_tdf::sca_ct_proxy::operator double() const [24] ----------------------------------------------- <spontaneous> [25] 1.4 0.15 0.00 sca_util::sca_implementation::sca_matrix_base<double>::operator()(long, long) [25] ----------------------------------------------- <spontaneous> [26] 1.1 0.12 0.00 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time, sca_tdf::sca_de::sca_in<double> const&, double) [26] ----------------------------------------------- <spontaneous> [27] 1.1 0.12 0.00 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time, double, double, sc_core::sc_time) [27] ----------------------------------------------- <spontaneous> [28] 1.1 0.04 0.08 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::calculate_timeinterval(unsigned long&, long&, long, sca_core::sca_implementation::sca_signed_time&, sca_core::sca_implementation::sca_signed_time&, sc_core::sc_time&, sc_core::sc_time&) [28] 0.04 0.03 1000000/1000000 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double, unsigned long) [46] 0.02 0.00 1000000/1000000 sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::write(double const&, unsigned long) [92] ----------------------------------------------- <spontaneous> [29] 1.0 0.11 0.00 vco_1f::processing() [29] ----------------------------------------------- <spontaneous> [30] 1.0 0.11 0.00 MA_GenerateProductValueSparse [30] ----------------------------------------------- <spontaneous> [31] 1.0 0.11 0.00 MA_SortSparseColumms [31] ----------------------------------------------- <spontaneous> [32] 0.9 0.10 0.00 sca_util::sca_implementation::sca_matrix_base<double>::operator[](unsigned long) const [32] ----------------------------------------------- <spontaneous> [33] 0.8 0.09 0.00 MA_FreeSparse [33] ----------------------------------------------- <spontaneous> [34] 0.7 0.08 0.00 ana_solv [34] ----------------------------------------------- <spontaneous> [35] 0.7 0.08 0.00 cp_1f::processing() [35] ----------------------------------------------- <spontaneous> [36] 0.7 0.08 0.00 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::register_nd(sca_util::sca_vector<double> const&, sca_util::sca_vector<double> const&, sc_core::sc_time, sca_util::sca_vector<double>&, double, double, sc_core::sc_time) [36] ----------------------------------------------- <spontaneous> [37] 0.7 0.08 0.00 sca_util::sca_implementation::sca_matrix_base_typeless::sca_matrix_base_typeless(sca_util::sca_implementation::sca_matrix_base_typeless const&) [37] ----------------------------------------------- <spontaneous> [38] 0.7 0.03 0.05 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read(unsigned long) const [38] 0.05 0.00 4000000/4000000 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::read(unsigned long, unsigned long) const [56] ----------------------------------------------- <spontaneous> [39] 0.7 0.05 0.02 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::convert_to_sca_port(sca_tdf::sca_out_base<double>&) [39] 0.01 0.00 1000000/1000000 sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::get_timestep(unsigned long) const [103] 0.01 0.00 1000000/1000000 sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::get_rate() const [104] 0.00 0.00 1000000/1000000 sca_tdf::sca_out<double, (sca_tdf::sca_cut_policy)0, sca_tdf::sca_default_interpolator<double> >::get_time(unsigned long) const [183] ----------------------------------------------- <spontaneous> [40] 0.7 0.07 0.00 sca_util::sca_implementation::sca_matrix_base<double>::set_sparse_mode() [40] ----------------------------------------------- <spontaneous> [41] 0.7 0.07 0.00 sca_tdf::sca_implementation::sca_ct_delay_buffer<double>::~sca_ct_delay_buffer() [41] ----------------------------------------------- <spontaneous> [42] 0.7 0.07 0.00 sca_core::sca_implementation::sca_solver_base::get_current_time() [42] ----------------------------------------------- <spontaneous> [43] 0.7 0.07 0.00 sca_util::sca_implementation::sca_matrix_base<double>::get_ref_for_write(sparse_matrix*, long, long) [43] ----------------------------------------------- <spontaneous> [44] 0.7 0.07 0.00 sca_core::sca_implementation::sca_signed_time::sca_signed_time(sca_core::sca_implementation::sca_signed_time const&) [44] ----------------------------------------------- <spontaneous> [45] 0.6 0.06 0.00 MA_ProductSparseVector [45] ----------------------------------------------- 0.04 0.03 1000000/1000000 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::calculate_timeinterval(unsigned long&, long&, long, sca_core::sca_implementation::sca_signed_time&, sca_core::sca_implementation::sca_signed_time&, sc_core::sc_time&, sc_core::sc_time&) [28] [46] 0.6 0.04 0.03 1000000 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double, unsigned long) [46] 0.01 0.02 1000000/1000000 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double, unsigned long) [73] ----------------------------------------------- <spontaneous> [47] 0.5 0.04 0.02 sca_core::sca_implementation::sca_solver_base::get_sc_value_on_time(sc_core::sc_time, sca_core::sca_implementation::sca_sync_value_handle_base&) [47] 0.02 0.00 4858913/4858913 sca_core::sca_implementation::sca_sync_value_handle<bool>::read_tmp() [91] ----------------------------------------------- <spontaneous> [48] 0.5 0.05 0.00 tocodedec [48] ----------------------------------------------- 0.05 0.00 2000000/2000000 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read() const [53] [49] 0.5 0.05 0.00 2000000 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::read(unsigned long) const [49] ----------------------------------------------- <spontaneous> [50] 0.5 0.05 0.00 sca_core::sca_implementation::sca_solver_base::write_sc_value(sc_core::sc_time, sc_core::sc_time, sca_core::sca_implementation::sca_sync_value_handle_base&) [50] 0.00 0.00 1000000/1000000 sca_core::sca_implementation::sca_sync_value_handle<bool>::store_tmp() [182] 0.00 0.00 1/5 sca_core::sca_implementation::sca_sync_value_handle<bool>::resize(int) [205] 0.00 0.00 1/1 sca_core::sca_implementation::sca_sync_value_handle<bool>::backup_tmp() [224] 0.00 0.00 1/1 sca_core::sca_implementation::sca_sync_value_handle<bool>::restore_tmp() [225] ----------------------------------------------- <spontaneous> [51] 0.5 0.05 0.00 sca_util::sca_implementation::sca_matrix_base<double>::remove() [51] ----------------------------------------------- <spontaneous> [52] 0.5 0.05 0.00 sca_util::sca_implementation::sca_matrix_base<double>::get_flat() [52] ----------------------------------------------- <spontaneous> [53] 0.5 0.00 0.05 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read() const [53] 0.05 0.00 2000000/2000000 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::read(unsigned long) const [49] ----------------------------------------------- <spontaneous> [54] 0.5 0.05 0.00 std::valarray<sca_util::sca_vector<double> >::operator[](unsigned long) [54] ----------------------------------------------- <spontaneous> [55] 0.4 0.05 0.00 sca_tdf::sca_implementation::sca_ct_ltf_nd_proxy::initialize_equation_system(int, double) [55] ----------------------------------------------- 0.05 0.00 4000000/4000000 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read(unsigned long) const [38] [56] 0.4 0.05 0.00 4000000 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::read(unsigned long, unsigned long) const [56] ----------------------------------------------- <spontaneous> [57] 0.4 0.05 0.00 sca_util::sca_implementation::sca_matrix_base<double>::resize(unsigned long) [57] BTW: This is just the top few lines from the profiler ! Curious to know what others have to say. My take on this (and it's something I've suspected for sometime now and the profile results would seem to backup) is that the matrix solving isn't very efficient in the fraunhofer implementation.
  4. trev

    Simulation Speed

    I did run the example I posted using gprof Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 30.00 0.03 0.03 sca_core::sca_implementation::sca_synchronization_layer_process::wait_for_next_start() 20.00 0.05 0.02 sca_core::sca_implementation::sca_solver_base::get_current_period() 10.00 0.06 0.01 2000000 5.00 5.00 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double) 10.00 0.07 0.01 1000000 10.00 25.00 vco::processing() 10.00 0.08 0.01 1000000 10.00 10.00 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read(unsigned long) const 10.00 0.09 0.01 sca_core::sca_implementation::sca_synchronization_alg::schedule_element::run() 10.00 0.10 0.01 sca_core::sca_implementation::sca_synchronization_layer_process::cluster_process() 0.00 0.10 0.00 2000000 0.00 5.00 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) Call graph (explanation follows) granularity: each sample hit covers 2 byte(s) for 10.00% of 0.10 seconds index % time self children called name <spontaneous> [1] 40.0 0.01 0.03 sca_core::sca_implementation::sca_synchronization_alg::schedule_element::run() [1] 0.01 0.02 1000000/1000000 vco::processing() [3] 0.00 0.01 1000000/2000000 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [5] 0.00 0.00 1000000/1000000 con::processing() [69] ----------------------------------------------- <spontaneous> [2] 30.0 0.03 0.00 sca_core::sca_implementation::sca_synchronization_layer_process::wait_for_next_start() [2] ----------------------------------------------- 0.01 0.02 1000000/1000000 sca_core::sca_implementation::sca_synchronization_alg::schedule_element::run() [1] [3] 25.0 0.01 0.02 1000000 vco::processing() [3] 0.01 0.00 1000000/1000000 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read(unsigned long) const [7] 0.00 0.01 1000000/2000000 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [5] ----------------------------------------------- <spontaneous> [4] 20.0 0.02 0.00 sca_core::sca_implementation::sca_solver_base::get_current_period() [4] ----------------------------------------------- 0.00 0.01 1000000/2000000 vco::processing() [3] 0.00 0.01 1000000/2000000 sca_core::sca_implementation::sca_synchronization_alg::schedule_element::run() [1] [5] 10.0 0.00 0.01 2000000 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [5] 0.01 0.00 2000000/2000000 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double) [6] ----------------------------------------------- 0.01 0.00 2000000/2000000 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::write(double) [5] [6] 10.0 0.01 0.00 2000000 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::write(unsigned long, double) [6] ----------------------------------------------- 0.01 0.00 1000000/1000000 vco::processing() [3] [7] 10.0 0.01 0.00 1000000 sca_tdf::sca_implementation::sca_tdf_port_impl<sca_tdf::sca_signal_if<double>, double>::read(unsigned long) const [7] 0.00 0.00 1000000/1000000 sca_tdf::sca_implementation::sca_tdf_signal_impl<double>::read(unsigned long, unsigned long) const [70] ----------------------------------------------- <spontaneous> [8] 10.0 0.01 0.00 sca_core::sca_implementation::sca_synchronization_layer_process::cluster_process() [8] ----------------------------------------------- I haven't been able to spend the time to look at the fraunhofer source code though to work out what exactly the classes taking all the cpu cycles are supposed to do. I will recompile the larger example with the profiler on and see if there are any significant differences.
  5. trev

    Simulation Speed

    Thanks again for your reply Torsten. You are correct my posted example is extremely simple ! The motivation behind my post was that I'm seeing similar results with much bigger systems, the PLL as mentioned above. The problem with the bigger systems is doing an apples for apples comparison since I haven't been rigorously implementing exactly the same functionality in the same blocks between SystemC AMS and CppSim (I've added additional complexity to the SystemC AMS modules). However, and for what it's worth, for a 100uS run with a 100pS time step I get real 0m28.829s user 0m26.767s sys 0m1.797s for SystemC AMS (using the O3 optimisation) real 0m19.839s user 0m19.759s sys 0m0.004s for SystemC AMS (-O3) with no IO (which isn't particularly useful ! but gives a closer comparison of raw performance) and real 0m7.224s user 0m7.213s sys 0m0.012s for CppSim (with IO on) The difference may not seem like much - but this is only for a fraction of the required total simulation time - particularly when generating modulation on the VCO. I'm going to think about how to implement your multirate suggestion. regards trev
  6. trev

    Simulation Speed

    Hi, I recompiled the Fraunhofer SystemC AMS code without the -g option as suggested, OPT_CXXFLAGS (which I take to be the optimised CXXFLAGS) is set to "-O3 -g -Wall -pedantic -Wno-long-long", so I'm using -fPIC -O3 -Wall -pedantic -Wno-long-long It makes no difference regards trev
  7. trev

    Simulation Speed

    Hi Torsten, Thanks very much for the reply. You are absolutely correct about the I/O, CppSim writes output in hspice compatible binary format. Switching off the output in both yields roughly 700mS for System AMS versus 60mS for CppSim. Compiling the System AMS module with the -O3 option yields a 30% improvement over the raw performance of the simulation (with no I/O), I will take a look at recompiling the SystemC AMS source with the optimised mode. Still, that currently leaves me with nearly an order of magnitude difference in raw simulation speed and no output data ! best regards trev
  8. Hi, I've been reading the forum for some time now but this is my first post. I've been using SystemC AMS for mixed signal modelling - mainly fracN PLL's, mixing verilator to generate verilog based systemC modules and SystemC AMS/ System C modules for analog/mixed signal blocks. I have been using CppSim previously for the same tasks. the System C AMS implementation is significantly slower than using CppSim. I've hacked together a really simple VCO (voltage controlled oscillator) model as an example: #include <systemc-ams.h> SCA_TDF_MODULE(con) { sca_tdf::sca_out<double> outp; SCA_CTOR(con) {}; void processing() { outp = 0; } }; SCA_TDF_MODULE(vco) { sca_tdf::sca_out<double> out; sca_tdf::sca_in<double> vin; double ampl; // output amplitude (v) double freq; // centre frequency (Hz) double kvco; // (Hz/V) vco ( sc_core::sc_module_name m, double ampl_ = 1.0, double freq_ = 1.0e9, double kvco_ = 1.0) : ampl(ampl_), freq(freq_), kvco(kvco_) {} void set_attributes(){} void processing() { double ts; ts = get_timestep().to_seconds(); out = ampl*sin(2.0*M_PI*(freq+(kvco*vin.read()))*ts); } }; int sc_main(int argc,char* argv[]) { sca_tdf::sca_signal<double> op; sca_tdf::sca_signal<double> vin; sca_trace_file* tr=sca_create_tabular_trace_file("vco_tb"); vco DUT("DUT"); DUT.out(op); DUT.vin(vin); DUT.set_timestep(1, SC_NS); con DUT2("DUT2"); DUT2.outp(vin); sca_trace(tr,op,"op"); sc_core::sc_start(1,SC_MS); sca_close_tabular_trace_file(tr); return 0; } SystemC 2.3.1-Accellera --- May 20 2015 15:16:23 Copyright (c) 1996-2014 by all Contributors, ALL RIGHTS RESERVED SystemC AMS extensions 2.0 Version: 2.0_beta1 --- BuildRevision: 1739 20140531 Copyright (c) 2010-2014 by Fraunhofer-Gesellschaft Institut Integrated Circuits / EAS Licensed under the Apache License, Version 2.0 Info: SystemC-AMS: 2 SystemC-AMS modules instantiated 1 SystemC-AMS views created 2 SystemC-AMS synchronization objects/solvers instantiated Info: SystemC-AMS: 1 dataflow clusters instantiated cluster 0: 2 dataflow modules/solver, contains e.g. module: DUT 2 elements in schedule list, 1 ns cluster period, ratio to lowest: 1 e.g. module: DUT ratio to highest: 1 sample time e.g. module: DUT 0 connections to SystemC de, 0 connections from SystemC de real 0m4.771s user 0m3.142s sys 0m1.629s the equivalent implementation runs in 105mS in CppSim. I'm using the same 1nS time step and 1e6 samples in both cases. (BTW: If I run for say 1e7 samples the ratio of the difference remains roughly the same) Any ideas on how to increase the speed for the SystemC AMS or comments on improving the code for speed would be much appreciated. Thanks in advance
×