whmmy Posted December 18, 2022 Report Posted December 18, 2022 I am a newbie to systemc. I'm writing a regfile module which is a vector register that outputs an array at a time. The code below is how I do it now, it requires a for loop to copy the data every update, which I think will slow down the simulation. Is there a better way to implement it, to avoid this for loop? I searched the web and found that we can't get the sc_module to output a pointer. SC_MODULE(regfile) { sc_in<sc_uint<5>> rsv1_addr; sc_out<sc_int<32>> rsv1_data[8]; void read_vector(); SC_CTOR(regfile) { SC_THREAD(read_vector); } private: sc_int<32> v_regfile[32][8]; // suppose data stored in it }; void regfile::read_vector() { for (int i = 0; i < 8; i++) rsv1_data[i] = v_regfile[rsv1_addr.read()][i]; while ((true)) { wait(rsv1_addr.value_changed_event()); for (int i = 0; i < num_thread; i++) rsv1_data[i] = v_regfile[rsv1_addr.read()][i]; } } Quote
Eyck Posted December 19, 2022 Report Posted December 19, 2022 You should stick to modern C++. Beyond that SystemC provides sc_vector which handles sc_object based instances better than C arrays do (e.g. when binding ports): SC_MODULE(regfile) { using reg_t = sc_int<32>; sc_in<sc_uint<5>> rsv1_addr{"rsv1_addr"}; sc_vector<sc_out<reg_t>> rsv1_data{"rsv1_data", 8}; void read_vector(); SC_CTOR(regfile) { SC_THREAD(read_vector); } private: using regfile_t = array<reg_t, 8>; array<regfile_t, 32> v_regfile; // suppose data stored in it }; void regfile::read_vector() { auto& elem1 = v_regfile[rsv1_addr.read()]; for (int i = 0; i < 8; i++) rsv1_data[i] = elem1[i]; while ((true)) { wait(rsv1_addr.value_changed_event()); auto& elem2 = v_regfile[rsv1_addr.read()]; for (int i = 0; i < num_thread; i++) rsv1_data[i] = elem2[i]; } } A few more notes: If this piece of code is part of a header file: pleas do not use 'using namespace sc_core;' in it. This will lead to problems when later using the code in larger problems. Name your signals and ports. This greatly helps when debugging your design with waveform traces Use C++ std lib datastructures (like std::array or std::vector) as they allow for range checking when built in debug mode. Other than that the code is fine. You cannot avoid the loop based assignment as this implies a type conversion (v_regfile elements is sc_int while rsv1_data is sc_port<...>). And even if there would be some convenience function this would result in a for loop. On the other hand having contigous memory layout (which std::array guarantees) the caching and branch prediction of your processor kicks in and compensates largely for the loop. BTW, num_thread is not declared, make sure that it sticks to a value smaller than 8. Otherwise you might run into a out-of-bounds access David Black 1 Quote
whmmy Posted December 19, 2022 Author Report Posted December 19, 2022 35 minutes ago, Eyck said: You should stick to modern C++. Beyond that SystemC provides sc_vector which handles sc_object based instances better than C arrays do (e.g. when binding ports): SC_MODULE(regfile) { using reg_t = sc_int<32>; sc_in<sc_uint<5>> rsv1_addr{"rsv1_addr"}; sc_vector<sc_out<reg_t>> rsv1_data{"rsv1_data", 8}; void read_vector(); SC_CTOR(regfile) { SC_THREAD(read_vector); } private: using regfile_t = array<reg_t, 8>; array<regfile_t, 32> v_regfile; // suppose data stored in it }; void regfile::read_vector() { auto& elem1 = v_regfile[rsv1_addr.read()]; for (int i = 0; i < 8; i++) rsv1_data[i] = elem1[i]; while ((true)) { wait(rsv1_addr.value_changed_event()); auto& elem2 = v_regfile[rsv1_addr.read()]; for (int i = 0; i < num_thread; i++) rsv1_data[i] = elem2[i]; } } A few more notes: If this piece of code is part of a header file: pleas do not use 'using namespace sc_core;' in it. This will lead to problems when later using the code in larger problems. Name your signals and ports. This greatly helps when debugging your design with waveform traces Use C++ std lib datastructures (like std::array or std::vector) as they allow for range checking when built in debug mode. Other than that the code is fine. You cannot avoid the loop based assignment as this implies a type conversion (v_regfile elements is sc_int while rsv1_data is sc_port<...>). And even if there would be some convenience function this would result in a for loop. On the other hand having contigous memory layout (which std::array guarantees) the caching and branch prediction of your processor kicks in and compensates largely for the loop. BTW, num_thread is not declared, make sure that it sticks to a value smaller than 8. Otherwise you might run into a out-of-bounds access Great answer that clarified a lot of grammar specifications for me. thank you very much! Quote
whmmy Posted December 19, 2022 Author Report Posted December 19, 2022 7 hours ago, Eyck said: You should stick to modern C++. Beyond that SystemC provides sc_vector which handles sc_object based instances better than C arrays do (e.g. when binding ports): SC_MODULE(regfile) { using reg_t = sc_int<32>; sc_in<sc_uint<5>> rsv1_addr{"rsv1_addr"}; sc_vector<sc_out<reg_t>> rsv1_data{"rsv1_data", 8}; void read_vector(); SC_CTOR(regfile) { SC_THREAD(read_vector); } private: using regfile_t = array<reg_t, 8>; array<regfile_t, 32> v_regfile; // suppose data stored in it }; void regfile::read_vector() { auto& elem1 = v_regfile[rsv1_addr.read()]; for (int i = 0; i < 8; i++) rsv1_data[i] = elem1[i]; while ((true)) { wait(rsv1_addr.value_changed_event()); auto& elem2 = v_regfile[rsv1_addr.read()]; for (int i = 0; i < num_thread; i++) rsv1_data[i] = elem2[i]; } } A few more notes: If this piece of code is part of a header file: pleas do not use 'using namespace sc_core;' in it. This will lead to problems when later using the code in larger problems. Name your signals and ports. This greatly helps when debugging your design with waveform traces Use C++ std lib datastructures (like std::array or std::vector) as they allow for range checking when built in debug mode. Other than that the code is fine. You cannot avoid the loop based assignment as this implies a type conversion (v_regfile elements is sc_int while rsv1_data is sc_port<...>). And even if there would be some convenience function this would result in a for loop. On the other hand having contigous memory layout (which std::array guarantees) the caching and branch prediction of your processor kicks in and compensates largely for the loop. BTW, num_thread is not declared, make sure that it sticks to a value smaller than 8. Otherwise you might run into a out-of-bounds access hi, I have another question. Right now, I'm trying to write a small processor, and I want to use a higher level of abstraction. In order to enable event communication between modules, I defined my own interface as shown in the following code. But I wonder if this is a good way to write it? Is there a similar function in tlm? thanks a lot again here's the code: class event_if : virtual public sc_interface { public: virtual const sc_event &obtain_event() const = 0; virtual void notify() = 0; }; class event : public sc_module, public event_if { public: event(sc_module_name _name) : sc_module(_name) {} const sc_event &obtain_event() const { return self_event; } void notify() { self_event.notify(); } private: sc_event self_event; }; so if you want to bind an event to a port, just type: sc_port<event_if> myevent; To get the event and wait for a trigger: wait(myevent->obtain_event()); To notify an event: myevent->notify(); Quote
Eyck Posted December 20, 2022 Report Posted December 20, 2022 Basically this would work but usually you want to transport data with the event and then you are with a signal. If you communicate values separatly you need to keep in mind that the receiving side of the channel is invoked 1 delta cycle later and the sending side may also be invoked in this delta cycle due to some notification. In this case it is left to the kernel which is function is invoked first: you have a classical race condition. In my experience a better approach is to write the processor model (instruction set simulator, ISS) in C++ and wrap it into a SystemC module adding the specifics of a SystemC module. This enables more versatile uses and a higher simulation speed: each event and its associated conext switch costs time. One example can be found here: https://github.com/Minres/DBT-RISE-RISCV/ This contains a ISS for the RISC-V ISA (written in C++) with a single sc_module being the wrapper around the ISS (https://github.com/Minres/DBT-RISE-RISCV/tree/master/src/sysc). Quote
whmmy Posted December 27, 2022 Author Report Posted December 27, 2022 (edited) On 12/20/2022 at 3:08 PM, Eyck said: Basically this would work but usually you want to transport data with the event and then you are with a signal. If you communicate values separatly you need to keep in mind that the receiving side of the channel is invoked 1 delta cycle later and the sending side may also be invoked in this delta cycle due to some notification. In this case it is left to the kernel which is function is invoked first: you have a classical race condition. In my experience a better approach is to write the processor model (instruction set simulator, ISS) in C++ and wrap it into a SystemC module adding the specifics of a SystemC module. This enables more versatile uses and a higher simulation speed: each event and its associated conext switch costs time. One example can be found here: https://github.com/Minres/DBT-RISE-RISCV/ This contains a ISS for the RISC-V ISA (written in C++) with a single sc_module being the wrapper around the ISS (https://github.com/Minres/DBT-RISE-RISCV/tree/master/src/sysc). Hi, The last question was a bit unclear, I'm sorry, I just got infected with Omicron, so I haven't had the energy for a while. I read your code and have some understanding of its code structure, which is indeed a good reference. arch_if is the most basic architecture, which is inherited by riscv_hart_msu_vp as a configurable instruction architecture, and finally encapsulated by core_wrapper and core_complex. But I didn't find where the specific CPU execution part of the code is. I would like my simulator to reflect the variation of cycle precision, where each instruction is not executed in one cycle, but is divided into multiple cycles according to its characteristics. So I take the liberty to ask, what is the precision of DBT-RISE and in which file can I find the code for CPU execution? Edited January 10, 2023 by MagicLantern More detailed question Quote
Eyck Posted January 29, 2023 Report Posted January 29, 2023 The behavior can be found in the vm_* classes, e.g. in https://github.com/Minres/DBT-RISE-RISCV/blob/master/src/vm/interp/vm_rv32imac.cpp The idea idea here is that the childs of arch_if encapsulate the architectural state of the processor while the iss::interp::vm_base<> derived classes implement the behavior. The combination of the 2 forms a processor. This processor is then extended with functionality by using mixins (riscv_hart_msu_vp is a mixin providing the privileged ISA stuff of RISC-V). Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.