Jump to content

How to create two stage pipeline using SystemC?


akb20

Recommended Posts

I have a design.h and design.cpp as given below. I am trying to have two stages in the design. I notice that the latency is same when I had only once stage. I don't understand why the number of clock cycles required by data to pass the design can remain same if I have two stages instead of one. I think that the two stages are not correctly done.

//Design.h file
#include <systemc.h>
#include "Buffer.h"

using namespace std;

SC_MODULE(Design)
{

sc_in_clk clock;
sc_in <bool> reset;

sc_in <Flit> flit_rx[DIRECTIONS + 2];
sc_out <Flit> flit_tx[DIRECTIONS + 2];

BufferBank buffer[DIRECTIONS + 2];
BufferBank intermediate_buffer[DIRECTIONS + 2];

void process();
void rxProcess();
void second_stage_process();
void intermediateProcess();
void txProcess();

SC_CTOR(Design) {
        SC_METHOD(process);
        sensitive << reset;
        sensitive << clock.pos();

        SC_METHOD(second_stage_process);
        sensitive << reset;
        sensitive << clock.pos();
       }
};

The design.cpp is as follows.

#include "Design.h"

void Design::process()
{
  rxProcess();
}

void Design::second_stage_process()
{
    txProcess();
    intermediateProcess();
}

void Design::rxProcess()
{
  if (!reset.read()) {
    for (int i = 0; i < DIRECTIONS + 2; i++) {
      Flit received_flit = flit_rx[i].read();

        int vc = received_flit.vc_id;

        if (!buffer[i][vc].IsFull()) 
        {

            // Store the incoming flit in the circular buffer
            buffer[i][vc].Push(received_flit);
         }
       }
     }
}

void Design::intermediateProcess()
{
    if (!reset.read())
    {
        for (int i = 0; i < DIRECTIONS + 2; i++) {
            for (int vc = 0;vc < GlobalParams::n_virtual_channels; vc++)
            {
              if ((!buffer[i][vc].IsEmpty()) && (!intermediate_buffer[i][vc].IsFull()))
                {
                    // Store the incoming flit in the circular buffer
                    intermediate_buffer[i][vc].Push(buffer[i][vc].Front());

                    buffer[i][vc].Pop();
                 }
             }
            }
           }
}

void Design::txProcess()
{

  if (!reset.read()) 
    {
      for (int j = 0; j < DIRECTIONS + 2; j++) 
    {
      int i = (start_from_port + j) % (DIRECTIONS + 2);

      for (int k = 0;k < GlobalParams::n_virtual_channels; k++)
      {
          int vc = (start_from_vc[i]+k)%(GlobalParams::n_virtual_channels);

          if (!intermediate_buffer[i][vc].IsEmpty()) 
          {
            Flit flit = intermediate_buffer[i][vc].Front();
            //some operations with flit
          }
       }
      }

      for (int i = 0; i < DIRECTIONS + 2; i++) 
      {
        if (!intermediate_buffer[i][vc].IsEmpty())  
          {
            Flit flit = intermediate_buffer[i][vc].Front();
            //some operations to get the value of o
            flit_tx[o].write(flit);
            intermediate_buffer[i][vc].Pop();
           }
       }
     }
}

Please tell me what I am doing wrong. I want to create a second stage and that is why I am saving data from buffer to intermediate_buffer and use that buffer in txProcess. But this is not increasing the latency of the data propagation (from input receiver to output transmitter) which means the second stage is actually not getting created.

Link to comment
Share on other sites

Your problem is the use of POD (plain old datatype) for communication between the processes. Why? As soon as you write onto it all other processes see the update. So the behavior of your design depends on the scheduling of the processes. If Design::process is scheduled before Design::second_stage_process the second_stage_process sees the updates of process.

Actually there are 2 ways out:

  • you just have one thread and call the function from the output to the input:
    void Design::process()
    {
      txProcess();
      intermediateProcess(); 
      rxProcess();
    }

    Although it will work in your case this approach will not scale. As soon as you cross sc_module boundaries you cannot control the order of calling.

  • you use a primitive channel instead of POD. In your case you might use a sc_core::sc_fifo with a depth of one. And you should use sc_vector instead of the POD array type since they need the proper initialization. Why does it help? New values being written will be visible at the output of the fifo in the next delta cycle. So no matter in which order the threads and methods are invoked they will read the 'old' values despite 'new' values have been written.

HTH

Link to comment
Share on other sites

  • 2 weeks later...

I want to thank you for your reply.

I could not understand your first solution. 1) Why do I need to put the txProcess at the first position? 2) What do you mean by controlling the order of calling?

Actually, I am  trying to implement two stages and want to see the latency difference compared to only one stage. I think the latency should increase. But currently there is no change of latency whether one or two stages. I thought that if I create two separate methods for two stages, it will take two clock cycles. But it is not true.

Kindly suggest me how I can implement two stages and see the impact on latency.

Link to comment
Share on other sites

If you setup SC_THREADs the SystemC kernel invokes them and you do not have any control in which order SC_METHODS and SC_THREADS get invoked/resumed. SystemC gives no guarantee about the order of within a delta cycle. So the 2nd or 3rd stage of your pipeline could be executed before or after the 1st stage. If you use POD (C++ types and classes) they immediately change their value when being written. If you now have a variable V between the 1st and the 2nd stage it depends on the order of invocation if the 2nd stage reads the value in the same delta-cycle (1st stage is invoked before 2nd stage) or at the next clock cycle (2nd is invokde first reading the value before 1st stages updates it). This approach is largely used in C++ performance models and works as long as you don't have loops. This should answer your 1st question: txProcess should read its inputs before intermediateProcess updates them and intermediateProcess reads its inputs before rxProcess updates them. This way you model the syncronous nature of a pipeline.

For the sake of simplicity I would model it using the means of SystemC: primitive channels and SC_METHODS:

SC_MODULE(Design)
{

sc_in_clk clock;
sc_in <bool> reset;

sc_vector<sc_in <Flit>>  flit_rx{"flit_rx", DIRECTIONS + 2};
sc_vector<sc_out <Flit>> flit_tx{"flit_tx", DIRECTIONS + 2};

static sc_fifo* fifo_creator(char const* name, size_t idx){
  return new sc_fifo(name, BUFFER_SIZE);
}
sc_vector<sc_fifo<Flit>> buffer{"buffer", (DIRECTIONS + 2) * GlobalParams::n_virtual_channels, &Design::fifo_creator};
sc_vector<sc_fifo<Flit>> intermediate_buffer{"intermediate_buffer", (DIRECTIONS + 2) * GlobalParams::n_virtual_channels, &Design::fifo_creator};

void rxProcess();
void intermediateProcess();
void txProcess();

SC_CTOR(Design) {
        SC_METHOD(rxProcess);
        sensitive << reset;
        sensitive << clock.pos();
        SC_METHOD(intermediateProcess);
        sensitive << reset;
        sensitive << clock.pos();
        SC_METHOD(txProcess);
        sensitive << reset;
        sensitive << clock.pos();
       }
};

And the stage processes would look like:

void Design::intermediateProcess()
{
    if (reset.read()){
	    // put your reset stuff here
    } else if(clk_i.posedge()){
        for (int i = 0; i < DIRECTIONS + 2; i++) {
            for (int vc = 0;vc < GlobalParams::n_virtual_channels; vc++)
            {
                if (buffer[i][vc].num_available()>0 && intermediate_buffer[i].num_free()>0)
                {
                    Flit f;
                    // get the incoming flit from buffer, check that the read is successfull
                    sc_assert(buffer[i].nb_read(f));
                    // Store the incoming flit in the circular buffer, check that the read is successfull
                    sc_assert(intermediate_buffer[i].nb_write(f));
                }
             }
         }
    }
}

HTH

Link to comment
Share on other sites

  • 4 weeks later...

The problem with your suggested solution is that it is not giving the correct latency value of the pipeline which is my only target. Note that I already know the latency (in terms of number of clock cycle) of the pipeline which is same as the number of stages. I also understood that FIFO is better way to do handshaking but the FIFO is not delaying in any pipeline stage. So there is no correct latency value.

I think that only wait() can pause the simulation but it is only possible in SC_THREAD or SC_CTHREAD and not possible in SC_METHOD. Currently, if I change the SC_METHOD to SC_CTHREAD, it is not giving correct result. I don't know why because just changing SC_METHOD to SC_CTHREAD without changing the code should not make the simulation to fail.

Link to comment
Share on other sites

First: I'm not going to solve your problems. I only propose possible solutions. So don't expect read-to-use models

Second: doing wait() in an SC_THREAD or SC_CTHREAD creates an implicit FSM. I guess this is not what you want. And as you already noticed, it does not solve you problem. You have a syncronous design and you should model it like that.

Third: if your latency is not what you want you need to change your model. Maybe there is a stage to much, maybe there aren't enough. This is left to the modeler....

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...