Jump to content
tyler

difference between blocking and nonblocking interface

Recommended Posts

Hi all,

I recently start to study systemC TLM2.0 for AT modeling. 

As a major AT concept, the non-blocking interface is used. I am very confused in this non-blocking concept as the interface barely defines 4 enums as different phases. The LRM explains them in a very theoretical way but I am confused that why we need the four phases. What makes the four phases  so important for modeling ?  0,1,2,3 can be states,  handshakes, sync points, whatever, but it is not very clearly said to me. Sync points maybe, but why 4? no more wait() to advance the time, the time is advanced in different phases and based on the protocol. But the time is advanced, and blocking can advance the time as  well, the only difference here is the phase. So why adding a phase makes a call nonblocking? The transaction items are just put into a queue to wait. It is blocked in the queue in the target  instead of the initiator now.

Could someone give  a real example about when  a non-blocking is a must over the blocking transport? 

As many are saying that non-blocking is majorly used to model pipeline structure without stalling the owner process, but blocking interface somehow can do the same.

The nonblocking transfer can indeed initiate two transactions to, for example , a memory chip, at the same time, say, 0ns;while the blocking has to wait for the first one to finish. However, in the real world, the instructions  are cycle based. No  one would send out two instructions at the same time over one port. If they do, there are more than one port for access.  Blocking interface can have multiple ports too. 

Please help me with some other real  cases that a nonblocking interface has practical  meaning to a hardware structure.  

Enlighten me on this please as I am scratching my  head right now. I thought of some types of instructions from cpu to access sram and dram at the same time, but this can be done by  blocking as well because it would use multiple sockets.

I read  the LRM many times and other resources. no one seems to give very clear explanation. I am new so please help!

Thanks in advance.

Share this post


Link to post
Share on other sites

Well, the answer is i bit more complex. The main difference is that the standart requires that during the nb_transport call no sc_wait is allowed while in b_transport it si allowed. So any implementation adhereing to the standart guarantees this.

Let's first look at the non-blockig implementation. tlm_phase do not denote a phase directly rather time -points o the protocol. Actually you have to phases: request and response which are denote by 2 time points each. So the initiator can indicate a start or end of a phase of a transaction and be sure that the call is not blocked by a call to wait(). So you can model the behavior and timing of a bus transaction in a fairly granular way and do something while the transaction is on-going. You can even have 2 transactions in parallel, one being in the request while the other one is in the response phase (or even more if you have more phases defined). The transaction are pipelined.

Looking at the b_transport situation is different. The target can delay the transaction by calling wait() until it is ready to respond. During that time no other transaction can be ongoing, the initiator is blocked and cannot react to it.

Blocking accesses can be used if timing of the communication is not of interest/not modeled (the other scenarion is loosly timed models, but that's a different story). They are easy to implement and easy to use.

Non-blocking is used if the timing of the communication needs to be modeled in more detail. E.g. this allows to model, simulate and analyse bus contention situations as it allows to attach timing to all phases of a bus transaction lile grant, address phase, data phase.

I hope this sheds some light

-Eyck

Share this post


Link to post
Share on other sites
8 hours ago, Eyck said:

Well, the answer is i bit more complex. The main difference is that the standart requires that during the nb_transport call no sc_wait is allowed while in b_transport it si allowed. So any implementation adhereing to the standart guarantees this.

Let's first look at the non-blockig implementation. tlm_phase do not denote a phase directly rather time -points o the protocol. Actually you have to phases: request and response which are denote by 2 time points each. So the initiator can indicate a start or end of a phase of a transaction and be sure that the call is not blocked by a call to wait(). So you can model the behavior and timing of a bus transaction in a fairly granular way and do something while the transaction is on-going. You can even have 2 transactions in parallel, one being in the request while the other one is in the response phase (or even more if you have more phases defined). The transaction are pipelined.

Looking at the b_transport situation is different. The target can delay the transaction by calling wait() until it is ready to respond. During that time no other transaction can be ongoing, the initiator is blocked and cannot react to it.

Blocking accesses can be used if timing of the communication is not of interest/not modeled (the other scenarion is loosly timed models, but that's a different story). They are easy to implement and easy to use.

Non-blocking is used if the timing of the communication needs to be modeled in more detail. E.g. this allows to model, simulate and analyse bus contention situations as it allows to attach timing to all phases of a bus transaction lile grant, address phase, data phase.

I hope this sheds some light

-Eyck

Thank you Eyck for the explanation.

Lets take the 2 in parallel transaction for example. Say for nonblocking transport, both 1st and 2nd transactions can happen at 0ns, and each one takes 10ns(1cycle) to finish. So their end point would be 10ns and 20ns respectively.

If i were to use blocking transport, I would sent 1st transaction at 0 and wait it finish,and send the 2nd one at 10ns, and wait it to finish at 20ns. This would appear more like a real world example to me since as I were thinking, no one would send two instructions in parallel over one port. A pipelined structure would output based on every cycle and lets just put the cycle as 10ns here in this example, the blocking transport would be more suited and the timing could as well be precise. My initiator would output the transaction every cycle(pipelined module).  I guess I am saying the nonblocking transport can do is to make two transactions happen at the same time but end at different times over the blocking transport which seems unclear to me what is the use of this.

Think in another way if someone argues that maybe in real module there could be cases where the top module have two or more outputs which would happen at the same time, and the bottom module would take these two as inputs at the same time but process them in a pipelined order, this would be the only use case of denoting the starting point of the transactions. But then the  bottom block is stalled for at least one cycle because the top module gives 2 transactions in parallel but the bot block can only process 1 every cycle.

Best,

Tyler

Share this post


Link to post
Share on other sites
11 minutes ago, tyler said:

If i were to use blocking transport, I would sent 1st transaction at 0 and wait it finish,and send the 2nd one at 10ns, and wait it to finish at 20ns. This would appear more like a real world example to me since as I were thinking, no one would send two instructions in parallel over one port. A pipelined structure would output based on every cycle and lets just put the cycle as 10ns here in this example, the blocking transport would be more suited and the timing could as well be precise.  I guess I am saying the nonblocking transport can do is to make two transactions happen at the same time but end at different times over the blocking transport which seems unclear to me what is the use of this.

Consider for example modern bus protocols like AMBA-AXI with out-of-order transaction termination. 2nd transaction can terminate before 1st in this case. It is not possible to model this with a single b_transport call. I think AT modeling was designed for such cases.

Share this post


Link to post
Share on other sites
9 minutes ago, Roman Popov said:

Consider for example modern bus protocols like AMBA-AXI with out-of-order transaction termination. 2nd transaction can terminate before 1st in this case. It is not possible to model this with a single b_transport call. I think AT modeling was designed for such cases.

Thanks Roman. I indeed thought of those cases.

I currently saw the nonblocking interface uses a queue(fifo) to save the transactions which would make them execute in order in this case. Maybe I should explore more of how to do what you are saying using tlm2.0. That would be something really interesting to know, and it also makes tlm2.0 fancier.

Share this post


Link to post
Share on other sites
11 hours ago, tyler said:

I currently saw the nonblocking interface uses a queue(fifo) to save the transactions which would make them execute in order in this case. Maybe I should explore more of how to do what you are saying using tlm2.0. That would be something really interesting to know, and it also makes tlm2.0 fancier.

If by queue you mean payload event queue ( peq_with_cb_and_phase or peq_with_get), then it's purpose is to sort transactions based on delay annotation.

Considering your example with two transactions: suppose you recieved 1st transaction with delay 40ns, and 2nd with delay 10 ns. Then you have to process 2nd transaction before 1st. Payload event queue may help with this.

Share this post


Link to post
Share on other sites

I think this all boils down to use cases:

Programmer wants the fastest possible implementation of a virtual platform to develop their software on that is register accurate and functionally correct. Solution is to use Loosely Timed (LT) modeling style. Must maintain functional/register accuracy, but try to reduce anything that slows down simulation speed. Things that slow down simulation speed include I/O (logging messages), context switching, copying data unnecessarily, arbitration and using the bus in general.

Architect wants to know if a particular configuration will work when taking into account all data traffic, which requires relatively accurate timing information on bus accesses. Solution is to use Approximately Timed (AT) modeling style. Speed of simulation is secondary, but speed of model creation is still high. RTL might provide the information, but has too much detail and takes too long to create. Skipping the bus (i.e. using DMI) is inappropriate as we need to see all bus traffic. Arbitration aspects are necessary. We can still minimize I/O somewhat, but some context switching is required to provide valid timing information.

Notice that I did not mention blocking (B) vs non-blocking (NB), because that is not the issue. Blocking style is actually easier to code and avoids context switching required by AT, so it is used for LT style. Interestingly, we try to avoid actual blocking (using wait) when coding b_transport in order to improve performance. Temporal decoupling and DMI are simulation speedup techniques for LT and completely inappropriate for AT.

If your bus supports it, we can reduce the number of NB calls to one, but many buses need to be modeled with more than one call. It is all very protocol dependent. It is also potentially tricky to get all of this correctly coded. Some buses like Arm's AXI require additional phases to properly represent all of the timing information needed.

Share this post


Link to post
Share on other sites
On 10/14/2018 at 8:03 AM, Roman Popov said:

If by queue you mean payload event queue ( peq_with_cb_and_phase or peq_with_get), then it's purpose is to sort transactions based on delay annotation.

Considering your example with two transactions: suppose you recieved 1st transaction with delay 40ns, and 2nd with delay 10 ns. Then you have to process 2nd transaction before 1st. Payload event queue may help with this.

This is interesting as I have written  something to test it myself. The  following is 1st transaction, and the 2nd transaction is similar except will have delay of 10ns and address = 2.

After I parsed this to nb_transport, my  address 1 is lost, and only address 2 remains, and as a  matter of fact, anything from the later transaction but sent first would be overwritten  by the 2nd  one that ends early. I am  using peq_with_cb_and_phase. I am not sure why, but i am writting my code based on the LRM 

address = 1;
en = 1;
rw = 0;
wait();
sc_core::sc_time start_time;
tlm::tlm_generic_payload* trans;
tlm::tlm_phase phase;
sc_core::sc_time delay = sc_time (30, SC_NS);
trans = m_mm.allocate();
trans->acquire();
 
trans->set_command(rw ? tlm::TLM_READ_COMMAND : tlm::TLM_WRITE_COMMAND);
trans->set_address(address);
trans->set_response_status(tlm::TLM_INCOMPLETE_RESPONSE);

Share this post


Link to post
Share on other sites
On 10/15/2018 at 4:04 AM, David Black said:

I think this all boils down to use cases:

Programmer wants the fastest possible implementation of a virtual platform to develop their software on that is register accurate and functionally correct. Solution is to use Loosely Timed (LT) modeling style. Must maintain functional/register accuracy, but try to reduce anything that slows down simulation speed. Things that slow down simulation speed include I/O (logging messages), context switching, copying data unnecessarily, arbitration and using the bus in general.

Architect wants to know if a particular configuration will work when taking into account all data traffic, which requires relatively accurate timing information on bus accesses. Solution is to use Approximately Timed (AT) modeling style. Speed of simulation is secondary, but speed of model creation is still high. RTL might provide the information, but has too much detail and takes too long to create. Skipping the bus (i.e. using DMI) is inappropriate as we need to see all bus traffic. Arbitration aspects are necessary. We can still minimize I/O somewhat, but some context switching is required to provide valid timing information.

Notice that I did not mention blocking (B) vs non-blocking (NB), because that is not the issue. Blocking style is actually easier to code and avoids context switching required by AT, so it is used for LT style. Interestingly, we try to avoid actual blocking (using wait) when coding b_transport in order to improve performance. Temporal decoupling and DMI are simulation speedup techniques for LT and completely inappropriate for AT.

If your bus supports it, we can reduce the number of NB calls to one, but many buses need to be modeled with more than one call. It is all very protocol dependent. It is also potentially tricky to get all of this correctly coded. Some buses like Arm's AXI require additional phases to properly represent all of the timing information needed.

thank you sir for your help.

I am trying to use tlm as not only just a bus simulator, i am trying to build systems upon it with systemc as well. What do you think tlm2.0 can achieve in such level? It works simple and easy with buses. But could be more complex with other things. 

Share this post


Link to post
Share on other sites
1 hour ago, tyler said:

This is interesting as I have written  something to test it myself. The  following is 1st transaction, and the 2nd transaction is similar except will have delay of 10ns and address = 2.

After I parsed this to nb_transport, my  address 1 is lost, and only address 2 remains, and as a  matter of fact, anything from the later transaction but sent first would be overwritten  by the 2nd  one that ends early. I am  using peq_with_cb_and_phase. I am not sure why, but i am writting my code based on the LRM 

Hard to say something without a complete source code.

Quote

I am trying to use tlm as not only just a bus simulator, i am trying to build systems upon it with systemc as well. What do you think tlm2.0 can achieve in such level? It works simple and easy with buses. But could be more complex with other things. 

The purpose of TLM-2.0 standard is model interoperability. So that TLM model developed by one vendor can be used in VP developed by other vendor. Building systems using low-level TLM-2.0 APIs can be quite hard.

For device and system modeling you should probably use some high-level framework built on-top of TLM-2.0 standard.  If you look for something open-source try to check GreenLib https://www.greensocs.com/docs , or SCML https://www.synopsys.com/cgi-bin/slcw/kits/reg.cgi .  Large semiconductor vendors can have proprietary TLM frameworks, so check if there is one developed inside your company.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×