How to Handle Data Coming From Parallel Threads

How to Handle Data Coming From Parallel Threads

Posted by

It is quite common for a DUT to have two or more interfaces from which, independent monitors, will send data to a scoreboard in the same simulation time. Because this is done from parallel threads, the order in which the data arrived in our scoreboard is random. If the order in which the data is processed is important, then those parallel threads can become a big headache.

In this post I will present one way of tackling the problem of parallel threads in an UVM SystemVerilog environment.

If you want to jump straight to the proposed solution click here.


Let’s consider that we need to verify a very simple FIFO with two interfaces: one for pushing data in the FIFO and one for popping data out of it.

Device Under Test

The FIFO has a width of 16 bits and a depth of 4. The push and pop communication protocols are very similar and quite simple: a transaction starts when req signal gets asserted and it is completed when also ack signal gets asserted. At that point data is considered to have a valid value.

PUSH and POP transactions – with and without wait states

For this FIFO, our model/scoreboard looks very simple.

Whenever there is some push transaction, we first make sure that there is room in the FIFO in which to place the data. Then, we do a simple push in a SystemVerilog queue:

class cfs_fifo_scoreboard extends uvm_component;
   ...
   //Action triggered when there is a PUSH into the FIFO
   virtual protected function void do_push(cfs_push_item_mon item);
     if(is_full()) begin
       `uvm_error("DUT_ERROR", "Trying to push while FIFO is full")
     end
     
     fifo.push_back(item.data);
   endfunction
   ...   
endclass

Whenever there is some pop transaction, we first make sure that there is some data available in our FIFO model. Then, we compare the popped value and received value:

class cfs_fifo_scoreboard extends uvm_component;
   ...
   //Action triggered when there is a POP from the FIFO
   virtual protected function void do_pop(cfs_pop_item_mon item);
     cfs_fifo_data exp_data;
        
     if(is_empty()) begin
       `uvm_error("DUT_ERROR", "Trying to pop while FIFO is empty")
     end

     exp_data = fifo.pop_front();
        
    if(exp_data != item.data) begin
       `uvm_error("DUT_ERROR", $sformatf(
         "Data mismatch - expected: %0h, received: %0h", exp_data, item.data))
    end
  endfunction
   ...   
endclass

The only thing that we have to do at this point is to declare some analysis ports in order to get the information from some PUSH and POP agents:

`uvm_analysis_imp_decl(_push)
`uvm_analysis_imp_decl(_pop)

class cfs_fifo_scoreboard extends uvm_component;
  ...
  //Analysis port for getting the pushed data from the PUSH monitor
  uvm_analysis_imp_push#(cfs_push_item_mon, cfs_fifo_scoreboard) port_push;
  
  //Analysis port for getting the popped data from the POP monitor
  uvm_analysis_imp_pop#(cfs_pop_item_mon, cfs_fifo_scoreboard) port_pop;
  
   virtual function void build_phase(uvm_phase phase);
     super.build_phase(phase);
     port_push = new("port_push", this);
     port_pop = new("port_pop", this);
   endfunction

  //Function associated with the analysis port connected to the PUSH monitor
  virtual function void write_push(cfs_push_item_mon item);
    do_push(item);
  endfunction
  
  //Function associated with the analysis port connected to the POP monitor
  virtual function void write_pop(cfs_pop_item_mon item);
    do_pop(item);
  endfunction
  ...
endclass

Here comes the tricky part: the functions write_push() and write_pop() are called from independent threads – one in each of the two agents (PUSH and POP agents). This means that if a push and a pop transaction happens in the same simulation time we can not say which of our functions will be called first.

This is also mentioned in the SystemVerilog LRM in the chapter 4.7 Nondeterminism:

One source of nondeterminism is the fact that active events can be taken off the Active or Reactive event region and processed in any order.

There are two corner case scenarios relevant for our threads order problem:

  1. While FIFO is empty and there is a push and a pop transaction at the same time, if write_pop() is called before write_push(), our model will complain that there was a POP while the FIFO was empty – so there is nothing to pop out of the FIFO.
  2. While FIFO is full and there is a push and a pop transaction at the same time, if write_push() is called before write_pop(), out model will complain that there was a PUSH while the FIFO was full – so there is no room to push the data.
Corner case scenarios

I build a small verification environment on EDA Playground to showcase this problem – How to Handle Data Coming From Parallel Threads 1. You can see that changing the simulator yields different errors based on the order on which write_push() and write_pop() are called. There is no problem with the simulators, all of them behave correctly, we just need to solve this nondeterminism in our code.


In order to solve this problem we need to implement the following behavior in our scoreboard/model:

  • When the FIFO is empty we want the push transaction to be processed before the pop transaction
  • When the FIFO is full we want the pop transaction to be processed before the push transaction
  • In any other case we do not care of the order between the two.

One possible solution for our problem is to do the following steps:

  1. Buffer transactions – when the PUSH and POP monitors are sending transactions, do not process them immediately, but buffer them for later use.
  2. Wait for all – wait until all the monitors had their chance to send the transactions happening in the same simulation time.
  3. Compute a priority – determine for each of the buffered transactions the priority with which it should be processed.
  4. Process transactions – use the priority of the buffered transactions to determine the order in which to process them.

Step #1 – Buffer Transactions

The first step in our small algorithm is to buffer all the transactions which are sent by the PUSH and POP monitors at the same simulation time, into some queue. This will allow us in next steps to process them in the order that we need.

class cfs_fifo_scoreboard extends uvm_component;
  ...
  //List of pending transactions to be processed
  protected uvm_object pending_items[$];
      
  //Handle the information coming from the monitors, regardless if
  //it is a push or a pop item
  protected virtual function void handle_item(uvm_sequence_item item);
    pending_items.push_back(item.clone());
  endfunction
  
  //Function associated with the analysis port connected to the PUSH monitor
  virtual function void write_push(cfs_push_item_mon item);
    handle_item(item);
  endfunction
      
  //Function associated with the analysis port connected to the POP monitor
  virtual function void write_pop(cfs_pop_item_mon item);
    handle_item(item);
  endfunction
endclass

Step #2 – Wait For All

In this step we need to wait some time so that all the monitors manage to send their information to the scoreboard. However, please keep in mind that this “waiting” is for transactions happening in the same simulation time.

In our case, the two PUSH and POP agents are working on the same clock signal. This means that it is safe to assume that the two threads will actually be handled by the simulator in random order but in the same batch of … “stuff” handled by the simulator per each event. So waiting for the next NBA region relative to the first thread handled should give “enough time” to the simulator to call both threads.

First, we need to start a task, called parse_items(), after every information pushed in the pending_items queue:

class cfs_fifo_scoreboard extends uvm_component;
  ...
  //Handle the information coming from the monitors, regardless if
  //it is a push or a pop item
  protected virtual function void handle_item(uvm_sequence_item item);
    pending_items.push_back(item.clone());
        
    fork
      begin
        parse_items();
      end
    join_none
  endfunction
  ...
endclass

Next, we need to make sure that the job done by parse_items() is done only once for a given simulation time, regardless of how many times parse_items() is called in the same simulation time.

This can be achieved by saving a reference to the process associated with task parse_items() and only allowing the main code of the task to be executed for the first call of the task, when the process is still null:

class cfs_fifo_scoreboard extends uvm_component;
   ...
   //Process associated with task parse_items
   local process process_parse_items;
      
  //Task which waits for all the transactions to be queued
  //and then start parsing them
  protected virtual task parse_items();        
    fork
      begin
        if(process_parse_items == null) begin
          process_parse_items = process::self();

          //Give time to the simulator to call both threads:
          //one from the PUSH monitor and one from the POP monitor
          uvm_wait_for_nba_region();
              
          //TODO: do the parsing - this is implemented in the next steps
          
          //Clear the reference to the process associated with parse_items()
          //task so that in the next clock cycle everything is clean.
          process_parse_items = null;
        end
      end
    join        
  endtask
  ...
endclass

You can make use of uvm_wait_for_nba_region() only if all the agents are working on the same clock signal. If you have a scenario in which the monitors are working, for example, on divided but synchronized clocks then it would be safer to use some physical delay like “#1ps”.

Step #3 – Compute Priority

Next, we need a function which computes the priority of a transaction based on the following variables:

  • the type of the transaction – push or pop
  • “is empty” indicator of the FIFO
  • “is full” indicator of the FIFO

Here is one possible implementation of such function:

class cfs_fifo_scoreboard extends uvm_component;
  ...
  //Get the priority of an item with which it will be processed by the scoreboard
  protected virtual function int unsigned get_priority(uvm_object item);
     cfs_push_item_mon push_item;
     cfs_pop_item_mon pop_item;
   
    if($cast(push_item, item)) begin
      if(is_full()) begin
        //SCENARIO: PUSH at FULL
        //FIFO is already full so associate a smaller priority than in "POP at FULL"
        //as a pop should be processed first in this case
        return 10;
      end
      else if(is_empty()) begin
        //SCENARIO: PUSH at EMPTY
        //FIFO is empty so associate a higher priority than in "POP at EMPTY"
        //as a pop should be processed second in this case
        return 15;
      end
      else begin
        //FIFO is neither full nor empty so the priority does not really matter
        return 6;
      end
    end
    else if($cast(pop_item, item)) begin
      if(is_full()) begin
        //SCENARIO: POP at FULL
        //FIFO is already full so associate a higher priority than in "PUSH at FULL"
        //as a pop should be processed first in this case
        return 15;
      end
      else if(is_empty()) begin
        //SCENARIO: POP at EMPTY
        //FIFO is empty so associate a smaller priority than in "PUSH at EMPTY"
        //as a pop should be processed second in this case
        return 10;
      end
      else begin
        //FIFO is neither full nor empty so the priority does not really matter
        return 5;
      end
    end
    else begin
      `uvm_fatal("ALGORITHM_ISSUE", $sformatf(
        "Trying to get priority for an unhandled class: %s", item.get_type_name()));
    end
  endfunction
  ...
endclass

There are two important things to notice in the get_priority() function:

  1. The scenario “PUSH at EMPTY” has priority 15 which is greater than the priority of scenario “POP at EMPTY”, which is 10. This solves the first corner case that we talk about as, when the FIFO is empty, a PUSH will be processed before a POP happening in the same simulation time.
  2. The scenario “POP at FULL” has a priority 15 which is greater than the priority of scenario “PUSH at FULL”, which is 10. This solved the second corner case that we talk about as, when the FIFO is full, a POP will be processed before the PUSH happening in the same simulation time.

Step #4 – Process Transactions

The final step is to process transactions in the order imposed by the get_priority() function. We fill in the code of parse_items() so that we first sort pending_items by the result of get_priority() and then we call the appropiate do_push() and do_pop():

class cfs_fifo_scoreboard extends uvm_component;
  ...
  //Task which waits for all the transactions to be queued
  //and then start parsing them
  protected virtual task parse_items();
        
    fork
      begin
        if(process_parse_items == null) begin
          process_parse_items = process::self();
              
          //Give time to the simulator to call both threads:
          //one from the PUSH monitor and one from the POP monitor
          uvm_wait_for_nba_region();
              
          //Sort the transactions based on their priority - from high to low
          pending_items.rsort(item) with (get_priority(item));
              
          while(pending_items.size() > 0) begin
            uvm_object item = pending_items.pop_front();
              
            cfs_push_item_mon push_item;
            cfs_pop_item_mon pop_item;
                
            if($cast(push_item, item)) begin
              do_push(push_item);
            end
            else if($cast(pop_item, item)) begin
              do_pop(pop_item);
            end
            else begin
              `uvm_fatal("ALGORITHM_ISSUE", $sformatf(
                "Trying to parse an unhandled class: %s", item.get_type_name()));
            end
          end
              
          //Clear the reference to the process associated with parse_items()
          //task so that in the next clock cycle everything is clean.
          process_parse_items = null;
        end
      end
    join      
  endtask
  ...
endclass

That’s it! With this algorithm the nondeterminism issue of SystemVerilog is solved and the data will be processed based on the priority determined by the function get_priority().

If you want to see the complete code running, then check out the project on EDA Playground called How to Handle Data Coming From Parallel Threads 2. Try to change the simulators and see that the order in which the PUSH and POP transactions are processed does not change and it is the one determined by the get_priority() function.


If you have other methods of tackling this problem please share in the comments section as it would be very interesting to see different solutions to this issue.

Hope you found this article useful 🙂

Cristian Slav

One Comment

  • eminakgun96@gmail.com' emin says:

    Interesting work, thanks!
    How can we accomplish the same task without checking the process handle? Have you ever thought?

    I thought what happens when another handle_item calls are made when your parse_items did not finish. Maybe we can wait until process is not equal to null?

    Lets say in parse_item,
    ‘uvm_wait_for_nba_region();’, is executed and another handle_item calls are already made, this’ll make your previous parse_item to consume the requests that registered after uvm_wait_for_nba_region() is finished, thus the next handle_item’s parse_item call might end up with doing nothing since queue is empty.

Leave a Reply

Your email address will not be published.