Designing Circuit & MIPS code for Search.

Question 1

Question 1

A fundamental mode sequential logic circuit has 2 inputs X₁ and X₂ and one output Z. The inputs X₁ and X₂ should be such that the pulse X₂ lies entirely within the pulse X₁. Typical correct inputs are shown in Figure 1a.

Figure 1a

An input sequence that does not have this characteristic is said to be illegal. The function of the fundamental mode sequential circuit is to detect any illegal input sequences. The output Z is cleared to logic 0 whenever both inputs X₁ and X₂ are logic 0. The output Z is to be set to logic 1 whenever an illegal input is detected. Once an illegal input is detected, Z remains high until both inputs are 0. A second pulse X₂ within the same pulse X₁ is also illegal. A single pulse on X₁ without a pulse on X₂ is not detected as an illegal input sequence.

Typical sequences are shown Figures 1b and 1c together with suggested primitive states:

Figure 1b. Correct input sequence

Figure 1c. Example illegal input sequences

Merge states if possible, and then design a Moore circuit. Ensure there are no race problems or hazards in your implementation.

Question 2.

MIPS supports subroutines using the jump and link instruction, jal,where the program jumps to an address and saves the return address (PC + 4) in register $ra, ($31). The stack pointer register, $sp (typically $29), points to the area in memory used for the stack.

The following C code subroutine ‘search’ is compiled into MIPS assembly language as shown. v[] is an array and 'n' is the size of the array. Determine what the C code does. Explain how each section of the MIPS code relates to the C code and explain what is happening in the MIPS code and why it is done. Write suitable comments for each line of assembly code (use the exam sheet if you wish). You are required to provide more than simple code translations.

void search (int v[], int n)

{

int i;

for (i = 0; i < n; i = i + 1)

{

if(v[i] <0) v[i] = v[i] + 100

else v[i] = v[i] - 1;

}

}
search:	addi	$sp, $sp, -16
	sw	$ra, 12($sp)
	sw	$s2, 8($sp)
	sw	$s1, 4($sp)
	sw	$s0, 0($sp)
	add	$s0, $zero, $zero
loop1:	slt	$t0, $s0, $a1
	beq	$t0, $zero, exit
	sll	$tl, $s0, 2
	add	$t2, $a0, $tl
	lw	$t3, 0($t2)
	slt	$t1, $t3, $zero
	beq	$t1, $zero, else
	addi	$t3, $t3, 100
	j	noelse
else:	addi	$t3, $t3, -1
noelse:	sw	$t3, 0($t2)
	addi	$s0, $s0, 1
	j	loop1
exit:	lw	$s0, 0($sp)
	lw	$s1, 4($sp)
	lw	$s2, 8($sp)
	lw	$ra,12($sp)
	addi	$sp, $sp, 16
	jr	$ra
Continued		3

Suggest three ways in which the performance of a microprocessor (CPU) can be improved. Comment on the use of one of these parameters as a guide to the computers performance.

Figure 2 in the notes section shows a single cycle data path.

What are the disadvantages of a single cycle data path?

Extend the data path of figure 2 to include the jump and link, jal instruction. Describe how to set any control lines you have added and explain its operation. You may draw on the data path of figure 2 to aid your answer.

Question 3

Figure 3a in the notes section shows a multicycle data path. Figure 3b shows the corresponding (incomplete) finite state machine diagram.

a. What are the motivations behind the multicycle data path architecture?

Describe how a branch (beq) and R format instruction are implemented in this multicycle data path. Note what is happening in each cycle. Complete the state machine diagram.

The program counter in figure 3a is a register with a clock enable. When the enable is set to logic '1' the data at the input is clocked to the output. Design a one bit register with clock enable and clear. You are to use a simple one bit D type flip-flop. Use AND, OR and NOT gates as required. Suggest why the clock should not be gated. Suggest a use for the clear function.

Modify the data path to handle unrecognized instruction and overflow exceptions. You may add two extra registers, exception program counter and cause register. The exception address, C0000000Hex, is to be loaded into the program counter on an exception. Discuss the use of these two registers and how an exception would be handled. Add the extra states needed to the finite state machine diagram and enter control signal settings.

Question 4

For this question consider the pipelined data path of figure 4a.

Explain how the terms 'instruction throughput' and 'instruction execution time' apply when considering pipeline data path performance. A diagram may aid your answer.

b Find the data dependencies in the following code:

add $2, $2, $7

and $2, $2, $2

or $8, $6, $2

sub $14, $8, $2

sw $15, 100($2)

Use a graphical representation to show possible dependencies. Use the same method to show how forwarding may or may not be used to solve these dependency problems.

Consider the code:

lw $2, 100($5)
sw $2, 200($6)
add $3, $6, $2

Use a graphical representation to show dependencies. Use graphical representations to show a forwarding solution to the dependency problem if possible.

Show the additional forwarding hardware required to implement any forwarding necessary from part b and your forwarding solutions from part c. You may draw on the data path of figure 4b to aid your answer. (The forwarding unit may be shown as a simple block labeled ‘forwarding unit’)

Compare the multicycle and pipeline data paths. Suggest an application were multicycle data path is preferable to the pipeline data path. Can the multicycle data path have faster execution times than the pipeline data path. Explain your answer.

Explain why more than one exception can be generated in a single clock cycle for the pipelined data path. Explain how you would deal with a multiple exception situation.

Question 5

For this question consider the pipelined data path of figure 4a.

Why is it better to have a high number of pipeline stages? What limits the number of pipeline stages?

Consider the code:

lw $1, 10($3)
add $3, $1, $4

Use a graphical representation to show why we need to stall the pipeline. Describe how the pipeline may be stalled.

c. For the code:

Address Instruction

beq $2, $3, 10
and $10, $1, $5
or $9, $8, $7
add $11, $12, $1

Xlw $4, 10($1)

Determine the address X of the lw instruction.

Assume the branch is taken. Describe the flushing operations that take place and why they are needed to ensure proper code execution?

Suggest a method of improving the performance of the beq instruction

d. Consider the following code:

beq	$1, $2, TARGET #branch is taken
lw	$3, 40($4)
add	$3, $3, $3
sw	$3, 40($4)
TARGET: or	$10, $11, $12

For the pipelined data path of figure 4a, can a flush and a stall happen simultaneously? If so what are the consequences? How is the situation changed if the branch resolution takes place in the ID stage?

Question 6

Describe the principles of locality and how they relate to cache systems.

A virtual memory system uses a look up table to translate between virtual memory and physical memory address. How can this system be used to provide protection between multiple user programs using the same CPU system? What special modes does the CPU need to enable an operating system to provide this protection?

Figure 6 shows an implementation of a 16k word direct mapped, one word block size cache. Show an implementation of a 2-way set associative cache (one word blocks). The new cache is to have the same amount of data storage as the direct mapped cache. Explain your design and comment on the cost of increasing the associativity.
Assume a direct mapped, 8 word block, cache. A write back system is in use. A cache read miss has occurred. Assuming a ‘least recently used’ system how does the cache system continue from this point?

A virtual memory address is translated to a real physical memory address via a look up table (page table). A virtual memory system has a 4k byte page size, a virtual byte address of 42 bits and a 32 bit physical byte address. How big (in bytes) is the look up table. Assume 32 bit data and 5 bits of additional control data. Comment on any difficulties in implementing this page table and suggest a solution.

Figure 6

Question 1

void search (int v[], int n) //Function Definition

{ //Formal Arguments

int i; //variable Declaration

for (i = 0; i < n; i = i + 1) //For Loop(Looping Statement)

{

if(v[i] <0) //Conditional Statement

v[i] = v[i] + 100 //Statements or Expression

else //Else Part

v[i] = v[i] - 1; //Statement or Expression

}

} //close the loop

search: addi $sp, $sp, -16 // $sp=$sp-16

sw $ra, 12($sp) //Memory[$sp+12]=$ra

sw $s2, 8($sp) //Memory[$sp+8]=$s2

sw $s1, 4($sp) //Memory[$sp+4]=$s1

sw $s0, 0($sp) //Memory[$sp+0]=$s0

add $s0, $zero, $zero //$s0=$zero+$zero

loop1: slt $t0, $s0, $a1 //if ($s0<$a1)$t0=1; else $t0=0

beq $t0, $zero, exit //if ($t0==$zero) go to pc+4+100

sll $tl, $s0, 2 //$t1=$s0<<2

add $t2, $a0, $tl //$t2=$a0+$t1

lw $t3, 0($t2) //$t3=Memory [$t0+0]

slt $t1, $t3, $zero //if ($t3<$zero)$t1=1; else $t1=0

beq $t1, $zero, else //if ($t1==$zero) go to PC+4+100

addi $t3, $t3, 100 //$t3=$t3+100

j noelse // jump to target address

else: addi $t3, $t3, -1 //$t3=$t3-1

noelse: sw $t3, 0($t2) //Memory[$t2+0]=$t3

addi $s0, $s0, 1 //$s0=$s0+1

j loop1 //go to loop1

exit: lw $s0, 0($sp) //$s0=Memory[$sp+0]

lw $s1, 4($sp) //$s1=Memory[$sp+4]

lw $s2, 8($sp) //$s2=Memory[$sp+8]

lw $ra,12($sp) //$ra=Memory[$sp+12]

addi $sp, $sp, 16 //$sp=$sp+16

jr $ra //go to $ra

Generally, the C statement contains a single operation where, now two of the operands are in memory. Thus, still more MIPS instructions are needed. Constant variables are used as one of the operand for many arithmetic operations in MIPS architecture. For avoiding the load instruction that is utilized in arithmetic, one operand can be used as a constant. This quick add instruction with one constant operand is called as add arithmetic or addi.

addi $sp,$sp,-16

For storing the word from the memory, add the instructions and place the sum in the pointer $s0. In a word, the shift operation moves all the bits to the left side and emptied bits are filled with 0’s. It is the decision making instruction used in MIPS assembly language. It stands for branch if equal.

beq $t0,$zero,exit

If the value in $t0 equals the value in $zero, the basic operation of this instruction is go to the statement which is labeled as exit. Add the values in $a0 and $t1 and store the values in $t2. Load the word from memory to the register. For creating all the relative conditions like equal, not equal, less than or equal, greater than or equal, MIPS compliers utilize slt, slti, beq, bne and the fixed value of 0. Therefore, it is called as comparison instruction.

Question 2

jr $ra

For jumping to the pointer address $ra, the above mentioned instruction is utilized.

Performance is considered as a significant attribute for selecting a better computer, among various computers. Response time, bandwidth and throughput are the three factors on which the microprocessor’s performance depends. For increase in performance, the response time or execution time for some work must be decreased. As per Amdahls law, speed up is referred as,

Speed up =

It is not efficient. Moreover, for every instruction the clock cycle contains similar length. Due to lengthy clock cycle, the complete performance is very poor.

The jump instruction resembles branch instruction, but it differs in computing the target PC and it is not conditional. Such as branch, the jump address’s low order 2 bits are always. The next lower 26 bits of this 32 bit address comes from 26 bit immediate field in the instruction. The upper 4 bits of the address should replace the PC that come from the PC of the jump instruction plus 4.

In PC, the jump can be implement by storing the concatenation of I.

Upper 4 bits of the current PC+4.
The immediate field of the jump instruction is 26 bit.
The bit is00_two

The multiplexer is controlled by the jump control signal. The jump target is obtained by shifting the lower 26 bits of the jump instruction left 2 bits, adding 00 as the low order bits and the concatenating the upper bits of PC+4 as higher order bits. One additional control signal is needed for the additional multiplexer. This control signal is referred as jump. When the opcode is two, the jump instruction is high.

It is very hard to implement in a single cycle data path. During the execution time, according to the instruction the processor adjusts the clock period. For overcoming this problem, multicycle data path is introduced. It is easy to make the clock fast. It allows the instructions to take more than one clock period, and varies based on the instruction. One basic step of processing is allowed for every single clock period.

In R format instruction, there are 4 cycles. In the first cycle, from the memory the instruction is fetched, and the ALU is utilized to increment PC. The instruction must be saved in the instruction register. In the second cycle, the registers are read, and the values from the registers that are utilized by the ALU must be saved in new registers A and B. Further in the ALU, r-type operation is completed during the third cycle and the result is stored in a new register called ALUOut. Then, the value in register ALUOut is written into the register file, during the fourth cycle. In branch instruction, 3 cycles are required. They are as follows:

Question 3

Cycle 1 IR ¬ mem[PC] Save instruction in IR

PC ¬PC + 4 increment PC

Cycle 2 A ¬ R[rs] save register values for the next cycle

B ¬ R[rt] (for comparison)

ALUOut ¬ PC + calculate the address for branch

signextend(imm16) << 2 and place in ALUOut

Cycle 3 Compare A and B

if Zero is set replace PC with ALUOut if Zero

then PC ¬ ALUOut is set, otherwise do not change PC

A register is referred as a group of flip- flops. Each flip- flop stores one bit of data. The simplest design is a register that consists just flip-flops, with no other gates in the circuit. The flip- flops share a common clock pulse (frequently using a buffer to reduce power requirements). A special terminal can be utilized for clearing the flip-flop i.e., by placing zeroes in all its bit.

Exception Program Counter:

A 32 bit register is utilized for holding the affected instruction’s address. Although, if the exceptions are vectored, it requires this register.

Cause Register:

For recording the cause of the exception, the cause register is utilized. In MIPS architecture, this register is 32 bits long and at present some bits are not in use. Assume that there is a five bit field that encodes the undefined instruction exception and arithmetic overflow exception. For these two kinds of exceptions 10 bits are used to represent an undefined instruction and 12 bits used to represent arithmetic overflow. If exception occurs one time, in such case the processor should store the offending instruction’s address in the exception program counter and at some specific address it must transfer control to the operating system. Once the operating system gets transferring control, it must take suitable action for providing any service to the user program. To handle the exception by operating system it must know the reason for the exception.

For finding the reason for exception, the MIPS architecture contains the following methods.

Vectored interrupts
Status Register

The CPU instruction throughput increased by pipelining. The number of instructions completed by the unit of time slightly increases the execution time of each instruction, just because the pipeline control has overhead. The increase in instruction throughput refers to fast program running and the total execution time is less.

In pipelining process, forwarding is a method that is utilized for avoiding any data hazards. Usually, at the end of the EX stage or clock cycle 3 the desired result is available. In the beginning of the EX stage, the result value is required for performing AND and OR instructions. This segment is executed without stalls, if forwarding the data as soon as it is available to ant units that need it before it is available to read from the register file.

Question 4

Let us assume forwarding is applied only in the EX stage, where it performs either ALU operation or an effective address calculation. If EX stage requires using the register value then it means that it must write only in WB state. In forwarding technique, notation is utilized for knowing the pipeline registers’ fields.

More execution time is required for the Multicycle data path. On the other hand, the pipeline instructions are executed one after the other. Thus, lest execution time is required.

In pipelined data path, simultaneously all the instructions are executed. Thus, more than one clock cycle is required for execution.

If the number of stages in the pipeline are increased, then the execution time is less, and the pipeline stages become limitless.

Here, among the load and its subsequent instruction the reliance drives backward in time, and it is a hazard which cannot be resolved with the technique like forwarding. Thus, along with a forwarding unit, a new unit known as hazard detection unit is also required. During the ID stage, the Hazard detection unit was operated and this is why it can easily insert the stall between the load and its usage.

The lw instruction’s address is 94.

If the wrong branch is predicted, it requires flush operation. In case, if this prediction is wrong, then it means two instructions are processed and started to execute incorrectly. We’ll have to discard, or flush, those instructions and begin executing the right ones from the branch target address, Label.

For improving the performance of the beq instruction, prediction can be utilized.

The instruction should be decoded for deciding whether it requires a bypass for the equality unit and whether equality comparison should be completed or not. If, the branch denotes instruction, then the PC should be set to the branch’s target address. The process of forwarding the branches’ operand is managed by the ALU forwarding logic unit. However, in ID stage the introduction of the equality test unit might need a new forwarding logic.

The branch’s bypass source operand can come either from the ALU/MEM or from the MEM/WB pipeline latches. In the branch comparison, the value is required for the ID stage but it might be produced in future. It is possible for the data hazards to occur and it might also need the stall. Thus, for overcoming such complications the branch execution can be moved to the ID stage. Because, if the branch is taken then the branch’s penalty is just decreased to one instruction.

Question 5

It expresses that, at any time the programs accesses a relatively small part of their address space. The localities are categorized into the following types-Spatial locality, and temporal locality. Depending on the word’s address that is present in the memory, the cache location can be assigned easily in the cache for each word in the memory.

Protection mechanism should make sure that multiple processes share similar main memory. Because, one renegade process won’t benefit to write the address space of another user process either intentionally or unintentionally into the operating system. Therefore, in the TLB the write access bit can be used for protecting a page from being written. Hence, exclusive of such protection level, there are more chances of spreading virus in the computer.

In the virtual memory system, the operating system helps to implement protection and the hardware must provide the following three capabilities.

Support at least two modes that indicate whether the running process is a user processor or an operating system process, which is either known as supervisor process, a kernel process or executive process.
Arrange for a portion of the processor state, where the user process could only read and not write. This contains either user mode bit or a supervisor mode bit that commands whether the processor is in user or supervisor mode. For writing such elements, the operating system utilizes special instructions which are just present in the supervisor mode.
Furnish mechanism where, the processor can move from user mode to supervisor mode and vice versa.

There is a middle range of design between the direct mapped cache and the fully associative cache. In a set of associative cache, it has fixed number of locations for placing each block. The n-way set associative cache is referred as a set of associative cache which has n locations for a single block. The n-way set associative cache comprises various sets, which contains n blocks. A unique set is mapped by each block of the cache, and provides an index field where the block can be placed in any element of that set. For directly mapping the block with a set, a set of associative placement combines with the direct mapped placement and fully associative placement. This process helps to search and match.

The block has to be replaced, if any cache miss takes place. Replacing the block is varied from one cache to another cache. At present, the popularly utilized replacement scheme is hardly used. In LRU, if any block remains unused for a long time then such block must be replaced. Such method is known as a scheme, which is utilized in the associative cache. Especially, if the number of elements are maximized in the association, then it becomes difficult to implement LRU. The LRU replacement is implemented by keeping track of each element, such as when was is relatively used with the other elements in the set.

Page fault occurs if a valid bit for a virtual page is off. In such case, the operating system has the control. Then, the operating system finds the page in the next level of the hierarchy and decides where to place the requested page in the main memory. On the disk the location of the page is not known only with the virtual address. Thus, in the virtual memory system, each page’s location must be tracked.

References

C. Swapneel, Microprocessor. [Place of publication not identified]: Jaico Publishing House, 2012.

J. Davies, MSP430 microcontroller basics. Oxford: Newnes, 2013.

J. ANDERSON, MICROPROCESSOR TECHNOLOGY. [Place of publication not identified]: TAYLOR & FRANCIS, 2016.

The 68000 Microprocessor. Springer Verlag, 2012.

S. Barrett, Arduino microcontroller. [San Rafael, Calif.]: Morgan & Claypool Publishers, 2013.

G. Rucka and C. Hamner, Pipeline. London: Titan, 2011.

[7]P. Naville, Pipeline. San Bernadino, California: [publisher not identified], 2015.

[8]R. Dalby, Pipeline. Abbeville, S.C.: Moonshine Cove Publishing, 2015.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2020). Designing A Fundamental Mode Sequential Circuit And MIPS Assembly Language Subroutine For Search. Retrieved from https://myassignmenthelp.com/free-samples/kne345-design-of-microprocessors-and-microcontrollers.

"Designing A Fundamental Mode Sequential Circuit And MIPS Assembly Language Subroutine For Search." My Assignment Help, 2020, https://myassignmenthelp.com/free-samples/kne345-design-of-microprocessors-and-microcontrollers.

My Assignment Help (2020) Designing A Fundamental Mode Sequential Circuit And MIPS Assembly Language Subroutine For Search [Online]. Available from: https://myassignmenthelp.com/free-samples/kne345-design-of-microprocessors-and-microcontrollers
[Accessed 27 April 2024].

My Assignment Help. 'Designing A Fundamental Mode Sequential Circuit And MIPS Assembly Language Subroutine For Search' (My Assignment Help, 2020) <https://myassignmenthelp.com/free-samples/kne345-design-of-microprocessors-and-microcontrollers> accessed 27 April 2024.

My Assignment Help. Designing A Fundamental Mode Sequential Circuit And MIPS Assembly Language Subroutine For Search [Internet]. My Assignment Help. 2020 [cited 27 April 2024]. Available from: https://myassignmenthelp.com/free-samples/kne345-design-of-microprocessors-and-microcontrollers.

Get instant help from 5000+ experts for