Draft a Summary of the main topic selected along with at least three related research papers on Intelligent Memories in Computer Architecture.
Answer
Summary of Intelligent Memory
Definition: The growth of technology has helped in developing the Microprocessor and Dynamic Random Access Memory or DRAM technology for the organization (Prizedwriting.ucdavis.edu, 2017). The implementation of the operations was helpful in carrying the system of the development of the speed and capacity of the memory devices. The Microprocessor was helpful in increasing the speed of the operations of the memory devices and the DRAM has made it possible for increasing the capacity of the memory devices. The Processor-Memory Performance Gap is an issue that results due to the difference in the function of the technologies of Microprocessor and Dynamic Random Access Memory or DRAM and it impacts in causing the issue of bottleneck in the system. The problem arises when the memory system competes with the devices for getting data to the CPU and it results in increasing the overall time duration. The problem results in making the computing power of the computer operation cycles idle and non-useable. The Intelligent Memory is a specially designed architecture of the memory structure that can help in overcoming the issue of Processor-Memory Performance Gap (Selman, Aburas & Selman, 2014). It helps in maximizing the data transfer and CPU functions with the help of smarter technologies.
Architectural Description: There are four models of Intelligent Memory that has been used prominently for development and improvement. They are Active Pages, Computational RAM (CRAM), Parallel Processing RAM (PPRAM), and Intelligent RAM (IRAM) (Ahn, Yoo, Mutlu & Choi, 2015). The development of the denser chips with the help of DRAM technology has made the development for Intelligent Memory more effective. It has helped in exhausting only 10 sq mm instead of 50 sq mm for developing 64 MB of the RAM. The Active Pages model has divided the memory of the DRAM in equally sized pages that has assign the logic block in each of the page. The logic block has only the die area for development of a simple circuitry. Hence the more complex operations for example Float Point Arithmetic has to be done in the CPU. However, a complete query consists of both simple and complex functions. The complex functions are operated in the CPU and the simple functions are operated in the memory. The task breakdown and task partitioning is a function deployed in Active Pages. The CRAM has very similar function to the Active pages and it also considers the placement of the processing blocks in DRAM chips. The CRAM model has enormous internal bandwidth with 1.1 terabytes per second (Selman, Aburas & Selman, 2014). The connection pins are placed on the memory chips that would help in yielding the 270 megabytes per second and it makes the external bandwidth of the 4000 less than the internal bandwidth.
The issue is that these little capacitors store so little charge that the interior wiring will retain the signs previously they escape the chip. To take care of this issue, DRAM architects put "sense speakers" on inner wires to support the yield signals. The processor has two 1-bit registers, X and Y, and a self-assertive capacity Arithmetic Logic Unit (ALU) (Prizedwriting.ucdavis.edu, 2017). This ALU acknowledges two information sources and places the 1-bit yield onto the outcome transport. One of the two ALU input ports (call it 'A' for simplicity of reference) has three bits, originating from enroll X, Y, and the sense intensifier. The other info port, 'B', contains a 8-bit guideline from the worldwide transport. Basically, the ALU is a multiplexor with 'A' being the 3-bit selector that controls which one of the 8-bits from 'B' gets set onto the outcome transport. A PPRAM framework is comprised of different PPRAM hubs and these hubs have 3 parts: a rationale hinder, a memory piece, and a correspondence square. The rationale square can be anything from a broadly useful processor to an I/O controller (Selman, Aburas & Selman, 2014). The correspondence piece deals with between hub correspondences with an extraordinary convention.
Performance: According to Ahn et al. (2015), the rapid transport associates the CPU to the level 1 store. The on-chip fundamental memory offers an execution practically identical to that for a level 2 reserve. In this way, one level of store is adequate. Between the level 1 reserve and memory(DRAM) is the Memory Interface Unit, which is the thing that IRAM uses to take advantage of the colossal DRAM inner transfer speed. The technique used to get RADram comes about is this: First, a test system for the RADram framework is composed. At that point, a few applications are keep running on the test system and their execution times recorded. From that point onward, a similar arrangement of uses is then keep running on a regular memory framework. At long last, the execution times on a regular memory framework are partitioned by those on a RADram framework to get speedups. Speedups running from 1 to 1000X were recorded. Most applications had a tendency to perform better as the information estimate expanded. Pack comes about are acquired through a test system. On this test system an arrangement of ten applications are run and their execution times recorded. Timing information over a similar arrangement of uses are additionally acquired from a SUN SparcStation-5 75MHz. The greater part of the chose applications demonstrated noteworthy speedup, running from 1252X to 41391X [EllWeb]. Contrasted and CRAM, PPRAM's speedup is immaterial.
PPRAMR execution estimations are taken in an indistinguishable manner from those taken in the past two models, that is, from a test system. In contrast to the PPRAM's execution with a customary framework, its execution is contrasted with two sorts of "future" frameworks, in particular, Multiple-Powerful-processors with Cache based framework (MPC), and a Single-Powerful-processor with Main memory framework (SPM). The outcomes demonstrate that PPRAMRhas a greatest speedup of 1.41X over MPC and 2.22X over SPM on five chose applications (Ahn, Yoo, Mutlu & Choi, 2015). IRAM utilizes an alternate strategy to get execution estimations. Not at all like the past three models where information are recorded from test systems, IRAM predicts execution estimations from numerical equations. There are two stages to this procedure: The first is to get application execution data, for instance clock cycles, from a business processor. The second step is to utilize numerical recipes to decide how IRAM would influence the execution data accumulated.
References
Ahn, J., Yoo, S., Mutlu, O., & Choi, K. (2015, June). PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture. In Computer Architecture (ISCA), 2015 ACM/IEEE 42nd Annual International Symposium on (pp. 336-348). IEEE.
Prizedwriting.ucdavis.edu. (2017). Literature Review - Intelligent Memory | Prized Writing.Retrieved 5 December 2017, from https://prizedwriting.ucdavis.edu/literature-review-intelligent-memory
Selman, A. H., Aburas, A., & Selman, S. (2014). Intelligent memory allocation based on fuzzy logic. Southeast Europe Journal of Soft Computing, 3(1)