SK hynix Memory-Centric Computing System Overview

At Hot Chips 35 in 2023, SK hynix presented expertise in the memory domain and described approaches to meet major demands of large-scale computing for artificial intelligence. SK hynix demonstrated its work on domain-specific memory for "memory-centric computing", seeking ways to mitigate one of the largest challenges in AI compute: the relationship between memory capacity and bandwidth and available compute resources. This article compiles that systematic presentation and clarifies key points.

Problem Statement

The issue SK hynix addressed is that the cost of generative AI inference is very high. Costs involve not only compute but also power consumption, interconnects, and memory, which significantly increase total expenditure.

Large Transformer Models' Compute and Memory Demands

For large Transformer models in particular, memory is a primary challenge. Models require large volumes of data and are often constrained by memory capacity and bandwidth.

large language model

SK hynix argues that the AI hardware industry needs more than plain memory: it needs different types of memory with built-in compute capabilities, including domain-specific memory. The company is positioning memory with compute as a way to add value in the stack.

why domain-specific memory?

"Compute-in-Memory" or SK hynix AiM

In their GDDR6 implementation, each memory tile includes its own 1 GHz processing unit and can provide 512 GB/s of internal bandwidth.

accelerator-in-memory

Matrix-vector multiplication (GEMV) is performed inside memory for AI compute: the weight matrix data comes from the memory tiles, while vector data comes from a global buffer.

Matrix-vector multiplication (GEMV)

AiM in memory uses specific memory commands to perform computation.

Aim-specific memory commands

Memory scalability and the AiM compute resources required by large language models were discussed.

Scale-out AiM for large language models

Using this type of AiM presents major challenges: mapping at the software level, designing hardware architectures for AiM, and then providing interfaces. These are key barriers to adoption.

Key techniques for efficient scaling AiMs for LLMs

Mapping Problems to AiM

Research is needed on how to map workloads and algorithms to AiM-capable memory.

practical/efficient mapping: AiM-specific tiling

System Architecture and Scalability

System architecture must support scaling and extensibility to handle large models.

1 Scalable AiM-based System Architecture

Key components of an AiM architecture include an AiM controller, scalable multicast interconnect, routers, compute units (ALUs), and an instruction sequencing controller.

2 Scalable AiM-based System Architecture

Matrix-vector accumulation functions are critical to AI workloads. AiM employs an instruction set similar to CISC to manage these functions.

Optimizations

New architectures often contain subtle differences that can be exploited to improve performance.

optimization techniques for scale-out architecture

SK hynix demonstrated a proof-of-concept GDDR6 AiM solution using two FPGAs, moving beyond abstract discussion to a tangible demonstration.

proof-of-concept for scale-out AiM system

AiM Software Stack

system integration options

The proof-of-concept is still at the evaluation stage, with various analyses comparing this approach to conventional solutions. It represents a potential direction for future development.