Writing a CHIP-8 emulator in C with SDL2

chip8 is a CHIP-8 emulator written in C, using SDL2 for rendering and input. CHIP-8 is the classic “hello world” of emulator development — a tiny 1970s virtual machine designed to make game programming approachable on 8-bit microcomputers. The whole console fits in your head: 4KB of memory, 16 registers, a 64×32 monochrome display, and just 35 instructions.

That small surface area is exactly why it’s such a good project. You get to build a real fetch-decode-execute CPU loop, do bit-level sprite blitting with collision detection, and wire it all up to a hardware-accelerated window — without drowning in the complexity of a real console.

Features

  • All 35 opcodes implemented with a complete fetch-decode-execute cycle.
  • 64×32 monochrome display rendered through SDL2 with hardware-accelerated scaling (10× by default, into a resizable window).
  • Hex keypad input mapped onto the keyboard in the traditional CHIP-8 layout.
  • Delay and sound timers ticking at 60 Hz.
  • Bundled ROMs (IBM Logo, Zero Demo, Particle) with an interactive selection prompt on startup.

The whole machine is one struct

Unlike a real console, the entire CHIP-8 state is small enough to read at a glance. Here’s the complete virtual machine:

Terminal window
typedef struct {
unsigned short opcode; // Current opcode
unsigned char* memory; // 4K memory [4096]
unsigned char* V; // 16 8-bit registers [16]
unsigned short I; // Index register
unsigned short pc; // Program counter
unsigned char* gfx; // Graphics buffer [64 * 32]
unsigned char delayTimer; // Delay timer
unsigned char soundTimer; // Sound timer
unsigned short* stack; // Stack [16]
unsigned short sp; // Stack pointer
unsigned char* key; // Keypad [16]
unsigned char drawFlag; // Set when the screen needs a redraw
} Chip8;

The main loop is just as honest — execute one instruction, poll input, redraw if needed, tick the timers, and sleep a little to slow the emulated CPU down to a reasonable speed:

Terminal window
while(true)
{
chip8_FetchDecodeExecute(&chip8);
chip8_Input(&chip8);
chip8_RenderLoop(&chip8);
chip8_Timer(&chip8);
usleep(10000); // Slow down execution
}

Decoding an opcode is bit surgery

Every CHIP-8 instruction is two bytes, stored big-endian. The first job is to reassemble those two bytes into a 16-bit opcode, then slice it into the bit-fields that different instructions care about: a 12-bit address (nnn), two register indices (x, y), an 8-bit constant (kk), and a 4-bit nibble (n).

Terminal window
void chip8_FetchDecodeExecute(Chip8* chip8)
{
// Fetch: combine two bytes into one big-endian 16-bit opcode
chip8->opcode = chip8->memory[chip8->pc] << 8 | chip8->memory[chip8->pc + 1];
// Decode: carve the opcode into its bit-fields
short nnn = chip8->opcode & 0x0FFF;
unsigned char x = (chip8->opcode & 0x0F00) >> 8;
unsigned char y = (chip8->opcode & 0x00F0) >> 4;
unsigned char kk = chip8->opcode & 0x00FF;
unsigned char n = chip8->opcode & 0x000F;
// Execute: dispatch on the high nibble, then on the variant
switch(chip8->opcode & 0xF000)
{
case 0x0000:
switch(chip8->opcode & 0x00FF)
{
case 0x00E0: chip8_cls(chip8); break; // CLS - clear the display
case 0x00EE: chip8_ret(chip8); break; // RET - return from subroutine
}
break;
// ... 1nnn, 2nnn, 3xkk, and the rest ...
}
chip8->pc += 2; // Instructions are 2 bytes — always advance by two
}

The dispatch is a nested switch: the outer one keys off the top nibble (0xF000), and where a family of instructions shares that nibble (like the 0x8xy_ arithmetic group), an inner switch picks the exact variant. Advancing pc by two at the end is what makes the program counter walk through memory one instruction at a time.

Drawing sprites: XOR and the collision flag

The most interesting opcode is Dxyn — draw a sprite. CHIP-8 sprites are 8 pixels wide and n rows tall, and they’re drawn by XOR-ing their bits onto the framebuffer. That XOR is the clever bit: drawing the same sprite twice erases it, which is how the original games did flicker-y animation.

The XOR also gives you free collision detection. If drawing a sprite ever flips a pixel that was already on back to off, something overlapped — so the emulator sets register VF to 1. Games read that flag to know when, say, a missile hit a wall.

Terminal window
void chip8_drw_vx_vy_nibble(Chip8* chip8, unsigned char x, unsigned char y, unsigned char n)
{
unsigned short px = chip8->V[x] % 64; // wrap X onto the screen
unsigned short py = chip8->V[y] % 32; // wrap Y onto the screen
unsigned short height = n;
unsigned char pixel;
chip8->V[0xF] = 0; // clear the collision flag
for(int yline = 0; yline < height; yline++)
{
if(py + yline >= 32) break;
pixel = chip8->memory[chip8->I + yline]; // one sprite row = one byte
for(int xline = 0; xline < 8; xline++)
{
if(px + xline >= 64) break;
if((pixel & (0x80 >> xline)) != 0) // test each bit, MSB first
{
unsigned int index = px + xline + ((py + yline) * 64);
if(chip8->gfx[index] == 1) // pixel was already on?
chip8->V[0xF] = 1; // -> collision
chip8->gfx[index] ^= 1; // XOR the pixel onto the screen
}
}
}
chip8->drawFlag = 1; // mark the frame dirty
}

The 0x80 >> xline mask is how each of the eight bits in a sprite row gets tested left-to-right (0x80 is 10000000, walked one bit right each iteration).

The font is just bytes that look like numbers

CHIP-8 ships its own built-in font for the hex digits 0F, and the trick is delightfully literal: each character is five bytes, and if you read those bytes in binary, the 1 bits are the pixels.

Terminal window
unsigned char const CHIP8_FONTSET[80] = {
0xF0, 0x90, 0x90, 0x90, 0xF0, // 0
0x20, 0x60, 0x20, 0x20, 0x70, // 1
0xF0, 0x10, 0xF0, 0x80, 0xF0, // 2
// ... through F
};

Take the digit 0: 0xF0, 0x90, 0x90, 0x90, 0xF0. Write out the top four bits of each byte and the shape pops right out:

Terminal window
1111 0xF0
1001 0x90
1001 0x90
1001 0x90
1111 0xF0

A hollow rectangle — a zero. Every glyph in the font is built this way, which means “rendering text” is just running these bytes through the exact same sprite-drawing path as everything else.

Only redraw when something changed

SDL2 handles the window, the GPU-backed texture, and the scaling (SDL_RenderSetLogicalSize lets the emulator pretend the screen is 64×32 while SDL stretches it to fill the window). The render path itself leans on that drawFlag from the draw opcode — there’s no point pushing pixels to the GPU on frames where nothing moved:

Terminal window
void chip8_RenderLoop(void)
{
if(chip8.drawFlag)
{
uint32_t pixels[SCREEN_WIDTH * SCREEN_HEIGHT];
for(int i = 0; i < SCREEN_WIDTH * SCREEN_HEIGHT; i++)
pixels[i] = chip8.gfx[i] ? 0x0000FF00 : 0x00000000; // green on / black off
SDL_UpdateTexture(canvas, NULL, pixels, SCREEN_WIDTH * sizeof(uint32_t));
SDL_RenderClear(renderer);
SDL_RenderCopy(renderer, canvas, NULL, NULL);
SDL_RenderPresent(renderer);
chip8.drawFlag = 0; // until the next DRW marks the frame dirty again
}
}

It’s a classic dirty-flag optimization: the framebuffer is a 1-byte-per-pixel monochrome buffer, and it only gets translated into RGBA and shipped to the texture when a DRW instruction actually touched it.

Tech Stack

  • Language: C
  • Graphics & input: SDL2 (hardware-accelerated renderer, streaming texture, logical scaling)
  • Build: CMake (SDL2 vendored as a git submodule — no system install needed)
  • Display: 64×32 monochrome, 10× scale into a resizable window
  • CPU: all 35 opcodes, 60 Hz delay/sound timers
×