Matchbox Educable Noughts and Crosses Engine (en.wikipedia.org)

🤖 AI Summary
In 1961 AI researcher Donald Michie (with Roger Chambers) built MENACE — the Matchbox Educable Noughts and Crosses Engine — a mechanical reinforcement-learning machine made from 304 labelled matchboxes and coloured beads that played tic‑tac‑toe. Each matchbox encoded a canonical board state (rotations and mirrors collapsed), and beads of different colours in a tray corresponded to legal moves from that state. To choose a move the operator shook the tray and used the bead that fell into a V-notch; after a game the sequence of used beads was either rewarded (winning moves returned plus three extra beads), mildly reinforced for a draw (+1 bead), or punished by removing the beads for losing moves. MENACE always played as O and learned purely by altering bead counts — starting uniformly random and biasing toward successful moves over repeated games. MENACE is significant because it embodied core reinforcement‑learning ideas before they were formalized: stochastic action selection, reward/punishment updating, symmetry reduction of state space, and an analogue of weight initialisation. In practice it quickly learned optimal play against suboptimal opponents and converged to draws against optimal human play (Michie’s tournament reached consistent draws after ~20 games). The BOXES algorithm Michie described influenced later work (including GLEE and links to Q‑learning) and MENACE remains a vivid pedagogical example of how simple, local update rules can produce intelligent behaviour.
Loading comments...
loading comments...