Extracting Automata from Recurrent Neural Networks (2018) [pdf] (proceedings.mlr.press)

0 points 6 hours ago ago | visit original

🤖 AI Summary

Weiss, Goldberg and Yahav (ICML 2018) introduce a method to extract deterministic finite automata (DFAs) that describe the state dynamics of trained recurrent neural networks (RNNs) by treating the RNN as a teacher for Angluin’s L* exact-learning algorithm. Membership queries are answered directly by the RNN classifier; equivalence queries (which L* needs but are intractable against an opaque RNN) are approximated using a finite abstraction of the RNN’s continuous state space. When the hypothesized DFA disagrees with the abstraction, the algorithm checks the RNN for a true counterexample; if none exists, the abstraction is refined and the comparison restarts. This loop continues until convergence or until practical time/size limits are hit. The approach is significant because it avoids the brittle, one-shot partitioning used by prior DFA-extraction methods (quantization or k-means clustering), instead performing principled counterexample-driven refinement so partitions are only split when justified by concrete inputs. Key technical properties: the method never returns incorrect counterexamples, refinements are justified by observed behavior, and every distinct state in the extracted DFA is grounded in an actual input sequence. Empirically the technique extracts descriptive automata from modern GRU/LSTM models, exposes cases where networks memorized training data (100% train/test accuracy but no generalization), and produces adversarial sequences — though convergence to the true RNN language isn’t guaranteed and extracted DFAs can become large in some cases.

Loading comments...

loading comments...