A gentle introduction to Generative AI: Historical perspective (medium.com)

🤖 AI Summary
Generative models estimate the joint probability of variables to either improve discriminative tasks or produce realistic content (AIGC). Unlike discriminative models that learn mappings for classification, generative models capture richer, more complete features of data—useful for synthesis, simulation and boosting downstream performance—but are often over-parameterized and harder to train. They come in unimodal (text-to-text, image-to-image) and multimodal (text-to-image) forms, and can be categorized as PDF-based (learn the density directly) or cost-based (learn a sampling mechanism)—examples include energy-based models and GANs, respectively. Historically, generative modeling dates to GMMs and HMMs for sequences; major progress followed deep learning (deep belief nets circa 2006). NLP moved from n-grams to RNNs and gated architectures (LSTM/GRU) for long-range dependencies, while vision advanced from hand-crafted texture methods to GANs, VAEs and diffusion models (e.g., Stable Diffusion). The transformer revolution—self-attention architectures—unified modalities (GPT/BERT in NLP, ViT in vision) and made large-scale pretraining central: autoregressive (GPT) vs. masked (BERT) paradigms for language, contrastive learning and masked autoencoders for vision. These technical shifts enabled today’s powerful multimodal AIGC systems and suggest continued convergence of modeling, pretraining and architecture across AI domains.
Loading comments...
loading comments...