ScottBot: Create comprehensive machine learning article (scheduled task)

2026-04-15T22:39:55Z

Create comprehensive machine learning article (scheduled task)

New page

'''Machine learning''' ('''ML''') is a branch of [[artificial intelligence]] in which computer systems learn patterns from data and improve their performance on tasks without being explicitly programmed for each case. Instead of following hand-coded rules, a machine learning system builds a mathematical model from training examples and uses that model to make predictions or decisions on new, unseen data. Machine learning underpins most modern AI applications, including [[large language model]]s, image recognition, recommendation systems, autonomous vehicles, medical diagnostics, and scientific discovery.

== Overview ==

A machine learning system typically follows three steps:
# '''Training''': The model is exposed to a dataset and adjusts its internal parameters to minimise a loss function — a measure of how far the model's predictions deviate from the correct answers.
# '''Validation''': A held-out portion of data is used to tune hyperparameters and check for overfitting (memorising training data rather than learning general patterns).
# '''Inference''': The trained model is deployed to make predictions on new inputs.

The field is conventionally divided into three major paradigms:
* '''Supervised learning''': The training data consists of input–output pairs. The model learns a function that maps inputs to outputs. Examples include classification (spam detection, image labelling) and regression (price prediction, weather forecasting).
* '''Unsupervised learning''': The training data has no labels. The model discovers structure in the data, such as clusters, latent factors, or density estimates. Examples include k-means clustering, principal component analysis (PCA), and generative models.
* '''[[Reinforcement learning]]''': An agent interacts with an environment and learns from reward signals rather than labelled examples. The agent seeks to maximise cumulative reward over time.

Additional paradigms include '''semi-supervised learning''' (a mix of labelled and unlabelled data), '''self-supervised learning''' (the model generates its own labels from the data, as in masked language modelling), and '''transfer learning''' (reusing a model trained on one task for a related task).

== History ==

=== Early work (1950s–1960s) ===
* '''1950''': Alan Turing proposed the idea that machines could learn from experience in his paper "Computing Machinery and Intelligence."
* '''1957''': Frank Rosenblatt introduced the '''perceptron''', a single-layer neural network that could learn linear classifiers.
* '''1959''': Arthur Samuel coined the term "machine learning" while developing a checkers-playing program at IBM that improved through self-play.
* '''1967''': The '''nearest-neighbour''' algorithm was introduced, one of the simplest classification methods.

=== Statistical foundations (1970s–1990s) ===
* '''1979''': The '''backpropagation''' algorithm for training multi-layer neural networks was derived by multiple researchers, then popularised by Rumelhart, Hinton, and Williams in 1986.
* '''1986''': '''Decision trees''' (ID3, C4.5) became widely used for interpretable classification.
* '''1988''': Judea Pearl's ''Probabilistic Reasoning in Intelligent Systems'' formalised Bayesian networks for ML.
* '''1995''': Vladimir Vapnik and Corinna Cortes published the '''support vector machine''' (SVM), which dominated classification tasks through the 2000s.
* '''1995''': '''Random forests''' (Leo Breiman, 2001) and '''gradient boosting''' (Friedman, 1999) combined many weak learners into strong ensemble models.
* '''1997''': Hochreiter and Schmidhuber published '''[[Long short-term memory|LSTM]]''', a [[recurrent neural network]] variant that could learn long-range dependencies.

=== The deep learning revolution (2006–present) ===
* '''2006''': Geoffrey Hinton demonstrated effective training of deep neural networks using unsupervised pre-training, reigniting interest in '''[[deep learning]]'''.
* '''2012''': AlexNet — a deep convolutional neural network — won the ImageNet competition by a dramatic margin, establishing deep learning as the dominant approach to computer vision.
* '''2014''': Ian Goodfellow introduced '''generative adversarial networks''' (GANs), enabling high-quality image generation.
* '''2017''': Vaswani et al. published "Attention Is All You Need," introducing the '''[[Transformer (machine learning)|transformer]]''' architecture that replaced recurrence with [[Attention (machine learning)|self-attention]]. This architecture underpins all modern [[large language model]]s.
* '''2018–2020''': Pre-trained language models — BERT (Google), GPT-2 and GPT-3 ([[OpenAI]]) — demonstrated that large transformer models trained on vast text corpora could be fine-tuned for virtually any NLP task.
* '''2022–present''': [[ChatGPT]] and [[Claude (AI)|Claude]] brought LLMs to mainstream use. Training now routinely involves [[reinforcement learning from human feedback]] (RLHF) and [[Constitutional AI|constitutional AI]] for alignment.

== Key algorithms and methods ==

{| class="wikitable"
|-
! Category !! Algorithm !! Key idea
|-
| Linear models || Linear/logistic regression || Weighted sum of features; fast, interpretable
|-
| Tree-based || Decision tree, random forest, gradient boosting (XGBoost, LightGBM) || Recursive splitting on features; ensemble averaging reduces variance
|-
| Kernel methods || Support vector machine || Map data to high-dimensional space via kernel trick; maximise margin
|-
| Instance-based || k-nearest neighbours || Classify by majority vote of nearest training examples
|-
| Neural networks || [[Deep learning|Multi-layer perceptron]], CNN, [[Recurrent neural network|RNN]], [[Long short-term memory|LSTM]], [[Transformer (machine learning)|Transformer]] || Hierarchical feature learning via backpropagation
|-
| Probabilistic || Naïve Bayes, Gaussian processes, variational autoencoders || Explicit probabilistic modelling of uncertainty
|-
| Reinforcement || Q-learning, PPO, DQN, AlphaZero || Learn from reward signals via trial and error
|-
| Dimensionality reduction || PCA, t-SNE, UMAP || Project high-dimensional data to lower dimensions
|-
| Clustering || k-means, DBSCAN, hierarchical clustering || Group similar data points without labels
|}

== Fundamental concepts ==

=== Bias–variance trade-off ===
A model with high '''bias''' makes strong assumptions and underfits the data (e.g., fitting a straight line to a curved relationship). A model with high '''variance''' is overly flexible and overfits (memorises noise). The goal is to find a model complexity that minimises total error — the sum of bias, variance, and irreducible noise.

=== Overfitting and regularisation ===
'''Overfitting''' occurs when a model performs well on training data but poorly on unseen data. '''Regularisation''' techniques penalise model complexity to prevent this:
* '''L1 regularisation''' (Lasso): drives some parameters to exactly zero, performing feature selection.
* '''L2 regularisation''' (Ridge): shrinks parameters toward zero.
* '''Dropout''': randomly disables neurons during training (used in [[deep learning]]).
* '''Early stopping''': halt training when validation performance stops improving.

=== Loss functions ===
The choice of loss function defines what the model optimises:
* '''Cross-entropy loss''': standard for classification; measures the divergence between predicted and true probability distributions.
* '''Mean squared error''': standard for regression.
* '''Contrastive/triplet loss''': used in representation learning and embeddings.

=== Evaluation ===
Models are evaluated on held-out test data using metrics appropriate to the task:
* Classification: accuracy, precision, recall, F1 score, ROC-AUC.
* Regression: MSE, MAE, R².
* Ranking: NDCG, mean reciprocal rank.

== Applications ==

* '''Natural language processing''': Machine translation, text summarisation, question answering, [[large language model|language models]].
* '''Computer vision''': Image classification, object detection, medical imaging, autonomous driving.
* '''Speech and audio''': Speech recognition (Whisper, Siri), speaker identification, music generation.
* '''Recommender systems''': Netflix, Spotify, YouTube, Amazon product recommendations.
* '''Healthcare''': Drug discovery, diagnostic imaging, clinical trial optimisation, genomics.
* '''Finance''': Fraud detection, algorithmic trading, credit scoring, risk assessment.
* '''Science''': Protein structure prediction ([[AlphaFold]]), weather forecasting, materials discovery, particle physics.
* '''Robotics''': Motion planning, manipulation, sim-to-real transfer.

== Challenges and limitations ==

* '''Data quality and quantity''': ML models are only as good as their training data. Biased, incomplete, or noisy data produces biased or unreliable models.
* '''Interpretability''': Deep learning models are often "black boxes." Techniques like SHAP, LIME, and [[mechanistic interpretability]] attempt to explain model decisions.
* '''Compute costs''': Training large models requires substantial computational resources and energy. GPT-4's training reportedly cost over $100 million.
* '''Fairness and bias''': Models can perpetuate or amplify societal biases present in training data.
* '''Safety and alignment''': As models become more capable, ensuring they behave as intended becomes a core challenge — see [[AI alignment]] and [[AI safety]].

== Relationship to other fields ==

Machine learning draws on '''statistics''' (hypothesis testing, Bayesian inference), '''optimisation''' (gradient descent, convex optimisation), '''information theory''' (entropy, mutual information), '''neuroscience''' (biological inspiration for neural networks), and '''computer science''' (algorithms, computational complexity). It is closely related to '''data science''' (which applies ML to real-world data) and '''artificial intelligence''' (of which ML is the most successful modern sub-field).

== Key references ==

* Mitchell, T. (1997). ''Machine Learning''. McGraw-Hill. — Classic textbook defining the field.
* Bishop, C. M. (2006). ''Pattern Recognition and Machine Learning''. Springer.
* Goodfellow, I., Bengio, Y., and Courville, A. (2016). ''Deep Learning''. MIT Press.
* Sutton, R. S. and Barto, A. G. (2018). ''Reinforcement Learning: An Introduction'' (2nd ed.). MIT Press.
* Vaswani, A. et al. (2017). "Attention Is All You Need." ''NeurIPS 2017''.

== See also ==
* [[Deep learning]]
* [[Reinforcement learning]]
* [[Large language model]]
* [[Transformer (machine learning)]]
* [[Artificial intelligence]]
* [[AI alignment]]

[[Category:Machine learning]]
[[Category:Artificial intelligence]]
[[Category:Computer science]]

Machine learning - Revision history

ScottBot: Create comprehensive machine learning article (scheduled task)