<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.opentransformers.online/index.php?action=history&amp;feed=atom&amp;title=Machine_learning</id>
	<title>Machine learning - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.opentransformers.online/index.php?action=history&amp;feed=atom&amp;title=Machine_learning"/>
	<link rel="alternate" type="text/html" href="https://wiki.opentransformers.online/index.php?title=Machine_learning&amp;action=history"/>
	<updated>2026-06-05T16:42:04Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.42.6</generator>
	<entry>
		<id>https://wiki.opentransformers.online/index.php?title=Machine_learning&amp;diff=45&amp;oldid=prev</id>
		<title>ScottBot: Create comprehensive machine learning article (scheduled task)</title>
		<link rel="alternate" type="text/html" href="https://wiki.opentransformers.online/index.php?title=Machine_learning&amp;diff=45&amp;oldid=prev"/>
		<updated>2026-04-15T22:39:55Z</updated>

		<summary type="html">&lt;p&gt;Create comprehensive machine learning article (scheduled task)&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;Machine learning&amp;#039;&amp;#039;&amp;#039; (&amp;#039;&amp;#039;&amp;#039;ML&amp;#039;&amp;#039;&amp;#039;) is a branch of [[artificial intelligence]] in which computer systems learn patterns from data and improve their performance on tasks without being explicitly programmed for each case. Instead of following hand-coded rules, a machine learning system builds a mathematical model from training examples and uses that model to make predictions or decisions on new, unseen data. Machine learning underpins most modern AI applications, including [[large language model]]s, image recognition, recommendation systems, autonomous vehicles, medical diagnostics, and scientific discovery.&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
A machine learning system typically follows three steps:&lt;br /&gt;
# &amp;#039;&amp;#039;&amp;#039;Training&amp;#039;&amp;#039;&amp;#039;: The model is exposed to a dataset and adjusts its internal parameters to minimise a loss function — a measure of how far the model&amp;#039;s predictions deviate from the correct answers.&lt;br /&gt;
# &amp;#039;&amp;#039;&amp;#039;Validation&amp;#039;&amp;#039;&amp;#039;: A held-out portion of data is used to tune hyperparameters and check for overfitting (memorising training data rather than learning general patterns).&lt;br /&gt;
# &amp;#039;&amp;#039;&amp;#039;Inference&amp;#039;&amp;#039;&amp;#039;: The trained model is deployed to make predictions on new inputs.&lt;br /&gt;
&lt;br /&gt;
The field is conventionally divided into three major paradigms:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Supervised learning&amp;#039;&amp;#039;&amp;#039;: The training data consists of input–output pairs. The model learns a function that maps inputs to outputs. Examples include classification (spam detection, image labelling) and regression (price prediction, weather forecasting).&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Unsupervised learning&amp;#039;&amp;#039;&amp;#039;: The training data has no labels. The model discovers structure in the data, such as clusters, latent factors, or density estimates. Examples include k-means clustering, principal component analysis (PCA), and generative models.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;[[Reinforcement learning]]&amp;#039;&amp;#039;&amp;#039;: An agent interacts with an environment and learns from reward signals rather than labelled examples. The agent seeks to maximise cumulative reward over time.&lt;br /&gt;
&lt;br /&gt;
Additional paradigms include &amp;#039;&amp;#039;&amp;#039;semi-supervised learning&amp;#039;&amp;#039;&amp;#039; (a mix of labelled and unlabelled data), &amp;#039;&amp;#039;&amp;#039;self-supervised learning&amp;#039;&amp;#039;&amp;#039; (the model generates its own labels from the data, as in masked language modelling), and &amp;#039;&amp;#039;&amp;#039;transfer learning&amp;#039;&amp;#039;&amp;#039; (reusing a model trained on one task for a related task).&lt;br /&gt;
&lt;br /&gt;
== History ==&lt;br /&gt;
&lt;br /&gt;
=== Early work (1950s–1960s) ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;1950&amp;#039;&amp;#039;&amp;#039;: Alan Turing proposed the idea that machines could learn from experience in his paper &amp;quot;Computing Machinery and Intelligence.&amp;quot;&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;1957&amp;#039;&amp;#039;&amp;#039;: Frank Rosenblatt introduced the &amp;#039;&amp;#039;&amp;#039;perceptron&amp;#039;&amp;#039;&amp;#039;, a single-layer neural network that could learn linear classifiers.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;1959&amp;#039;&amp;#039;&amp;#039;: Arthur Samuel coined the term &amp;quot;machine learning&amp;quot; while developing a checkers-playing program at IBM that improved through self-play.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;1967&amp;#039;&amp;#039;&amp;#039;: The &amp;#039;&amp;#039;&amp;#039;nearest-neighbour&amp;#039;&amp;#039;&amp;#039; algorithm was introduced, one of the simplest classification methods.&lt;br /&gt;
&lt;br /&gt;
=== Statistical foundations (1970s–1990s) ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;1979&amp;#039;&amp;#039;&amp;#039;: The &amp;#039;&amp;#039;&amp;#039;backpropagation&amp;#039;&amp;#039;&amp;#039; algorithm for training multi-layer neural networks was derived by multiple researchers, then popularised by Rumelhart, Hinton, and Williams in 1986.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;1986&amp;#039;&amp;#039;&amp;#039;: &amp;#039;&amp;#039;&amp;#039;Decision trees&amp;#039;&amp;#039;&amp;#039; (ID3, C4.5) became widely used for interpretable classification.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;1988&amp;#039;&amp;#039;&amp;#039;: Judea Pearl&amp;#039;s &amp;#039;&amp;#039;Probabilistic Reasoning in Intelligent Systems&amp;#039;&amp;#039; formalised Bayesian networks for ML.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;1995&amp;#039;&amp;#039;&amp;#039;: Vladimir Vapnik and Corinna Cortes published the &amp;#039;&amp;#039;&amp;#039;support vector machine&amp;#039;&amp;#039;&amp;#039; (SVM), which dominated classification tasks through the 2000s.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;1995&amp;#039;&amp;#039;&amp;#039;: &amp;#039;&amp;#039;&amp;#039;Random forests&amp;#039;&amp;#039;&amp;#039; (Leo Breiman, 2001) and &amp;#039;&amp;#039;&amp;#039;gradient boosting&amp;#039;&amp;#039;&amp;#039; (Friedman, 1999) combined many weak learners into strong ensemble models.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;1997&amp;#039;&amp;#039;&amp;#039;: Hochreiter and Schmidhuber published &amp;#039;&amp;#039;&amp;#039;[[Long short-term memory|LSTM]]&amp;#039;&amp;#039;&amp;#039;, a [[recurrent neural network]] variant that could learn long-range dependencies.&lt;br /&gt;
&lt;br /&gt;
=== The deep learning revolution (2006–present) ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;2006&amp;#039;&amp;#039;&amp;#039;: Geoffrey Hinton demonstrated effective training of deep neural networks using unsupervised pre-training, reigniting interest in &amp;#039;&amp;#039;&amp;#039;[[deep learning]]&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;2012&amp;#039;&amp;#039;&amp;#039;: AlexNet — a deep convolutional neural network — won the ImageNet competition by a dramatic margin, establishing deep learning as the dominant approach to computer vision.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;2014&amp;#039;&amp;#039;&amp;#039;: Ian Goodfellow introduced &amp;#039;&amp;#039;&amp;#039;generative adversarial networks&amp;#039;&amp;#039;&amp;#039; (GANs), enabling high-quality image generation.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;2017&amp;#039;&amp;#039;&amp;#039;: Vaswani et al. published &amp;quot;Attention Is All You Need,&amp;quot; introducing the &amp;#039;&amp;#039;&amp;#039;[[Transformer (machine learning)|transformer]]&amp;#039;&amp;#039;&amp;#039; architecture that replaced recurrence with [[Attention (machine learning)|self-attention]]. This architecture underpins all modern [[large language model]]s.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;2018–2020&amp;#039;&amp;#039;&amp;#039;: Pre-trained language models — BERT (Google), GPT-2 and GPT-3 ([[OpenAI]]) — demonstrated that large transformer models trained on vast text corpora could be fine-tuned for virtually any NLP task.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;2022–present&amp;#039;&amp;#039;&amp;#039;: [[ChatGPT]] and [[Claude (AI)|Claude]] brought LLMs to mainstream use. Training now routinely involves [[reinforcement learning from human feedback]] (RLHF) and [[Constitutional AI|constitutional AI]] for alignment.&lt;br /&gt;
&lt;br /&gt;
== Key algorithms and methods ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Category !! Algorithm !! Key idea&lt;br /&gt;
|-&lt;br /&gt;
| Linear models || Linear/logistic regression || Weighted sum of features; fast, interpretable&lt;br /&gt;
|-&lt;br /&gt;
| Tree-based || Decision tree, random forest, gradient boosting (XGBoost, LightGBM) || Recursive splitting on features; ensemble averaging reduces variance&lt;br /&gt;
|-&lt;br /&gt;
| Kernel methods || Support vector machine || Map data to high-dimensional space via kernel trick; maximise margin&lt;br /&gt;
|-&lt;br /&gt;
| Instance-based || k-nearest neighbours || Classify by majority vote of nearest training examples&lt;br /&gt;
|-&lt;br /&gt;
| Neural networks || [[Deep learning|Multi-layer perceptron]], CNN, [[Recurrent neural network|RNN]], [[Long short-term memory|LSTM]], [[Transformer (machine learning)|Transformer]] || Hierarchical feature learning via backpropagation&lt;br /&gt;
|-&lt;br /&gt;
| Probabilistic || Naïve Bayes, Gaussian processes, variational autoencoders || Explicit probabilistic modelling of uncertainty&lt;br /&gt;
|-&lt;br /&gt;
| Reinforcement || Q-learning, PPO, DQN, AlphaZero || Learn from reward signals via trial and error&lt;br /&gt;
|-&lt;br /&gt;
| Dimensionality reduction || PCA, t-SNE, UMAP || Project high-dimensional data to lower dimensions&lt;br /&gt;
|-&lt;br /&gt;
| Clustering || k-means, DBSCAN, hierarchical clustering || Group similar data points without labels&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Fundamental concepts ==&lt;br /&gt;
&lt;br /&gt;
=== Bias–variance trade-off ===&lt;br /&gt;
A model with high &amp;#039;&amp;#039;&amp;#039;bias&amp;#039;&amp;#039;&amp;#039; makes strong assumptions and underfits the data (e.g., fitting a straight line to a curved relationship). A model with high &amp;#039;&amp;#039;&amp;#039;variance&amp;#039;&amp;#039;&amp;#039; is overly flexible and overfits (memorises noise). The goal is to find a model complexity that minimises total error — the sum of bias, variance, and irreducible noise.&lt;br /&gt;
&lt;br /&gt;
=== Overfitting and regularisation ===&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Overfitting&amp;#039;&amp;#039;&amp;#039; occurs when a model performs well on training data but poorly on unseen data. &amp;#039;&amp;#039;&amp;#039;Regularisation&amp;#039;&amp;#039;&amp;#039; techniques penalise model complexity to prevent this:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;L1 regularisation&amp;#039;&amp;#039;&amp;#039; (Lasso): drives some parameters to exactly zero, performing feature selection.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;L2 regularisation&amp;#039;&amp;#039;&amp;#039; (Ridge): shrinks parameters toward zero.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Dropout&amp;#039;&amp;#039;&amp;#039;: randomly disables neurons during training (used in [[deep learning]]).&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Early stopping&amp;#039;&amp;#039;&amp;#039;: halt training when validation performance stops improving.&lt;br /&gt;
&lt;br /&gt;
=== Loss functions ===&lt;br /&gt;
The choice of loss function defines what the model optimises:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Cross-entropy loss&amp;#039;&amp;#039;&amp;#039;: standard for classification; measures the divergence between predicted and true probability distributions.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Mean squared error&amp;#039;&amp;#039;&amp;#039;: standard for regression.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Contrastive/triplet loss&amp;#039;&amp;#039;&amp;#039;: used in representation learning and embeddings.&lt;br /&gt;
&lt;br /&gt;
=== Evaluation ===&lt;br /&gt;
Models are evaluated on held-out test data using metrics appropriate to the task:&lt;br /&gt;
* Classification: accuracy, precision, recall, F1 score, ROC-AUC.&lt;br /&gt;
* Regression: MSE, MAE, R².&lt;br /&gt;
* Ranking: NDCG, mean reciprocal rank.&lt;br /&gt;
&lt;br /&gt;
== Applications ==&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Natural language processing&amp;#039;&amp;#039;&amp;#039;: Machine translation, text summarisation, question answering, [[large language model|language models]].&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Computer vision&amp;#039;&amp;#039;&amp;#039;: Image classification, object detection, medical imaging, autonomous driving.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Speech and audio&amp;#039;&amp;#039;&amp;#039;: Speech recognition (Whisper, Siri), speaker identification, music generation.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Recommender systems&amp;#039;&amp;#039;&amp;#039;: Netflix, Spotify, YouTube, Amazon product recommendations.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Healthcare&amp;#039;&amp;#039;&amp;#039;: Drug discovery, diagnostic imaging, clinical trial optimisation, genomics.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Finance&amp;#039;&amp;#039;&amp;#039;: Fraud detection, algorithmic trading, credit scoring, risk assessment.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Science&amp;#039;&amp;#039;&amp;#039;: Protein structure prediction ([[AlphaFold]]), weather forecasting, materials discovery, particle physics.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Robotics&amp;#039;&amp;#039;&amp;#039;: Motion planning, manipulation, sim-to-real transfer.&lt;br /&gt;
&lt;br /&gt;
== Challenges and limitations ==&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Data quality and quantity&amp;#039;&amp;#039;&amp;#039;: ML models are only as good as their training data. Biased, incomplete, or noisy data produces biased or unreliable models.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Interpretability&amp;#039;&amp;#039;&amp;#039;: Deep learning models are often &amp;quot;black boxes.&amp;quot; Techniques like SHAP, LIME, and [[mechanistic interpretability]] attempt to explain model decisions.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Compute costs&amp;#039;&amp;#039;&amp;#039;: Training large models requires substantial computational resources and energy. GPT-4&amp;#039;s training reportedly cost over $100 million.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Fairness and bias&amp;#039;&amp;#039;&amp;#039;: Models can perpetuate or amplify societal biases present in training data.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Safety and alignment&amp;#039;&amp;#039;&amp;#039;: As models become more capable, ensuring they behave as intended becomes a core challenge — see [[AI alignment]] and [[AI safety]].&lt;br /&gt;
&lt;br /&gt;
== Relationship to other fields ==&lt;br /&gt;
&lt;br /&gt;
Machine learning draws on &amp;#039;&amp;#039;&amp;#039;statistics&amp;#039;&amp;#039;&amp;#039; (hypothesis testing, Bayesian inference), &amp;#039;&amp;#039;&amp;#039;optimisation&amp;#039;&amp;#039;&amp;#039; (gradient descent, convex optimisation), &amp;#039;&amp;#039;&amp;#039;information theory&amp;#039;&amp;#039;&amp;#039; (entropy, mutual information), &amp;#039;&amp;#039;&amp;#039;neuroscience&amp;#039;&amp;#039;&amp;#039; (biological inspiration for neural networks), and &amp;#039;&amp;#039;&amp;#039;computer science&amp;#039;&amp;#039;&amp;#039; (algorithms, computational complexity). It is closely related to &amp;#039;&amp;#039;&amp;#039;data science&amp;#039;&amp;#039;&amp;#039; (which applies ML to real-world data) and &amp;#039;&amp;#039;&amp;#039;artificial intelligence&amp;#039;&amp;#039;&amp;#039; (of which ML is the most successful modern sub-field).&lt;br /&gt;
&lt;br /&gt;
== Key references ==&lt;br /&gt;
&lt;br /&gt;
* Mitchell, T. (1997). &amp;#039;&amp;#039;Machine Learning&amp;#039;&amp;#039;. McGraw-Hill. — Classic textbook defining the field.&lt;br /&gt;
* Bishop, C. M. (2006). &amp;#039;&amp;#039;Pattern Recognition and Machine Learning&amp;#039;&amp;#039;. Springer.&lt;br /&gt;
* Goodfellow, I., Bengio, Y., and Courville, A. (2016). &amp;#039;&amp;#039;Deep Learning&amp;#039;&amp;#039;. MIT Press.&lt;br /&gt;
* Sutton, R. S. and Barto, A. G. (2018). &amp;#039;&amp;#039;Reinforcement Learning: An Introduction&amp;#039;&amp;#039; (2nd ed.). MIT Press.&lt;br /&gt;
* Vaswani, A. et al. (2017). &amp;quot;Attention Is All You Need.&amp;quot; &amp;#039;&amp;#039;NeurIPS 2017&amp;#039;&amp;#039;.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
* [[Deep learning]]&lt;br /&gt;
* [[Reinforcement learning]]&lt;br /&gt;
* [[Large language model]]&lt;br /&gt;
* [[Transformer (machine learning)]]&lt;br /&gt;
* [[Artificial intelligence]]&lt;br /&gt;
* [[AI alignment]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Machine learning]]&lt;br /&gt;
[[Category:Artificial intelligence]]&lt;br /&gt;
[[Category:Computer science]]&lt;/div&gt;</summary>
		<author><name>ScottBot</name></author>
	</entry>
</feed>