CS198: Introduction to Machine Learning¶
Based on the first chapter from the CS198 UCB course by Machine Learning at Berkeley.
What is Machine Learning?¶
Machine Learning (ML) is a subfield of Artificial Intelligence (AI) focused on creating algorithms that learn patterns from data. Instead of writing explicit rules for every scenario, machine learning systems discover these rules automatically by analyzing large amounts of data.
In traditional programming, a human developer provides:
- Input data + Rules → Program output
In machine learning, the developer provides:
Input data + Desired outputs → Learns a model (or rules)
This approach allows solutions to be scaled and adapted to complex problems where rule-based systems are too cumbersome or impossible to specify by hand.
How to learn a value in ML?
Why Machine Learning?¶
- Complexity: Certain tasks (like recognizing objects in images or translating languages) are too complex for fixed, rule-based systems.
- Adaptability: ML systems can adapt to new data without exhaustive manual updates.
- Scalability: With the explosion of data (text, images, audio, sensor data), ML can uncover patterns that are not easily spotted by humans.
- Automation: ML can automate tasks—like predictions, recommendations, and even creative tasks—more efficiently or accurately than purely manual methods.
Types of Machine Learning¶
Supervised Learning¶
Supervised learning deals with labeled data, meaning each training example comes with an associated “correct answer” (often called a “label” or “target”).
- Goal: Learn a function ( f(x) ) that maps an input ( x ) (e.g., an image) to an output ( y ) (e.g., a category or numerical value).
- Examples:
- Classification: Predicting if an email is spam or not (labels: "spam" or "not spam").
- Regression: Predicting house prices (label: numerical value representing price).
Unsupervised Learning¶
In unsupervised learning, we do not have labeled data. The system tries to learn the underlying structure or distribution in the data.
- Goal: Discover patterns or groupings without explicitly provided labels or targets.
- Examples:
- Clustering: Grouping customers by similar purchasing behavior.
- Dimensionality Reduction: Compressing data (e.g., using Principal Component Analysis) to visualize or reduce noise.
Reinforcement Learning¶
Reinforcement Learning (RL) involves an agent learning to perform actions in an environment so as to maximize some notion of cumulative reward.
- Goal: Determine the optimal sequence of actions under a reward function.
- Examples:
- Game AI (e.g., learning to play chess or Go).
- Robotics (e.g., learning how to navigate around obstacles).
Vocab¶
Function
/Model
- These terms are used interchangeably
- These refer to the function template… the template we have chosen to use
Parameter
/Weight
(andBias
)- Weights (and biases) are terms used to denote parameters in some ML models
Hyperparameter
- This is some non-learnt parameter like: model size, model class, training procedure, etc that helps to specify our function
- Again, we choose these ourselves before we start learning
Loss Function
/Cost Function
/Risk Function
- We haven’t introduced these yet, but these will come up; just note that they are the same
“Feature”
- This can refer to bits of our data (either inputs or learnt representations of inputs)
Key Components of a Machine Learning System¶
- Data: High-quality data is essential. This includes:
- Feature representation (how we transform raw data like text or images into numeric vectors)
- Enough samples to capture the variability of the problem
- Model: A mathematical representation of the relationship between input features and outputs. Examples include:
- Linear models (e.g., Linear Regression, Logistic Regression)
- Decision trees, Random Forests
- Neural networks (Deep Learning)
- Loss Function: A function that measures how well the model’s predictions match the actual labels (in supervised learning). Examples:
- Mean Squared Error (regression problems)
- Cross-entropy loss (classification problems)
- Optimization Algorithm: A method for adjusting the model’s parameters to minimize the loss function. Examples:
- Gradient Descent
- Stochastic Gradient Descent
- Adam, RMSProp
- Evaluation Metrics: Quantities to measure model performance:
- Accuracy, Precision, Recall, F1-score (for classification)
- Mean Absolute Error, R-squared (for regression)
The Machine Learning Process / Pipeline¶
A common blueprint for an ML workflow:
Define the Problem
- Clearly articulate the task (e.g., predict whether a credit card transaction is fraudulent).
- Identify available data and constraints (time, computational resources, etc.).
Collect and Prepare Data
- Gather datasets relevant to the problem.
- Clean and preprocess the data (handle missing values, outliers).
- Split data into training, validation, and test sets (often 70/15/15 or 80/10/10, etc.).
Choose a Model and a Learning Algorithm
- Start with a baseline model (e.g., a simple linear classifier).
- Choose more complex models if necessary (trees, ensembles, neural networks).
Train the Model
- Feed training data to the model.
- Use an optimization algorithm to update model parameters.
Evaluate the Model
- Use validation or test data (unseen during training) to measure performance.
- Calculate appropriate metrics (accuracy for classification, mean squared error for regression, etc.).
Iterate and Improve
- Analyze errors and refine the model (feature engineering, hyperparameter tuning).
- Repeat the process until the model is sufficiently accurate or resources are exhausted.
Deployment
- Integrate the final model into a production environment.
- Monitor performance over time; models may degrade as data evolves.
Generalization: Overfitting, Underfitting, Bias, and Variance¶
A key goal in machine learning is generalization—the ability of a model to perform well on data it has not seen before. Understanding overfitting
, underfitting
, and the bias-variance trade-off
helps us design models that generalize better.
Overfitting¶
- Definition: Occurs when a model learns not only the true patterns in the training data but also noise or random fluctuations.
- Symptoms:
- The model performs extremely well on training data but poorly on new, unseen data.
- High training accuracy but low test accuracy.
- Causes:
- Model is too complex (too many parameters relative to the amount of training data).
- Inadequate regularization or constraints.
- Solutions:
- Regularization techniques (L1, L2, dropout in neural networks).
- Early stopping to prevent over-training.
- Cross-validation to choose model complexity.
- Collect more data (if possible).
Underfitting¶
- Definition: Occurs when a model is too simple to capture the underlying trend of the data.
- Symptoms:
- The model performs poorly on both training and test data.
- Low training accuracy and low test accuracy.
- Causes:
- Model is not complex enough to learn the real relationships.
- Insufficient training or not enough relevant features.
- Solutions:
- Increase model complexity (e.g., more parameters or deeper networks).
- Feature engineering to provide the model with more meaningful inputs.
- Reduce regularization if it is too strong.
Bias and Variance¶
The bias-variance trade-off helps explain why models overfit or underfit:
Bias:
- The tendency of a model to consistently learn the wrong relationship by not taking into account all the features/patterns in the data.
- High-bias models are typically too simple (leading to underfitting).
Variance:
- The tendency of a model to learn random noise or fluctuations in the training data rather than the true underlying pattern.
- High-variance models are typically too complex (leading to overfitting).
A good model strikes a balance between bias and variance—it is complex enough to capture important patterns but not so complex that it fits noise.
Common Applications¶
- Image Recognition (e.g., classifying dog vs. cat images)
- Natural Language Processing (e.g., text classification, chatbots, language translation)
- Time Series Forecasting (e.g., financial markets, weather prediction)
- Recommendation Systems (e.g., movie/music recommendations)
- Anomaly Detection (e.g., fraud detection, network intrusion detection)
Summary¶
In this introductory module, we covered:
- Core definition of Machine Learning and how it differs from traditional programming.
- Motivations for using ML in various real-world scenarios.
- The three main types of ML: supervised, unsupervised, and reinforcement learning.
- The key components (data, model, loss function, optimizer, metrics) that form any ML system.
- A standardized workflow for building ML solutions (defining problem → collecting data → model training → evaluation → iteration → deployment).
This foundation sets the stage for deeper exploration of topics such as linear models, decision trees, neural networks, and more advanced ML paradigms.
Further Reading¶
- CS198: Machine Learning at Berkeley YouTube Playlist – Official playlist for deeper lectures.
- Stanford CS229 Notes – Additional theoretical foundations and mathematics behind ML.
- Andrew Ng’s Machine Learning Course on Coursera – A classic, beginner-friendly introduction.
- ISL (Introduction to Statistical Learning) – Great for understanding the statistical concepts behind ML.
Next Steps
In the next chapter, we’ll dive into the basics of linear models and how to train them (e.g., Linear/Logistic Regression), which form the building blocks of many more advanced ML algorithms.
Disclaimer: This tutorial is a summary and adaptation of the concepts presented in the first chapter of CS198 UCB (Machine Learning at Berkeley). For the most accurate and detailed information, please refer to the official course materials and videos.