DSPython Logo DSPython

Naive Bayes Classifier

Learn a fast and powerful classification algorithm based on probability and Bayes' Theorem.

Machine Learning Fundamental 45 min

Topic 1: What is Naive Bayes?

The Naive Bayes classifier is a supervised learning algorithm used for classification. It's based on Bayes' Theorem from probability theory. It is "naive" because it makes a strong, simplifying assumption about the data.



It's most famous for being the algorithm behind classic spam filters, but it's also used in medical diagnosis, text classification, and more.

Bayes' Theorem: The Core Idea

Bayes' Theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event. In Machine Learning, it helps us "flip" a probability question:

  • We want to know P(Class | Data). (The probability of a class, given the data.)
  • It's easier to calculate P(Data | Class). (The probability of seeing this data, if we assume it was a certain class.)

Bayes' Theorem gives us the formula to find the first from the second:

Formula 1: P(A | B) = [ P(B | A) × P(A) ] / P(B)

Or, for our classifier:

Formula 2: P(Class | Features) = [ P(Features | Class) × P(Class) ] / P(Features)


Topic 2: The "Naive" Assumption

Calculating P(Features | Class) is extremely difficult. It means, "What is the probability of seeing 50 specific words in this exact order, given this is spam?"

To make this possible, Naive Bayes makes one big "naive" assumption: All features are conditionally independent.

This means the model assumes that the presence of one feature does not affect the presence of another, given the class.

Analogy: The Spam Filter

A new email contains the words "prince" and "viagra".

  • A non-naive model would need to know the probability of "prince" and "viagra" appearing together in a spam email.
  • A naive model just assumes they are unrelated. It calculates:
    P("prince" | Spam)
    P("viagra" | Spam)
    ...and then multiplies them together:

    P("prince" and "viagra" | Spam) = P("prince" | Spam) × P("viagra" | Spam)

This assumption is almost always false in the real world! (Words are not independent). But it simplifies the math so much that the model becomes incredibly fast—and it often works surprisingly well anyway.


Topic 3: How it "Learns" (By Counting)

The "training" process for Naive Bayes is just fast, simple counting. To classify a new email with "hello" and "prince":

  1. Calculate "Spam" Score:
    • P(Spam): What percentage of all emails in the training data were Spam? (e.g., 20%)
    • P("hello" | Spam): Of all Spam emails, what percentage contained "hello"? (e.g., 5%)
    • P("prince" | Spam): Of all Spam emails, what percentage contained "prince"? (e.g., 15%)
    • Final Score = 0.20 × 0.05 × 0.15 = 0.0015
  2. Calculate "Not Spam" Score:
    • P(Not Spam): What percentage of all emails in the training data were not Spam? (e.g., 80%)
    • P("hello" | Not Spam): Of all non-Spam emails, what percentage contained "hello"? (e.g., 70%)
    • P("prince" | Not Spam): Of all non-Spam emails, what percentage contained "prince"? (e.g., 0.1%)
    • Final Score = 0.80 × 0.70 × 0.001 = 0.00056

The "Spam" score (0.0015) is higher than the "Not Spam" score (0.00056), so the model classifies the email as Spam.


Topic 4: Types of Naive Bayes in `sklearn`

The math changes slightly depending on your feature data, so `sklearn` provides three main types:

  • 1. `GaussianNB` (Gaussian Naive Bayes):
    Use this for **continuous numerical features** (like 'sepal length', 'age', 'price').
    It "learns" by assuming the features for each class follow a Gaussian (normal) distribution, and it simply calculates the mean and standard deviation of each feature for each class.
  • 2. `MultinomialNB` (Multinomial Naive Bayes):
    Use this for **discrete counts**. This is the classic model for text classification, where the features are "word counts" (e.g., "prince" appeared 2 times, "hello" appeared 1 time).
  • 3. `BernoulliNB` (Bernoulli Naive Bayes):
    Use this for **binary features** (e.g., 0/1, True/False, Yes/No). In text, this would mean "Does the word 'prince' appear in this document?" (Yes/No), not "How many times?".