Build a TensorFlow neural network

An example based tutorial on how to build Tensorflow neural network for better accuracy.

You can query a model directly and test the results returned when using different parameter values with the Cloud console, or by calling the Vertex AI API directly.

Prompt Freeform

Teach me how to use TensorFlow to create an accurate neural network with some code examples. Please include insights into how the accuracy of the model is improved by adding layers to it.

Response

Let's dive into creating accurate neural networks with TensorFlow! We'll cover the fundamentals, practical code examples, and how adding layers impacts accuracy.

Understanding Neural Networks and Accuracy

A neural network is a computational model inspired by the structure and function of the human brain. It consists of interconnected nodes (neurons) organized in layers:

  • Input Layer: Receives the raw data.
  • Hidden Layers: Perform complex computations and feature extraction.
  • Output Layer: Produces the final prediction.

Accuracy in a neural network refers to how often its predictions match the actual target values. It's a crucial metric for evaluating a model's performance.

Building an Accurate Neural Network with TensorFlow

The key to building an accurate neural network lies in a combination of:

  1. Choosing the Right Architecture: How many layers, neurons per layer, and types of layers.
  2. Effective Data Preprocessing: Cleaning, scaling, and transforming your data.
  3. Appropriate Activation Functions: Introducing non-linearity.
  4. Choosing the Right Loss Function and Optimizer: Guiding the learning process.
  5. Regularization Techniques: Preventing overfitting.
  6. Hyperparameter Tuning: Finding the optimal settings.

Core TensorFlow Components

We'll primarily use tensorflow.keras, a high-level API for building and training neural networks.

  • tf.keras.models.Sequential: A linear stack of layers. Great for simple, sequential models.
  • tf.keras.layers: Contains various types of layers (e.g., Dense, Conv2D, MaxPooling2D, Flatten, Dropout).
  • tf.keras.optimizers: Algorithms that update the model's weights during training (e.g., Adam, SGD).
  • tf.keras.losses: Functions that measure the error between predictions and true values (e.g., categorical_crossentropy, binary_crossentropy, mean_squared_error).
  • model.compile(): Configures the model for training by specifying the optimizer, loss function, and metrics.
  • model.fit(): Trains the model on the given data.
  • model.evaluate(): Evaluates the trained model on unseen data.
  • model.predict(): Generates predictions from the trained model.

Example 1: A Simple Neural Network for Image Classification (MNIST)

Let's start with a classic example: classifying handwritten digits from the MNIST dataset.

1. Setup and Data Loading

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess the data
# Normalize pixel values to be between 0 and 1
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# One-hot encode the labels
# For example, 0 becomes [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

print("x_train shape:", x_train.shape)
print("y_train shape:", y_train.shape)

Explanation:

  • We load the MNIST dataset, which contains 28x28 grayscale images of handwritten digits.
  • Normalization: Dividing pixel values by 255 scales them to the range [0, 1]. This helps the model learn more efficiently, as very large or very small input values can cause numerical instability.
  • One-hot Encoding: Since this is a multi-class classification problem (10 digits), we convert the integer labels (0-9) into a binary vector representation. This is required by many loss functions like categorical_crossentropy.

2. Building a Simple Model (One Hidden Layer)

# Define the model
model_simple = Sequential([
    # Flatten the 28x28 image into a 1D array of 784 pixels
    Flatten(input_shape=(28, 28)),
    # A dense hidden layer with 128 neurons and ReLU activation
    Dense(128, activation='relu'),
    # The output layer with 10 neurons (one for each digit) and softmax activation
    # Softmax ensures probabilities sum to 1 across all classes
    Dense(10, activation='softmax')
])

# Compile the model
model_simple.compile(optimizer='adam',
                     loss='categorical_crossentropy',
                     metrics=['accuracy'])

# Display the model's architecture
model_simple.summary()

Explanation:

  • Sequential: We start with a Sequential model, meaning layers are added one after another.
  • Flatten(input_shape=(28, 28)): The input to a Dense layer must be a 1D vector. This layer takes the 28x28 image and flattens it into a 784-element vector. input_shape is crucial for the first layer.
  • Dense(128, activation='relu'): This is a fully connected (dense) hidden layer.
    • 128: The number of neurons in this layer. More neurons can potentially learn more complex features.
    • activation='relu': The Rectified Linear Unit (ReLU) activation function. It introduces non-linearity, allowing the network to learn more than just linear relationships. ReLU outputs the input if it's positive, and zero otherwise (max(0, x)).
  • Dense(10, activation='softmax'): This is the output layer.
    • 10: There are 10 possible digits (0-9), so we need 10 output neurons.
    • activation='softmax': Softmax is used for multi-class classification. It converts the raw output scores into probabilities for each class, ensuring that the sum of probabilities for all classes is 1.
  • model_simple.compile(...):
    • optimizer='adam': Adam is a popular and effective optimization algorithm that adapts the learning rate during training.
    • loss='categorical_crossentropy': This loss function is suitable for multi-class classification problems with one-hot encoded labels. It measures the difference between the predicted probability distribution and the true distribution.
    • metrics=['accuracy']: We want to track the accuracy of the model during training and evaluation.

3. Training the Model

# Train the model
history_simple = model_simple.fit(x_train, y_train,
                                  epochs=10,       # Number of times to iterate over the entire dataset
                                  batch_size=32,   # Number of samples per gradient update
                                  validation_split=0.2) # Use 20% of training data for validation

# Evaluate the model on the test set
loss_simple, accuracy_simple = model_simple.evaluate(x_test, y_test, verbose=0)

print(f"\nSimple Model Accuracy: {accuracy_simple*100:.2f}%")

Explanation:

  • epochs=10: The model will go through the entire training dataset 10 times. More epochs can lead to better learning, but also risk overfitting.
  • batch_size=32: The model updates its weights after processing every 32 samples. Smaller batch sizes can introduce more noise but sometimes lead to better generalization. Larger batch sizes can speed up training but might get stuck in local minima.
  • validation_split=0.2: This is a crucial step for monitoring performance without touching the final test set. During training, 20% of the x_train and y_train data will be held out and used to evaluate the model's performance after each epoch. This helps detect overfitting.
  • model_simple.evaluate(...): We test the trained model on the completely unseen x_test and y_test data to get a final performance measure.

How Adding Layers Improves Accuracy: The Power of Hierarchy

Now, let's see what happens when we add more hidden layers. This is where the concept of deep learning truly shines.

Insight: Adding more layers allows the neural network to learn a hierarchical representation of the data. Each layer can learn increasingly abstract and complex features from the output of the previous layer.

Imagine classifying images:

  1. First Hidden Layer: Might learn to detect simple edges, corners, and basic shapes (e.g., horizontal lines, vertical lines).
  2. Second Hidden Layer: Can combine these simple features to recognize more complex patterns like curves, circles, or specific textures.
  3. Third Hidden Layer: Could start recognizing parts of objects, like eyes, wheels, or ears.
  4. Deeper Layers: Can combine these parts to identify entire objects (e.g., a face, a car).

This hierarchical learning allows the network to build a rich understanding of the input data, leading to higher accuracy, especially for complex tasks like image recognition, natural language processing, and speech recognition.

Example 2: A Deeper Model (Two Hidden Layers)

Let's add another hidden layer to our MNIST model.

# Define a deeper model
model_deep = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='relu'),
    # Adding a second hidden layer
    Dense(64, activation='relu'), # Another hidden layer with 64 neurons
    Dense(10, activation='softmax')
])

# Compile the model
model_deep.compile(optimizer='adam',
                   loss='categorical_crossentropy',
                   metrics=['accuracy'])

# Display the model's architecture
model_deep.summary()

# Train the deeper model
history_deep = model_deep.fit(x_train, y_train,
                              epochs=10,
                              batch_size=32,
                              validation_split=0.2)

# Evaluate the deeper model
loss_deep, accuracy_deep = model_deep.evaluate(x_test, y_test, verbose=0)

print(f"\nDeeper Model Accuracy: {accuracy_deep*100:.2f}%")

Observation:

You will likely observe that model_deep achieves a slightly higher accuracy on the test set compared to model_simple. This is because the additional hidden layer provides the network with more capacity to learn the intricate patterns present in the MNIST digits. It can build more complex feature representations.

It's helpful to visualize the training and validation accuracy/loss over epochs.

def plot_history(history, title):
    plt.figure(figsize=(12, 4))

    # Plot accuracy
    plt.subplot(1, 2, 1)
    plt.plot(history.history['accuracy'], label='Train Accuracy')
    plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
    plt.title(f'{title} - Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend()

    # Plot loss
    plt.subplot(1, 2, 2)
    plt.plot(history.history['loss'], label='Train Loss')
    plt.plot(history.history['val_loss'], label='Validation Loss')
    plt.title(f'{title} - Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()

    plt.tight_layout()
    plt.show()

# Plot the histories
plot_history(history_simple, "Simple Model")
plot_history(history_deep, "Deeper Model")

Interpreting the Plots:

  • Accuracy: You want to see both training and validation accuracy increasing. If validation accuracy starts to plateau or decrease while training accuracy continues to rise, it's a sign of overfitting.
  • Loss: You want to see both training and validation loss decreasing. Similar to accuracy, if validation loss starts increasing, it indicates overfitting.

Other Factors Influencing Accuracy

While adding layers is a primary way to increase representational power, other factors are critical for accuracy:

  1. Number of Neurons per Layer:

    • Too few neurons: The model might not have enough capacity to learn the underlying patterns (underfitting).
    • Too many neurons: The model might learn the training data too well, including noise, leading to overfitting and poor generalization.
  2. Activation Functions:

    • ReLU: Generally a good choice for hidden layers.
    • Sigmoid/Tanh: Can be used, but might suffer from vanishing gradients in very deep networks.
    • Softmax: Essential for multi-class classification output layers.
    • Sigmoid: Used for binary classification output layers.
  3. Optimizer:

    • Adam, RMSprop, Adagrad: Often perform better than basic SGD due to adaptive learning rates.
    • SGD with Momentum: Can also be very effective.
  4. Regularization Techniques: When overfitting becomes a problem (validation accuracy drops while training accuracy keeps rising):

    • Dropout (tf.keras.layers.Dropout(rate)): Randomly deactivates a fraction of neurons during training. This forces the network to learn redundant representations and prevents over-reliance on specific neurons.
    • L1/L2 Regularization (Weight Decay): Adds a penalty to the loss function based on the magnitude of the weights. This discourages large weights, leading to simpler models.
  5. Dataset Size and Quality: A larger, cleaner, and more representative dataset is fundamental for building accurate models. Garbage in, garbage out!

  6. Data Augmentation: For image data, creating new training samples by applying transformations (rotation, flipping, zooming) can significantly improve robustness and accuracy.


Example 3: A More Robust Model with Dropout

Let's add dropout to our deeper model to combat potential overfitting.

# Define a deeper model with Dropout
model_dropout = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='relu'),
    # Add dropout after the first hidden layer
    tf.keras.layers.Dropout(0.2), # 20% of neurons will be randomly dropped
    Dense(64, activation='relu'),
    # Add dropout after the second hidden layer
    tf.keras.layers.Dropout(0.2),
    Dense(10, activation='softmax')
])

# Compile the model
model_dropout.compile(optimizer='adam',
                      loss='categorical_crossentropy',
                      metrics=['accuracy'])

# Display the model's architecture
model_dropout.summary()

# Train the model with dropout
history_dropout = model_dropout.fit(x_train, y_train,
                                    epochs=10,
                                    batch_size=32,
                                    validation_split=0.2)

# Evaluate the model with dropout
loss_dropout, accuracy_dropout = model_dropout.evaluate(x_test, y_test, verbose=0)

print(f"\nModel with Dropout Accuracy: {accuracy_dropout*100:.2f}%")

# Plot the history for comparison
plot_history(history_dropout, "Model with Dropout")

Observation:

You might see slightly lower training accuracy with dropout compared to the previous models, but the validation accuracy could be higher or more stable. This indicates that dropout is helping the model generalize better to unseen data by reducing overfitting.


Beyond Dense Layers: Convolutional Neural Networks (CNNs) for Images

For image data, Dense layers are not the most efficient. Convolutional Neural Networks (CNNs) are specifically designed to exploit the spatial structure of images. They use:

  • Conv2D: Convolutional layers that apply learnable filters to detect patterns.
  • MaxPooling2D: Downsamples the feature maps, reducing computational cost and making the model more robust to small variations in image position.
  • Flatten: To transition from convolutional layers to dense layers for classification.

Let's sketch out a CNN architecture (without full training code for brevity, but concept is key):

# Example of a CNN architecture
model_cnn = Sequential([
    # Convolutional Layer 1
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)), # 32 filters, 3x3 kernel, 1 input channel (grayscale)
    tf.keras.layers.MaxPooling2D((2, 2)), # 2x2 pooling

    # Convolutional Layer 2
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),

    # Flatten for dense layers
    tf.keras.layers.Flatten(),

    # Dense layers for classification
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile and train this model similarly...
# model_cnn.compile(...)
# model_cnn.fit(...)

How CNNs improve accuracy for images:

  • Feature Extraction: Convolutional layers automatically learn relevant features (edges, textures, shapes) from the raw pixel data.
  • Spatial Hierarchies: Like deep dense networks, CNNs learn spatial hierarchies of features, from simple to complex.
  • Parameter Sharing: Filters are reused across the entire image, significantly reducing the number of parameters compared to a fully connected network on raw pixels, which helps prevent overfitting and speeds up training.
  • Translation Invariance: Pooling layers make the network less sensitive to the exact position of features in the image.

CNNs typically achieve much higher accuracies on image-related tasks than simple feed-forward dense networks.


Summary and Next Steps

  1. Start Simple: Begin with a Sequential model with a few Dense layers.
  2. Understand Data: Preprocess your data correctly (normalization, encoding).
  3. Choose Loss and Optimizer: Select appropriate ones for your task.
  4. Monitor Validation: Always split your data and monitor validation performance to detect overfitting.
  5. Add Layers/Neurons: Increase model complexity to learn more intricate patterns, but be mindful of overfitting.
  6. Regularize: Use techniques like Dropout if overfitting occurs.
  7. Explore Specialized Layers: For specific data types (images, text), use specialized layers like Conv2D or Embedding.
  8. Hyperparameter Tuning: Experiment with learning rates, batch sizes, the number of layers, neurons, and dropout rates. Tools like KerasTuner can automate this.

Building accurate neural networks is an iterative process of experimentation, analysis, and refinement. TensorFlow, with its flexible APIs, provides the tools to explore and build powerful models. Keep experimenting, and you'll gradually gain intuition about what works best for different problems!

Model: gemini-2.5-pro
Temperature: 0.2
Max output tokens: 8192
TopK: 40.0
TopP: 0.95