[Learn about machine learning from the Keras] — 14.optimizer And learning_rate

Czxdas
3 min readSep 22, 2023

--

This section will discuss the initial setting methods, operation, and impact of optimizer and learning rate.

Example from the previous section:

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.models import Model

class SimpleDense(layers.Layer):

def __init__(self, units=32):
super(SimpleDense, self).__init__()
self.units = units

def build(self, input_shape):
self.w = self.add_weight(shape=(input_shape[-1], self.units),
initializer='random_normal',
trainable=True)
self.b = self.add_weight(shape=(self.units,),
initializer='random_normal',
trainable=True)

def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b

from keras.models import Sequential
model = Sequential([
SimpleDense(512),
layers.Dense(10, activation="softmax")
])

import tensorflow.keras.optimizers as optimizers
model.compile( optimizer= optimizers.get( {"class_name": "rmsprop",
"config": {"learning_rate": 0.0001} } ) ,
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])

model.build(input_shape=(None,784))

from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

model.fit(train_images, train_labels, epochs=1, batch_size=128)

There are three ways to set optimizer:

(1)
Here, set the optimizer in the model’s compiler, using the name “rmsprop” as a parameter. Keras will find the module with the corresponding name, but the module will use the default value of 0.001.

Refer to the notes on this parameter on the keras official website:
learning_rate: Initial value for the learning rate: either a floating point value, or a tf.keras.optimizers.schedules.LearningRateSchedule instance. Defaults to 0.001.

The above figure is the process of obtaining entities by specifying names. It mainly relies on the keras.engine.training.Model._get_optimizer function. After being set to dict internally, the corresponding entities are obtained through keras.optimizers.deserialize and returned. Entity to model. The corresponding method is to build a corresponding table, as mentioned in the Compiler chapter, the content is as follows:

“adadelta”: keras.optimizers.adadelta.Adadelta
“adagrad”: ‘keras.optimizers.adagrad.Adagrad
“adam”: ‘keras.optimizers.adam.Adam
“adamax”: keras.optimizers.adamax.Adamax
“experimentaladadelta”: keras.optimizers.adadelta.Adadelta
“experimentaladagrad”: keras.optimizers.adagrad.Adagrad
“experimentaladam”: keras.optimizers.adam.Adam
“experimentalsgd”: keras.optimizers.sgd.SGD
“nadam”: keras.optimizers.nadam.Nadam
“rmsprop”: keras.optimizers.rmsprop.RMSprop
“sgd”: keras.optimizers.sgd.SGD
“ftrl”: keras.optimizers.Ftrl
“lossscaleoptimizer”: keras.mixed_precision.loss_scale_optimizer.LossScaleOptimizerV3
“lossscaleoptimizerv3”: keras.mixed_precision.loss_scale_optimizer.LossScaleOptimizerV3
“lossscaleoptimizerv1”:keras.mixed_precision. loss_scale_optimizer.LossScaleOptimizer

(2)
optimizers.get( {“class_name”: “rmsprop”, “config”: {“learning_rate” : 0.0001} } )
Parameters can be passed in this way, and the model compiler program is modified to pass in learning_rate:

import keras.optimizers as optimizers
model.compile( optimizer= optimizers.get( {"class_name": "rmsprop",
"config": {"learning_rate" : 0.0001} } ) ,
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])

(3)
Define the object and give parameters

model.compile(  optimizer= optimizer.rmsprop("learning_rate" : 0.0001) ,
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])

In this example, the optimizer is RMSprop. The following shows that as the learning rate is adjusted downward, the accuracy may not necessarily get better and better.

learning rate set 0.001 (default)

learning rate set 0.0001

learning rate set 0.00001

The above are the different setting methods of optimizer, and their important relationship with learning_rate during training is recorded here.

--

--