[Learn about machine learning from the Keras] — 4.Model Fit operation process

6 min readSep 21, 2023

When the model is successfully compiled, training can be performed through fit of keras.engine.training.Model.

Keras official website example:

from tensorflow.keras.models import Sequential
from tensorflow.keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

from tensorflow.keras import layers 
from tensorflow.keras.models import Model

model = Sequential([
   layers.Dense(512, activation="relu"),
   layers.Dense(10, activation="softmax")
])

model.compile(optimizer="rmsprop",
      loss="sparse_categorical_crossentropy",
      metrics=["accuracy"])

model.fit(train_images, train_labels, epochs=5, batch_size=128)

The fit function generally mainly performs the following tasks:

(1)
Through the _assert_compile_was_called function, determine whether the _is_compiled variable of the model is set to True. If not, it means that the model has not passed the compiler, and an error exception will be issued.

(2)
Through the keras.engine.data_adapter.get_data_handler function, set the parameters to the keras.engine.data_adapter.DataHandler class attribute member. To set the _adapter attribute, the appropriate class will be found from the following class list:

keras.engine.data_adapter.ListsOfScalarsDataAdapter
keras.engine.data_adapter.TensorLikeDataAdapter
keras.engine.data_adapter.GenericArrayLikeDataAdapter
keras.engine.data_adapter.DatasetAdapter
keras.engine.data_adapter.GeneratorDataAdapter
keras.engine.data_adapter.KerasSequenceAdapter
keras.engine.data_adapter.CompositeTensorDataAdapter
keras.engine.data_adapter.DatasetCreatorAdapter

The class of input training data (train_images, train_labels) that can be processed in this example are
keras.engine.data_adapter.TensorLikeDataAdapter, after generating an instance of this class, also sets the parameters to the attribute members of this class. This Adapter will do some processing and conversion during the setting process, as follows:
(A) Convert the training data (train_images, train_labels) and sample_weights (if any) into tensor variables of Tensor type.
(B) If batch_size is not set, set the unconditional carry number by dividing the input shape[0] value (indicating how many pieces of training data there are in total) by 32 (default).
© Slice the tensor using batch_size, convert it into a shape size divisible by the batch, and process the last part of the batch separately to achieve higher performance.

(3)
If the callbacks parameter list passed in is not an entity list that inherits the keras.callbacks.CallbackList class, the callbacks parameter will be integrated into the keras.callbacks.CallbackList class, which is also the Container of the callback. In the process of integration,
The members that this callbacks parameter list can contain are keras.callbacks.ProgbarLogger class entities, or keras.callbacks.History class entities. These members all inherit the keras.callbacks.Callback class. It doesn’t matter if there are no these members. By default, these two entities will be assigned to the _history and _progbar attributes of the Container, and will also be added to the Container’s callbacks entity list. In other words, both callbacks will be used during training.

(4)
Then call the keras.engine.training.Model.make_train_function function to determine the train_function to be used for each training step. Here, a method similar to closure is used to save the function in memory in advance, and this block of memory will be used when running batches.

(5)
In this example, 5 epochs represent 5 training times. Each epoch trains a total of 60,000 pieces of data. These 60,000 pieces of data are trained in batches of 128.
The structure of program execution iteration is roughly as follows:

It can be described roughly as follows:

for epoch, iterator in data_handler.enumerate_epochs():
    callbacks.on_epoch_begin(epoch)
    
    for step in data_handler.steps():
        callbacks.on_train_batch_begin(step)
        train_function(iterator)
        callbacks.on_train_batch_end(end_step, logs)
        if self.stop_training:
            break
    callbacks.on_epoch_end(epoch, epoch_logs)
    if self.stop_training:
     break

(6)
train_function in loop:
Refer to the diagram shown in (5), the red color is the main function of the main training.
The train_function is to execute the step_function(model, iterator) function, and then execute keras.engine.training.Model.train_step.
The basic execution sequence in keras.engine.training.Model.train_step is keras.engine.training.Model.call -> keras.engine.base_layer.call (each layer checks the compatibility with the input before executing the call) .
If the model’s build function has not been executed before training, the model will be built first at this time to ensure that the model must be built before actual training.

The operation is as follows:

In this example, from train_step, the sequence.call of the model will be executed, and the keras.engine.sequential.build_graph_network_for_inferred_shape function will be executed. At the beginning, the tensor dimension of the input will be set. The dimension here is as follows: shape=(None, 784). Next, all added layers will be iterated and the __call_ of each layer will be called. Taking the first iteration as an example, the call function is the call of the parent class of each layer, keras.engine.base_layer.call. After executing the fetching settings and performing corresponding processing, it actually goes to the keras.layers.core.dense.call function. In the formula, it is the call function implemented by the layer entity. Each layer call will check whether the layer itself has been built, and if not, the build action will be performed. In this example, the layer itself keras.layers.core.dense executes keras.layers.core.dense.build. The build will definitely pass in the received input entity. You must understand the dimensions of the received tensor. Then the keras.engine.base_layer.add_weight function will be used to initialize the kernel (weight) and bias (fault tolerance) attribute settings of keras.layers.core.dense corresponding to the parameter content of the originally declared keras.layers.core.dense (details See the article on dense layer).

If it is confirmed that each layer entity has been built, the call function of the layer entity will actually perform the inner product of the input tensor and its own weight (kernel), plus the bias tensor variable, and finally pass the specified activation The function converts the output tensor. The output tensor is then converted into the input tensor of the next layer, and the __call__ of each layer is called again, passing in the input tensor, and the action is repeated until the output of the last layer is generated. Here we are making connections between layers and ensuring that layers are related.

Afterwards, the keras.engine.functional._init_graph_network function will be executed to reinitialize the graph network for model.Sequential (model.build will mention it again in detail later), and then specify keras.engine.sequential.Sequential.input as input. layer entity, and keras.engine.sequential.Sequential.output is the output layer entity. So far, the weights of the model are only initial settings. The keras.engine.sequential.Sequential.built and _graph_initialized properties will also be set to True to indicate that the model has been built.

The model ensures that after the keras.engine.sequential._build_graph_network_for_inferred_shape function and the keras.engine.functional._init_graph_network function are executed, keras.engine.functional._run_internal_graph can be executed, which is the calculated output tensor.

After all the layers are built, go through the model to find the layers constructed in the model, and correspond the input dimensions and the dimensions set by each layer in order (check whether they are compatible). If there is no problem, then execute the input tensor and The inner product of the weight tensor of the Layer, plus the bias, is then used to calculate the loss value and update weight using tf.GradientTape() and the optimizer initialized by the model (this example is keras.optimizers.rmsprop.RMSprop). And finally use activation function to convert the output tensor.

The calculation of the loss value will use the previously set keras.engine.compile_utils.LossesContainer object. According to this example, the Container will use the keras.losses.sparse_categorical_crossentropy function corresponding to the model’s loss parameter set to ‘sparse_categorical_crossentropy’ to calculate.

After completing the calculation of the loss function, keras.engine.training.compute_metrics and keras.engine.compile_utils.update_state will be executed to perform backwards pass and update.
After training each train_function to complete a single loop, the callback function of the keras.callbacks.ProgbarLogger object will be used to print metrics to stdout and capture the screen:

The above is a rough observation of the main items and steps performed by model.fit(), which are recorded here.

[Learn about machine learning from the Keras] — 4.Model Fit operation process

Written by Czxdas

No responses yet