用户
 找回密码
 立即注册
发表于 2020-10-16 00:28:39
65312
经过测试,在Resnet18里可以正常训练并计算loss
但是在mobilenetv2模型中会出现invalid loss并终止训练

使用道具 举报 回复
发表于 2020-10-16 00:32:28
==================================================================================
Total params: 1,222,588
Trainable params: 1,205,436
Non-trainable params: 17,152
________________________________________________________________________________
2020-10-15 16:26:03,654 [INFO] iva.ssd.scripts.train: Number of images in the training dataset:          1527
2020-10-15 16:26:03,655 [INFO] iva.ssd.scripts.train: Number of images in the validation dataset:           248
Epoch 12/100
2020-10-15 16:26:32.618167: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-10-15 16:26:36.564399: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x7445790
2020-10-15 16:26:36.565127: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-10-15 16:26:36.969943: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-10-15 16:26:36.971407: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
6/24 [======>.......................] - ETA: 2:05 - loss: nan                               Batch 5: Invalid loss, terminating training

Epoch 00012: saving model to /workspace/mydata/ssd/experiment_dir_unpruned/weights/ssd_mobilenet_v2_epoch_012.tlt
==================================================================================
以上是报错信息
使用道具 举报 回复 支持 反对
发表于 2020-10-16 11:57:15
通常是由于学习率高了,或者样本数量较少引起的。
你可以增加训练样本,或者调低学习率来改善
使用道具 举报 回复 支持 反对
发新帖
您需要登录后才可以回帖 登录 | 立即注册