本文共 2483 字,大约阅读时间需要 8 分钟。
在工程项目中,数据集通常以多种格式存在,为了统一管理,可以选择将数据转换为统一格式。Tensorflow定义的TFRecord格式是一种灵活且高效的数据存储方式。
首先,将输入文件转换为TFRecord格式。示例:
来自MNIST图像集的转换:
from __future__ import print_functionimport osimport tensorflow as tffrom tensorflow.contrib.learn.python.learn.datasets import mnistimport numpy as npsave_dir = 'c:/tmp/data'# 数据下载data_sets = mnist.read_data_sets(save_dir, dtype=tf.uint8, reshape=False, validation_size=1000)
将数据写出:
data_splits = ['train', 'test', 'validation']for d in range(len(data_splits)): print('保存' + data_splits[d]) data_set = data_sets[d] filename = os.path.join(save_dir, data_splits[d] + '.tfrecords') writer = tf.python_io.TFRecordWriter(filename) for index in range(data_set.images.shape[0]): image = data_set.images[index].tostring() example = tf.train.Example( features=tf.train.Features( feature={ 'height': tf.train.Feature(int64_list=tf.train.Int64List(value=[data_set.images.shape[1]])), 'width': tf.train.Feature(int64_list=tf.train.Int64List(value=[data_set.images.shape[2]])), 'depth': tf.train.Feature(int64_list=tf.train.Int64List(value=[data_set.images.shape[3]])), 'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[int(data_set.labels[index])])), 'image_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image])) }) ) writer.write(example.SerializeToString()) writer.close()
读取时使用tf.python_io.tf_record_iterator
:
from tensorflow import python_iofilename = os.path.join(save_dir, 'train.tfrecords')record_iterator = python_io.tf_record_iterator(filename)serialized_img_example = next(record_iterator)
解析数据:
example = tf.train.Example()example.ParseFromString(serialized_img_example)image = example.features.feature['image_raw'].bytes_list.valuelabel = example.features.feature['label'].int64_list.value[0]width = example.features.feature['width'].int64_list.value[0]height = example.features.feature['height'].int64_list.value[0]
恢复图像:
img_flat = np.fromstring(image[0], dtype=np.uint8)img_reshaped = img_flat.reshape((height, width, -1))
Tensorflow的TFRecord格式为数据处理提供了高效的解决方案,无论是写入还是读取数据都得到了充分支持。
转载地址:http://ndigz.baihongyu.com/