Tensorflow 透过性能分析工具查看变量位置

Tensorflow 通过性能分析工具查看变量位置
通过tensorflow自动分配变量时,并不清楚变量具体的分配位置时CPU还是GPU
一般情况下,这并不是问题,然而我尝试分配大变量时,内存溢出了!!!!
才发现embedding variable居然被分配在了GPU上。。。。
搜索发现,tensorflow自带性能分析工具,参见:http://stackoverflow.com/questions/37751739/tensorflow-code-optimization-strategy

文末程序的日志如下:
Tensorflow 透过性能分析工具查看变量位置

不多说了,上代码:
# coding=utf-8
'''
测试Tensorflow的性能分析工具;
该工具也可以检测变量的位置,,

参考网址:http://stackoverflow.com/questions/37751739/tensorflow-code-optimization-strategy

Created on Mar 30, 2017
@author: colinliang
'''
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from tensorflow.python.client import timeline
if __name__ == '__main__':
    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
    run_metadata = tf.RunMetadata()
    sess = tf.Session()
    print(u'CUPTI的so文件位置变了, 直接运行本程序会产生exception!!! http://blog.****.net/rtygbwwwerr/article/details/51605835')
    print(u'解决方法:我是拷贝了一下/usr/local/cuda/extras/CUPTI/lib64对应3个的文件到 /usr/local/cuda/lib64 中')
    from tensorflow.python.ops import partitioned_variables
    partitioner = partitioned_variables.variable_axis_size_partitioner(max_shard_bytes=512)  # 注意这里是分割的字节数,而不是float的数量
    dim = 2
    
#     with tf.device('/cpu:0'):
#         embedding_var=tf.get_variable('embedding_var', shape=[600,dim],partitioner=partitioner, dtype=tf.float32)
    embedding_var = tf.get_variable('embedding_var', shape=[200, dim], partitioner=partitioner, dtype=tf.float32) 
    
    w = tf.get_variable('w', shape=[dim, 10], dtype=tf.float32)
    
#     tf.PartitionedVariable  # 如果进行了partition,变量类型不是tf.Variable,而是,由一组tf.Variable组成的tf.PartitionedVariable
    sess.run(tf.global_variables_initializer())
    print('embedding_var: %s' % embedding_var)
#     print('device of embedding_var: %s'% embedding_var.device)  #对于单个变量,可以这么打印device,但只针对显式指定device的变量有效。。。

    r = tf.nn.embedding_lookup(embedding_var, [0])
    r = tf.matmul(r, w)
    print(r)
    sess.run(r, options=run_options, run_metadata=run_metadata)
        
#     print(run_metadata)
    tl = timeline.Timeline(run_metadata.step_stats)
    ctf = tl.generate_chrome_trace_format()
    tracing_log='/tmp/timeline.json'
    with open(tracing_log, 'w') as f:
        f.write(ctf)
    print('DONE')
    print('在chrome中打开:   chrome://tracing  , 再load %s 即可查看运行日志'%tracing_log)
    exit(0)
    #############################################
    

'''
    print(run_metadata) 输出的部分内容如下
    可以看到 embedding_var 被分片了,每片大小为400字节,而不是我们指定的512字节。。。
    allocator_name: "gpu_bfc" 为 gpu_bfc , 说明该变量被分配在GPU上!!!
'''

#     node_stats {
#       node_name: "embedding_var/part_0"
#       all_start_micros: 1490861057718593
#       op_end_rel_micros: 2
#       all_end_rel_micros: 6
#       memory {
#         allocator_name: "gpu_bfc"
#       }
#       output {
#         tensor_description {
#           dtype: DT_FLOAT
#           shape {
#             dim {
#               size: 50
#             }
#             dim {
#               size: 2
#             }
#           }
#           allocation_description {
#             requested_bytes: 400
#             allocated_bytes: 512
#             allocator_name: "gpu_bfc"
#             allocation_id: 24
#             has_single_reference: true
#             ptr: 1117060600064
#           }
#         }
#       }