使用add_edge_list()方法创建图形的最佳方法是什么?
我试图通过graph-tool
库(在10 ^ 6-10 ^ 7个顶点附近)创建大型图形,并用顶点名称填充顶点属性或使用名称而不是顶点索引.我有:
I am trying to create large graph via graph-tool
library (near 10^6 - 10^7 vertices) and fill vertex property with vertex name or use names instead of vertex indexes. I have:
-
名称列表:
list of names:
['50', '56', '568']
一组边,但不是顶点索引,而是它们的名称:
set of edges, but instead of vertex indexes it consists of their names:
edge_list = {frozenset({'568', '56'}), frozenset({'56', '50'}), frozenset({'50', '568'})}
因为add_edge_list()
允许创建折点(如果它们在图形中没有这样的折点).我正在尝试使用它来填充一个空图.它可以正常工作,但是当我尝试通过名称获取顶点时,出现一个错误,即没有具有该索引的顶点.
Since add_edge_list()
allows to create vertices if they are no such vertix in the graph. I'm trying to use it to fill an empty graph. It works ok, but when I was trying to get vertex by its name, I got an error that there are no vertex with such index.
这是我程序的代码:
g = grt.Graph(directed=False)
edge_list = {frozenset({'568', '56'}), frozenset({'56', '50'}), frozenset({'50', '568'})}
ids = ['50', '56', '568']
g.add_edge_list(edge_list, hashed=True, string_vals=True)
print(g.vertex('50'))
print(g.vertex('50'))
的错误消息:
ValueError: Invalid vertex index: 50
我要创建图形:
- 仅使用
edge_list
; - 可以通过名称快速访问顶点;
- 按时间最佳(如果可能的话,还有RAM).
- Using
edge_list
only; - Having quick access to a vertex by its name;
- Optimal by time (and RAM if possible).
有什么好办法吗?
当前代码:
g = grt.Graph(directed=False)
g.add_vertex(len(ids))
vprop = g.new_vertex_property("string", vals=ids)
g.vp.user_id = vprop
for vert1, vert2 in edges_list:
g.add_edge(g.vertex(ids_dict[vert1]), g.vertex(ids_dict[vert2]))
如果您有一个包含10 ^ 6-10 ^ 7个顶点的密集图(是一些医学数据还是社交图?它可以改变一切) ,您不应该使用networkx
,因为它是在纯Python上编写的,因此它比graph-tool
或igraph
慢10-100倍.对于您的情况,我建议您使用graph-tool
.这是最快的(〜c igraph
)Python图形处理库.
If you have a dense graph with 10^6 - 10^7 vertices (Is it some medical data or social graph? It can change everything), you shouldn't use networkx
because it is written on pure Python so it is ~10-100 times slower than graph-tool
or igraph
. In your case I recommend you to use graph-tool
. It is the fastest (~as igraph
) Python graph processing library.
graph-tool
的行为不同于networkx
.创建networkx
节点时,其标识符就是您在节点构造函数中编写的内容,因此可以通过其ID获取该节点.在graph-tool中,每个顶点ID是从1到GRAPH_SIZE的整数:
graph-tool
behaviour differs from networkx
. When you create the networkx
node, its identifier is what you wrote in node constructor so you can get the node by its ID. In graph-tool every vertex ID is the integer from 1 to GRAPH_SIZE:
图中的每个顶点都有一个唯一索引,该索引始终在0到N-1之间,其中N是顶点数.可以通过使用图形的vertex_index属性(此属性为属性图,请参见属性图)或将顶点描述符转换为int来获得此索引.
Each vertex in a graph has an unique index, which is always between 0 and N−1, where N is the number of vertices. This index can be obtained by using the vertex_index attribute of the graph (which is a property map, see Property maps), or by converting the vertex descriptor to an int.
关于图形,顶点或边的所有其他信息都存储在属性图.当您将.add_edge_list()
与hashed=True
一起使用时,新的属性映射将作为.add_edge_list()
的结果返回.因此,在您的情况下,应该这样处理顶点:
Every additional information about graph, vertices or edges is stored in property maps. And when you are using .add_edge_list()
with hashed=True
, the new property map is returned as the result of .add_edge_list()
. So in your case you should handle your vertices like this:
# Create graph
g = grt.Graph(directed=False)
# Create edge list
# Why frozensets? You don't really need them. You can use ordinary sets or tuples
edge_list = {
frozenset({'568', '56'}),
frozenset({'56', '50'}),
frozenset({'50', '568'})
}
# Write returned PropertyMap to a variable!
vertex_ids = g.add_edge_list(edge_list, hashed=True, string_vals=True)
g.vertex(1)
Out [...]: <Vertex object with index '1' at 0x7f3b5edde4b0>
vertex_ids[1]
Out [...]: '56'
如果要根据ID获取顶点,则应手动构造映射字典(好吧,我不是graph-tool
专家,但我找不到简单的解决方案):
If you want to get a vertex according to the ID, you should construct mapping dict manually (well, I am not a graph-tool
guru, but I can't find simple solution):
very_important_mapping_dict = {vertex_ids[i]: i for i in range(g.num_vertices())}
因此您可以轻松获得顶点索引:
So you can easily get a vertex index:
very_important_mapping_dict['568']
Out [...]: 0
vertex_ids[0]
Out [...]: '568'