Python:如何使用Plotly堆叠或叠加直方图

问题描述：

我在单独的列表中有两组数据.每个列表元素的值都从0:100开始，元素重复.

I have two sets of data in separate lists. Each list element has a value from 0:100, and elements repeat.

例如:
first_data = [10,20,40,100，...，100,10,50]
second_data = [20,50,50,10，...，70,10,100]

我可以使用以下方法在直方图中绘制其中之一:

I can plot one of these in a histogram using:

import plotly.graph_objects as go
.
.
.

fig = go.Figure()
fig.add_trace(go.Histogram(histfunc='count', x=first_data))
fig.show()

通过将 histfunc 设置为'count'，我的直方图包括一个从0到100的x轴，以及表示 first_data中重复元素数量的条形图.

By setting histfunc to 'count', my histogram consists of an x-axis from 0 to 100 and bars for the number of repeated elements in first_data.

我的问题是:如何使用相同的计数"在同一轴上覆盖第二组数据?直方图?

My question is: How can I overlay the second set of data over the same axis using the same "count" histogram?

答

执行此操作的一种方法是，只需添加另一条迹线，您就快到了！用于创建这些示例的数据集可以在本文的最后部分找到.

One method to do this is by simply adding another trace, you were nearly there! The dataset used to create these examples, can be found in the last section of this post.

注意:
以下代码使用了较低级别"的plotly API，因为(个人而言)我觉得它更透明，并且使用户能够查看正在绘制的内容以及原因；而不是依赖 graph_objects 和 express 的便捷模块.

from plotly.offline import plot

layout = {}
traces = []

traces.append({'x': data1, 'name': 'D1', 'opacity': 1.0})
traces.append({'x': data2, 'name': 'D2', 'opacity': 0.5})

# For each trace, add elements which are common to both.
for t in traces:
    t.update({'type': 'histogram',
              'histfunc': 'count',
              'nbinsx': 50})

layout['barmode'] = 'overlay'

plot({'data': traces, 'layout': layout})

输出1:

另一个选择是绘制分布曲线(高斯KDE)，如下所示.值得注意的是，此方法绘制的是概率密度，而不是计数.

Another option is to plot the curve (Gaussian KDE) of the distribution, as shown here. It's worth noting that this method plots the probability density, rather than the counts.

X1, Y1 = calc_curve(data1)
X2, Y2 = calc_curve(data2)

traces = []
traces.append({'x': X1, 'y': Y1, 'name': 'D1'})
traces.append({'x': X2, 'y': Y2, 'name': 'D2'})

plot({'data': traces})

输出2:

关联的 calc_curve()函数:

from scipy.stats import gaussian_kde

def calc_curve(data):
    """Calculate probability density."""
    min_, max_ = data.min(), data.max()
    X = [min_ + i * ((max_ - min_) / 500) for i in range(501)]
    Y = gaussian_kde(data).evaluate(X)
    return(X, Y)

选项3-绘制条和曲线:

或者，您始终可以使用y轴上的概率密度将这两种方法结合在一起.

Option 3 - Plot Bars and Curves:

Or, you can always combine the two methods together, using the probability density on the yaxis.

layout = {}
traces = []

traces.append({'x': data1, 'name': 'D1', 'opacity': 1.0})
traces.append({'x': data2, 'name': 'D2', 'opacity': 0.5})

for t in traces:
    t.update({'type': 'histogram',
              'histnorm': 'probability density',
              'nbinsx': 50})

traces.append({'x': X1, 'y': Y1, 'name': 'D1'})
traces.append({'x': X2, 'y': Y2, 'name': 'D2'})

layout['barmode'] = 'overlay'

plot({'data': traces, 'layout': layout})

输出3:

以下是用于模拟[0,100]值的数据集并创建以下示例的代码:

Here is the bit of code used to simulate your dataset of [0,100] values, and to create these examples:

import numpy as np
from sklearn.preprocessing import MinMaxScaler

mms = MinMaxScaler((0, 100))
np.random.seed(4)
data1 = mms.fit_transform(np.random.randn(10000).reshape(-1, 1)).ravel()
data2 = mms.fit_transform(np.random.randn(10000).reshape(-1, 1)).ravel()