我如何衡量两幅图像之间的相似性?
我想比较的一个应用截图(可以是网页)以previously采取截图,以确定应用程序是否正确显示自身。我不想精确匹配的比较,因为高宽可以是稍有不同(在一个Web应用程序的情况下,根据浏览器,一些元件可以以稍微不同的位置)。它应该给的是多么相似截图的措施。
I would like to compare a screenshot of one application (could be a Web page) with a previously taken screenshot to determine whether the application is displaying itself correctly. I don't want an exact match comparison, because the aspect could be slightly different (in the case of a Web app, depending on the browser, some element could be at a slightly different location). It should give a measure of how similar are the screenshots.
有一个库/工具,它已经这样做了吗?你将如何实现它?
Is there a library / tool that already does that? How would you implement it?
这完全取决于你想如何聪明的算法来定。
This depends entirely on how smart you want the algorithm to be.
例如,这里有一些问题:
For instance, here are some issues:
- 在裁剪的图像与一个未裁剪图片
- 与文字图像添加与另一个没有
- 镜像的图片
最简单的,最简单的算法的我已经看到了这仅仅是做以下步骤来每个图像:
The easiest and simplest algorithm I've seen for this is just to do the following steps to each image:
- 规模小东西,像64×64 32×32或者,不顾高宽比,用相结合的缩放算法,而不是最近的像素
- 缩放色彩范围,使最暗的黑色和最亮白色
- 旋转和翻转的图像,使得lighest颜色是左上角,然后右上接着较暗,左下是下一个更暗(尽可能当然)
修改 A 组合缩放算法的是一个扩展10个像素降至1时,将使用一个函数,它所有的10个像素的颜色,并结合他们去做成一体。能与像平均,平均值,或者更复杂的像双三次样条函数的算法来完成。
Edit A combining scaling algorithm is one that when scaling 10 pixels down to one will do it using a function that takes the color of all those 10 pixels and combines them into one. Can be done with algorithms like averaging, mean-value, or more complex ones like bicubic splines.
然后计算平均距离逐象素的两个图像之间
Then calculate the mean distance pixel-by-pixel between the two images.
要查找一个可能的匹配在数据库中,存储的像素颜色单独列在数据库中,指数一群人(但不是全部,除非你使用一个非常小的图像),并做一个使用查询范围对于每个像素值,即。每一个形象,其中的小图像中的像素为-5,你要查找的图像+5之间。
To look up a possible match in a database, store the pixel colors as individual columns in the database, index a bunch of them (but not all, unless you use a very small image), and do a query that uses a range for each pixel value, ie. every image where the pixel in the small image is between -5 and +5 of the image you want to look up.
这是很容易实现的,还算运行速度快,但当然不会处理最先进的差异。对于您需要更多先进的算法。
This is easy to implement, and fairly fast to run, but of course won't handle most advanced differences. For that you need much more advanced algorithms.