有没有办法在FPGA上存储百万位矩阵?
我正在努力在FPGA上实现通道解码器.本质上,问题可以归结为:
I am working towards the implementation of a channel decoder on an FPGA. Esentially , the problem sums up to this :
1)我有一个矩阵.我对行进行一些计算.然后,我对列进行一些计算.
1) I have a matrix . I do some computations on the rows. Then, I do some computations on the columns.
解码器基本上会拾取矩阵的每一行,执行一些操作,然后移至下一行.它与列相同.
The decoder basically picks up each row of the matrix, performs some operations and move onto the next row. It does the same with the columns.
然而,解码器在1023 * 1023矩阵上运行,即我有1023行和1023列.
The decoder however operates on a 1023 * 1023 matrix i.e I have 1023 rows and 1023 columns.
有效的小型测试用例: 我首先创建了reg [1022:0] product_code [0:1],即2行1023列.输出是预期的.但是,LUT利用率约为9%.然后,将大小增加到10行和1023列(reg [1022:0] product_code [0:9]),这也可以按预期工作.但是资源利用率已上升到27%.
Small test case that works : I first created a reg [1022:0] product_code[0:1] i.e 2 rows and 1023 columns. The output is as expected. However, the LUT utilization shows up to be 9 percent approximately. Then , I increase the size to 10 rows and 1023 columns(reg [1022:0] product_code[0:9]) which works as expected too. But the resource utilization has gone up to 27 percent.
现在我的目标是要获得1023行和1023列.我什至不综合.有没有更好的方法在FPGA上存储这种矩阵?
Now my goal is to work get 1023 rows and 1023 columns. I does not even synthesize. Is there a better way to store such matrix on the FPGA ?
我非常感谢您的反馈!!!
I would really appreciate any feedback !!!
您可以从制造商数据表中找到FPGA的存储量.然而,那些存储器是高度可配置的.
You can find out the amount of storage an FPGA has from the manufacturers data sheet. However those memories are highly configurable.
因此,可以将36位宽的存储器用作36x1或18x2或4x9单元.另外,您可以读取单位,例如36位,但您自己将数据拆分为8位4位.分别处理每个半字节,然后重新写回整个整体.
Thus a 36 bit wide memory can be used as 36x1 or 18x2 or 4x9 units. Alternative you can read units of e.g. 36 bits but split the data yourself in 8 units of 4 bits. Process each nibble separately and write the whole back again.
请确保您使用的是同步内存,因为所有FPGA中的所有大内存块都是同步的.如果您开始使用异步存储器,则必须从LUTS构建存储器,并且很快就会用完.
Make sure your are using synchronous memories as all big memory blocks in all FPGAs are synchronous. If you start using asynchronous memories, the memories must be build from LUTS and you run out very quickly.
还请注意,行和列的处理必须考虑到数据的存储方式.您可以例如按行存储数据.以半字节为例:当您读取一个36个内存条目时,将为您提供8个半字节的行.但是在列模式下,一次读取将为您提供8个相邻列的前8个条目.因此,理想情况下,您应该同时并行处理8列.
Also beware that your row and column processing must take into account how the data is stored. You can e.g. store the data row-wise. Using nibbles as example: when you read one 36 memory entry, that gives you a row of 8 nibbles. But in column mode one read gives you the first 8 entries of 8 adjacent columns. So there you should ideally process 8 columns in parallel at the same time.