



I generated a list of files by writing the following code:

files = [file for file in Path(main_directory).rglob('*filename*v*.xlsx')]


files[0] = .../2018/filename 2018 v 1.xlsx
files[1] = .../2019/filename 2019 v 5.xlsx
files[2] = .../2020/filename 2020 v 4.xlsx
files[3] = .../2020/filename 2020 v 5.xlsx
files[13] = .../2020/filename 2020 v 10.xlsx

我该怎么做才能获得每年只给我最大的 v 的输出,所以我会有这样的输出?

What can I do to have an output to give me only the biggest v for each year, so I will have an output like this?

files[0] = .../2018/filename 2018 v 1.xlsx
files[1] = .../2019/filename 2019 v 5.xlsx
files[2] = .../2020/filename 2020 v 10.xlsx

我必须获得最大的 v ,这并不意味着它具有最新的修改日期,因此我无法使用该功能.我已经尝试过 os.path re ,但是我找不到任何地方.

I have to get the biggest v which doesn't mean it has the latest modified date, so I can not use that functionality. I have tried os.path and re but I am getting no where.


Assuming that the filenames of same years are together you can try this.

x=["2018/filename 2018 v 1.xlsx","2019/filename 2019 v 5.xlsx","2020/filename 2020 v 4.xlsx","2020/filename 2020 v 5.xlsx","2020/filename 2020 v 10.xlsx"]
from itertools import groupby
import re
for i,j in groupby(x, lambda x:int(re.findall(r"(?<=filename )\d+", x)[0])):
    print max(j, key=lambda x:int(re.findall(r"(?<=v )\d+(?=\.xlsx)", x)[0]))


2018/filename 2018 v 1.xlsx
2019/filename 2019 v 5.xlsx
2020/filename 2020 v 10.xlsx