远程服务器上的工件存储和MLFLow

问题描述:

我正在尝试使MLFlow在本地网络上的另一台计算机上运行,​​我想寻求帮助,因为我现在不知道该怎么办.

I am trying to get MLFlow on another machine in a local network to run and I would like to ask for some help because I don't know what to do now.

我有一个在服务器上运行的mlflow服务器. mlflow服务器正在我的用户下在 server 上运行,并已按以下方式启动:

I have a mlflow server running on a server. The mlflow server is running under my user on the server and has been started like this:

mlflow server --host 0.0.0.0 --port 9999 --default-artifact-root sftp://<MYUSERNAME>@<SERVER>:<PATH/TO/DIRECTORY/WHICH/EXISTS>

我的程序应将所有数据记录到mlflow服务器中,如下所示:

My program which should log all the data to the mlflow server looks like this:

from mlflow import log_metric, log_param, log_artifact, set_tracking_uri

if __name__ == "__main__":
    remote_server_uri = '<SERVER>' # this value has been replaced
    set_tracking_uri(remote_server_uri)
    # Log a parameter (key-value pair)
    log_param("param1", 5)

    # Log a metric; metrics can be updated throughout the run
    log_metric("foo", 1)
    log_metric("foo", 2)
    log_metric("foo", 3)

    # Log an artifact (output file)
    with open("output.txt", "w") as f:
        f.write("Hello world!")
    log_artifact("output.txt")

参数get和指标被传输到服务器,但不是工件.为什么会这样?

The parameters get and metrics get transfered to the server but not the artifacts. Why is that so?

关于SFTP部分的说明: 我可以通过SFTP登录并安装了pysftp软件包

Note on the SFTP part: I can log in via SFTP and the pysftp package is installed

我不知道我是否能得到我的问题的答案,但是我确实通过这种方式解决了.

I don't know if I will get an answer to my problem but I did solved it this way.

在服务器上,我创建了目录/var/mlruns.我通过--backend-store-uri file:///var/mlruns

On the server I created the directory /var/mlruns. I pass this directory to mlflow via --backend-store-uri file:///var/mlruns

然后我通过例如挂载此目录sshfs在同一路径下的本地计算机上.

Then I mount this directory via e.g. sshfs on my local machine under the same path.

我不喜欢这种解决方案,但是到目前为止,它已经很好地解决了这个问题.

I don't like this solution but it solved the problem good enough for now.