Linux内核中的sys_execve()系统调用可以同时接收绝对路径还是相对路径?

Linux内核中的sys_execve()系统调用可以同时接收绝对路径还是相对路径?

问题描述:

内核级代码中的sys_execve()是否应接收filename参数的绝对或相对路径?

Shall sys_execve() in kernel level code receive absolute or relative path for the filename parameter?

sys_execve可以采用绝对路径或相对路径

sys_execve can take either absolute or relative paths

让我们通过以下方式进行验证:

Let's verify it in the following ways:

  • 使用原始系统调用的实验
  • 阅读内核源代码
  • 在内核+ QEMU上运行GDB以验证我们的源代码分析

实验

main.c

#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>

int main(void) {
    syscall(__NR_execve, "../main2.out", NULL, NULL);
}

main2.c

#include <stdio.h>

int main(void) {
    puts("hello main2");
}

编译并运行:

gcc -o main.out main.c
gcc -o ../main2.out main2.c
./main.out

输出:

hello main2

在Ubuntu 16.10中进行了测试.

Tested in Ubuntu 16.10.

内核源

首先,进入内核树

git grep '"\.\."' fs

我们专注于fs,因为我们知道在那里定义了execve.

We focus on fs because we know that execve is defined there.

这立即产生如下结果: https://github.com/torvalds/linux/blob/v4.9/fs/namei.c#L1759 明确表明他的内核了解..:

This immediately gives results like: https://github.com/torvalds/linux/blob/v4.9/fs/namei.c#L1759 which clearly indicate that he kernel knows about ..:

/*
 * "." and ".." are special - ".." especially so because it has
 * to be able to know about the current root directory and
 * parent relationships.
 */

然后,我们查看execve的定义 https ://github.com/torvalds/linux/blob/v4.9/fs/exec.c#L1869 ,它要做的第一件事是在输入路径上调用getname():

We then look at the definition of execve https://github.com/torvalds/linux/blob/v4.9/fs/exec.c#L1869 and the first thing it does is to call getname() on the input path:

SYSCALL_DEFINE3(execve,
        const char __user *, filename,
        const char __user *const __user *, argv,
        const char __user *const __user *, envp)
{
    return do_execve(getname(filename), argv, envp);
}

getname是在fs/namei.c中定义的,该文件是上面的".."引用所在的文件.

getname is defined in fs/namei.c, which is the file where the above ".." quote came from.

我没有费心遵循完整的通话路径,但是我敢打赌getname它最终会完成..分辨率.

I haven't bothered to follow the full call path, but I bet that getname it ends up doing .. resolution.

follow_dotdot看起来特别有前途.

GDB + QEMU

阅读源代码很棒,但是我们无法确定是否实际使用了代码路径.

Reading the source is great, but we can never be sure that the code paths are actually used.

有两种方法可以做到这一点:

There are two ways to do that:

  • printk,重新编译,printk,重新编译
  • GDB + QEMU.设置有点粗糙,但是一旦完成,那就太幸福了
  • printk, recompile, printk, recompile
  • GDB + QEMU. Setup is a bit rougher, but once done it is pure bliss

首先按照以下说明进行设置:

First get the setup working as explained at: How to debug the Linux kernel with GDB and QEMU?

现在,我们将使用两个程序:

Now, we will use two programs:

init.c

#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>

int main(void) {
    chdir("d");
    syscall(__NR_execve, "../b.out", NULL, NULL);
}

b.c

#include <unistd.h>
#include <stdio.h>

int main(void) {
    puts("hello");
    sleep(0xFFFFFFFF);
}

rootfs文件结构应类似于:

init
b.out
d/

一旦GDB运行,我们将做:

Once GDB is running, we will do:

b sys_execve
c
x/s filename

输出../b.out,所以我们知道它是正确的系统调用.

Outputs ../b.out, so we know it is the right syscall.

现在我们之前看过的有趣的".."注释位于一个名为walk_component的函数中,所以让我们看看它是否被称为:

Now the interesting ".." comment we had seen before was in a function called walk_component, so let's see if that is called:

b walk_component
c

是的,我们击中了它.

如果我们读了一点,就会看到一个电话:

If we read a bit into it, we see a call:

error = handle_dots(nd, nd->last_type);

听起来很有前途,并且确实做到了:

which sounds promising and does:

static inline int handle_dots(struct nameidata *nd, int type)
{
    if (type == LAST_DOTDOT) {
        if (!nd->root.mnt)
            set_root(nd);
        if (nd->flags & LOOKUP_RCU) {
            return follow_dotdot_rcu(nd);
        } else
            return follow_dotdot(nd);
    }
    return 0;
}

那么将此type(nd->last_type)设置为LAST_DOTDOT的原因是什么?

So what is it that sets this type (nd->last_type) to LAST_DOTDOT?

好吧,在源代码中搜索= LAST_DOTDOT,我们发现link_path_walk正在执行此操作.

Well, search the source for = LAST_DOTDOT, and we find that link_path_walk is doing it.

更好的是:btlink_path_walk是一个呼叫者,因此很容易理解现在的情况.

And even better: bt says that link_path_walk is a caller, so it will be easy to understand what is going on now.

link_path_walk中,我们看到:

if (name[0] == '.') switch (hashlen_len(hash_len)) {
    case 2:
        if (name[1] == '.') {
            type = LAST_DOTDOT;

因此,迷雾得到了解决:".."不是正在执行的检查,它挫败了我们之前的抱怨!

and thus the mistery is solved: ".." was not the check that was being done, which foiled our previous greps!

相反,分别检查了两个点(因为.是子案例).

Instead, the two dots were being checked separately (because . is a subcase).