命令行参数选项处置:getopt()及getopt_long()函数使用

命令行参数选项处理:getopt()及getopt_long()函数使用


     在运行某个程序的时候,我们通常使用命令行参数来进行配置其行为。命令行选项和参数控制 UNIX 程序,告知它们如何动作。当 gcc的程序启动代码调用我们的入口函数 main(int argc,char *argv[]) 时,已经对命令行进行了处理。argc 参数包含程序参数的个数,而 argv 包含指向这些参数的指针数组。

程序的参数可以分为三种:选项,选项的关联值,非选项参数。例如:

$gcc getopt_test.c -o testopt
getopt_test.c是非选项参数,-o是选项,testopt是-o选项的关联值。根据Linux的惯例,程序的选项应该以一个短横线开头,后面包含单个字母或数字,选项分为:带关联值的和不带关联值的以及可选的不带关联值的选项可以在一个短横线后合并使用,例如 ls -al。此外还有长选项,有两个短横线来指明,比如说   -o filename  --output filename  给定输出文件名等,下面整理了一些国外的资源用来学习。

getopt():短选项处理

getopt() 函数位于 unistd.h 系统头文件中,函数原型是: 
int getopt( int argc, char *const argv[], const char *optstring );
getopt使用main函数的argc和argv作为前两个参数,optsting是一个字符列表,每个字符代表一个单字符选项,如果一个字符后面紧跟以冒号(:),表示该字符有一个关联值作为下一个参数;两个冒号"::"代表这个选项的参数是可选的。getopt的返回值是argv数组中的下一个选项参数,由optind记录argv数组的下标,如果选项参数处理完毕,函数返回-1;如果遇到一个无法识别的选项,返回问号(?),并保存在optopt中;

如果一个选项需要一个关联值,而程序执行时没有提供,返回一个问号(?),如果将optstring的第一个字符设为冒号(:),这种情况下,函数会返回冒号而不是问号。

选项参数处理完毕后,optind会指向argv数组尾部的其他非选项参数。实际上,getopt在执行过程中会重排argv数组,将非选项参数移到数组的尾部
getopt() 所设置的全局变量(在unistd.h中)包括:
optarg——指向当前选项参数(如果有)的指针。
optind—— getopt() 即将处理的下一个参数 argv 指针的索引。
optopt——最后一个已知选项。

下面是一个使用getopt简单例子:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main( int argc, char **argv) {
      int opt = 0;
      int i = 0;
      const char *optstring = ":vV:h:" ;  //

      for(i = 0; i < argc; i++)
           printf ("%d:%s\n" , i, argv[i]);

      //分别处理选项参数中的各个参数
      while((opt = getopt (argc, argv, optstring)) != -1){
           switch (opt){
           case 'v' :
               printf ("verbose\n" );
               break ;
           case 'V' :
               printf ("option %c:the Version is %s\n" , opt, optarg);
               break ;
           case 'h' :
               printf ("The option %c  is %s...\n" , opt, optarg);
               break ;
           case '?' :
               printf ("Unknown option %c\n" ,optopt);
               break ;
          }
     }

      //option index 最终会指向非选项参数
      printf( "After getopt the optind = %d \n" , optind);

      //在执行完getopt之后重新打印 argv数组
      for(i = 0; i < argc; i++)
           printf ("%d:%s\n" , i, argv[i]);

      return 0;
}
结果:
X:\1.KEEP MOVING\3.C\MyCodes\GetOpt\Debug\GetOpt.exe: invalid option -- x
0:X:\1.KEEP MOVING\3.C\MyCodes\GetOpt\Debug\GetOpt.exe
1:arg1
2:-v
3:-V
4:2.1
5:-h
6:help
7:-x
8:arg2
verbose
option V:the Version is 2.1
The option h  is help...
Unknown option x
After getopt the optind = 7
0:X:\1.KEEP MOVING\3.C\MyCodes\GetOpt\Debug\GetOpt.exe
1:-v
2:-V
3:2.1
4:-h
5:help
6:-x
7:arg1
8:arg2

可以看到getopt执行完后非选项参数都移到了后面,由optind指向。

getopt_long():长选项处理

函数原型  :   int getopt_long (int argc, char *const *argv, const char *shortopts, const struct option *longopts, int *indexptr)
贴一段对这个函数比较清晰的说明:

Decode options from the vector argv (whose length is argc). The argument shortopts describes the short options to accept, just as it does in getopt. The argument longopts describes the long options to accept (see above).

When getopt_long encounters a short option, it does the same thing that getopt would do: it returns the character code for the option, and stores the options argument (if it has one) inoptarg.

When getopt_long encounters a long option, it takes actions based on the flag and val fields of the definition of that option.

If flag is a null pointer, then getopt_long returns the contents of val to indicate which option it found. You should arrange distinct values in the val field for options with different meanings, so you can decode these values after getopt_long returns. If the long option is equivalent to a short option, you can use the short option's character code in val.

If flag is not a null pointer, that means this option should just set a flag in the program. The flag is a variable of type int that you define. Put the address of the flag in the flag field. Put in the val field the value you would like this option to store in the flag. In this case, getopt_long returns 0.

For any long option, getopt_long tells you the index in the array longopts of the options definition, by storing it into *indexptr. You can get the name of the option withlongopts[*indexptr].name. So you can distinguish among long options either by the values in their val fields or by their indices. You can also distinguish in this way among long options that set flags.

When a long option has an argument, getopt_long puts the argument value in the variable optarg before returning. When the option has no argument, the value in optarg is a null pointer. This is how you can tell whether an optional argument was supplied.

When getopt_long has no more options to handle, it returns -1, and leaves in the variable optind the index in argv of the next remaining argument.

getopt_long的选项用结构体option定义:
struct option {
    char *name;   //长选项的名字
    int has_arg;  // 0/1,标志是否有选项
    int *flag; //上面有详细说明,通常为NULL
    int val;  
};
This structure describes a single long option name for the sake of getopt_long. The argument longopts must be an array of these structures, one for each long option. Terminate the array with an element containing all zeros.

The struct option structure has these fields:
name - This field is the name of the option. It is a string. 
has_arg - This field says whether the option takes an argument. It is an integer, and there are three legitimate values: no_argument,             required_argument  and optional_argument. 
flag ,val - These fields control how to report or act on the option when it occurs.
If flag is a null pointer, then the val is a value which identifies this option. Often these values are chosen to uniquely identify particular long options.
If flag is not a null pointer, it should be the address of an int variable which is the flag for this option. The value in val is the value to store in the flag to indicate that the option was seen.

上面的英文解释非常清晰,下面是一个使用getopt_long简单例子:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <getopt.h>

int main( int argcchar **argv){
      const char *short_options = "vhVo:" ;

      const struct option long_options[] = {
              { "verbose" optional_argument , NULL, 'v' },
              { "help" no_argument , NULL, 'h' },
              { "version" no_argument , NULL, 'V' },
              { "output" required_argument , NULL, 'o' },
              {NULL, 0, NULL, 0} ,  /* Required at end of array. */
     };

      for (;;) {
           int c;
          c = getopt_long (argc, argv, short_options, long_options, NULL);//
           if (c == -1) {
               break ;
          }
           switch (c) {
           case 'h' :
               printf ("The usage of this program...\n" );
               break ;
           case 'v' :
               printf ("set the program's log verbose...\n");
               break ;
           case 'V' :
               printf ("The version is 0.1 ...\n" );
               break ;
           case 'o' :
               printf ("The output file is %s.\n" ,optarg);
               break ;
           case '?' :
               printf ("Invalid option , abort the program.");
               exit (-1);
           default // unexpected
             abort ();
          }
     }

      return 0;
}

参数是:
命令行参数选项处置:getopt()及getopt_long()函数使用
结果:
The usage of this program...
set the program's log verbose...
The version is 0.1 ...
The output file is outputfile.

应用场景分析

在openvswitch的源码中,每个组件的启动过程都会牵扯到命令行参数的解析,处理思路都是类似的。下面是我对ovsdb-client中代码的这部分代码的抽离,明确这个过程做了哪些事情。
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <getopt.h>
#include <limits.h>


void out_of_memory( void ){
      printf( "virtual memory exhausted\n" );
      abort();
}

// xmalloc最终还是调用标准C的 malloc,只不过进行了包装,
//保证内存会分配成功,否则就因此执行终止应用程序。
void *xmalloc( size_t size){
    void *p = malloc (size ? size : 1);
    if (p == NULL) {
        out_of_memory();
    }
    return p;
}

char *xmemdup0( const char *p_, size_t length){
    char *p = xmalloc(length + 1);
    memcpy(p, p_, length);
    p[length] = '\0';
    return p;
}

//Duplicates a character string without fail, using xmalloc to obtain memory.
char *xstrdup( const char *s){
    return xmemdup0(s, strlen (s));
}

/* Given the GNU-style long options in 'options', returns a string that may be
 * passed to getopt() with the corresponding short options.  The caller is
 * responsible for freeing the string. */
char *long_options_to_short_options( const struct option options[]){
    char short_options[UCHAR_MAX * 3 + 1];
    char *p = short_options;

    for (; options-> name; options++) {
        const struct option *o = options;
        if (o->flag == NULL && o-> val > 0 && o-> val <= UCHAR_MAX) {
            *p++ = o-> val;
            if (o->has_arg == required_argument) {
                *p++ = ':';
            } else if (o->has_arg == optional_argument) {
                *p++ = ':';
                *p++ = ':';
            }
        }
    }
    *p = '\0';
    //不能直接返回局部变量:字符数组,需要在堆上分配空间,然后返回对应的指针。
    return xstrdup(short_options);
}

static void
parse_options( int argc, char *argv[])
{
    enum {
        OPT_BOOTSTRAP_CA_CERT = UCHAR_MAX + 1,
        OPT_TIMESTAMP ,
        DAEMON_OPTION_ENUMS ,
        TABLE_OPTION_ENUMS
    };
    static struct option long_options[] = {
        { "verbose" optional_argument , NULL, 'v' },
        { "help" no_argument , NULL, 'h' },
        { "version" no_argument , NULL, 'V' },
        { "timestamp "no_argument, NULL, OPT_TIMESTAMP },
        {NULL, 0, NULL, 0},
    };

    char *short_options = long_options_to_short_options(long_options);
    //当把把长短选项分离出来之后,就是上面的处理套路
    //这里仅仅打印出short options
    printf( "%s\n" ,short_options);

    free(short_options);
}

int main( int argc, char **argv) {
     parse_options(argc, argv);

      return 0;
}

参考资料:
1.http://www.gnu.org/software/libc/manual/html_node/Getopt-Long-Options.html
2.http://www.ibm.com/developerworks/cn/aix/library/au-unix-getopt.html
3. http://www.cppblog.com/cuijixin/archive/2010/06/13/117788.html
4.OVS源码