Go编译器小知识：1-Go是如何为行末添加分号的

小半 • 2023年10月24日上午11:18 • 微信精选 • 阅读 85

本文基于Go 1.17写作

在我们写Go程序时，不需要手动的在行末输入分号，编译器会自动帮你完成这件事，在行末添加分号，那么Go是如何帮你在行末添加分号的呢？

cmd/compile/internal/syntax/parser.go中有这样一个结构体:

type parser struct {
    ...
    scanner

    ...
}

parser是执行语法分析的主体，今天我们的主角是词法分析器，也就是parser肚子里这个匿名字段:scanner。

scanner结构体里存储着我们源代码的原始信息，比如每个源代码字符。同时也储存着词法分析的相关数据，比如当前分析到哪一行，哪一列，当前正在分析的Token是什么。

type scanner struct {
    source // 源代码信息存储在这个结构体里

    // 当前处理到的Token等
    line, col uint
    blank     bool // line is blank up to col
    tok       token
    lit       string   // valid if tok is _Name, _Literal, or _Semi ("semicolon", "newline", or "EOF"); may be malformed if bad is true
    ...
}

scanner有一系列的方法，其中next方法使scanner来读取下一个Token:

func (s *scanner) next() {
    ...
    // 跳过空格
    for s.ch == ' ' || s.ch == 't' || s.ch == 'n' && !nlsemi || s.ch == 'r' {
        //处理下一个字符
        s.nextch()
    }
    
    // 处理字母
    if isLetter(s.ch) || s.ch >= utf8.RuneSelf && s.atIdentChar(true) {
        s.nextch()
        s.ident()
        return
    }
    
    // 处理符号和数字
    // 如果当前字符是：
    switch s.ch {
    // 文件结尾
    case -1:
        if nlsemi {
            s.lit = "EOF"
            s.tok = _Semi
            break
        }
        s.tok = _EOF
    // 换行符
    case 'n':
        s.nextch()
        s.lit = "newline"
        // 设置当前Token为分号
        s.tok = _Semi

    case '0', '1', '2', '3', '4', '5', '6', '7', '8', '9':
        s.number(false)
    ...
}