Golang正则表达式使用及简单示例

我发现很多从写脚本转过来写Go代码的开发者都会对Go的正则表达式 (Regular Expression) 功能有微词。普遍觉得其灵活性和完善度都不太给力。其实两个都是有原因的。灵活性其实主要是对Go regexp包的设计哲学不理解。完善度则主要是因为regexp包承诺match的时间与输入长度成线性关系，因此有些表达式就无法支持了，比如 (?!re) ：

The regexp implementation provided by this package is guaranteed to run in time linear in the size of the input.

所以，本质上这不是一个「够不够完善」的问题，而是「技术实现决定了它支持程度就是这样」。更多讨论可以关注这里。

Go的正则表达式采用RE2语法，详细语法及支持情况可以参阅[这里]。

regex包的设计原则

regexp包的方法命名规则如下：

Find(All)?(String)?(Submatch)?(Index)?

包含All的方法捕获所有match, 返回值是一个slice. 同时一般会提供一个参数n作为最大匹配次数。

包含String的方法对string类型进行匹配，反之对[]byte进行匹配。

包含Submatch的方法返回所有子匹配，返回值是一个slice. 位置0是对应整个正则表达式匹配结果，位置n(n>0)是第n个子表达式(group) 匹配结果。

包含Index的方法返回匹配的位置。例如，返回loc []int, 则与之对应的匹配字符为src[loc[0]:loc[1]].

匹配Unicode字符

Unicode character class (one-letter name): \pN
Unicode character class: \p{Greek}

    test1 := “123中文汉字abc321”
    reg1 := regexp.MustCompile(`\p{L}+`)
    match := reg1.FindString(test1)
    log.Println(match)

    // 输出： 中文汉字abc

test1 := “123中文汉字abc321”

reg1 := regexp.MustCompile(`\p{L}+`)

match := reg1.FindString(test1)

log.Println(match)

// 输出：中文汉字abc

含参数

    test1 := `123中文汉字abc321`
    reg2 := regexp.MustCompile(`(?P<All>(?P<Number>\d+)(?P<Letter>\p{L}+)(?P<Number>\d+))`)
    names := reg2.SubexpNames()
    for k, v := range names {
        log.Printf("%d: %s", k, v)
    }
    matches := reg2.FindStringSubmatch(test1)
    for k, v := range matches {
        log.Printf("%d: %s", k, v)
    }

    // 输出：
    // 2016/04/25 16:26:18 0: 
    // 2016/04/25 16:26:18 1: All
    // 2016/04/25 16:26:18 2: Number
    // 2016/04/25 16:26:18 3: Letter
    // 2016/04/25 16:26:18 4: Number
    // 2016/04/25 16:26:18 0: 123中文汉字abc321
    // 2016/04/25 16:26:18 1: 123中文汉字abc321
    // 2016/04/25 16:26:18 2: 123
    // 2016/04/25 16:26:18 3: 中文汉字abc
    // 2016/04/25 16:26:18 4: 321

test1 := `123中文汉字abc321`

reg2 := regexp.MustCompile(`(?P<All>(?P<Number>\d+)(?P<Letter>\p{L}+)(?P<Number>\d+))`)

names := reg2.SubexpNames()

for k, v := range names {

log.Printf("%d: %s", k, v)

}

matches := reg2.FindStringSubmatch(test1)

for k, v := range matches {

log.Printf("%d: %s", k, v)

}

// 输出：

// 2016/04/25 16:26:18 0:

// 2016/04/25 16:26:18 1: All

// 2016/04/25 16:26:18 2: Number

// 2016/04/25 16:26:18 3: Letter

// 2016/04/25 16:26:18 4: Number

// 2016/04/25 16:26:18 0: 123中文汉字abc321

// 2016/04/25 16:26:18 1: 123中文汉字abc321

// 2016/04/25 16:26:18 2: 123

// 2016/04/25 16:26:18 3: 中文汉字abc

// 2016/04/25 16:26:18 4: 321

Golang反射的使用的正确姿势

Go本身不支持模板，因此在以往需要使用模板的场景下往往就需要使用反射(reflect). 反射使用多了以后会容易上瘾，有些人甚至会形成一种莫名其妙的鄙视链。文人相轻，看来在需要动手指的领域历来如此:) 。反射有两个问题，在使用前需要三思：

大量的使用反射会损失一定性能
Clear is better than clever. Reflection is never clear.

Go的类型设计上有一些基本原则，理解这些基本原则会有助于你理解反射的本质：

变量包括 <type, value> 两部分。理解这一点你就知道为什么nil != nil了。
type包括 static type和concrete type. 简单来说 static type是你在编码是看见的类型，concrete type是runtime系统看见的类型。
类型断言能否成功，取决于变量的concrete type，而不是static type. 因此，一个 reader变量如果它的concrete type也实现了write方法的话，它也可以被类型断言为writer.
Go中的反射依靠interface{}作为桥梁，因此遵循原则3. 例如，反射包.Kind方法返回的是concrete type, 而不是static type.

Talk is cheap, show some code:

package main

import (
    "fmt"
    "reflect"
)

type T struct {
    A int
    B string
}

func main() {
    t := T{23, "skidoo"}
    tt := reflect.TypeOf(t)
    fmt.Printf("t type:%v\n", tt)
    ttp := reflect.TypeOf(&t)
    fmt.Printf("t type:%v\n", ttp)
    // 要设置t的值，需要传入t的地址，而不是t的拷贝。
    // reflect.ValueOf(&t)只是一个地址的值，不是settable, 通过.Elem()解引用获取t本身的reflect.Value
    s := reflect.ValueOf(&t).Elem()
    typeOfT := s.Type()
    for i := 0; i < s.NumField(); i++ {
        f := s.Field(i)
        fmt.Printf("%d: %s %s = %v\n", i,
            typeOfT.Field(i).Name, f.Type(), f.Interface())
    }
}

// 输出结果
// t type:main.T
// t type:*main.T
// 0: A int = 23
// 1: B string = skidoo

package main

import (

"fmt"

"reflect"

)

type T struct {

A int

B string

}

func main() {

t := T{23, "skidoo"}

tt := reflect.TypeOf(t)

fmt.Printf("t type:%v\n", tt)

ttp := reflect.TypeOf(&t)

fmt.Printf("t type:%v\n", ttp)

// 要设置t的值，需要传入t的地址，而不是t的拷贝。

// reflect.ValueOf(&t)只是一个地址的值，不是settable, 通过.Elem()解引用获取t本身的reflect.Value

s := reflect.ValueOf(&t).Elem()

typeOfT := s.Type()

for i := 0; i < s.NumField(); i++ {

f := s.Field(i)

fmt.Printf("%d: %s %s = %v\n", i,

typeOfT.Field(i).Name, f.Type(), f.Interface())

}

// 输出结果

// t type:main.T

// t type:*main.T

// 0: A int = 23

// 1: B string = skidoo

行思錄 | Travel Coder

Arch, Coding, Life

月度归档：2016年04月

Golang正则表达式使用及简单示例

Golang正则表达式使用及简单示例

regex包的设计原则

匹配Unicode字符

含参数

Golang反射的使用的正确姿势

Golang反射的使用的正确姿势