My mistakes in Golang, regular expression and slice

Tung Nguyen
4 min readOct 29, 2020

Several years ago, I built a service by Golang and unluckily I made two serious performance issues with regular expression and slice. The issues look not complicated but they took me a lot of time to investigate and fix. In this article, I would like to share my mistakes along with solutions and benchmarks. Hopefully, it is helpful for you to avoid such cases in your work.

1. Using Regular expression

Mistake

In a release, I used regular expression to validate whether a date time string meets RFC3339 standard or not. An example of RFC3339 standard is “2017–04–01T22:08:41+00:00” and here I just use a simple regular expression for instance ^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}[+-]\\d{2}:\\d{2}$.

Usually the service only takes around 2% of CPU usage. But during a stress test of the release, it took around 50% of CPU usage. Beside that, response time of the service was super slow. At the beginning, I figured out many different assumptions related to go-routines, caching and even logging. However, after a profiling, I was extremely surprised that regular expression is the root cause of the issue.

Basically the validator was implemented as below:

package validationimport "regexp"func IsRFC3339V1(datetime string) bool {
r, _ := regexp.Compile("^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}[+-]\\d{2}:\\d{2}$")
return r.MatchString(datetime)
}

Reason

Like other languages, regular expression is quite expensive. So we should be careful when we use it.

Solution

I used the built-in date time parser to check date time strings. This solution resolved the issue completely and turned CPU usage back to 2% as usual. Following is the new validator:

package validationimport "time"func IsRFC3339V2(datetime string) bool {
_, err := time.Parse(time.RFC3339, datetime)
if nil != err {
return false
}
return true
}

Another Solution

Actually we can compiles the regular expression only one time and store the compiled result somewhere in the application. Then we just use it to validate date time strings without any extra compilation. However this solution needs a bit more implementation to manage the compiled regular expression.

package validationimport "regexp"func IsRFC3339V3(r *regexp.Regexp, datetime string) bool {
return r.MatchString(datetime)
}

Benchmark

BenchmarkIsRFC3339V1-8            123400              9478 ns/op
BenchmarkIsRFC3339V2-8 3541533 340 ns/op
BenchmarkIsRFC3339V3-8 4844779 244 ns/op

Here, BenchmarkIsRFC3339V1–8, BenchmarkIsRFC3339V2–8 and BenchmarkIsRFC3339V3–8 are benchmark results of IsRFC3339V1, IsRFC3339V2 and IsRFC3339V3 respectively.

According to the result, each operation of IsRFC3339V1 takes 9478 nanoseconds while IsRFC3339V2 takes 340 nanoseconds and IsRFC3339V3 just takes 244 nanoseconds. This illustrates that using the built-in date time parser is faster than using regular expression about 28 times and of course the more complex regular expression is, the more time validator takes. However using a pre-compiled regular expression even faster than using the built-in date time parser.

2. Declaring size for slice

Mistake

In another release, I faced one more terrible performance issue again. This time, the issue came from using slice to store variable length data. In the first days of Go-programming, I did not care about length and capacity of slices. In consequence, I created a new empty slice without length and capacity declaration then I just appended item by item into it. As a result, I got a performance problem during a stress test. Again a profiling indicated that append operation took a lot of time and of course it caused my service extremely slow.

The mistake can be demonstrated as below:

package slicefunc CreateSliceV1() {
s := make([]int64, 0)
for i := 0; i < 5; i++ {
s = append(s, 100)
}
}

Reason

The slices were used to store big objects, one of which needs a high deal amount of memory. Unfortunately, during the stress test, memory was fragmented too much, so it took time to resize the slice and alloc memory. I checked function growsSlice in slice.go from Golang source and I found that Go will create a new slice with a new capacity (double or 25% more depends on size of the old slice), then copy items from the old slice to the new slice (https://github.com/golang/go/blob/master/src/runtime/slice.go).

Solution

In fact, most of cases I already know the maximum length of a certain slice. Simply, I just declared length and capacity for it. Then I added items directly to a slice by index. This solution disappeared the problem absolutely.

Definitely the solution does not need much change:

package slicefunc CreateSliceV2() {
s := make([]int64, 5, 5)
for i := 0; i < 5; i++ {
s[i] = 100
}
}

Benchmark

BenchmarkCreateSliceV1-4       10000000           184 ns/op
BenchmarkCreateSliceV2-4 300000000 6.42 ns/op

In the benchmark, BenchmarkCreateSliceV1–4 and BenchmarkCreateSliceV2–4 are for CreateSliceV1 and CreateSliceV2 respectively. They show that CreateSliceV1 takes 184 nanoseconds/operation while CreateSliceV2 takes 6.42 nanoseconds/operation. Obviously, CreateSliceV2 is faster than CreateSliceV1 about 28 times.

PS: I do hope that this article is helpful for newbies to avoid these common mistakes.

--

--

Tung Nguyen

A coding lover. Mouse and keyboard are my friends all day long. Computer is a part of my life and coding is my cup of tea.