Careful with io.TeeReader and json.Decoder

Some time ago, I was investigating very strange bug in our codebase:

http: ContentLength=513 with Body length 512

It was inside our Go reverse proxy, and what was very suspicious was the exact 1 byte discrepancy.

My suspicion was on gzip middleware, which is often the cause when you forget to set request.ContentLength after compressing the payload. However it turns out it was not the case.

Do you see anything problematic here?

buf := bytes.NewBuffer(make([]byte, 0)) 
reader := io.TeeReader(r.Body, buf)
query, err := json.NewDecoder(reader).Decode(&out)
if err != nil {
    // omited
}
r.Body.Close()
r.Body = ioutil.NopCloser(buf)

From the title of this post you can guess that indeed this is the problem. However it is not so easy to reproduce. Here is the same code (playground link):

package main

import (
	"fmt"
	"encoding/json"
	"strings"
	"bytes"
	"io"
)

func main() {
	data := strings.NewReader("{\"my\": \"jsalksjdnfalsknjdfaskljdnfalskjdnfalkjdnflksjdfksdflkajsndfklasjkladntrglkaj dnrgkljandr glkjandrgkljnadrlkgjn aderlkgjn adelrigbnu aelitbghn aeiljbrg asliejkbrgn alierbng laidjbfnglks djbfg lajkhdbf glkadjbrfg akljdbfg lakjdbfng lakjdbfg lahjkdbfgjndflaksjndflaksjndflkasjndflkasjdbflajshebrflaehrbgalkdjrbglakdjbrgn akldjrbgn lakjdfb glkasdbf gakljdfg adfg nlaksjdnflaksjdnflaksjdnflaksjdsfghnfsfghlaksjdnfsfghslaksjdnflaskjdnffghs fgh laskjdnflaksjdnfsjdnflkasjndflkajsndflkajsndfklnasdfa sdfasdfon\"}\n")
	origLen := data.Len()
	
	var unmarshaled interface{}
	
	copy := new(bytes.Buffer)
	reader := io.TeeReader(data, copy)
	err := json.NewDecoder(reader).Decode(&unmarshaled)
	
	fmt.Printf("%+v, len orig: %d, len new: %d", err, origLen , copy.Len())
}
<nil>, len orig: 512, len new: 512

But the size of the body and data in reader is identical. What is happening here?

I realised that the json decoder may have some optimization that it does not read the entire request body and skips some bytes (like trailing newlines). Which is what I tried here (playground link):

package main

import (
	"fmt"
	"encoding/json"
	"strings"
	"bytes"
	"io"
)

func main() {
	data := strings.NewReader("{\"my\": \"json\"}\n?")
	origLen := data.Len()
	
	var unmarshaled interface{}
	
	copy := new(bytes.Buffer)
	reader := io.TeeReader(data, copy)
	err := json.NewDecoder(reader).Decode(&unmarshaled)
	
	fmt.Printf("%+v, %+v, copy: %+v, len orig: %d, len new: %d", unmarshaled, err, copy.String(), origLen , copy.Len())
}
map[my:json], <nil>, copy: {"my": "json"}
?, len orig: 16, len new: 16

But it WORKS?! WHAT? So what is so special about the 512 bytes (from the error message)?

It turns out, that json decoder creates 512 bytes buffers into which it copies input.

It also looks ahead, to check if it should allocate another 512 bytes, and if next byte is end of object or empty space, then stop reading input.

So if I take the 1st playground example, and add a /n at the end of the JSON, it reproduces the issue (playground link):

package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io"
	"strings"
)

func main() {
	data := strings.NewReader("{\"my\": \"1jsalksjdnfalsknjdfaskljdnfalskjdnfalkjdnflksjdfksdflkajsndfklasjkladntrglkaj dnrgkljandr glkjandrgkljnadrlkgjn aderlkgjn adelrigbnu aelitbghn aeiljbrg asliejkbrgn alierbng laidjbfnglks djbfg lajkhdbf glkadjbrfg akljdbfg lakjdbfng lakjdbfg lahjkdbfgjndflaksjndflaksjndflkasjndflkasjdbflajshebrflaehrbgalkdjrbglakdjbrgn akldjrbgn lakjdfb glkasdbf gakljdfg adfg nlaksjdnflaksjdnflaksjdnflaksjdsfghnfsfghlaksjdnfsfghslaksjdnflaskjdnffghs fgh laskjdnflaksjdnfsjdnflkasjndflkajsndflkajsndfklnasdfa sdfasdfon\"}\n")
	origLen := data.Len()

	var unmarshaled interface{}

	copy := new(bytes.Buffer)
	reader := io.TeeReader(data, copy)
	err := json.NewDecoder(reader).Decode(&unmarshaled)

	fmt.Printf("%+v, len orig: %d, len new: %d", err, origLen, copy.Len())
}
<nil>, len orig: 513, len new: 512

So it seems like we can’t use json.Decoder together with io.TeeReader to duplicate request body.

I was thinking of creating a bug report, but the documentation of TeeReader states this correctly:

TeeReader returns a Reader that writes to w what it reads from r. All reads from r performed through it are matched with corresponding writes to w. There is no internal buffering - the write must complete before the read completes. Any error encountered while writing is reported as a read error.

So if you don’t read all bytes from r, not all bytes will be written to w. Simple.

To fix this, I switched to ioutil.ReadAll which reads all bytes, which is what we want.