Two identical parse requests can return different results #54

zurk · 2019-01-30T14:28:43Z

Guys, I found one really weird big in bblfsh. I think it is related to daemon itself but cannot be 100% sure. So do not hesitate to move this issue if I am wrong.

Context

Machine learning team really want to have reproducible results everywhere where it is possible.
So we fix all random seeds to have a deterministic code. Suddenly we find out that two identical runs of style-analyzer quality estimation give different results: src-d/style-analyzer#557. Clues lead to bblfsh and I was able to reproduce the bug without our code. Let me show you.

How to reproduce

Run in bash

docker run -d --rm --name style_analyzer_bblfshd --privileged -p 9432\:9432 \
                bblfsh/bblfshd\:v2.11.0 --log-level DEBUG
docker exec style_analyzer_bblfshd bblfshctl driver install \
                javascript docker://bblfsh/javascript-driver\:v2.6.0
# clone repo with data
git clone https://github.com/laravel/telescope
git checkout 6f0a10ec586cfa1a22218b6778bf9c1572b97912

Also checked for v1.2.0 driver version.

Run in python3

from collections import defaultdict
import pickle
import time
import os
import bblfsh

bc = bblfsh.BblfshClient("localhost:9432")
res = defaultdict(lambda: defaultdict(int))
times = defaultdict(list)
iter_n = 10
for i in range(iter_n):
    for (dirpath, dirnames, filenames) in os.walk("telescope"):
        for filename in filenames:
            try:
                t_start = time.time()
                filepath = os.path.join(dirpath, filename)
                if ".git" in filepath:
                    continue
                resp = bc.parse(filename=filepath)
                if len(res[filepath]) > 0 and resp.status not in res[filepath]:
                    print("iter %.3d New status %d for filepath %s. counter: %s " % (
                        i + 1, resp.status, filepath, dict(res[filepath])))
                res[filepath][resp.status] += 1
                times[filepath].append(time.time() - t_start)
            except Exception as e:
                print(filepath, e)
                res[filepath][repr(e)] += 1

I do not think it is important but bblfsh package version is 2.12.6.

My output head

telescope/resources/.DS_Store Content must be UTF-8, ASCII or Base64 encoded
telescope/resources/.DS_Store Content must be UTF-8, ASCII or Base64 encoded
iter 003 New status 2 for filepath telescope/webpack.mix.js. counter: {0: 2} 
telescope/resources/.DS_Store Content must be UTF-8, ASCII or Base64 encoded
telescope/resources/.DS_Store Content must be UTF-8, ASCII or Base64 encoded
iter 004 New status 2 for filepath telescope/resources/js/routes.js. counter: {0: 3} 
iter 004 New status 2 for filepath telescope/resources/js/base.js. counter: {0: 3}

How to interpret

Mainly, if you see the message, that some file gets a new status that means that during the identical requests bblfsh return different statuses. For example, for telescope/webpack.mix.js and telescope/resources/js/routes.js it was able to parse file first but not the second time.

Notes

If you set iter_n to bigger number you get much more output messages.
Sometimes performance for big iter_n become super bad. Can be related to perf: bblfshd 2.9 non responsive after parsing 1000+ files bblfshd#226
If you feed only js files it works great and always parse them for iter_n=100.

So it is important for ML team to have reproducible output from bblfsh.
Please tell me if you are able to reproduce the issue or need more info from my side.

The text was updated successfully, but these errors were encountered:

creachadair · 2019-01-30T16:00:03Z

Out of curiosity, have you seen this happen with languages other than JS? I'm asking since it's not yet clear whether this points to a driver bug (which I suspect) or an SDK bug (which is also possible).

zurk · 2019-01-30T17:35:58Z

No, but it is fast to check, so I did it. Thanks for the question because I found a bug in my code snippet and fix it (first msg updated). But the bug is still on place.

So, I run the test for latest python driver and https://github.com/src-d/style-analyzer repo and it looks fine. The bblfsh output was the same during 28 iterantions and then I stop it.

I think we can safely move this one to https://github.com/bblfsh/javascript-driver/issues

creachadair · 2019-01-30T17:55:57Z

I think we can safely move this one to https://github.com/bblfsh/javascript-driver/issues

I agree, though I don't seem to have the permissions to do that.

zurk · 2019-02-05T12:16:49Z

@smola can I kindly ask you to move this issue to https://github.com/bblfsh/javascript-driver?

bzz · 2019-02-12T16:28:24Z

Was trying to reproduce this with latest Golang v3 client instead of old python v1, to check if this is fixed in latest version using the code from details

go run bblfsh_js.go

and it is not reproduced exactly, but out of 112, there was only a single file telescope/public/app.js which fails and seem to produce consistent but somehow un-deterministic error messages (different order and number of failing checks by type)

First run

transform failed: received 15 errors:
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenStart
\tcheck: key \"extra\": field was not used: parenStart

Another run

transform failed: received 15 errors:
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenStart
\tcheck: key \"extra\": field was not used: parenStart
\tcheck: key \"extra\": field was not used: parenStart
\tcheck: key \"extra\": field was not used: parenStart

Other then that, there is no difference on 10 iterations using latest UASTv2-enabled client.

package main

//git clone https://github.com/laravel/telescope; cd telescope
//git checkout 6f0a10ec586cfa1a22218b6778bf9c1572b97912; cd ..
//go run bblfsh_js.go

import (
	"fmt"
	"os"
	"path/filepath"

	"github.com/sanity-io/litter"
	"gopkg.in/bblfsh/client-go.v3"
)

const (
	numberOfIterations = 2
	dataDir            = "telescope"
)

var filesStatuses map[string]map[string]int

func main() {
	litter.Config.Compact = true
	filesStatuses = make(map[string]map[string]int)
	client, err := bblfsh.NewClient("127.0.0.1:9432")
	if err != nil {
		panic(err)
	}

	for i := 0; i < numberOfIterations; i++ {
		processed := parseEveryFileIn(dataDir, client)
		fmt.Printf("Iter: %d done, processed: %d\n", i, processed)
	}

	fmt.Printf("\nTotal files: %d\n", len(filesStatuses))
	for file, statuses := range filesStatuses {
		for _, count := range statuses {
			if count != numberOfIterations {
				fmt.Printf("\t file: %s, %d statuses: %v\n", file, len(statuses), litter.Sdump(statuses))
			}
			break
		}
	}
}

func parseEveryFileIn(dir string, client *bblfsh.Client) int {
	processed := 0
	err := filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
		if err != nil {
			fmt.Printf("failed to access the path %q: %v\n", path, err)
			return err
		}
		if info.IsDir() && info.Name() == ".git" { //do not walk .git/*
			return filepath.SkipDir
		}
		if info.IsDir() { //do not process any dir
			return nil
		}

		_, lang, err := client.NewParseRequest().ReadFile(path).UAST()
		status := status(lang, err)

		if _, ok := filesStatuses[path]; ok {
			fs := filesStatuses[path]
			if _, ok := fs[status]; !ok {
				fmt.Printf("\tNew status for %s: '%s'\n", path, status)
			}
			fs[status]++
		} else {
			//fmt.Printf("\tProcessing '%s' got '%s'\n", path, status)
			filesStatuses[path] = make(map[string]int)
			filesStatuses[path][status] = 1
		}

		processed++
		return nil
	})
	if err != nil {
		fmt.Printf("error walking the path %q: %v\n", dir, err)
	}

	return processed
}

func status(lang string, err error) string {
	return fmt.Sprintf("lang:'%s' err:'%s'", lang, err)
}

Next step would be to confirm if this behavior of the latest client version is consistent with new Python client bblfsh/python-client#128

BTW, the offender here is this 2.4mb app.js file and same behavior (randomized errors) is easily reproducible with multiple subsequent runs of bblfsh-cli telescope/public/app.js

dennwc · 2019-02-12T16:42:49Z

@bzz This helps a lot, actually. The output of a failing transformation is non-deterministic (random map iteration). But it is deterministic for a successful one.

Can you please open an issue with the file that generates those errors? Thanks!

creachadair · 2019-02-12T16:57:25Z

I suspect a map-traversal order issue on the error path.

dennwc · 2019-02-12T17:01:35Z

@creachadair Exactly. Although I'm not sure if we want to change it - it will affect performance.

bzz · 2019-02-12T17:19:41Z

I have updated the #54 (comment) above with the link - do you mean we need a separate issue in SDK as well?

Although I'm not sure if we want to change it - it will affect performance.

Am planing to submit a workaround by sorting them first.

dennwc · 2019-02-12T17:26:52Z

@bzz It's JS-related issue since it fails to transform a valid JS AST.

Regarding sorting errors, I'm not against it, but it won't fix the actual issue with the non-deterministic AST output. This is because AST traversal order is non-deterministic in DSL, and it may terminate the transform on fatal errors (read: randomly depending on the AST branch taken).

The random order is intentional since it provides better performance (no allocation or sorting) and provides some level of protection against invalid transforms that rely on iteration order. And it doesn't affect the successful transformation passes.

bzz · 2019-02-12T17:40:21Z

but it won't fix the actual issue

makes sense, thank you for explaining! And TBH I do not think it's an issue at all - as soon parsing consistently fails for the same file, it's not a big deal having a slightly different error messages.

Ok, will just investigate further why it fails for this particular file, if it's consistent between clients and skip the sorting.

And examples of code in here might be useful for future regression testing automation that we can run on every release.

creachadair · 2019-02-12T19:14:53Z

@creachadair Exactly. Although I'm not sure if we want to change it - it will affect performance.

I don't think performance on an error path is a concern: When we get an error, we're already committed to aborting a fairly expensive transaction, so it won't happen in a loop.

In contrast, having to do subtle workarounds to compensate for time-varying output in the "bad" cases is almost certainly not worth it. Reproducibility is really really helpful. :)

zurk · 2019-02-12T21:28:34Z

Completely agree with @creachadair
as a bblfsh user, I'd like to have reproducibility even if it affects performance.

dennwc · 2019-02-12T22:29:46Z

I agree with a general concern about reproducibility. And the successful transformation have this already - in other cases all fixture tests fill fail randomly, which doesn't happen.

@creachadair The problem is that the issue @bzz found is not on an error path. The main code path does this random iteration for performance reasons. Sorting keys for each node on every transform step and for every suboperation will have a very severe performance impact.

The real question is: why it returns different statuses? Are there any cases when it parses the file for the first time and fails the second time? If so - it's a bug in DSL or SDK. Unrelated to iteration order.

creachadair · 2019-02-12T22:35:01Z

@creachadair The problem is that the issue @bzz found is not on an error path. The main code path does this random iteration for performance reasons. Sorting keys for each node on every transform step and for every suboperation will have a very severe performance impact.

The real question is: why it returns different statuses? Are there any cases when it parses the file for the first time and fails the second time? If so - it's a bug in DSL or SDK. Unrelated to iteration order.

I agree it doesn't make sense to sort each time. What I guess I'm not clear about is where order matters: If order is only observed at the point where a result is reported to the caller, then it should be fine to sort the results once, at the point of return (e.g., a gRPC handler maybe).

If order matters at other times, then we should consider using a different data structure. But it sounds like the real issue is just that the final return value is variable, is that correct?

dennwc · 2019-02-12T22:40:59Z

It's still not clear to me if the bug that @bzz found is the same as what @zurk originally reported.

Because if it's the same, it means that the DSL terminates the transform half-way because of an error and then reports different results because of the random iteration order (different branches visited before an error condition).

But if that's not the same, it means that some annotation is flaky and should be fixed.

bzz · 2019-02-22T17:21:06Z

I will get back to this as soon as driver is migrated to latest SDK and missing annotations are added (#57) and post the updated results.

vmarkovtsev · 2019-03-08T14:40:06Z

@bzz We are going to port to 2.7.0 (the current 2.6.0 has some problems) in 🔥src-d/style-analyzer#666 😈 and then we will try to reproduce again. Hopefully, it will be healed automagically with the help of Satan. 🔥

bzz · 2019-03-11T11:30:34Z

Awesome! This is still very useful and is open due to #62 right now so yeah 😈

bzz · 2019-03-13T18:29:11Z

#62 fixed, #68 found and added to v2.7.0 scope - after that, this issue is not reproducible any more and everything works as expected!

bzz · 2019-03-15T12:36:36Z

v2.7.0 is out now!

So far, this issue is not reproducible any more with go-client app from #54 (comment):

$ docker run --rm -it -p 9432:9432 bblfsh/javascript-driver:v2.7.0 --log-level=debug
$ go run bblfsh_js.go
Iter: 0 done, processed: 1528
Iter: 0 done, processed: 1528
Iter: 1 done, processed: 1528
Iter: 2 done, processed: 1528
Iter: 3 done, processed: 1528
Iter: 4 done, processed: 1528

@zurk could you verify that on your side please?

zurk · 2019-03-19T09:51:19Z

Yes, I can not reproduce the bug. Looks good, thank you!

smola transferred this issue from bblfsh/bblfshd Feb 7, 2019

bzz mentioned this issue Feb 7, 2019

Migrate from v1 JS driver to v2 src-d/style-analyzer#578

Closed

bzz added the bug label Feb 7, 2019

vmarkovtsev mentioned this issue Feb 8, 2019

add engineering OKRs for 2019Q1 src-d/okrs#109

Merged

bzz self-assigned this Feb 8, 2019

bzz changed the title ~~Two identical bblfsh parse requests can give different results~~ Two identical parse requests can return different results Feb 12, 2019

bzz mentioned this issue Feb 12, 2019

Parsing fails on valid JS file: extra + parenthesized/parenStart #56

Closed

bzz mentioned this issue Mar 5, 2019

Parsing fails on valid JS: string literal with \0 #62

Closed

bzz added this to the 2.7.0 release milestone Mar 5, 2019

bzz modified the milestones: v2.7.0, v2.7.1 Mar 12, 2019

zurk closed this as completed Mar 19, 2019

zurk mentioned this issue Apr 3, 2019

Remove expected vnodes number calculation and check src-d/style-analyzer#739

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two identical parse requests can return different results #54

Two identical parse requests can return different results #54

zurk commented Jan 30, 2019

creachadair commented Jan 30, 2019

zurk commented Jan 30, 2019

creachadair commented Jan 30, 2019

zurk commented Feb 5, 2019

bzz commented Feb 12, 2019 •

edited

Loading

dennwc commented Feb 12, 2019 •

edited

Loading

creachadair commented Feb 12, 2019

dennwc commented Feb 12, 2019

bzz commented Feb 12, 2019 •

edited

Loading

dennwc commented Feb 12, 2019

bzz commented Feb 12, 2019 •

edited

Loading

creachadair commented Feb 12, 2019 •

edited

Loading

zurk commented Feb 12, 2019

dennwc commented Feb 12, 2019

creachadair commented Feb 12, 2019

dennwc commented Feb 12, 2019

bzz commented Feb 22, 2019

vmarkovtsev commented Mar 8, 2019 •

edited

Loading

bzz commented Mar 11, 2019

bzz commented Mar 13, 2019 •

edited

Loading

bzz commented Mar 15, 2019

zurk commented Mar 19, 2019

Two identical parse requests can return different results #54

Two identical parse requests can return different results #54

Comments

zurk commented Jan 30, 2019

Context

How to reproduce

My output head

How to interpret

Notes

creachadair commented Jan 30, 2019

zurk commented Jan 30, 2019

creachadair commented Jan 30, 2019

zurk commented Feb 5, 2019

bzz commented Feb 12, 2019 • edited Loading

dennwc commented Feb 12, 2019 • edited Loading

creachadair commented Feb 12, 2019

dennwc commented Feb 12, 2019

bzz commented Feb 12, 2019 • edited Loading

dennwc commented Feb 12, 2019

bzz commented Feb 12, 2019 • edited Loading

creachadair commented Feb 12, 2019 • edited Loading

zurk commented Feb 12, 2019

dennwc commented Feb 12, 2019

creachadair commented Feb 12, 2019

dennwc commented Feb 12, 2019

bzz commented Feb 22, 2019

vmarkovtsev commented Mar 8, 2019 • edited Loading

bzz commented Mar 11, 2019

bzz commented Mar 13, 2019 • edited Loading

bzz commented Mar 15, 2019

zurk commented Mar 19, 2019

bzz commented Feb 12, 2019 •

edited

Loading

dennwc commented Feb 12, 2019 •

edited

Loading

bzz commented Feb 12, 2019 •

edited

Loading

bzz commented Feb 12, 2019 •

edited

Loading

creachadair commented Feb 12, 2019 •

edited

Loading

vmarkovtsev commented Mar 8, 2019 •

edited

Loading

bzz commented Mar 13, 2019 •

edited

Loading