Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

Two identical parse requests can return different results #54

Closed
zurk opened this issue Jan 30, 2019 · 22 comments
Closed

Two identical parse requests can return different results #54

zurk opened this issue Jan 30, 2019 · 22 comments
Assignees
Labels
Milestone

Comments

@zurk
Copy link

zurk commented Jan 30, 2019

Guys, I found one really weird big in bblfsh. I think it is related to daemon itself but cannot be 100% sure. So do not hesitate to move this issue if I am wrong.

Context

Machine learning team really want to have reproducible results everywhere where it is possible.
So we fix all random seeds to have a deterministic code. Suddenly we find out that two identical runs of style-analyzer quality estimation give different results: src-d/style-analyzer#557. Clues lead to bblfsh and I was able to reproduce the bug without our code. Let me show you.

How to reproduce

Run in bash

docker run -d --rm --name style_analyzer_bblfshd --privileged -p 9432\:9432 \
                bblfsh/bblfshd\:v2.11.0 --log-level DEBUG
docker exec style_analyzer_bblfshd bblfshctl driver install \
                javascript docker://bblfsh/javascript-driver\:v2.6.0
# clone repo with data
git clone https://github.com/laravel/telescope
git checkout 6f0a10ec586cfa1a22218b6778bf9c1572b97912

Also checked for v1.2.0 driver version.

Run in python3

from collections import defaultdict
import pickle
import time
import os
import bblfsh

bc = bblfsh.BblfshClient("localhost:9432")
res = defaultdict(lambda: defaultdict(int))
times = defaultdict(list)
iter_n = 10
for i in range(iter_n):
    for (dirpath, dirnames, filenames) in os.walk("telescope"):
        for filename in filenames:
            try:
                t_start = time.time()
                filepath = os.path.join(dirpath, filename)
                if ".git" in filepath:
                    continue
                resp = bc.parse(filename=filepath)
                if len(res[filepath]) > 0 and resp.status not in res[filepath]:
                    print("iter %.3d New status %d for filepath %s. counter: %s " % (
                        i + 1, resp.status, filepath, dict(res[filepath])))
                res[filepath][resp.status] += 1
                times[filepath].append(time.time() - t_start)
            except Exception as e:
                print(filepath, e)
                res[filepath][repr(e)] += 1

I do not think it is important but bblfsh package version is 2.12.6.

My output head

telescope/resources/.DS_Store Content must be UTF-8, ASCII or Base64 encoded
telescope/resources/.DS_Store Content must be UTF-8, ASCII or Base64 encoded
iter 003 New status 2 for filepath telescope/webpack.mix.js. counter: {0: 2} 
telescope/resources/.DS_Store Content must be UTF-8, ASCII or Base64 encoded
telescope/resources/.DS_Store Content must be UTF-8, ASCII or Base64 encoded
iter 004 New status 2 for filepath telescope/resources/js/routes.js. counter: {0: 3} 
iter 004 New status 2 for filepath telescope/resources/js/base.js. counter: {0: 3} 

How to interpret

Mainly, if you see the message, that some file gets a new status that means that during the identical requests bblfsh return different statuses. For example, for telescope/webpack.mix.js and telescope/resources/js/routes.js it was able to parse file first but not the second time.

Notes

  1. If you set iter_n to bigger number you get much more output messages.
  2. Sometimes performance for big iter_n become super bad. Can be related to perf: bblfshd 2.9 non responsive after parsing 1000+ files bblfshd#226
  3. If you feed only js files it works great and always parse them for iter_n=100.

So it is important for ML team to have reproducible output from bblfsh.
Please tell me if you are able to reproduce the issue or need more info from my side.

@creachadair
Copy link
Contributor

Out of curiosity, have you seen this happen with languages other than JS? I'm asking since it's not yet clear whether this points to a driver bug (which I suspect) or an SDK bug (which is also possible).

@zurk
Copy link
Author

zurk commented Jan 30, 2019

No, but it is fast to check, so I did it. Thanks for the question because I found a bug in my code snippet and fix it (first msg updated). But the bug is still on place.

So, I run the test for latest python driver and https://github.com/src-d/style-analyzer repo and it looks fine. The bblfsh output was the same during 28 iterantions and then I stop it.

I think we can safely move this one to https://github.com/bblfsh/javascript-driver/issues

@creachadair
Copy link
Contributor

I think we can safely move this one to https://github.com/bblfsh/javascript-driver/issues

I agree, though I don't seem to have the permissions to do that.

@zurk
Copy link
Author

zurk commented Feb 5, 2019

@smola can I kindly ask you to move this issue to https://github.com/bblfsh/javascript-driver?

@smola smola transferred this issue from bblfsh/bblfshd Feb 7, 2019
@bzz bzz added the bug label Feb 7, 2019
@bzz bzz self-assigned this Feb 8, 2019
@bzz bzz changed the title Two identical bblfsh parse requests can give different results Two identical parse requests can return different results Feb 12, 2019
@bzz
Copy link
Contributor

bzz commented Feb 12, 2019

Was trying to reproduce this with latest Golang v3 client instead of old python v1, to check if this is fixed in latest version using the code from details

go run bblfsh_js.go

and it is not reproduced exactly, but out of 112, there was only a single file telescope/public/app.js which fails and seem to produce consistent but somehow un-deterministic error messages (different order and number of failing checks by type)

First run

transform failed: received 15 errors:
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenStart
\tcheck: key \"extra\": field was not used: parenStart

Another run

transform failed: received 15 errors:
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenthesized
\tcheck: key \"extra\": field was not used: parenStart
\tcheck: key \"extra\": field was not used: parenStart
\tcheck: key \"extra\": field was not used: parenStart
\tcheck: key \"extra\": field was not used: parenStart

Other then that, there is no difference on 10 iterations using latest UASTv2-enabled client.

package main

//git clone https://github.com/laravel/telescope; cd telescope
//git checkout 6f0a10ec586cfa1a22218b6778bf9c1572b97912; cd ..
//go run bblfsh_js.go

import (
	"fmt"
	"os"
	"path/filepath"

	"github.com/sanity-io/litter"
	"gopkg.in/bblfsh/client-go.v3"
)

const (
	numberOfIterations = 2
	dataDir            = "telescope"
)

var filesStatuses map[string]map[string]int

func main() {
	litter.Config.Compact = true
	filesStatuses = make(map[string]map[string]int)
	client, err := bblfsh.NewClient("127.0.0.1:9432")
	if err != nil {
		panic(err)
	}

	for i := 0; i < numberOfIterations; i++ {
		processed := parseEveryFileIn(dataDir, client)
		fmt.Printf("Iter: %d done, processed: %d\n", i, processed)
	}

	fmt.Printf("\nTotal files: %d\n", len(filesStatuses))
	for file, statuses := range filesStatuses {
		for _, count := range statuses {
			if count != numberOfIterations {
				fmt.Printf("\t file: %s, %d statuses: %v\n", file, len(statuses), litter.Sdump(statuses))
			}
			break
		}
	}
}

func parseEveryFileIn(dir string, client *bblfsh.Client) int {
	processed := 0
	err := filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
		if err != nil {
			fmt.Printf("failed to access the path %q: %v\n", path, err)
			return err
		}
		if info.IsDir() && info.Name() == ".git" { //do not walk .git/*
			return filepath.SkipDir
		}
		if info.IsDir() { //do not process any dir
			return nil
		}

		_, lang, err := client.NewParseRequest().ReadFile(path).UAST()
		status := status(lang, err)

		if _, ok := filesStatuses[path]; ok {
			fs := filesStatuses[path]
			if _, ok := fs[status]; !ok {
				fmt.Printf("\tNew status for %s: '%s'\n", path, status)
			}
			fs[status]++
		} else {
			//fmt.Printf("\tProcessing '%s' got '%s'\n", path, status)
			filesStatuses[path] = make(map[string]int)
			filesStatuses[path][status] = 1
		}

		processed++
		return nil
	})
	if err != nil {
		fmt.Printf("error walking the path %q: %v\n", dir, err)
	}

	return processed
}

func status(lang string, err error) string {
	return fmt.Sprintf("lang:'%s' err:'%s'", lang, err)
}

Next step would be to confirm if this behavior of the latest client version is consistent with new Python client bblfsh/python-client#128


BTW, the offender here is this 2.4mb app.js file and same behavior (randomized errors) is easily reproducible with multiple subsequent runs of bblfsh-cli telescope/public/app.js

@dennwc
Copy link
Member

dennwc commented Feb 12, 2019

@bzz This helps a lot, actually. The output of a failing transformation is non-deterministic (random map iteration). But it is deterministic for a successful one.

Can you please open an issue with the file that generates those errors? Thanks!

@creachadair
Copy link
Contributor

I suspect a map-traversal order issue on the error path.

@dennwc
Copy link
Member

dennwc commented Feb 12, 2019

@creachadair Exactly. Although I'm not sure if we want to change it - it will affect performance.

@bzz
Copy link
Contributor

bzz commented Feb 12, 2019

I have updated the #54 (comment) above with the link - do you mean we need a separate issue in SDK as well?

Although I'm not sure if we want to change it - it will affect performance.

Am planing to submit a workaround by sorting them first.

@dennwc
Copy link
Member

dennwc commented Feb 12, 2019

@bzz It's JS-related issue since it fails to transform a valid JS AST.

Regarding sorting errors, I'm not against it, but it won't fix the actual issue with the non-deterministic AST output. This is because AST traversal order is non-deterministic in DSL, and it may terminate the transform on fatal errors (read: randomly depending on the AST branch taken).

The random order is intentional since it provides better performance (no allocation or sorting) and provides some level of protection against invalid transforms that rely on iteration order. And it doesn't affect the successful transformation passes.

@bzz
Copy link
Contributor

bzz commented Feb 12, 2019

but it won't fix the actual issue

makes sense, thank you for explaining! And TBH I do not think it's an issue at all - as soon parsing consistently fails for the same file, it's not a big deal having a slightly different error messages.

Ok, will just investigate further why it fails for this particular file, if it's consistent between clients and skip the sorting.

And examples of code in here might be useful for future regression testing automation that we can run on every release.

@creachadair
Copy link
Contributor

creachadair commented Feb 12, 2019

@creachadair Exactly. Although I'm not sure if we want to change it - it will affect performance.

I don't think performance on an error path is a concern: When we get an error, we're already committed to aborting a fairly expensive transaction, so it won't happen in a loop.

In contrast, having to do subtle workarounds to compensate for time-varying output in the "bad" cases is almost certainly not worth it. Reproducibility is really really helpful. :)

@zurk
Copy link
Author

zurk commented Feb 12, 2019

Completely agree with @creachadair
as a bblfsh user, I'd like to have reproducibility even if it affects performance.

@dennwc
Copy link
Member

dennwc commented Feb 12, 2019

I agree with a general concern about reproducibility. And the successful transformation have this already - in other cases all fixture tests fill fail randomly, which doesn't happen.

@creachadair The problem is that the issue @bzz found is not on an error path. The main code path does this random iteration for performance reasons. Sorting keys for each node on every transform step and for every suboperation will have a very severe performance impact.

The real question is: why it returns different statuses? Are there any cases when it parses the file for the first time and fails the second time? If so - it's a bug in DSL or SDK. Unrelated to iteration order.

@creachadair
Copy link
Contributor

@creachadair The problem is that the issue @bzz found is not on an error path. The main code path does this random iteration for performance reasons. Sorting keys for each node on every transform step and for every suboperation will have a very severe performance impact.

The real question is: why it returns different statuses? Are there any cases when it parses the file for the first time and fails the second time? If so - it's a bug in DSL or SDK. Unrelated to iteration order.

I agree it doesn't make sense to sort each time. What I guess I'm not clear about is where order matters: If order is only observed at the point where a result is reported to the caller, then it should be fine to sort the results once, at the point of return (e.g., a gRPC handler maybe).

If order matters at other times, then we should consider using a different data structure. But it sounds like the real issue is just that the final return value is variable, is that correct?

@dennwc
Copy link
Member

dennwc commented Feb 12, 2019

It's still not clear to me if the bug that @bzz found is the same as what @zurk originally reported.

Because if it's the same, it means that the DSL terminates the transform half-way because of an error and then reports different results because of the random iteration order (different branches visited before an error condition).

But if that's not the same, it means that some annotation is flaky and should be fixed.

@bzz
Copy link
Contributor

bzz commented Feb 22, 2019

I will get back to this as soon as driver is migrated to latest SDK and missing annotations are added (#57) and post the updated results.

@vmarkovtsev
Copy link

vmarkovtsev commented Mar 8, 2019

@bzz We are going to port to 2.7.0 (the current 2.6.0 has some problems) in 🔥src-d/style-analyzer#666 😈 and then we will try to reproduce again. Hopefully, it will be healed automagically with the help of Satan. 🔥

@bzz
Copy link
Contributor

bzz commented Mar 11, 2019

Awesome! This is still very useful and is open due to #62 right now so yeah 😈

@bzz bzz modified the milestones: v2.7.0, v2.7.1 Mar 12, 2019
@bzz
Copy link
Contributor

bzz commented Mar 13, 2019

#62 fixed, #68 found and added to v2.7.0 scope - after that, this issue is not reproducible any more and everything works as expected!

@bzz
Copy link
Contributor

bzz commented Mar 15, 2019

v2.7.0 is out now!

So far, this issue is not reproducible any more with go-client app from #54 (comment):

$ docker run --rm -it -p 9432:9432 bblfsh/javascript-driver:v2.7.0 --log-level=debug
$ go run bblfsh_js.go
Iter: 0 done, processed: 1528
Iter: 0 done, processed: 1528
Iter: 1 done, processed: 1528
Iter: 2 done, processed: 1528
Iter: 3 done, processed: 1528
Iter: 4 done, processed: 1528

@zurk could you verify that on your side please?

@zurk
Copy link
Author

zurk commented Mar 19, 2019

Yes, I can not reproduce the bug. Looks good, thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants