BinSum - Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models

Important

We are in the process of releasing the dataset, adding more data/implementation details, and improving the documents. Please stay tuned.

What's New?

[Dec. 17, 2023] Our paper has been publicly available on arXiv. We are in the process of releasing the dataset.

Introduction

BinSum is a comprehensive benchmark and dataset of over 557K binary functions and introduce a novel method for prompt synthesis and optimization. To more accurately gauge LLM performance, we also propose a new semantic similarity metric that surpasses traditional exact-match approaches. Our extensive evaluation of prominent LLMs, including ChatGPT, GPT-4, Llama 2, and Code Llama, reveals 10 pivotal insights. This evaluation generates 4 billion inference tokens, incurred a total expense of 11,418 US dollars and 873 NVIDIA A100 GPU hours. Our findings highlight both the transformative potential of LLMs in this field and the challenges yet to be overcome.

Dataset

Coming soon.

Citation

If you find BinSum useful, please consider citing our paper:

BibTeX:

@article{jin2023binary,
  title={Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models},
  author={Xin Jin and Jonathan Larson and Weiwei Yang and Zhiqiang Lin},
  journal={arXiv preprint arXiv:2312.09601},
  year={2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BinSum - Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models

What's New?

Introduction

Dataset

Citation

About

Releases

Packages

xinjin95/BinSum

Folders and files

Latest commit

History

Repository files navigation

BinSum - Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models

What's New?

Introduction

Dataset

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages