CodeSurvey is a framework and tool to survey code repositories for language feature usage, library usage, and more:
- Survey a specific set of repositories, or randomly sample repositories from services like GitHub
- Built-in support for analyzing Python code; extensible to support any language
- Write simple Python functions to define the code features you want to survey; record arbitrary details of feature occurrences
- Supports parallelizization of repository downloading and analysis across multiple processes
- Logging and progress tracking to monitor your survey as it runs
- Inspect the results as Python objects, or in an sqlite database
pip install codesurvey
The CodeSurvey
class can easily be configured to run a survey, such
as to measure how often the math
module is used in a random set of
recently updated Python repositories from GitHub:
from codesurvey import CodeSurvey
from codesurvey.sources import GithubSampleSource
from codesurvey.analyzers.python import PythonAstAnalyzer
from codesurvey.analyzers.python.features import py_module_feature_finder
# Define a FeatureFinder to look for the `math` module in Python code
has_math = py_module_feature_finder('math', modules=['math'])
# Configure the survey
survey = CodeSurvey(
db_filepath='math_survey.sqlite3',
sources=[
GithubSampleSource(language='python'),
],
analyzers=[
PythonAstAnalyzer(
feature_finders=[
has_math,
],
),
],
max_workers=5,
)
# Run the survey on 10 repositories
survey.run(max_repos=10)
# Report on the results
repo_features = survey.get_repo_features(feature_names=['math'])
repo_count_with_math = sum([
1 for repo_feature in repo_features if
repo_feature.occurrence_count > 0
])
print(f'{repo_count_with_math} out of {len(repo_features)} repos use math')
- For more Sources of repositories, see Source docs
- For more Analyzers and FeatureFinders, see Analyzer docs
- For more options and methods for inspecting results, see
CodeSurvey
docs - For details on directly inspecting the sqlite database of survey results see Database docs
- More examples can be found in examples
- Install Poetry dependencies with
make deps
- Documentation:
- Run local server:
make docs-serve
- Build docs:
make docs-build
- Deploy docs to GitHub Pages:
make docs-github
- Docstring style follows the Google style guide
- Run local server:
- Add unit tests