Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module detection #41

Open
8 tasks
pionxzh opened this issue Nov 14, 2023 · 8 comments
Open
8 tasks

Module detection #41

pionxzh opened this issue Nov 14, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@pionxzh
Copy link
Owner

pionxzh commented Nov 14, 2023

Similar to what we alr have for babel runtime detection, consider introducing a module code detection that can help us transform the code and give the extracted module a better name other than module-xxxx.js.

@pionxzh pionxzh added the enhancement New feature or request label Nov 14, 2023
@StringKe
Copy link

It's usually easier to identify some dependencies by checking them yourself or reading something like LICENSE.txt.

I don't know if this will help, maybe the developer can identify some dependencies in advance?

/*! For license information please see main.64e92519.js.LICENSE.txt */
/*
      object-assign
      (c) Sindre Sorhus
      @license MIT
      */

/*
object-assign
(c) Sindre Sorhus
@license MIT
*/

/* NProgress, (c) 2013, 2014 Rico Sta. Cruz - http://ricostacruz.com/nprogress
 * @license MIT */

/*!

pica
https://github.com/nodeca/pica

*/

/*!
  Copyright (c) 2018 Jed Watson.
  Licensed under the MIT License (MIT), see
  http://jedwatson.github.io/classnames
*/

/*!
 * PEP v0.5.1 | https://github.com/jquery/PEP
 * Copyright jQuery Foundation and other contributors | http://jquery.org/license
 */

/*!
 * The buffer module from node.js, for the browser.
 *
 * @author   Feross Aboukhadijeh <feross@feross.org> <http://feross.org>
 * @license  MIT
 */

/*!
 * The buffer module from node.js, for the browser.
 *
 * @author   Feross Aboukhadijeh <https://feross.org>
 * @license  MIT
 */

/*! *****************************************************************************
Copyright (c) Microsoft Corporation.

Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH
REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT,
INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THIS SOFTWARE.
***************************************************************************** */

/*! Fabric.js Copyright 2008-2015, Printio (Juriy Zaytsev, Maxim Chernyak) */

/*! ieee754. BSD-3-Clause License. Feross Aboukhadijeh <https://feross.org/opensource> */

/*! js-cookie v3.0.5 | MIT */

/*! regenerator-runtime -- Copyright (c) 2014-present, Facebook, Inc. -- license (MIT): https://github.com/facebook/regenerator/blob/main/LICENSE */

/*! safe-buffer. MIT License. Feross Aboukhadijeh <https://feross.org/opensource> */

/**
 * @license
 * Copyright 2010-2022 Three.js Authors
 * SPDX-License-Identifier: MIT
 */

/**
 * @license
 * Lodash <https://lodash.com/>
 * Copyright OpenJS Foundation and other contributors <https://openjsf.org/>
 * Released under MIT license <https://lodash.com/license>
 * Based on Underscore.js 1.8.3 <http://underscorejs.org/LICENSE>
 * Copyright Jeremy Ashkenas, DocumentCloud and Investigative Reporters & Editors
 */

/**
 * @license React
 * react-dom.production.min.js
 *
 * Copyright (c) Facebook, Inc. and its affiliates.
 *
 * This source code is licensed under the MIT license found in the
 * LICENSE file in the root directory of this source tree.
 */

/**
 * @license React
 * react-jsx-runtime.production.min.js
 *
 * Copyright (c) Facebook, Inc. and its affiliates.
 *
 * This source code is licensed under the MIT license found in the
 * LICENSE file in the root directory of this source tree.
 */

/**
 * @license React
 * react.production.min.js
 *
 * Copyright (c) Facebook, Inc. and its affiliates.
 *
 * This source code is licensed under the MIT license found in the
 * LICENSE file in the root directory of this source tree.
 */

/**
 * @license React
 * scheduler.production.min.js
 *
 * Copyright (c) Facebook, Inc. and its affiliates.
 *
 * This source code is licensed under the MIT license found in the
 * LICENSE file in the root directory of this source tree.
 */

/**
 * @license React
 * use-sync-external-store-shim.development.js
 *
 * Copyright (c) Facebook, Inc. and its affiliates.
 *
 * This source code is licensed under the MIT license found in the
 * LICENSE file in the root directory of this source tree.
 */

/**
 * @license React
 * use-sync-external-store-shim.production.min.js
 *
 * Copyright (c) Facebook, Inc. and its affiliates.
 *
 * This source code is licensed under the MIT license found in the
 * LICENSE file in the root directory of this source tree.
 */

/**
 * @license React
 * use-sync-external-store-shim/with-selector.production.min.js
 *
 * Copyright (c) Facebook, Inc. and its affiliates.
 *
 * This source code is licensed under the MIT license found in the
 * LICENSE file in the root directory of this source tree.
 */

/** @license React v0.20.2
 * scheduler.production.min.js
 *
 * Copyright (c) Facebook, Inc. and its affiliates.
 *
 * This source code is licensed under the MIT license found in the
 * LICENSE file in the root directory of this source tree.
 */

/** @license React v16.13.1
 * react-is.production.min.js
 *
 * Copyright (c) Facebook, Inc. and its affiliates.
 *
 * This source code is licensed under the MIT license found in the
 * LICENSE file in the root directory of this source tree.
 */

/** @preserve
   * Counter block mode compatible with  Dr Brian Gladman fileenc.c
   * derived from CryptoJS.mode.CTR
   * Jan Hruby jhruby.web@gmail.com
   */

/** @preserve
  (c) 2012 by Cédric Mesnil. All rights reserved.
  	Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
  	    - Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
      - Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  	THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
  

@pionxzh
Copy link
Owner Author

pionxzh commented Nov 14, 2023

This issue existed because we needed to know which library we were processing to give an appropriate output. License can be a good hint for both humans and wakaru. Modern bundlers often destroy most information, including the method name, so a module/function detection is still required. And this list won't grow brainlessly; we will pick high-value targets. 🙏

Dev can still identify the module by themself and rename the module name.

@0xdevalias
Copy link

0xdevalias commented Nov 14, 2023

that can help us transform the code and give the extracted module a better name other than module-xxxx.js

This could then also tie in well with some of the ideas for 'unmangling identifiers' that I laid out here:

Theoretically if we can identify a common open source module, we could also have pre-processed that module to extract variable/function names, that we could then potentially apply back to the identified module.

I kind of think of this like 'debug symbols' used in compiled binaries.

Though technically, if you know the module and can get the original source; and you know the webpacked version of that code; you could also generate a sourcemap that lets the user map between the 2 versions of the code.


When I was manually attempting to reverse and identify the modules in #40, a couple of techniques I found useful:

  • searching for Symbol()s
  • searching for React .displayName and similar
  • searching for other arrays of static strings/similar
  • once interesting candidates had been found, searching for them on GitHub code search to try and identify the library/narrow things down

Edit: This might not be useful right now, but just added a new section to one of my gists with some higher level notes/thoughts on fingerprinting modules; that I might expand either directly, or based on how this issue pans out:

While it might be more effort than it's worth, it may also be possible to extract the patterns that wappalyzer was using to identify various libraries; which I made some basic notes on in this revision to the above gist:

@0xdevalias
Copy link

0xdevalias commented Nov 20, 2023

Within some webpacked code I was looking at (Ref):


We can easily identify a number of the React modules based on their license header; which also includes the original filename:

~/dev/0xdevalias/REDACTED/unpacked/_next/static/chunks/653.js:
 13730        "use strict";
 13731        /**
 13732:        * @license React
 13733         * react-is.production.min.js
 13734         *

~/dev/0xdevalias/REDACTED/unpacked/_next/static/chunks/framework.js:
    5      2920: function (e, n, t) {
    6        /**
    7:        * @license React
    8         * react-dom.production.min.js
    9         *
   ..
 8452      82875: function (e, n, t) {
 8453        /**
 8454:        * @license React
 8455         * react-jsx-runtime.production.min.js
 8456         *
 ....
 8492      99504: function (e, n) {
 8493        /**
 8494:        * @license React
 8495         * react.production.min.js
 8496         *
 ....
 8891      95507: function (e, n) {
 8892        /**
 8893:        * @license React
 8894         * scheduler.production.min.js
 8895         *

~/dev/0xdevalias/REDACTED/unpacked/_next/static/chunks/pages/_app.js:
 47741      93802: function (U, B) {
 47742        "use strict";
 47743:       /** @license React v16.13.1
 47744         * react-is.production.min.js
 47745         *
 .....
 54586        "use strict";
 54587        /**
 54588:        * @license React
 54589         * use-sync-external-store-shim.production.min.js
 54590         *
 .....
 54654        "use strict";
 54655        /**
 54656:        * @license React
 54657         * use-sync-external-store-shim/with-selector.production.min.js
 54658         *

And at least in this bundled code, statsig-js seems to make at least it's presence known (though this is the only thing in that module):

export default JSON.parse(
  '{"name":"statsig-js","version":"4.32.0","description":"Statsig JavaScript client SDK for single user environments.","main":"dist/index.js","types":"dist/index.d.ts","scripts":{"prepare":"rm -rf build/ && rm -rf dist/ && tsc && webpack","postbuild":"rm -rf build/**/*.map","test":"jest --config=jest-debug.config.js","testForGithubOrRedisEnthusiasts":"jest","test:watch":"jest --watch","build:dryrun":"npx tsc --noEmit","types":"npx tsc"},"files":["build/statsig-prod-web-sdk.js","dist/*.js","dist/*.d.ts","dist/utils/*.js","dist/utils/*.d.ts"],"jsdelivr":"build/statsig-prod-web-sdk.js","repository":{"type":"git","url":"git+https://github.com/statsig-io/js-client-sdk.git"},"author":"Statsig, Inc.","license":"ISC","bugs":{"url":"https://github.com/statsig-io/js-client-sdk/issues"},"keywords":["feature gate","feature flag","continuous deployment","ci","ab test"],"homepage":"https://www.statsig.com","devDependencies":{"@babel/preset-env":"^7.14.9","@babel/preset-typescript":"^7.14.5","@types/jest":"^27.1.0","@types/uuid":"^8.3.1","circular-dependency-plugin":"^5.2.2","core-js":"^3.16.4","jest":"^27.1.0","terser-webpack-plugin":"^5.1.4","ts-jest":"^27.1.0","ts-loader":"^9.2.3","typescript":"^4.2.2","webpack":"^5.75.0","webpack-cli":"^4.10.0"},"dependencies":{"js-sha256":"^0.9.0","uuid":"^8.3.2"},"importSort":{".js, .jsx, .ts, .tsx":{"style":"module","parser":"typescript"}}}'
);

See also:

@0xdevalias
Copy link

0xdevalias commented Dec 1, 2023

With regards to module detection/similar for React, these might be interesting/useful:

@0xdevalias
Copy link

I won't copy the content here in full as it was pretty long, but I detailed some of my higher level thoughts around some more 'esoteric' methods that might be applicable to module detection (AST fingerprinting, code similarity, etc) in this comment:

@0xdevalias
Copy link

0xdevalias commented Jan 13, 2024

This specific implementation is more related to detecting and injecting into webpack modules at runtime, but it might have some useful ideas/concepts that are applicable at the AST level too:

// ..snip..

export const common = { // Common modules
  React: findByProps('createElement'),
  ReactDOM: findByProps('render', 'hydrate'),

  Flux: findByProps('Store', 'connectStores'),
  FluxDispatcher: findByProps('register', 'wait'),

  i18n: findByProps('Messages', '_requestedLocale'),

  channels: findByProps('getChannelId', 'getVoiceChannelId'),
  constants: findByProps('API_HOST')
};

@0xdevalias
Copy link

There has recently been a new source of discussion around code fingerprinting and module identification over on the humanify repo in this issue:

Originally posted by @0xdevalias in #74 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants