Access to data from remote and private GitHub repositories that have forks

Carding Forum · Jul 26, 2024

Truffle Security has published attack scenarios for several standard repository management techniques on GitHub that allow you to extract data from remote repositories that have public forks or were created as forks.

The ability to access commits by hash in all fork-related repositories is caused by the fact that GitHub, in order to optimize and eliminate duplicates, stores all objects from the main repository and forks together, only logically separating the ownership of commits. Such storage allows you to view any commit from any fork in the main repository, explicitly specifying its hash in the URL. For example, a user can create a fork of the/torvalds/linux repository and add any code to it, after which this code will be available via a direct hash link in the/torvalds/linux repository. If a repository is deleted, if there is at least one public fork, data from the deleted repository remains available using the commit hash.

Three security risk scenarios are proposed:

* The first scenario involves developers creating forks of public repositories, adding changes to them, experimenting, and then deleting them. In addition to leaking code that is not intended for publication, there is a danger when working API access keys are added to the code of sample files in experiments. In this case, the attacker can get access to the change by the commit hash, which is addressed after the fork is deleted through the main repository. For example, using the proposed method, researchers were able to determine 30 working API access keys by studying three repositories related to machine learning with a large number of forks.

* The second scenario concerns the ability to gain access to data after deleting the primary repository, if forks were created for this repository. As an example, a case is given when the private keys of one of the employees were accidentally published in the public repository of one company, allowing them to get full access to all the repositories of this company on GitHub. The company deleted the repository through which the leak occurred, but the keys were still available for extraction through accessing the commit hash in repositories with forks.

* The third scenario is related to the development model of projects that develop a basic open version in a public repository and an extended proprietary version in a private one. If the company initially developed the project in a private repository, and then after opening the project code, it transferred it to the public category, but continued to develop a closed internal or extended version in a private fork, it is possible to access changes added to the private fork using commit hashes through the public repository. However, you can only access changes that were added to the private fork before the main repository was made public (the repositories of private and public repositories are separated, but when the two repositories were private, the commits were stored together, so they remained in the repository after it was made public).

The trick that allows you to access commits in forks of the repository via a link to the main repository has been known for many years and is periodically used for various jokes and misleading developers (for example, pranksters periodically create the appearance of replacing backdoors in the repository with the Linux kernel on GitHub). As a measure to counteract such jokes, GitHub added a warning that the requested commit does not belong to branches in the current repository and may belong to a fork. At the same time, the very possibility of accessing commits by hash in any fork-related repositories was considered harmless, since accessing data in remote and private forks requires knowledge of the commit hash.

Finding a commit hash based on the SHA-1 algorithm and including 32 characters is not realistic, but it turned out that this is not required. GitHub supports a shortened form of addressing commits, which allows you to address changes by the first few characters of the hash, if there are no intersections with other commits. The minimum number of characters for shortened hash addressing is 4, which corresponds to searching through only 65 thousand combinations (16^4). At the same time, brute force may not be necessary, since the GitHub API allows you to connect handlers for intercepting events that are used by third-party projects that maintain an archive with a complete log of all operations, in which information about commit hashes remains even after deleting repositories.

Access to data from remote and private GitHub repositories that have forks

Carding Forum

Professional

Similar threads