Design Document: Building Python on Windows
Design documents are not descriptions of the current functionality of Bazel. Always go to the documentation for current information.
Author: Yun Peng
Design document published: 05 September 2016
- zipper now can specify actual path a file is added to
- Create Python executable zip file
- Make python executable zip a real self-extracting binary
- Using stub_template.txt as __main__.py and zip header in python executable zip
- Get rid of python executable zip file header
- Put runfiles tree under ‘runfiles’ directory to avoid conflict
As we keep finding new problems and coming up with new solutions, you can see some of the implementation in the previous changes has been deprecated in the latter ones. Here we only present the final solution.
After providing basic support for C++ and Java on Windows (although not perfect for C++ due to the wrapper scripts we use), Python becomes the last language we need to fix in order to make Bazel no longer experimental on Windows.
Currently, as described on bazel.build, the way py_binary works on Unix is:
A py_binary is an executable Python program consisting of a collection
.py source files (possibly belonging to other py_library rules),
*.runfiles directory tree containing all the code and data needed by
the program at run-time, and a stub script
that starts up the program with the correct initial environment and data.
This doesn’t work on Windows, because we don’t have runfiles tree on Windows.
A runfiles client library (which maps a runfile from runfiles path to
its real path) doesn’t solve the whole problem, because we also need
__init__.py file under every python source directory to make the
directory a recognizable python package. And it doesn’t make sense to
__init__.py files outside of runfiles tree. Therefore, how to
find a way to run python binary properly on Windows is our main problem.
Python has the ability to execute zip file as scripts since version 2.6.
When invoking the python interpreter with a zip file as the first argument,
it executes the
__main__.py file in the root directory of the archive.
The idea of the solution is to create a self-extracting zip file which
packages everything supposed to be in the original runfiles tree and add
the stub script as the
__main__.py file. We tell the stub script whether
it’s in a zip file or not. If it is, it first extracts the zip file to a
temporary directory as the runfiles tree, then set the correct environment
variables and runs the main python script. At the end of the execution,
it deletes the temporary directory.
We did three more things to achieve this goal:
1. Implement new feature in zipper for packaging runfiles tree easily
Creating the zip file is not a trivial thing, since we don’t have runfiles
tree at all. We should not only archive every runfile into the right path,
but also adding
__init__.py file to every directory. To make things
easier, we introduce a new feature in zipper (a custom zip tool of Bazel)
which makes users able to specify the actual path a file is archived into.
Zipper now supports the following semantics:
zipper cC x.zip [<zip_path>=][<file>]
$ zipper cC x.zip a/b/lib.py # Add file a/b/lib.py $ zipper cC x.zip a/b/__init__.py= # Add an empty file at a/b/__init__.py $ zipper cC x.zip a/b/main.py=foo/bar/bin.py # Add file foo/bar/bin.py as a/b/main.py
With the help of this feature, we can easily control the directory structure in the zip file.
2. Build the final python binary
The final python binary is the zip file with a
shebang. By doing this, we can run the binary in the following two ways:
$ ./bazel-bin/foo/bar/bin $ python ./bazel-bin/foo/bar/bin
And we can use the second way to run the binary from Windows native command line(cmd.exe). Since the zip file packages everything in runfiles tree, it can be copied to anywhere as a self-contained executable binary.
This flag can be used to tell Bazel whether or not it should build a zip file as the final binary. By default, it’s enabled on Windows, disabled on other platform.