Intro
Probably the best way to introduce this post is to explain a bit of my background, and then describe the problem I’m trying to solve.
Background
I have been using python for data analysis work since about 2017, so around 3 years at the time of writing this post. I work on a small team, and so it’s necessary for us to be able to share code for things like implementing business logic, or connecting to internal data sources. I also maintain an open source package called stats_can that can be used to access Statistics Canada datasets in python.
My current packaging approach
The current way my team shares code is by having a repository with a lib
folder in it, and adding that folder to the PYTHONPATH
environment variable in Windows.
The current way I build new versions of stats_can
is through a cargo cult sequence of steps that I kind of sort of understand.
The problem
For the shared team library all of our stuff is basically in one giant package, broken up into subpackages. This leads to all sorts of problems:
- It’s very difficult to write tests for it.
- There’s no version numbering so it’s impossible to pin code at a particular version.
- We can’t share it easily with other teams, and we really can’t share just one particular subpackage of it with other teams.
- The whole thing just feels very wrong to me. I knew it wasn’t the way to go when I set it up, but I was very new to python and just didn’t have the experience/capacity to find a better way and it worked for the time being.
For stats_can my current system more or less works, it just has two problems:
- I only build conda packages. I’d like to allow
pip
users to access it but… - Like I said, the build process is a bit of a house of cards that I barely understand, so adding in another build steps scares me.
Both of the examples described above are for libraries. I’ve built a couple of small apps, but have even less of an idea the correct way to build/deploy them.
What I’m hoping to do here
Basically I want to figure out the current best practice way to do the following:
- build a library package with versions that can be installed with
pip
andconda
- deploy those packages to both a privately hosted repository (for work specific stuff) as well as pypi and Anaconda Cloud or conda-forge for public open source stuff
- Originally I was also going to include building user facing (web or CLI) apps but this got pretty long already so I think I’m going to leave that for another post
- Ditto for CI/CD, linting, extensive testing, and all the other things that go into managing a project. Too big to include in this post.
So a library with conda
and pip
packages, hosted both publicly and privately means four total ways to manage the library.
How this will progress
I find that most of the packaging guides I’ve read show either how to build a completely trivial project that demonstrates one narrow feature, or some giant project that’s a lot to take in all at once. My aim is to start from a single file script and gradually build it up to the final product that I laid out in the what I’m trying to accomplish section. I’ll host the repositories for the library/app on GitHub, and use tags in order to mark the progress of the project through various stages.
The process
Preliminary setup
Create repository
The first step in any project is to make a repository. This one has the uncreative name of ianlibdemo. If you want to follow along at home you can clone it and check out the tag for the associated stage in the tutorial. The state of the repository right after being created in this case can be accessed with git checkout eg01
Set up environment
So I have somewhere to work from, and also to make this process reproducible for others the next thing I have to do is create an isolated python environment to work in. I’m a conda
user so I’ll create an environment.yml
file:
name: ianlibdemo_conda_env
dependencies:
- python
Then I’ll create the environment with conda env create -f environment.yml
.
There’s absolutely nothing to this environment, which is kind of the point.
Make my super sweet library
Enough talk! Let’s write some code! Well, actually, I’m not going to write any code. The point of this tutorial is to build a package, not write a super awesome library, so I’m just going to copy the demo project used in SciPy 2018 - the sheer joy of packaging. The original code is here. Basically what the module does is take a text file and output a copy with all the words capitalized (except a specified subset).
In the root directory of the repository I’ll copy the capital_mod.py
file and cap_data.txt
. I’ll also create an example_in.txt
file that I can use to manually test the capitalize function.
Now I have the following files in my repository:
$ ls
__pycache__/ capital_mod.py example_in.txt LICENSE
cap_data.txt environment.yml README.md
I can test the “package” out from the interactive prompt:
$ python -i
Python 3.8.3 (default, May 19 2020, 06:50:17) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import capital_mod
>>> capital_mod.get_datafile_name()
WindowsPath('C:/Users/ianep/Documents/ianlibdemo/cap_data.txt')
>>> capital_mod.capitalize("example_in.txt", "example_out.txt")
>>> quit()
Everything looks like it ran fine, and if I check in the directory I have file example_out.txt
that is indeed a capitalized version of example_in.txt
. If you want to get your repository to this point run git checkout eg02
.
So everything works great and we can go home, right?
Run into problems
This is all well and good, but I don’t just want to use this functionality in this folder. The idea is that this is a utility library. Presumably there are all sorts of scripts that I want to add this file capitalization capability to. Maybe I have coworkers I want to share this with, or use it in an app I’m building. As it stands how can I accomplish this?
Some bad ways to solve the problem
Just copy the file everywhere
Fine. It only works from the local directory? I’ll just put a copy of it everywhere I want it. This is pretty clearly a bad idea. It will be annoying to copy the file into every location I might want to use it, if I ever have to update the functionality I will then have to track down every instance of that file and make the change repeatedly, and it violates DRY so any experienced developer that sees me do it will make fun of me. Better not do it this way.
Add it to the path
This is already going to be a really long guide so I don’t want to add too much about the python path directly. This guide by Chris Yeh is the best I’ve found on the python path and import statements, so if you’re curious by all means check that out. Briefly though, let’s demonstrate the two ways we could directly add this “package” to the path, and therefore run it without being in the same directory.
To set the stage I’ve created a new directory separate from the package, and created a text file that I will try and capitalize:
(ianlibdemo_conda_env) Ian@terra ~/Documents/demo_tmp
$ ls
demo_in.txt
(ianlibdemo_conda_env) Ian@terra ~/Documents/demo_tmp
$ cat demo_in.txt
i want to capitalize this text file, but it's in the wrong folder. oh no!
If I just try and do the same steps I did from within the folder it will fail:
>>> import capital_mod
Traceback (most recent call last):"<stdin>", line 1, in <module>
File ModuleNotFoundError: No module named 'capital_mod'
That’s because the folder with capital_mod.py
is not on my path.
One way I can solve this is by adding the path to capital_mod.py
to my path. Like so:
$ export PYTHONPATH="/c/Users/Ian/Documents/ianlibdemo"
(ianlibdemo_conda_env) Ian@terra ~/Documents/demo_tmp
$ python -i
Python 3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 07:34:03) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import capital_mod
>>> capital_mod.get_datafile_name()
WindowsPath('C:/Users/Ian/Documents/ianlibdemo/cap_data.txt')
>>> capital_mod.capitalize("demo_in.txt", "demo_out.txt")
>>> quit()
(ianlibdemo_conda_env) Ian@terra ~/Documents/demo_tmp
$ cat demo_out.txt
I Want to Capitalize This Text File, But It's In the Wrong Folder. Oh No!
This worked, but I don’t want to have to run that export
command every time before I run a script, and sharing this code with other people and telling them to do that every time seems like a hassle. There are ways to permanently add folders to your python path. This guide covers them nicely. But we’re not actually going to go this route so let’s move on.
The slightly less hacky way is to use sys.path
from within a python script. Back in my demo directory I can write a python script that looks like this:
import sys
r"C:\Users\Ian\Documents\ianlibdemo")
sys.path.append(import capital_mod
"demo_in.txt", "demo_out.txt") capital_mod.capitalize(
We can see that this works as well:
(ianlibdemo_conda_env) Ian@terra ~/Documents/demo_tmp
$ ls
demo_in.txt syspathdemo.py
(ianlibdemo_conda_env) Ian@terra ~/Documents/demo_tmp
$ python syspathdemo.py
(ianlibdemo_conda_env) Ian@terra ~/Documents/demo_tmp
$ ls
demo_in.txt demo_out.txt syspathdemo.py
(ianlibdemo_conda_env) Ian@terra ~/Documents/demo_tmp
$ cat demo_out.txt
I Want to Capitalize This Text File, But It's In the Wrong Folder. Oh No!
This also worked, but I had to import sys
, and I had to know the exact path to the library. It’s going to be annoying to have to put that in every script, and if I try and share this code with anyone else they’re going to have to modify it to point to wherever they’ve saved my library code.
Get hypermodern
As I was working on this guide I discovered a series of articles by Claudio Jolowicz called Hypermodern Python. The series is an opinionated (in a good way) look at how to configure a python project in 2020. It’s excellent and well worth a read, but I can’t completely adopt its recommendations for two related reasons. The first is that it assumes you’re either using a *NIX system or can load WSL2 on your Windows machine. For my work setup neither of those assumptions hold. It also assumes you’re working in the standard python ecosystem and therefore doesn’t reference conda
either for environment management or packaging. For the remainder of this guide I’m going to try and follow Claudio’s suggestions where possible, but adapt them to incorporate conda
.
Turn our code into a poetry package
Poetry seems to be the current best practice for building python packages. Let’s see if we can get it working with conda
.
Poetry init
After adding poetry
as a dependency to my conda
environment and updating the environment I run poetry init
:
$ poetry init
This command will guide you through creating your pyproject.toml config.
Package name [ianlibdemo]:
Version [0.1.0]:
Description []: Python packaging - how does it work?
Author [[Ian Preston] <17241371+ianepreston@users.noreply.github.com>, n to skip]: Ian Preston
License []: GPL-3.0-or-later
Compatible Python versions [^3.7]:
Would you like to define your main dependencies interactively? (yes/no) [yes] no
Would you like to define your dev dependencies (require-dev) interactively (yes/no) [yes] no
Generated file
[tool.poetry]
name = "ianlibdemo"
version = "0.1.0"
description = "Python packaging - how does it work?"
authors = ["Ian Preston"]
license = "GPL-3.0-or-later"
[tool.poetry.dependencies]
python = "^3.7"
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"
Do you confirm generation? (yes/no) [yes] yes
At the end of this process I have a pyproject.toml
file in the root of my repository with the text listed above inside.
src layout
The root folder of this repository is getting crowded. I’ve got various files that either describe the project or the environment I’m supposed to work on it in intermingled with the actual source code for the package. To address this I’ll make a separate folder for the actual package files, and as recommended by hypermodern python I’ll use src layout
poetry install
The last step for a basic install is to use poetry to install the package into the environment. Since poetry 1.0 it should be able to detect conda
environments and do its installation directly into them based on this PR.
$ poetry install
Updating dependencies
Resolving dependencies... (0.1s)
Writing lock file
No dependencies to install or update
- Installing ianlibdemo (0.1.0)
Seems to work, let’s try that old example that wouldn’t run before:
(ianlibdemo_conda_env) e975360@N2012 /c/tfs/text_demo
$ ls
example_in.txt
(ianlibdemo_conda_env) e975360@N2012 /c/tfs/text_demo
$ python -i
Python 3.7.7 (default, May 6 2020, 11:45:54) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from ianlibdemo.capital_mod import capitalize
>>> capitalize("example_in.txt", "example_out.txt")
>>> quit()
(ianlibdemo_conda_env) e975360@N2012 /c/tfs/text_demo
$ cat example_in.txt
these words will all get capitalized, except the ones in that super special text file, like is, or, and a.
(ianlibdemo_conda_env) e975360@N2012 /c/tfs/text_demo
$ cat example_out.txt
These Words Will All Get Capitalized, Except the Ones In That Super Special Text File, Like Is, Or, And A.
Magic! Note that I have to do one more layer of importing from the ianlibdemo
package whereas before I was directly importing the capital_mod
module, but otherwise we’re gold.
Of course this hasn’t really solved my problem yet, I still don’t have an actual package that other people can install. But still, progress!
poetry build
It turns out that making it to the previous step was essentially all I needed to create a pip
installable package. Just running poetry build
from the root of the repository creates a dist
folder containing a sdist and a wheel
test the build
Having built this package, how would I install it?
To start the test I’ll create a new empty conda environment and make sure I can’t import the ianlibdemo
package.
$ conda create -n pyonly python
...
$ conda activate pyonly
$ python -i
>>> import ianlibdemo
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'ianlibdemo'
This verifies that I have a clean environment without that package installed. I can use pip
to install it like so:
$ pip install /c/tfs/ianlibdemo/dist/ianlibdemo-0.1.0-py3-none-any.whl
$ python -i
Python 3.8.3 (default, May 19 2020, 06:50:17) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright"
>>> import ianlibdemo
The import ran successfully. I haven’t done a lot of validation that the package works the way I’d expect, but I’ll get to that when we set up testing later. Note that I installed the package using the .whl
file that the build process created, but I could have also used the .tar.gz
file in the same folder just as easily.
Since we’ve now built a working package this seems like another good place for a checkpoint. To see the state of the project at this point you can run git checkout eg03
.
Automate testing
This is already going to be a big post so I’m definitely not going to offer extensive notes on testing, but I’d like to include enough to at least ensure it integrates with the rest of the process, and to save manually testing after each step.
Add a pytest dependency
We want to use pytest for testing, so the first step is to add it as a development dependency. Normally this would be a simple one liner, poetry add --dev pytest
, but because of this bug between conda and poetry, at least at the time of this writing I had to install an update of msgpack before I could get it to run. I’ve amended the environment.yml
file to include this fix, so between that and hopefully this bug being resolved in time this shouldn’t be an issue for anyone else following this guide, I just wanted to flag what I encountered and how I resolved it.
Write tests
Now in the base of the repository we add a tests
folder and add an empty __init__.py
file and a test_capitalize.py
file. The test file looks like this:
from ianlibdemo import capital_mod
def test_capitalize_file(tmp_path):
= tmp_path / "in_file.txt"
in_file = "this is the lowercase input sentence"
in_content
in_file.write_text(in_content)= tmp_path / "out_file.txt"
out_file = "This is the Lowercase Input Sentence\n"
out_content # Output shouldn't exist before we call the function
assert not out_file.exists()
capital_mod.capitalize(in_file, out_file)assert out_file.exists()
assert out_file.read_text() == out_content
Now from the base directory of the repository I can run pytest with poetry run pytest
.
To see the project at this stage run git checkout eg04
.
Publish to pypi
I’ve built a package, I can test that it works, the next step is to publish it somewhere for others to access. The defacto source for python packages is PyPi. However, since this is just a demo package I don’t want to publish it there, since it will just add clutter. Fortunately, there is a similar location designed exactly for testing out publishing packages, appropriately named Test PyPi.
Set up for publishing
In order to publish packages you need an account. The registration process is straightforward. Note that pypi and test pypi use completely separate databases, and you will need an account for each of them. For now we’re just publishing to test pypi so it’s not an issue, but just something to keep in mind.
Next I want to create an API token. You can just use your username and password to authenticate and publish packages, but tokens are the preferred method. Once you’re logged in you can click on your account, go to account settings, and under API tokens click “add API token”. Give it a descriptive name and save it somewhere secure (I put mine in a LastPass note). As they warn on the page it will only be displayed once, and if you lose it you’ll have to delete it and create a new one.
Now we need to set up the test pypi repository in poetry. From the poetry docs you can see that repositories are added to your poetry config:
poetry config repositories.testpypi https://test.pypi.org/legacy/
poetry config pypi-token.testpypi <your api key>
Note that these configurations are global to poetry, so they’re not saved in your repository. If you switch machines, or (I think) change conda environments since we installed poetry with conda you’ll have to redo these configurations.
Publish
Once this is set up publishing is quite straightforward. If you haven’t already built the package do so with poetry build
and then run poetry publish --repository testpypi
.
Look at that! There it is!
Pull it back down and test
Let’s just make sure that all worked.
First make a clean conda environment with just pytest for testing and activate it:
conda create -n test_env pytest
conda activate test_env
Navigate to the root of your package folder and try running tests. They should fail, because we don’t have the package installed in this environment:
cd ~/Documents/ianlibdemo
pytest
.
.
.
tests\test_capitalize.py:1: in <module>
from ianlibdemo import capital_mod
E ModuleNotFoundError: No module named 'ianlibdemo'
Now pip install that package and try running tests again:
pip install --index-url https://test.pypi.org/simple/ ianlibdemo
pytest
.
.
.
======================== 1 passed, 1 warning in 0.09s =========================
Looks good!
Publish to a private repository
Not all of the code we develop should be published on the public internet. Some of it you just want accessible to an internal team. I have a private package index running using this docker container - setting that up will be its own post. Once you have that all set up though the process is exactly the same as for the public pypi so I’ll leave it at that for this guide.
None of the steps used to publish this package required changes to the library repository, so you can still use git checkout eg04
to view the state of the repository at this point.
Adding dependencies
One thing I realized I should ensure is that all of this works with libraries that depend on other libraries. Let’s add a dependency on pandas and give that a shot.
Fortunately adding a dependency is easy. Since I want to require pandas
I just run poetry add pandas
and it’s now a dependency. I added a module called fun_pandas
and a test for it in my tests suite. After that I rebuilt the package and uploaded it to a repository as described above, pulled it back down and tested it like before and everything worked! It’s nice when that happens.
To see the project at this stage you can run git checkout eg05
.
Now do conda
The next thing I want to work out is how to build a conda
package. The first step is to add conda-build
to my environment. The next step is to define a meta.yaml
file to specify how to do the build.
Sort of working build
Rather than just dump the final working file, I think it will be useful to step through from the first version I got working to the final one I’m happy with. A lot of the steps for setting this up are hacky, so seeing what doesn’t work is as important as seeing what does for people who are trying to figure out how to apply this to their own projects.
Here’s the first version of my meta.yaml
that actually built:
{% raw %}
{% set name = "ianlibdemo" %}
{% set version = "0.2.0" %}
package:
name: "{{ name|lower }}"
version: "{{ version }}"
source:
path: ./dist/{{ name }}-{{ version }}-py3-none-any.whl
build:
script: "{{ PYTHON }} -m pip install ./{{ name }}-{{ version }}-py3-none-any.whl --no-deps --ignore-installed -vv "
requirements:
host:
- pip
- python
run:
- python
- pandas
test:
imports:
- ianlibdemo
{% endraw %}
From an environment with conda-build
installed I can build a package by running conda-build .
from the base of the repository. It creates a conda package as a tar.bz2
file in a deeply nested directory. From there I can install it into an environment with something like:
conda install /c/Users/e975360/.conda/envs/conda_build_test/conda-bld/win-64/ianlibdemo-0.2.0-py38_0.tar.bz2
Running pytest in an environment with that package installed resulted in one passed test and one failure for the one requiring pandas. As we’ll see below, that issue will get solved if I can load it to a package repository so I’ll leave that alone at this point.
Issues with this build
- First off, note that it’s called
meta.yaml
notmeta.yml
. Despite.yml
being the common and preferred extension for this file type (see this SO thread) it has to end with.yaml
orconda-build
can’t find it. - Also note that I’m pointing it to the
.whl
file that I built with poetry, rather than the.tar.gz
that’s in the same folder. In theory I should be able to do either, and most examples online point to.tar.gz
files, but I got errors about not having poetry in my build environment, and when I tried to add poetry I got a version conflict because apparently the main conda repository only has the python2.7 version of poetry and… it just seemed easier to use the.whl
. - It makes a build that claims to be specific to windows and python 3.8 when in fact this should run on any OS and any python 3.
- I have to repeat the file name in two places
- I’m specifying the version number in two places now since it’s already in the
pyproject.toml
file. There’s a risk of them getting out of sync - Similar to the version number I have to specify dependencies in this file as well as
pyproject.toml
(pandas in this case). Unfortunately, since conda packages can have slightly different names than their pypi counterparts, and I have to actually specify python as a dependency here I don’t think there’s an automated way to keep these in sync. Fortunately I don’t expect dependencies to change as often as the package version so this will be less of a burden to manage. - To do anything with the created package I have to scroll up through a big install log and find the path to the file
- I get a bunch of build environments and intermediate files created on my machine (maybe this is why the build guide suggests using docker).
Fixing the issues
Setting the build to work for any OS and python is an easy fix. Under the build section you just add one line. The build section now looks like this:
{% raw %}
build:
noarch: python
script: "{{ PYTHON }} -m pip install ./{{ name }}-{{ version }}-py3-none-any.whl --no-deps --ignore-installed -vv "
{% endraw %}
Defining the package file in once place is similarly easy. Jinja lets you concatenate variables with the ~
symbol. The updated relevant section looks like this:
{% raw %}
{% set name = "ianlibdemo" %}
{% set version = "0.2.0" %}
{% set wheel = name ~ "-" ~ version ~ "-py3-none-any.whl" %}
package:
name: "{{ name|lower }}"
version: "{{ version }}"
source:
path: ./dist/{{ wheel }}
build:
noarch: python
script: "{{ PYTHON }} -m pip install ./{{ wheel }} --no-deps --ignore-installed -vv "
{% endraw %}
Adding a Makefile
The rest of the issues outlined above aren’t directly the result of the meta.yaml
file. To resolve them I’ll need to write some scripts, and to tie that all together I’ll use my good friend Make.
To begin I add some boilerplate to the beginning of the file to handle conda environments
# Oneshell means I can run multiple lines in a recipe in the same shell, so I don't have to
# chain commands together with semicolon
.ONESHELL:
# Need to specify bash in order for conda activate to work.
SHELL=/bin/bash
# Note that the extra activate is needed to ensure that the activate floats env to the front of PATH
CONDA_ACTIVATE=source $$(conda info --base)/etc/profile.d/conda.sh ; conda activate ; conda activate
ENV_NAME = ianlibdemo_conda_env
Next I create a python script that will read the version number from pyproject.toml
and update the version in meta.yaml
with it. I won’t reproduce that script here but it’s in the scripts
folder of the ianlibdemo
repository.
Finally I add a target to sync the versions. I can then make that a pre-requisite of building the conda
package.
.PHONY: versionsync
versionsync:
$(CONDA_ACTIVATE) $(PROJECT_NAME)
python scripts/version_sync.py
.PHONY:
means that target should be run each time it’s called. By default Make
won’t redo a target if an output file already exists.
Now running make versionsync
from the root of the repository will take the version from pyproject.toml
and put it in meta.yaml
. Eventually I’ll also want to ensure that the python package has been built by poetry before building the conda package.
PS: I documented how you can activate conda environments from within makefiles and bash scripts here. Since I had to refer back to it when doing this I thought it would be helpful to include a pointer.
The next issue I described above is that running conda-build
generates the package in some obscure subdirectory and you have to scroll back up through the log file to find it. If I want to upload the package to a repository or install it directly that’s going to be a hassle. Fortunately conda-build
comes with a --output
flag that you can run to return where your package file would be saved if you actually ran conda-build
. Knowing this I can write a small bash script which first builds the package and then uses the --output
flag to find the generated package and copy it into my dist
directory.
The new part of the Makefile looks like this:
conda:
$(CONDA_ACTIVATE) $(ENV_NAME)
bash scripts/conda_build.sh
And the bash script looks like this:
#!/bin/bash
conda-build .
CONDA_PACK=$(conda-build . --output)
cp $CONDA_PACK dist/
I’m going to make a cleanup function later to remove all build artifacts so we’ll leave that alone for now.
Publish to a public channel
To publish to an external public conda channel I have to install the anaconda-client
package in my environment. The first time I do an upload I will need to log in with anaconda login
and provide my username and password.
After that I can add a new recipe to my makefile to publish the package:
conda_ext_pub: conda
$(CONDA_ACTIVATE) $(ENV_NAME)
anaconda upload $$(conda-build . --output)
conda_ext_pub
depends on the conda
recipe so this will build the package first and then upload it to Anaconda.org. After running make conda_ext_pub
I can see that the package was indeed published to Anaconda.org:
As with the previous installations I can create a new blank environment with just pytest installed, install this package into it with conda install -c ian.e.preston ianlibdemo
and now both my tests pass, as pandas
is installed as well.
Publish to a private channel
As with the other private repository, actually setting up the repository is outside the scope of this post. This will assume that you have one created and that packages are stored on some sort of file share that you can access from your build machine. There’s no fancy way to publish conda packages to a private repository. You just drop the package file in the appropriate architecture subfolder (noarch
in this case since this is a pure python package) and then run conda index
on the repository folder. My server has a file watcher that detects changes and auto runs that, so all we have to do to publish a package is to make sure it’s in the right place. In this example the file share from my local machine is at \\r4001\finpublic\FP&A\channel_test\noarch
and the web server is available at http://dml01:8081/.
To set up publishing I add the following to my makefile:
CONDA_LIB_DIR = //r4001/finpublic/FP\&A/channel_test/noarch
.
.
.
conda_int_pub: conda
$(CONDA_ACTIVATE) $(ENV_NAME)
cp $$(conda-build . --output) $(CONDA_LIB_DIR)
After that I can install the package into a library by running conda install -c http://dml01:8081 ianlibdemo
.
To see the project at this stage you can run git checkout eg06
.
Put it all together
All of the pieces are here, so the final thing to do is to put them all together. I started that process in the last section by creating a makefile, now I just have to finish it up by tying the pip packaging and publishing in with the conda packaging and publishing.
Clean slate
After a package file is built and published we don’t really have any further need for it locally, but it’s not automatically deleted. Let’s make a clean
task in Make that will clear out any previous builds. That way any new process can start fresh.
The clean task looks like this:
clean:
# remove pip packages
rm -rf ./dist/*
# remove conda packages and build artifacts
$(CONDA_ACTIVATE) $(ENV_NAME)
bash scripts/conda_clean.sh
and conda_clean.sh
looks like this:
#!/bin/bash
export CONDA_BLD_PATH=${CONDA_PREFIX}/conda-bld
rm -rf $CONDA_BLD_PATH
Full build chain
The last step is to add make tasks to build and publish the pip packages and set them as appropriate dependencies for the conda steps.
First, add a task to build the pip installable package:
pip: clean
$(CONDA_ACTIVATE) $(ENV_NAME)
poetry build
Next add tasks to publish to external and internal pip sources:
pip_ext_pub: pip
$(CONDA_ACTIVATE) $(ENV_NAME)
poetry publish --repository testpypi
pip_int_pub: pip
$(CONDA_ACTIVATE) $(ENV_NAME)
poetry publish --repository localpypi
Finally as an example we can make wrapper tasks that will publish pip and conda packages to external/internal sources:
all_int_pub: pip_int_pub conda_int_pub
echo "publishing to internal conda and pip repository"
all_ext_pub: pip_ext_pub conda_ext_pub
echo "publishing to external conda and pip repository"
At this point if you want to build and publish your package you can just run make all_int_pub
and it will clear out old build artifacts, build a new pip installable package, upload it to the internal pip package repository, sync the version number with conda, build a conda package and publish that to the internal conda package repository. Not bad!
This is concludes the changes I’m planning to make in this repository. If you just clone the repository as is you should see it in this state, or you can run git checkout eg07
.
Conclusion and next steps.
This guide demonstrated how to turn some python code into an installable package, and distribute that package to internal and external users via pip or conda. At the end of this you should be able to reproduce this process for your own project. But there’s always more to do, so what are some next steps to think about?
First of all, a lot of what we’ve done to set this project up would be broadly applicable to any library built under similar circumstances. It’d be a shame to have to rewrite or copy paste that Makefile into every library you build with minor alterations for example. It would be a good idea to use a templating tool like cookiecutter to automate the files and folder structure that will be consistent across projects. Stay tuned, I’m working on putting that together next.
Next, there’s still lots of aspects of developing and maintaining a library that we haven’t touched. Things like linting, testing, coverage reporting… Take a look at the rest of the Hypermodern Python series for some ideas there.
Finally, I haven’t described how you actually set up an internal package repository for conda or pip packages. I’ll have a follow up post on that coming soon too.
Resources I’ve consulted
This section will serve as a link dump for things I’ve referenced while going through this process. In no particular order they are: