top of page

Boost Your Coding Productivity with this Tool [Tutorial + Code]




Whether you are creating your portfolio of projects to show to the world or working on a team, you want to ensure that your coding style is high quality and well structured. Luckily there is a tool that can help with this, called pre-commit.


In short, pre-commit allows you to identify (and, in some cases, automatically fix) simple issues in your code before making a commit. In addition, you can add different plugins to your pre-commit pipeline called hooks and use them for several programming languages. Pre-commit seems to be gaining traction among the data science community since well-known python packages such as pandas, sklearn and seaborn are using it.


In this tutorial, you will learn the following:

  • Six plugins suitable for a data science project;

  • How to install pre-commit in your repository;

  • How to use pre-commit while coding.

This tutorial will focus primarily on python, but you can set such a tool for other languages. You should also have a basic understanding of Git; it is assumed that you already have Git installed on your local machine.


You will find the GitHub repository used for this tutorial here, and all pre-commit configuration will be embedded in one unique file called .pre-commit-config.yaml. In our case, it contains six plugins that will check all created or modified scripts every time a git commit is made, and unless all checks are passed, no code will be committed.

 

Let's start with the plugins which we will be using:


1) Black

Black is a code formatter that automatically modifies your code based on the style convention defined by pycodestyle (formerly known as PEP8). You can look at the black playground to see black in action. The current setup will format both python scripts as well as Jupyter notebooks.


2) Flake8

Flake8 is a wrapper around the following three tools:

  • pyflakes which checks Python source files for errors;

  • pycodestyle, i.e., a python style checker (also used in black);

  • McCabe complexity checker, which measures the code complexity based on Ned Batchelder's McCabe criteria.

Contrary to black, Flake8 notifies all the issues observed in your code but requires fixing them manually. Flake8 can be, in some cases, rather demanding since you might need to spend a non-negligible amount of time fixing the issues highlighted by this hook. For this reason, here I provide three options to silence violations:

  1. files that contain # flake8: noqa will not be checked (I would recommend placing it at the top of your script);

  2. placing # flake8: noqa: C901 on top of a function definition will silence any violations which may arise within such a function;

  3. you can ignore specific errors on a line with # noqa: <ERROR_1>,<ERROR_2>,..,<ERROR_N> e.g., # flake8: noqa: D104, D100, D202 (the error code will pop up in the terminal once you run pre-commit).


3) Pre-commit hooks

pre-commit provides several hooks out-of-the-box. Below is the setting which I've been using so far:

-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.2.0
    hooks:
    - id: check-yaml
    - id: sort-simple-yaml
    - id: trailing-whitespace
    - id: end-of-file-fixer
    - id: check-added-large-files
      args: ['--maxkb=1000']
    - id: check-json
    - id: fix-encoding-pragma
    - id: detect-private-key
    - id: detect-aws-credentials
      args: ['--allow-missing-credentials']
    - id: pretty-format-json
      args: ['--autofix', '--indent=2', '--no-sort-keys']
    - id: name-tests-test
      args: ['--pytest']

The configuration should be self-explanatory, but if you want to know more or come up with your own configuration, you can refer to this link.


4) Pytest

If you have unit tests in your repository designed using pytest, you can include this local hook shown below:

-   repo: local
    hooks:
     -  id: tests
        name: pytest
        entry: pytest
        pass_filenames: false
        language: system
        types: [python]
        stages: [commit]

This hook lets you promptly detect if you are breaking anything in your code, which is tested using pytest. This solution is much faster and cheaper than first committing and running pytest using your CI/CD tool on the server. Note that you need to have pytest installed in your environment to run this hook.


5) mypy

mypy is a static type checker for Python. This hook improves the readability of your function by specifying which input and output are excepted to have; for example, you would go


From:

def myfunction(input):
   # do something...
    return output

To:

def myfunction(input : float) -> list[float]:
    # do something...
    return output

In the second case, you know that myfunction is expecting a float input, which will provide a list of float as an output.


6) Dockerfilelint

This hook checks if there are any issues in the docker files living in the repository.

- repo: https://github.com/pryorda/dockerfilelint-precommit-hooks
  rev: v0.1.0
  hooks:
  - id: dockerfilelint
    stages: [commit]

If you don't know use docker, you can simply delete this hook from the config file.

 

Let's now see how to install pre-commit,


After cloning this repository in your machine, you can create a conda environment with the requirements stored in environment.yml as follows (you can also use virtual environments via pip):

$ conda create --name my-env python=3.9 --file=environment.yml

Once the environment is installed, activate it via

$ conda activate my-env

then Install the git hooks described in the previous section stored in .pre-commit-config.yaml

$ pre-commit install

Note 🚨: every time you add/remove/change a git hook, you need to rerun the command above to load the latest configuration in your environment.


✅ And that's it !!! At this point, you can check if pre-commit has been successfully installed in your environment by checking its version :

$ pre-commit --version
 

Let's see the pre-commit in action by creating example_file.py containing a messy code.

x = {  'a':37,'b':42,
'c':True}
if x['c'] is not None and \
 x['a'] > 0 or \
 x['b']  <= 214           :
 z = 'hello '+'world'
else:
 world = 'world'
 a = 'hello {}'.format(world)
 f = rf'hello {world}'
class Foo  (     object  ):
  def f    (self   ):
    return       37*-2
  def g(self, x,y=42):
      return y
custom_formatting = [0,  1,  2,
    3,  
4,  5,
    6,  

7,  8,
]

Let's also create a JSON file called example_json.json containing the following code:

{
"glossary": {"title": "example glossary",
"GlossDiv": {"title": "S","GlossList": {"GlossEntry": {"ID": "SGML","SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML","Abbrev": "ISO 8879:1986",
"GlossDef": {"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML","XML"]
},
"GlossSee": "markup"}}}}}

Finally, in test/scr/test_log.py, change the input value in test_function from x=1 to x=2. Since log(2) will be equal to 0.6931, the unit test will fail.

class TestLog:
    """Test natural log function."""

    def test_function(self):
        """Check that log(1) = 0."""
        result = calculate_log(x=1) <- change to x=2 
        assert result == pytest.approx(0, rel=1e-09, abs=1e-09)

Now you can attempt to commit your changes and the new files as usual:

$ git add .
$ git commit -m "add new files"

Because the files we created are messy and the unit test will fail, you will see some of the hooks will be Failed.

However, example_file.py and example_json.json will now be properly formatted. Now that we have seen pre-commit in action, we can set x=1 again in test/scr/test_log.py and newly commit the changes:

$ git add .
$ git commit -m "[pre-commit] add new files"

I personally tend to reuse the same message and add [pre-commit] at the beginning, so I know that this commit was made because of a modification made by pre-commit. This time all checks should pass, and you can finally push your code to your remote repository 🎉.



Final notes 📕:

  • If you want to skip all the checks and still push your code to your remote repository, you can add the option --no-verify when committing your code, e.g. git commit --no-verify -m “some text”.

  • Every now and then, you can update the git hooks using the command pre-commit autoupdate. This command will update the pre-commit-config.yaml, namely the versions of each hook specified in rev (Don't forget to run pre-commit install afterwards).

bottom of page