Python for ecologists

Content Contributors: April Wright, Ethan White, John Gosset, Leah Wasser, Mariela Perignon, Tracy Teal

Lesson Maintainers: April Wright, John Gosset, Mateusz Kuzak

Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with ecological data in Python.

Lessons

Data

Data for this lesson is from the Portal Project Teaching Database - available on FigShare.

Specifically, the data files we use in these lessons are:

Requirements

Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to insure the proper setup of tools for an efficient workflow. These lessons assume no prior knowledge of the skills or tools, but working through this lesson requires working copies of the software described below. To most effectively use these materials, please make sure to install everything before working through this lesson.

Participants are required to abide by Data Carpentry’s Code of Conduct.

Setting Up Python

Python is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of its scientific packages individually can be a bit difficult, so we recommend an all-in-one installer.

Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., 3.4 is fine).

We will teach Python using the Jupyter notebook, a programming environment that runs in a web browser. For this to work you will need a reasonably up-to-date browser. The current versions of the Chrome, Safari and Firefox browsers are all supported (some older browsers, including Internet Explorer version 9 and below, are not).

Windows

  • Download and install Anaconda.
  • Download the default Python 3 installer. Use all of the defaults for installation except make sure to check Make Anaconda the default Python.

Mac OS X

  • Download and install Anaconda.
  • Download the default Python 3 installer. Use all of the defaults for installation.

Linux

We recommend the all-in-one scientific Python installer Anaconda.

  1. Download the installer that matches your operating system and save it in your home folder. Download the default Python 3 installer.
  2. Open a terminal window.
  3. Type
    bash Anaconda-
    and then press tab. The name of the file you just downloaded should appear.
  4. Press enter. You will follow the text-only prompts. When there is a colon at the bottom of the screen press the down arrow to move down through the text. Type yes and press enter to approve the license. Press enter to approve the default location for the files. Type yes and press enter to prepend Anaconda to your PATH (this makes the Anaconda distribution the default Python).

Installing ggplot Python package

ggplot is a Python implementation of the R ggplot2 graphics package. It is not intended to be a feature-for-feature port of ggplot2 but provides some of ggplot2 functionality in Python ecosystem.

The easiest approach to install ggplot is via conda package manager provided in Anaconda distribution that you have installed above.

Windows

  • Open Anaconda Prompt from windows menu.
  • In opened prompt window type in conda install -c conda-forge ggplot and accept when prompted for feedback.

Mac OS X

  • Open Terminal app.
  • Type into Terminal window conda install -c conda-forge ggplot and accept when prompted for feedback.

Linux

  1. Open default terminal application (on Ubuntu that will be gnome-terminal).
  2. Type into terminal conda install -c conda-forge ggplot and accept when prompted for feedback.

In some cases, installing ggplot from conda may fail with an error like:

UnsatisfiableError:The following specifications were found to be in conflict:
      - ggplot -> python3.4*
      - python 3.6*
In that case, try installing ggplot with Anaconda pip by running this command in your terminal:
pip install -U ggplot

Checking that your installation worked

Now it is time to make sure that youe Ananconda installation was successful. Here you can find a python script check_env.py that will check if Anaconda has been correctly installed on your system. From your terminal, navigate to the directory that contains check_env.py and execute the following:

python check_env.py
If you receieve an AssertionError, it will inform you how to correct your installation. Otherwise, it will tell you that your system is good to go and ready for Data Carpentry!

Acknowledgements & Support

Data Carpentry is supported by the Gordon and Betty Moore Foundation and a partnership of several NSF-funded BIO Centers (NESCent, iPlant, iDigBio, BEACON and SESYNC) and Software Carpentry, and is sponsored by the Data Observation Network for Earth (DataONE). The structure and objectives of the curriculum as well as the teaching style are informed by Software Carpentry.

Schedule

Setup Download files used in the lesson.
00:00 Short Introduction to Programming in Python What is Python?
Why should I learn Python?
00:00 Starting With Data
00:00 Indexing, Slicing and Subsetting DataFrames in Python
01:00 Data Types and Formats
01:30 Combining DataFrames with pandas
01:30 Data workflows and automation
01:30 Plotting with ggplot
01:30 Data Ingest & Visualization - Matplotlib & Pandas
01:30 Accessing SQLite Databases Using Python & Pandas
01:30 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.