Content Contributors: April Wright, Ethan White, John Gosset, Leah Wasser, Mariela Perignon, Tracy Teal
Lesson Maintainers: April Wright, John Gosset, Mateusz Kuzak
Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with ecological data in Python.
Data for this lesson is from the Portal Project Teaching Database - available on FigShare.
Specifically, the data files we use in these lessons are:
Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to insure the proper setup of tools for an efficient workflow. These lessons assume no prior knowledge of the skills or tools, but working through this lesson requires working copies of the software described below. To most effectively use these materials, please make sure to install everything before working through this lesson.
Participants are required to abide by Data Carpentry’s Code of Conduct.
Python is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of its scientific packages individually can be a bit difficult, so we recommend an all-in-one installer.
Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., 3.4 is fine).
We will teach Python using the Jupyter notebook, a programming environment that runs in a web browser. For this to work you will need a reasonably up-to-date browser. The current versions of the Chrome, Safari and Firefox browsers are all supported (some older browsers, including Internet Explorer version 9 and below, are not).
We recommend the all-in-one scientific Python installer Anaconda.
bash Anaconda-and then press tab. The name of the file you just downloaded should appear.
yes
and
press enter to approve the license. Press enter to approve the
default location for the files. Type yes
and
press enter to prepend Anaconda to your PATH
(this makes the Anaconda distribution the default Python).
ggplot is a Python implementation of the R ggplot2 graphics package. It is not intended to be a feature-for-feature port of ggplot2 but provides some of ggplot2 functionality in Python ecosystem.
The easiest approach to install ggplot is via conda package manager provided in Anaconda distribution that you have installed above.
conda install -c conda-forge ggplot
and accept when prompted for feedback.
conda install -c conda-forge ggplot
and accept when prompted for feedback.
conda install -c conda-forge ggplot
and accept when prompted for feedback.
In some cases, installing ggplot from conda may fail with an error like:
UnsatisfiableError:The following specifications were found to be in conflict: - ggplot -> python3.4* - python 3.6*In that case, try installing ggplot with Anaconda
pip
by running this command in your terminal:
pip install -U ggplot
Now it is time to make sure that youe Ananconda installation was successful. Here you can find a python script check_env.py that will check if Anaconda has been correctly installed on your system. From your terminal, navigate to the directory that contains check_env.py and execute the following:
python check_env.pyIf you receieve an AssertionError, it will inform you how to correct your installation. Otherwise, it will tell you that your system is good to go and ready for Data Carpentry!
Data Carpentry is supported by the Gordon and Betty Moore Foundation and a partnership of several NSF-funded BIO Centers (NESCent, iPlant, iDigBio, BEACON and SESYNC) and Software Carpentry, and is sponsored by the Data Observation Network for Earth (DataONE). The structure and objectives of the curriculum as well as the teaching style are informed by Software Carpentry.
Setup | Download files used in the lesson. | |
00:00 | Short Introduction to Programming in Python |
What is Python?
Why should I learn Python? |
00:00 | Starting With Data | |
00:00 | Indexing, Slicing and Subsetting DataFrames in Python | |
01:00 | Data Types and Formats | |
01:30 | Combining DataFrames with pandas | |
01:30 | Data workflows and automation | |
01:30 | Plotting with ggplot | |
01:30 | Data Ingest & Visualization - Matplotlib & Pandas | |
01:30 | Accessing SQLite Databases Using Python & Pandas | |
01:30 | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.