Advanced Python for Data Science Assignment 2

These exercises assume that you are working in pairs to add and modify files in a common repository. The files are available in the data and code directories of the course repository.

You’re working on a large project trying to predict diversity hotspots. Another member of your collaborative team has produced a series of files that contain lists of areas that resulted from a series of modeling exercises. Each filename begins with the word areas and ends with .txt.

You and your team pair decide that everything will be kept in a common repository, and to name the repository using a combination of your netIDs and assignment2. For example, if your netIDs are aaa11 and bbb22 you would name the repository aaa11_bbb22_assignment2. Since you’re going to share a repository, you need to agree on some way to communicate with each other. You can choose anything you want: email, IM, GitHub Issues, etc.

One team member should create the repository under the nyu-cds organization on GitHub. Make sure that the second team member has write access to the repository by choosing the Add teams and collaborators button and adding their username to the Collaborators field. Whoever creates the repository will then need to let their team member know that it is available, and both should clone a copy to their computers.

A programmer has whipped up a small python script called rich_pred.py that takes a single file containing a list of areas, one per line, and returns the area and the predicted richness. The script can be downloaded using the command curl -O https://nyu-cds.github.io/courses/code/rich_pred.py.

One team member should download this script and commit it to their local repository, then push the changes to GitHub using git push from the command line. They should then notify the second team member, who will update their local repository from the command line using git pull and check that they now have a copy of the script.
This is a follow up question to Version Control 6.

You and your team member decide to split up the work that needs to be done. You should each work in your own local clone of the repository, then share the resulting files by committing them and pushing them to GitHub.

The six data files required for the project are available from https://nyu-cds.github.io/courses/data and are called areas1.txt througn areas6.txt. One team member should download all of these files using the curl command, commit them to their repository, and push to GitHub.

While the first team member is downloading the data files, the second team member can work on a script that will run the data files through the Python code and produce a single list of the areas and associated richness predictions from all of the sites combined. This list should be sorted from the smallest area to the largest area, and should only include unique values.You could cut and paste the files together, run them through the Python code, and then do some post processing to get the list looking right, but new files are going to be showing up constantly, and besides, this can be readily accomplished in one line using the shell. You could use a loop, but since you just need a single list of areas and predictions it’s probably easier to just use cat to concatenate all of the files at the beginning. Once you’ve figured out the necessary shell commands put them in a text file and save it as predict_richness.sh. Since you’ll need the data files to test the script, you’ll have to wait until your team member lets you know they have been committed to GitHub. Then you can update your local repository to obtain the files. Test to make sure everything is working by running the script using the command bash predict_richness.sh. Commit the predict_richness.sh script to the local repository and push to GitHub.

Once the script has been made available on GitHub, the first team member should update their repository and run the script. The results should be saved in a file called predicted_diversities.txt, committed to their local repository, and pushed to GitHub.
This is a follow up question to Version Control 7.

Both team members have been working late and sit down to edit predict_richness.sh. They both open the file in an editor. One reaches for a cup of coffee, and knocks it over onto their computer! At the same time, a cat jumps on the keyboard of the other team member’s computer. In all of the excitement they both somehow delete the contents of the file and save it (go ahead, open it, delete the contents, press save). Thankfully the team has been using using version control! The team members take a deep breath and revert the changes using git checkout. Then they reflect on how using version control makes you just like Superman in Superman 1 because it’s like time just went backwards.
This is a follow up question to Version Control 8.

A colleague emails one of the team members that some of the parameter values that are being used in rich_pred.py are incorrect and need to be changed urgently. Unfortunately in their haste, the colleague accidentally sent an old email to the other team member at the same time.

The first email said “Please go to the line that defines sar_parameters and change it to
```
sar_parameters = [[22.7, 0.3], [1.2, 0.163, 0.009],
                  [14.36, 21.16], [85.91, 42.57],
				  [1082.45, 1.59, 390000000]]
```
”

The second, incorrect, email said “Please go to the line that defines sar_parameters and change it to
```
sar_parameters = [[22.7, 0.3], [1.2, 0.163, 0.010],
                  [14.36, 21.45], [85.91, 42.57],
				  [1082.45, 1.59, 390000000]]
```
”

Now, follow these instructions carefully:
1. Each team member should choose a different version and modify the file. Commit the change to each local repository.
2. Both should try to push the change to the GitHub repository.
3. One team member should get an error indicating that the commit failed because they aren’t up to date with the repository (changes have been made since they last update).
4. Pull those changes down, resolve any conflicts, and if necessary commit the correct version of the file.
5. Repeat the process again so that the other team member sees and resolves the conflicts.

Programming for Data Science

Advanced Python for Data Science Assignment 2