Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed typos and formatting in DataViz tutorial #207

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions website/_containers/dataviz/2013-01-03-part-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Graph our sample data with matplotlib.
- We include the parse function here so we build on the process of parse → plot. We need to parse the data into the list of dictionaries so that we can easily tell matplotlib what and how to plot. We could, however, imported it from `parse.py`. As a **challenge** to you, try editing away the parse function in `graph.py` and import it from your `parse.py`.

### Visualize Functions
Let’s first take a look at a chuck of data that we just parsed to get a better idea of what sort of data we’re working with:
Let’s first take a look at a chunk of data that we just parsed to get a better idea of what sort of data we’re working with:

```bash
{
Expand All @@ -51,7 +51,7 @@ By looking at a snippet of data, we can understand how we can play/visualize it.
**Disclaimer:** As with understanding statistics, correlation does _not_ mean causation. This is a small sample size, not current, and it’s from the point of view of officers reporting incidents. Take everything with a grain of salt!

#### Visualize Days Function
As we read from the docstring, this will give us a visualization of data by the day of the week. For instance, are SF policy officers more likely to file incidents on Monday versus a Tuesday? Or, tongue-in-cheek, should you stay in your house Friday night versus Sunday morning?
As we read from the docstring, this will give us a visualization of data by the day of the week. For instance, are SF police officers more likely to file incidents on Monday versus a Tuesday? Or, tongue-in-cheek, should you stay in your house Friday night versus Sunday morning?

You’ll also notice that the `def visualize_days()` function does not take any parameters. An option to explore would be to pass this function already-parsed data. If you feel up to it after understanding this function, explore redefining the function like so: `def visualize_days(parsed_data)`.

Expand Down Expand Up @@ -154,7 +154,7 @@ We now tell `matplotlib` to use our `data_list` as data points to plot. The `pyp
plt.plot(data_list)
```

If you are curious about the `plot()` function, open a `python` prompt in your terminal, then `import matplotlib.pyplot as plt` followed by `help(plt)` and/or `dir(plt)`. Again, to exit out of the Python shell, press `CTRL-D`.
If you are curious about the `plot()` function, open a `python` prompt in your terminal, then `import matplotlib.pyplot as plt` followed by `help(plt)` and/or `dir(plt)`. Again, to exit out of the Python shell, press `CTRL+D`.

Just creating the variable `day_tuple` for our x-axis isn’t enough — we also have to assign it to our `plt` by using the method `xticks()`:

Expand Down Expand Up @@ -342,7 +342,7 @@ For the `plt.xticks()`, the first parameter should look similar to before, but h

Notice how we can pass `xticks()` more parameters than we did before. If you read the [documentation](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.xticks) of that function, you can pass it `*args` and `**kwargs`, or arguments and keyword arguments. It mentions that you can pass matplotlib-defined [text properties](http://matplotlib.org/api/artist_api.html#matplotlib.text.Text) for the labels — so that would explain the `**kwargs` element there. If nothing is passed in for `rotation` then it’s set to a default defined in their text properties documentation.

Next, we just add a little bit of spacing to the bottom of the graph so the labels (since some of them are long, like `Forgery/Counterfeiting`). We use the `.subplots_adjust()` function. In matplotlib, you have the ability to render multiple graphs on one window/function, called subplots. With one graph, subplots can be used to adjust the spacing around the graph itself.
Next, we just add a little bit of spacing to the bottom of the graph so the labels (since some of them are long, like `Forgery/Counterfeiting`). We use the `subplots_adjust()` function. In matplotlib, you have the ability to render multiple graphs on one window/function, called subplots. With one graph, subplots can be used to adjust the spacing around the graph itself.

```python
# Give some more room so the labels aren't cut off in the graph
Expand Down
26 changes: 13 additions & 13 deletions website/_containers/dataviz/2013-01-04-part-1.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Parse our sample SF crime data.

## Parse Module Setup

Open up `parse.py`, found: [new-coder/dataviz/tutorial_source/parse.py](https://github.com/econchick/new-coder/blob/master/dataviz/tutorial_source/parse.py)
Open up `parse.py`, found at: [new-coder/dataviz/tutorial_source/parse.py](https://github.com/econchick/new-coder/blob/master/dataviz/tutorial_source/parse.py)

[The beginning of the module](https://github.com/econchick/new-coder/blob/master/dataviz/tutorial_source/parse.py#L1-12) is an introduction as well as any copyright and/or license information.

Expand All @@ -25,11 +25,11 @@ Open up `parse.py`, found: [new-coder/dataviz/tutorial_source/parse.py](https://
<div class="panel panel-default">
<div class="panel-heading">For the Curious</div>
<div class="panel-body">
<p>Code that is up on <a href="http://github.com">GitHub</a> does _not_ mean that it is free to use. If you want to use a library, ask the developer if s/he has plans to include a <code>LICENSE</code> file or in the headers of the files if it’s not there already.</p>
<p>Code that is up on <a href="http://github.com">GitHub</a> does <b>not</b> mean that it is free to use. If you want to use a library, ask the developer if s/he has plans to include a <code>LICENSE</code> file or in the headers of the files if it’s not there already.</p>

<p>If <b>you</b> want to open source your code (yay, go you!), include your desired license either as a separate file or within the preamble/beginning of your code. Licensing your code is simply copying & pasting the required language of a license of your choice into your codebase.</p>

<p><b>CAUTION!</b> Double check with your employer agreement. Sometimes, especially if you are in any tech-related role, there are statements in your employment contract that stipulates what and when code is actually the employers. It may be only code that is written on their equipment, and/or during work hours. Or it may be any code written during the time of employment. The stipulations can even change across states and countries within a single employer.</p>
<p><b>CAUTION!</b> Double check with your employer agreement. Sometimes, especially if you are in any tech-related role, there are statements in your employment contract that stipulates what and when code is actually the employers. It may be only code that is written on their equipment and/or during work hours. Or it may be any code written during the time of employment. The stipulations can even change across states and countries within a single employer.</p>

<p><b>FYI</b>: For reference, this tutorial is licensed under the <a href="http://creativecommons.org/licenses/">Creative Commons license</a>, specifically, <a href="http://creativecommons.org/licenses/by/3.0/">Creative Commons Attribution 3.0 Unported license</a>, with the code under <a href="http://opensource.org/licenses/Zlib">zlib/libpng</a> simply because it’s short.</p>
</div>
Expand All @@ -45,7 +45,7 @@ import csv
`MY_FILE` is defining a global - notice how it‘s all caps, a convention for variables we won't be changing. Included in this repo is a sample file to which this variable is assigned.

```python
MY_FILE = "../data/sample_sfpd_incident_all.csv"
MY_FILE = "../data/dataviz/sample_sfpd_incident_all.csv"
```

### The Parse Function
Expand All @@ -68,7 +68,7 @@ Let’s be good coders and write a documentation-string (doc-string) for future

```python
def parse(raw_file, delimiter):
"""Parses a raw CSV file to a JSON-line object."""
"""Parses a raw CSV file to a JSON-like object."""

return parsed_data
```
Expand All @@ -80,18 +80,18 @@ If you are interested in understanding how docstrings work, Python’s PEP (Pyth

The difference between <code>"""docstrings"""</code> and <code># comments</code> have to do with who the reader will be. Within the a Python shell, if you call <code>help</code> on a particular function or class, it will return the <code>"""docstring"""</code> that the developer has written.

There are also documentation programs that look specifically for <code>"""docstrings"""</code> to help the developer automatically produce documentation separated out of the code. Within docstrings, it’s helpful to say imperatively what the function/method or class is supposed to do. Examples of how the documented code should work can also be written in the docstrings (and, subsequently, tested). <code># comments</code>, on the otherhand, are for those reading through the code — the comments are to simply say what a specific piece/line of code is meant to do. Inline <code># comments</code> are always appreciated by those reading through your code. Many developers also litter <code># TODO</code> or <code># FIXME</code> statements for combing through later.
There are also documentation programs that look specifically for <code>"""docstrings"""</code> to help the developer automatically produce documentation separated out of the code. Within docstrings, it’s helpful to say imperatively what the function/method or class is supposed to do. Examples of how the documented code should work can also be written in the docstrings (and, subsequently, tested). <code># comments</code>, on the other hand, are for those reading through the code — the comments are to simply say what a specific piece/line of code is meant to do. Inline <code># comments</code> are always appreciated by those reading through your code. Many developers also litter <code># TODO</code> or <code># FIXME</code> statements for combing through later.
</div>
</div>


What we have now is a pretty good skeleton - we know what parameters the function will take (<code>raw_file</code> and <code>delimiter</code>), what it is supposed to do (our <code>"""doc-string"""</code>), and what it will return, <code>parsed_data</code>. Notice how the parameters and the return value is descriptive in itself.

Let’s sketch out, with comments, how we want this function to take a raw file and give us the format that we want. First, let’s open the file, and the read the file, then build the parsed_data element.
Let’s sketch out, with comments, how we want this function to take a raw file and give us the format that we want. First, let’s open the file, and then read the file, then build the <code>parsed_data</code> element.

```python
def parse(raw_file, delimiter):
"""Parses a raw CSV file to a JSON-line object"""
"""Parses a raw CSV file to a JSON-like object"""

# Open CSV file

Expand All @@ -116,13 +116,13 @@ So we’ve told Python to open the file, now we have to read the file. We have t
csv_data = csv.reader(opened_file, delimiter=delimiter)
```

Here, <code>csv.reader</code> is a function of the CSV module. We gave it two parameters: opened_file, and delimiter. It’s easy to get confused when parameters and variables share names. In <code>delimiter=delimiter</code>, the first <code>delimiter</code> is referring to the name of the parameter that <code>csv.reader</code> needs; the second <code>delimiter</code> refers to the argument that our <code>parse</code> function takes in.
Here, <code>csv.reader</code> is a function of the CSV module. We gave it two parameters: <code>opened_file</code>, and <code>delimiter</code>. It’s easy to get confused when parameters and variables share names. In <code>delimiter=delimiter</code>, the first <code>delimiter</code> is referring to the name of the parameter that <code>csv.reader</code> needs; the second <code>delimiter</code> refers to the argument that our <code>parse</code> function takes in.

Just to quickly put these two lines in our <code>parse</code> function:

```python
def parse(raw_file, delimiter):
"""Parses a raw CSV file to a JSON-line object"""
"""Parses a raw CSV file to a JSON-like object"""

# Open CSV file
opened_file = open(raw_file)
Expand Down Expand Up @@ -225,7 +225,7 @@ if __name__ == "__main__":
main()
```

it will call the `main()` function. By doing the `name == __main__` check, you can have that code only execute when you want to run the module as a program (via the command line) and not have it execute when someone just wants to import the `parse()` function itself into another Python file. This is referred to as “boilerplate code” — code doesn’t really do anything and yet is necessary.
it will call the `main()` function. By doing the `__name__ == __main__` check, you can have that code only execute when you want to run the module as a program (via the command line) and not have it execute when someone just wants to import the `parse()` function itself into another Python file. This is referred to as “boilerplate code” — code that doesn’t really do anything and yet is necessary.

### Putting it to action
So you’ve written the parse function and your `parse.py` file looks like [mine](https://github.com/econchick/new-coder/blob/master/dataviz/tutorial_source/parse.py). Now what? Let’s run it and parse some d*mn files!
Expand All @@ -249,7 +249,7 @@ Users/lynnroot/MyProjects/new-coder/dataviz/

Go ahead and save your copy of `parse.py` into `MySourceFiles` (through “Save As” within your text editor). You should see the file in the directory if you return to your terminal and type `ls`.

To run the python code, you have to tell the terminal to execute the parse.py file with python:
To run the python code, you have to tell the terminal to execute the <code>parse.py</code> file with python:

```bash
(DataVizProj) $ python parse.py
Expand All @@ -266,7 +266,7 @@ The output from the `(DataVizProj) $ python parse.py` should look like a bunch o
'18:59', 'Date': '02/18/2003', 'X': '-122.445006858202', 'Resolution': 'NONE'}]
```

You see this output because in the `def main()` function, and you explicitly say `print new_data` which feeds to the output of the terminal. You could, for instance, not print the `new_data` variable, and just pass the `new_data` variable to another function. Coincidently, that’s what [Part II]( {{get_url("Part-2-Graph/")}}) and [Part III]( {{get_url("/Part-3-Map/")}}) are about!
You see this output because in the `def main()` function you explicitly say `print new_data` which feeds to the output of the terminal. You could, for instance, not print the `new_data` variable, and just pass the `new_data` variable to another function. Coincidently, that’s what [Part II]( {{get_url("Part-2-Graph/")}}) and [Part III]( {{get_url("/Part-3-Map/")}}) are about!

### Explore further

Expand Down