Asking for Help With Code
Last updated: January 18, 2021
When asking for help with code, you should always include the following with your question:
- Your code
- Any inputs needed for your code to run
- The output (or errors) you see when you run your code
- What you expect your output to be
If you do all of this, you will have a “reproducible example” of your problem, and enough information for another person to understand how to help you.
Always run your example in a clean environment before asking for help to make sure it can actually run on someone else’s system, and produces the same problem you are seeing. For Jupyter notebooks, this means creating a new notebook just with your reproducible example code. For R, you should use reprex
(see below).
Data analysis code
The same as above applies to data analysis code at a high level, but you should handle the data in question (#2 in the list above) in a specific way:
You typically will not want to include your entire dataset because it may be proprietary or too large.
Rather, construct a small sample dataset.
Your bug may in fact be due to malformed data rather than a bug in your analysis code. Constructing a sample dataset may actually help you to figure out if data issues are the source of your problem.
Your sample data should have the following characteristics:
- It should be as small as possible while still clearly demonstrating your problem.
- It should use entirely fake values to avoid leaking anything confidential/proprietary from your actual data.
- It should be as simple as possible while still being realistic. For example, if your data has a
name
variable, there is typically no need1 to come up with a full name likeJohn C. Smith
for each row. Instead, just useA
,B
,C
. - If possible, generate data inline (more on this below).
Inline sample data
To avoid having to pass around sample .csv
files along with your code, it’s helpful to create your sample data in code.
Here’s how you can do this in R (with tidyverse
):
df <- tribble(
~participant_id, ~favorite_fruit_1, ~favorite_fruit_2, ~favorite_fruit_3,
1, "Banana", "Apple", "Dragon fruit",
2, "Apple", "Strawberry", NA,
3, "Banana", NA, NA,
4, "Blueberry", "Kiwi", NA
)
Here’s what the df
created above looks like:
## # A tibble: 4 x 4
## participant_id favorite_fruit_1 favorite_fruit_2 favorite_fruit_3
## <dbl> <chr> <chr> <chr>
## 1 1 Banana Apple Dragon fruit
## 2 2 Apple Strawberry <NA>
## 3 3 Banana <NA> <NA>
## 4 4 Blueberry Kiwi <NA>
Here’s how to do something similar in Python (using the IPython REPL):
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([{
...: "participant_id": 1,
...: "favorite_fruit_1": "Banana",
...: "favorite_fruit_2": "Apple",
...: "favorite_fruit_3": "Dragon fruit"
...: }, {
...: "participant_id": 2,
...: "favorite_fruit_1": "Apple",
...: "favorite_fruit_2": "Strawberry",
...: "favorite_fruit_3": None
...: }, {
...: "participant_id": 3,
...: "favorite_fruit_1": "Banana",
...: "favorite_fruit_2": None,
...: "favorite_fruit_3": None
...: }, {
...: "participant_id": 4,
...: "favorite_fruit_1": "Blueberry",
...: "favorite_fruit_2": "Kiwi",
...: "favorite_fruit_3": None
...: }])
In [3]: df
Out[3]:
participant_id favorite_fruit_1 favorite_fruit_2 favorite_fruit_3
0 1 Banana Apple Dragon fruit
1 2 Apple Strawberry None
2 3 Banana None None
3 4 Blueberry Kiwi None
Example help request with R code
I am having trouble with the following code. Here is a reproducible example:
library(tidyverse)
df <- tribble(
~weight_kg, ~height_m,
45, 1.9,
50, 2.1,
54, 2.4
)
df %>% mutate(
bmi = weight / height ^ 2
)
#> Error: object 'weight' not found
I am trying to create a
bmi
variable, but instead of getting this variable I am getting the error above.
This example has everything it needs:
- My code ✅
- Inputs ✅ (the
tribble
defines my input data frame) - Output with errors ✅ (
Error: object 'weight' not found
) - Expected output ✅ (the sentence beginning with “I am trying to create a
bmi
variable…“)
Using reprex
with R
If you are using R, you should consider using the reprex
package when asking for help with code, which is very helpful for creating reproducible examples in R.
To use reprex
:
- Copy the complete set of code for your reproducible example so it’s in your system clipboard.
- Run
reprex()
in RStudio. - This will open up a viewer window showing the executed version of your code after running it in a clean environment (i.e., no existing objects or loaded packages). This is key for ensuring reproducibility.
- Unless, of course, your issue involves code for parsing the
name
variable, in which case your sample data forname
should be realistic fake names. [return]