Asking for Help With Code

Last updated: January 18, 2021

When asking for help with code, you should always include the following with your question:

  1. Your code
  2. Any inputs needed for your code to run
  3. The output (or errors) you see when you run your code
  4. What you expect your output to be

If you do all of this, you will have a “reproducible example” of your problem, and enough information for another person to understand how to help you.

Always run your example in a clean environment before asking for help to make sure it can actually run on someone else’s system, and produces the same problem you are seeing. For Jupyter notebooks, this means creating a new notebook just with your reproducible example code. For R, you should use reprex (see below).

Data analysis code

The same as above applies to data analysis code at a high level, but you should handle the data in question (#2 in the list above) in a specific way:

You typically will not want to include your entire dataset because it may be proprietary or too large.

Rather, construct a small sample dataset.

Your bug may in fact be due to malformed data rather than a bug in your analysis code. Constructing a sample dataset may actually help you to figure out if data issues are the source of your problem.

Your sample data should have the following characteristics:

  1. It should be as small as possible while still clearly demonstrating your problem.
  2. It should use entirely fake values to avoid leaking anything confidential/proprietary from your actual data.
  3. It should be as simple as possible while still being realistic. For example, if your data has a name variable, there is typically no need1 to come up with a full name like John C. Smith for each row. Instead, just use A, B, C.
  4. If possible, generate data inline (more on this below).

Inline sample data

To avoid having to pass around sample .csv files along with your code, it’s helpful to create your sample data in code.

Here’s how you can do this in R (with tidyverse):

df <- tribble(
  ~participant_id, ~favorite_fruit_1, ~favorite_fruit_2, ~favorite_fruit_3,
  1,               "Banana",          "Apple",           "Dragon fruit",
  2,               "Apple",           "Strawberry",      NA,
  3,               "Banana",          NA,                NA,
  4,               "Blueberry",       "Kiwi",            NA
)

Here’s what the df created above looks like:

## # A tibble: 4 x 4
##   participant_id favorite_fruit_1 favorite_fruit_2 favorite_fruit_3
##            <dbl> <chr>            <chr>            <chr>
## 1              1 Banana           Apple            Dragon fruit
## 2              2 Apple            Strawberry       <NA>
## 3              3 Banana           <NA>             <NA>
## 4              4 Blueberry        Kiwi             <NA>

Here’s how to do something similar in Python (using the IPython REPL):

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([{
   ...:     "participant_id": 1,
   ...:     "favorite_fruit_1": "Banana",
   ...:     "favorite_fruit_2": "Apple",
   ...:     "favorite_fruit_3": "Dragon fruit"
   ...: }, {
   ...:     "participant_id": 2,
   ...:     "favorite_fruit_1": "Apple",
   ...:     "favorite_fruit_2": "Strawberry",
   ...:     "favorite_fruit_3": None
   ...: }, {
   ...:     "participant_id": 3,
   ...:     "favorite_fruit_1": "Banana",
   ...:     "favorite_fruit_2": None,
   ...:     "favorite_fruit_3": None
   ...: }, {
   ...:     "participant_id": 4,
   ...:     "favorite_fruit_1": "Blueberry",
   ...:     "favorite_fruit_2": "Kiwi",
   ...:     "favorite_fruit_3": None
   ...: }])

In [3]: df
Out[3]:
   participant_id favorite_fruit_1 favorite_fruit_2 favorite_fruit_3
0               1           Banana            Apple     Dragon fruit
1               2            Apple       Strawberry             None
2               3           Banana             None             None
3               4        Blueberry             Kiwi             None

Example help request with R code

I am having trouble with the following code. Here is a reproducible example:

library(tidyverse)

df <- tribble(
  ~weight_kg, ~height_m,
  45,         1.9,
  50,         2.1,
  54,         2.4
)
df %>% mutate(
  bmi = weight / height ^ 2
)
#> Error: object 'weight' not found

I am trying to create a bmi variable, but instead of getting this variable I am getting the error above.

This example has everything it needs:

  1. My code ✅
  2. Inputs ✅ (the tribble defines my input data frame)
  3. Output with errors ✅ (Error: object 'weight' not found)
  4. Expected output ✅ (the sentence beginning with “I am trying to create a bmi variable…“)

Using reprex with R

If you are using R, you should consider using the reprex package when asking for help with code, which is very helpful for creating reproducible examples in R.

To use reprex:

  1. Copy the complete set of code for your reproducible example so it’s in your system clipboard.
  2. Run reprex() in RStudio.
  3. This will open up a viewer window showing the executed version of your code after running it in a clean environment (i.e., no existing objects or loaded packages). This is key for ensuring reproducibility.

  1. Unless, of course, your issue involves code for parsing the name variable, in which case your sample data for name should be realistic fake names. [return]