Having spent a decent amount of time watching both the r and pandas tags on SO, the impression that I get is that pandas questions are less likely to contain reproducible data. This is ...
Home/example
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
The Good: Do include a small example DataFrame, either as runnable code: In [1]: df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'B']) or make it "copy and pasteable" using pd.read_clipboard(sep=r'\s\s+'). In [2]: df Out[2]: A B 0 1 2 1 1 3 2 4 6 Test it yourself to make sure it works andRead more
The Good:
or make it “copy and pasteable” using
pd.read_clipboard(sep=r'\s\s+').Test it yourself to make sure it works and reproduces the issue.
df = df.head()? If not, fiddle around to see if you can make up a small DataFrame which exhibits the issue you are facing.But every rule has an exception, the obvious one being for performance issues (in which case definitely use
%timeitand possibly%prunto profile your code), where you should generate:Consider using
np.random.seedso we have the exact same frame. Having said that, “make this code fast for me” is not strictly on topic for the site.df.to_dictis often useful, with the differentorientoptions for different cases. In the example above, I could have grabbed the data and columns fromdf.to_dict('split').Explain where the numbers come from:
But say what’s incorrect:
Aside: the answer here is to use
df.groupby('A', as_index=False).sum().pd.to_datetimeto them for good measure.Sometimes this is the issue itself: they were strings.
The Bad:
The correct way is to include an ordinary DataFrame with a
set_indexcall:Be specific about how you got the numbers (what are they)… double check they’re correct.
On that note, you might also want to include the version of Python, your OS, and any other libraries. You could use
pd.show_versions()or thesession_infopackage (which shows loaded libraries and Jupyter/IPython environment).The Ugly:
- Don’t link to a CSV file we don’t have access to (and ideally don’t link to an external source at all).
- Don’t explain the situation vaguely in words, like you have a DataFrame which is “large”, mention some of the column names in passing (be sure not to mention their dtypes). Try and go into lots of detail about something which is completely meaningless without seeing the actual context. Presumably no one is even going to read to the end of this paragraph.
- Don’t include 10+ (100+??) lines of data munging before getting to your actual question.
See lessMost data is proprietary, we get that. Make up similar data and see if you can reproduce the problem (something small).
Essays are bad; it’s easier with small examples.
Please, we see enough of this in our day jobs. We want to help, but not like this…. Cut the intro, and just show the relevant DataFrames (or small versions of them) in the step which is causing you trouble.