Having spent a decent amount of time watching both the r and pandas tags on SO, the impression that I get is that pandas questions are less likely to contain reproducible data. This is ...
The syntax is: a[start:stop] # items start through stop-1 a[start:] # items start through the rest of the array a[:stop] # items from the beginning through stop-1 a[:] # a copy of the whole array There is also the step value, which can be used with any of the above: a[start:stop:step] # start througRead more
The syntax is:
a[start:stop] # items start through stop-1
a[start:] # items start through the rest of the array
a[:stop] # items from the beginning through stop-1
a[:] # a copy of the whole array
There is also the step
value, which can be used with any of the above:
a[start:stop:step] # start through not past stop, by step
The key point to remember is that the :stop
value represents the first value that is not in the selected slice. So, the difference between stop
and start
is the number of elements selected (if step
is 1, the default).
The other feature is that start
or stop
may be a negative number, which means it counts from the end of the array instead of the beginning. So:
a[-1] # last item in the array
a[-2:] # last two items in the array
a[:-2] # everything except the last two items
Similarly, step
may be a negative number:
a[::-1] # all items in the array, reversed
a[1::-1] # the first two items, reversed
a[:-3:-1] # the last two items, reversed
a[-3::-1] # everything except the last two items, reversed
Python is kind to the programmer if there are fewer items than you ask for. For example, if you ask for a[:-2]
and a
only contains one element, you get an empty list instead of an error. Sometimes you would prefer the error, so you have to be aware that this may happen.
Relationship with the slice
object
A slice
object can represent a slicing operation, i.e.:
a[start:stop:step]
is equivalent to:
a[slice(start, stop, step)]
Slice objects also behave slightly differently depending on the number of arguments, similar to range()
, i.e. both slice(stop)
and slice(start, stop[, step])
are supported. To skip specifying a given argument, one might use None
, so that e.g. a[start:]
is equivalent to a[slice(start, None)]
or a[::-1]
is equivalent to a[slice(None, None, -1)]
.
While the :
-based notation is very helpful for simple slicing, the explicit use of slice()
objects simplifies the programmatic generation of slicing.
The Good: Do include a small example DataFrame, either as runnable code: In [1]: df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'B']) or make it "copy and pasteable" using pd.read_clipboard(sep=r'\s\s+'). In [2]: df Out[2]: A B 0 1 2 1 1 3 2 4 6 Test it yourself to make sure it works andRead more
The Good:
or make it “copy and pasteable” using
pd.read_clipboard(sep=r'\s\s+')
.Test it yourself to make sure it works and reproduces the issue.
df = df.head()
? If not, fiddle around to see if you can make up a small DataFrame which exhibits the issue you are facing.But every rule has an exception, the obvious one being for performance issues (in which case definitely use
%timeit
and possibly%prun
to profile your code), where you should generate:Consider using
np.random.seed
so we have the exact same frame. Having said that, “make this code fast for me” is not strictly on topic for the site.df.to_dict
is often useful, with the differentorient
options for different cases. In the example above, I could have grabbed the data and columns fromdf.to_dict('split')
.Explain where the numbers come from:
But say what’s incorrect:
Aside: the answer here is to use
df.groupby('A', as_index=False).sum()
.pd.to_datetime
to them for good measure.Sometimes this is the issue itself: they were strings.
The Bad:
The correct way is to include an ordinary DataFrame with a
set_index
call:Be specific about how you got the numbers (what are they)… double check they’re correct.
On that note, you might also want to include the version of Python, your OS, and any other libraries. You could use
pd.show_versions()
or thesession_info
package (which shows loaded libraries and Jupyter/IPython environment).The Ugly:
Most data is proprietary, we get that. Make up similar data and see if you can reproduce the problem (something small).
Essays are bad; it’s easier with small examples.
Please, we see enough of this in our day jobs. We want to help, but not like this…. Cut the intro, and just show the relevant DataFrames (or small versions of them) in the step which is causing you trouble.