Python Archives - WikiQuora

0

W3spoint99Begginer

Asked: December 27, 2024In: Python

How to split a list into equally-sized chunks in Python?

0

How to split a list into equally-sized chunks in Python?

Saralyn Begginer

Added an answer on December 27, 2024 at 6:25 am

Here's a generator that yields evenly-sized chunks: def chunks(lst, n): """Yield successive n-sized chunks from lst.""" for i in range(0, len(lst), n): yield lst[i:i + n] import pprint pprint.pprint(list(chunks(range(10, 75), 10))) [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25,Read more

Here’s a generator that yields evenly-sized chunks:

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

For Python 2, using xrange instead of range:

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in xrange(0, len(lst), n):
        yield lst[i:i + n]

Below is a list comprehension one-liner. The method above is preferable, though, since using named functions makes code easier to understand. For Python 3:

[lst[i:i + n] for i in range(0, len(lst), n)]

For Python 2:

[lst[i:i + n] for i in xrange(0, len(lst), n)]

See less

0

W3spoint99Begginer

Asked: December 26, 2024In: Python

How to make good reproducible pandas examples?

0

Having spent a decent amount of time watching both the r and pandas tags on SO, the impression that I get is that pandas questions are less likely to contain reproducible data. This is ...

Saralyn Begginer
Added an answer on December 26, 2024 at 2:02 pm
The Good: Do include a small example DataFrame, either as runnable code: In [1]: df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'B']) or make it "copy and pasteable" using pd.read_clipboard(sep=r'\s\s+'). In [2]: df Out[2]: A B 0 1 2 1 1 3 2 4 6 Test it yourself to make sure it works andRead more

The Good:

Do include a small example DataFrame, either as runnable code:
In [1]: df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'B'])

or make it “copy and pasteable” using pd.read_clipboard(sep=r'\s\s+').

In [2]: df Out[2]: A B 0 1 2 1 1 3 2 4 6

Test it yourself to make sure it works and reproduces the issue.

You can format the text for Stack Overflow by highlighting and using Ctrl+K (or prepend four spaces to each line), or place three backticks (“`) above and below your code with your code unindented.

I really do mean small. The vast majority of example DataFrames could be fewer than 6 rows,^{[citation needed]} and I bet I can do it in 5. Can you reproduce the error with df = df.head()? If not, fiddle around to see if you can make up a small DataFrame which exhibits the issue you are facing.
But every rule has an exception, the obvious one being for performance issues (in which case definitely use %timeit and possibly %prun to profile your code), where you should generate:

df = pd.DataFrame(np.random.randn(100000000, 10))

Consider using np.random.seed so we have the exact same frame. Having said that, “make this code fast for me” is not strictly on topic for the site.

For getting runnable code, df.to_dict is often useful, with the different orient options for different cases. In the example above, I could have grabbed the data and columns from df.to_dict('split').

Write out the outcome you desire (similarly to above)
In [3]: iwantthis Out[3]: A B 0 1 5 1 4 6

Explain where the numbers come from:

The 5 is the sum of the B column for the rows where A is 1.

Do show the code you’ve tried:
In [4]: df.groupby('A').sum() Out[4]: B A 1 5 4 6

But say what’s incorrect:

The A column is in the index rather than a column.

Do show you’ve done some research (search the documentation, search Stack Overflow), and give a summary:

The docstring for sum simply states “Compute sum of group values”

The groupby documentation doesn’t give any examples for this.

Aside: the answer here is to use df.groupby('A', as_index=False).sum().

If it’s relevant that you have Timestamp columns, e.g. you’re resampling or something, then be explicit and apply pd.to_datetime to them for good measure.
df['date'] = pd.to_datetime(df['date']) # this column ought to be date.

Sometimes this is the issue itself: they were strings.

The Bad:

Don’t include a MultiIndex, which we can’t copy and paste (see above). This is kind of a grievance with Pandas’ default display, but nonetheless annoying:
In [11]: df Out[11]: C A B 1 2 3 2 6

The correct way is to include an ordinary DataFrame with a set_index call:

In [12]: df = pd.DataFrame([[1, 2, 3], [1, 2, 6]], columns=['A', 'B', 'C']) In [13]: df = df.set_index(['A', 'B']) In [14]: df Out[14]: C A B 1 2 3 2 6

Do provide insight to what it is when giving the outcome you want:
B A 1 1 5 0

Be specific about how you got the numbers (what are they)… double check they’re correct.

If your code throws an error, do include the entire stack trace. This can be edited out later if it’s too noisy. Show the line number and the corresponding line of your code which it’s raising against.

Pandas 2.0 introduced a number of changes, and Pandas 1.0 before that, so if you’re getting unexpected output, include the version:
pd.__version__

On that note, you might also want to include the version of Python, your OS, and any other libraries. You could use pd.show_versions() or the session_info package (which shows loaded libraries and Jupyter/IPython environment).

The Ugly:

Don’t link to a CSV file we don’t have access to (and ideally don’t link to an external source at all).
df = pd.read_csv('my_secret_file.csv') # ideally with lots of parsing options

Most data is proprietary, we get that. Make up similar data and see if you can reproduce the problem (something small).

Don’t explain the situation vaguely in words, like you have a DataFrame which is “large”, mention some of the column names in passing (be sure not to mention their dtypes). Try and go into lots of detail about something which is completely meaningless without seeing the actual context. Presumably no one is even going to read to the end of this paragraph.
Essays are bad; it’s easier with small examples.

Don’t include 10+ (100+??) lines of data munging before getting to your actual question.
Please, we see enough of this in our day jobs. We want to help, but not like this…. Cut the intro, and just show the relevant DataFrames (or small versions of them) in the step which is causing you trouble.

See less
0

Share
Share

Share on Facebook

Share on Twitter

Share on LinkedIn

Share on WhatsApp

Report

0

W3spoint99Begginer

Asked: December 26, 2024In: Python

How Slicing in Python works?

0

How does Python’s slice notation (Slicing) work? That is: when I write code like a[x:y:z], a[:], a[::2] etc., how can I understand which elements end up in the slice?

Saralyn Begginer
Added an answer on December 26, 2024 at 1:59 pm
This answer was edited.
The syntax is: a[start:stop] # items start through stop-1 a[start:] # items start through the rest of the array a[:stop] # items from the beginning through stop-1 a[:] # a copy of the whole array There is also the step value, which can be used with any of the above: a[start:stop:step] # start througRead more

The syntax is:

a[start:stop] # items start through stop-1 a[start:] # items start through the rest of the array a[:stop] # items from the beginning through stop-1 a[:] # a copy of the whole array

There is also the step value, which can be used with any of the above:

a[start:stop:step] # start through not past stop, by step

The key point to remember is that the :stop value represents the first value that is not in the selected slice. So, the difference between stop and start is the number of elements selected (if step is 1, the default).

The other feature is that start or stop may be a negative number, which means it counts from the end of the array instead of the beginning. So:

a[-1] # last item in the array a[-2:] # last two items in the array a[:-2] # everything except the last two items

Similarly, step may be a negative number:

a[::-1] # all items in the array, reversed a[1::-1] # the first two items, reversed a[:-3:-1] # the last two items, reversed a[-3::-1] # everything except the last two items, reversed

Python is kind to the programmer if there are fewer items than you ask for. For example, if you ask for a[:-2] and a only contains one element, you get an empty list instead of an error. Sometimes you would prefer the error, so you have to be aware that this may happen.

Relationship with the slice object

A slice object can represent a slicing operation, i.e.:

a[start:stop:step]

is equivalent to:

a[slice(start, stop, step)]

Slice objects also behave slightly differently depending on the number of arguments, similar to range(), i.e. both slice(stop) and slice(start, stop[, step]) are supported. To skip specifying a given argument, one might use None, so that e.g. a[start:] is equivalent to a[slice(start, None)] or a[::-1] is equivalent to a[slice(None, None, -1)].

While the :-based notation is very helpful for simple slicing, the explicit use of slice() objects simplifies the programmatic generation of slicing.
See less
0

Share
Share

Share on Facebook

Share on Twitter

Share on LinkedIn

Share on WhatsApp

Report

Python

The Good:

The Bad:

The Ugly:

Relationship with the `slice` object

What is a NullPointerException?

How to prevent SQL injection in PHP?

How to create pivot table in mysql?

Sign Up

Sign In

Forgot Password

Python

WikiQuora Latest Questions

How to split a list into equally-sized chunks in Python?

How to make good reproducible pandas examples?

The Good:

The Bad:

The Ugly:

How Slicing in Python works?

Relationship with the slice object

What is a NullPointerException?

How to prevent SQL injection in PHP?

How to create pivot table in mysql?

Relationship with the `slice` object