When to create a generator or return simply a list in Python? [closed]

Question

Closed. This question is off-topic. It is not currently accepting answers.

Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site.

Closed 7 years ago.

Improve this question

I have this kind of instance method:

def list_records(self,race_type,year=0):
    yearnow= datetime.now().year
    yearlist = [yearnow - i for i in range(4)]
    if not year:
        for y in yearlist:
            if self.records:
                yield self.records[utils.dateracetype_to_DBcolumn(y,str(race_type))]
            else:
                yield None
    else:
        if self.records:
            yield self.records[utils.dateracetype_to_DBcolumn(year,str(race_type))]
        else:
            yield None

I am calling it mainly through list comprehension, but I could return it simply as a list, what is the most pythonic way to do this? What would be the reasons to chose one method or the other?
let's say I'll go with a generator, should I turn the 2 last 'yield' into return values as there will be only one value to be returned?
Is there a more optimized way to code this method, as I have the feeling there's too much nested conditions/loop in there?
Is there a Python convention to name generators? something like 'gen...'

Thanks

1) Depends on the context, which is missing in this question. Please take a look at the help center. 3) I've seen much worse. I'd probably create 2 sub-functions for list_records, but whether that makes it any better is a matter of taste. 4) All Python conventions are in the PEP. However, generators are usually not named. Their result is named. How the result was created is irrelevant. The naming you propose smells like Hungarian notation gone wrong and I'd advise against it. — Mast
– Mast ♦, Commented Jul 3, 2018 at 7:50
The exception to naming generators is of-course if you create a function which yeals the result of a generator. In that case it would be totally ok to name the function generate_whatever. — Mast
– Mast ♦, Commented Jul 3, 2018 at 7:53
The context is flexible for this case, I could either manage a list or a generator. Regarding the naming, in a few week I want to remember at first glance that this method has to be used as a generator, but I assume that yield keyword will help me reminding ;) — FTG
– FTG, Commented Jul 3, 2018 at 8:47

Gareth Rees · Accepted Answer · 2018-07-03 09:31:41Z

If you are going to need the whole list for some reason, then make a list. But if you only need one item at a time, use a generator. The reasons for preferring generators are (i) you only need the memory for one item at a time; (ii) if the items are coming from a file or network connection or database then generators give you some parallelism because you can be processing the first item without having to wait for all items to arrive; (iii) generators can handle infinite streams of items.
There is no special convention for naming generators in Python. It is best to name a function after the things it generates.

Instead of:

[yearnow - i for i in range(4)]

write:

range(yearnow, yearnow - 4, -1)

The duplicated code could be avoided like this:

if year:
    year_range = (year,)
else:
    year_now = datetime.now().year
    year_range = range(year_now, year_now - 4, -1)
for y in year_range:
    if self.records:
        yield self.records[utils.dateracetype_to_DBcolumn(y, str(race_type))]
    else:
        yield None

(But I don't recommend implementing it like this, as I'll explain below.)

It is conventional to use None as the default value for an argument like year. A default like year=0 might be confused with a legitimate value for year. For example, in astronomical year numbering there is a year zero, and how would you pass this to your function?
Instead of yielding an exceptional value (here None) when there are no results to generate, just generate no results. The problem with exceptional values is that the caller might forget to check for them, whereas it is usually straightforward to handle the case of no results.
I don't think it's a good idea to use datetime.now().year for the case where the caller doesn't specify a year. The problem is that close to midnight on 31st December, there is a race: the current year might be 2018 just before the call to list_records happens, and 2019 just after. What should list_records return in this case? The Zen of Python says,

In the face of ambiguity, refuse the temptation to guess.

so it would be better to avoid this problem by requiring the caller to specify the year.

The function does two different things: it generates the record for a year (if a year was specified), or it generates records from the last four years (if no year was specified). The single responsibility principle suggests that these two things should be implemented by two functions:

def year_record(self, race_type, year):
    """Return the record for race_type and year.
    Raise IndexError if there are no records.

    """
    return self.records[utils.dateracetype_to_DBcolumn(year, str(race_type))]

def recent_records(self, race_type, year, n=4):
    """Generate the records for race_type for the n years up to (and
    including) year, in reverse order.

    """
    for y in range(year, year - n, -1):
        try:
            yield self.year_record(race_type, y)
        except IndexError:
            pass

If the caller needs to know whether there was a record for each year, then I recommend generating the necessary data explicitly, like this:

def recent_records(self, race_type, year, n=4):
    """Generate the records for race_type for the n years up to (and
    including) year, in reverse order. For each year, generate a tuple
    (year, have_record, record) where have_record is True if there
    is a record for the given year, or False if not.

    """
    for y in range(year, year - n, -1):
        try:
            yield year, True, self.year_record(race_type, y)
        except IndexError:
            yield year, False, None

Then the caller can write:

for year, have_record, record in self.recent_records(...):
    if have_record:
        # ...

Wonderful answer, thanks! Regarding 2) why more range than list comprehension? Because it is a generator itself? — FTG
– FTG, Commented Jul 3, 2018 at 8:22
Another point: I had put a yield None since I need to know when no year record exists to manage the way I will display it (should be '-'). So maybe instead of pass I could write yield None? — FTG
– FTG, Commented Jul 3, 2018 at 9:05

Stack Exchange Network

When to create a generator or return simply a list in Python? [closed]

1 Answer 1

Hot Network Questions

When to create a generator or return simply a list in Python? [closed]

1 Answer 1

Related

Hot Network Questions