Wednesday, October 1, 2008

Parsing a list of numbers in Python

I find that I often need to get a selection of numbers in a range as input. I'm using Python more and more these days it seems, so I needed to port this classic function over. I must have done this 4 weeks ago - I'd been meaning to put it up here.

The valid input will be a comma separated list of integers, which could possibly contain a 'range' defined as "x-y" - where x and y are both integers.

I tried not to make any special stipulation for the order of these integers, or even that the input string would not contain bad characters.

Here it is:
#! /usr/local/bin/python
import sys
import os

# return a set of selected values when a string in the form:
# 1-4,6
# would return:
# 1,2,3,4,6
# as expected...

def parseIntSet(nputstr=""):
selection = set()
invalid = set()
# tokens are comma seperated values
tokens = [x.strip() for x in nputstr.split(',')]
for i in tokens:
try:
# typically tokens are plain old integers
selection.add(int(i))
except:
# if not, then it might be a range
try:
token = [int(k.strip()) for k in i.split('-')]
if len(token) > 1:
token.sort()
# we have items seperated by a dash
# try to build a valid range
first = token[0]
last = token[len(token)-1]
for x in range(first, last+1):
selection.add(x)
except:
# not an int and not a range...
invalid.add(i)
# Report invalid tokens before returning valid selection
print "Invalid set: " + str(invalid)
return selection
# end parseIntSet

print 'Generate a list of selected items!'
nputstr = raw_input('Enter a list of items: ')

selection = parseIntSet(nputstr)
print 'Your selection is: '
print str(selection)

When trying to copy this from someone else I came across a similar function written in Ruby, in case you needed that instead.

5 comments:

ptmcg said...

This is a fun little parsing problem. Couldn't resist trying a pyparsing version. What do you think?
============
from pyparsing import Word,nums,delimitedList

# define basic expressions for an integer or range of integers
integer = Word(nums)
intRange = integer + "-" + integer

# define expression for comma-delimited list of intRange or integer
integerList = delimitedList(intRange | integer, ",")

# convert string to int
integer.setParseAction(lambda t:int(t[0]))

# convert range to list of ints
intRange.setParseAction(lambda t:
range(t[0],t[2]+1) if t[0]>t[2] else range(t[2],t[0]+1) if t[2]>t[0] else t[0])

# sort total list, use set to remove duplicates
integerList.setParseAction(lambda t: sorted(set(t.asList())))

print integerList.parseString("1-4,6,3-2, 11, 8 - 12,5,14-14")
=================

Prints:
[1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 14]


-- Paul

Mark said...

Thank you for posting this program. I am not a Python programmer, but I did try out your script and noticed some interesting behavior:

*********************
Generate a list of selected items!
Enter a list of items: 1-4,6-12,41,50-54
Invalid set: set([])
Your selection is:
set([1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 50, 51, 52, 53, 54, 41])
*********************

The 41 comes after the range of numbers 50-54!

Here is another one that did not turn out how I thought it would, but the input numbers are somewhat out of order, so that might be the cause:

*********************
Generate a list of selected items!
Enter a list of items: 21-25, 39-37, 4,7
Invalid set: set([])
Your selection is:
set([4, 37, 38, 39, 7, 21, 22, 23, 24, 25])
*********************

This was run with Python 2.5

ptmcg said...

I screwed up the parse action on integerList - it should read: intRange.setParseAction(lambda t:
range(t[0],t[2]+1) if t[0]<t[2] else range(t[2],t[0]+1) if t[2]<t[0] else t[0])

With this change, the number ranges should all come back pretty much as you would expect: no duplicates, in ascending order.

Bruce Edge said...

Nice example.

I altered my copy at the end with:

# Report invalid tokens before returning valid selection
if len(invalid):
raise ValueError('Invalid set: {}'.format(str(invalid)))
return selection
# end parseIntSet


if __name__ == '__main__':

print 'Generate a list of selected items!'
nputstr = raw_input('Enter a list of items: ')

selection = parseIntSet(nputstr)
print 'Your selection is: '
print str(selection)

to make it more usable as in a module while still allowing cmdline use.

Jay said...

Hi clayg,

After 12 years of your original post. I'm still finding the solution useful.
It just required a little syntax tweaking to convert Python 2 to Python 3.
Python and open source are amazing.
Thank you and have a nice day!

Jay