Thursday, December 4, 2008

get dates from excel with python xlrd

a1_as_datetime = datetime.datetime(*xlrd.xldate_as_tuple(a1, 0))

UPDATE: Please read the discussion of the second argument to xldate_as_tuple - "datemode" in the comments section of this post before using this example. It is LIKELY that hard-coding the "datemode" option will not meet your long term needs.

I had to piece this line together from two other articles, sorry don't remember which.

Full Example:

>>> import datetime, xlrd
>>> book = xlrd.open_workbook("myfile.xls")
>>> sh = book.sheet_by_index(0)
>>> a1 = sh.cell_value(rowx=0, colx=0)
>>> print "Cell A1 is ", a1
Cell A1 is 39811.0
>>> a1_as_datetime = datetime.datetime(*xlrd.xldate_as_tuple(a1, book.datemode))
>>> print 'datetime: %s' % a1_as_datetime
datetime: 2008-12-29 00:00:00

This might make more sense if you're familiar with xlrd - A Python module for extracting data from MS Excel ™ spreadsheet files.

If you are familiar with xlrd, then the only part really worth discussing is xldate_as_tuple, which will convert the float that excel is using to store the date as something more useful, like a tuple:
(2008, 12, 29, 0, 0, 0)

Note that the first argument to the xldate_as_tuple function is the variable I defined as a1. xldate_as_tuple will not accept a cell reference 'a1' or some such thing - you have to give it the float!

The datetime module has a constructor for dates that requires at minimum three positional arguments:
datetime( year, month, day[, hour[, minute[, second[, microsecond[, tzinfo]]]]])

This would also work:
a1_as_date =*xlrd.xldate_as_tuple(a1, 0)[:3])

You can pass the items of the tuple as positional arguments by prefacing the tuple with an asterisk (wtg python!)


Anonymous said...

Thanks, clayg. Works like a charm. :-)

Unknown said...

Two points:

1. I don't understand the comment about xldate_as_tuple() requiring a float argument -- that's exactly what an "xldate" is. I would regard it as an extremely bad example of API design if it were to accept any thing else. It's a bit like complaining that math.sin() requires a float number of radians -- it borks on math.sin("-123d59m59s" :-)
It's called specialisation -- sin cos tan etc don't have to include all the conversion-to-radians functionality, which is done by separate conversion routines, and they don't need to be upgraded if someone invents a new method of representing angles.

2. Moving to the second argument (datemode): you have put in 0 (zero), with no comment at all. I strongly suggest that you read the xlrd documentation, both the documentation for the xldate_as_tuple function, and section (2) of the general discussion on dates up the front of the documentation.

When using this module's xldate_as_tuple function to convert numbers from a workbook, you must use the datemode attribute of the Book object. If you guess, or make a judgement depending on where you believe the workbook was created, you run the risk of being 1462 days out of kilter."""

clayg said...

Those are both great points John, Thank you!

I commented on my reference to 'a1' just because I thought it added some clarity the very first line in the post:
a1_as_datetime = datetime.datetime(*xlrd.xldate_as_tuple(a1, 0))

I might could have also said:
a1_as_datetime = datetime.datetime(*xlrd.xldate_as_tuple(
).cell_value(rowx=0, colx=0
), xlrd.open_workbook("myfile.xls").datemode))

but I thought it was more clear to just explain that to "cut & paste" that little bit you'd need to have an open workbook, and a sheet, and also have read the contents of some cell into a variable - then replace 'a1' with that variable name.

In regards to the datemode parameter... I probably should have spoken to it more directly. But, both times I've read a date from an excel sheet it was NOT originally created with an old version of Mac Excel - and they were both 1900 based. Once you get an idea of how all this might could be done, I highly encourage you to reference the documentation! I did, and I linked to it above more than once in the original article. But it sure doesn't hurt that you've thrown up a warning about this potential pitfall here as well - thank you.

In retrospect, I concede it would have been more correct, and a better example to replace line 6 with the following:
a1_as_datetime = datetime.datetime(*xlrd.xldate_as_tuple(a1, book.datemode))

My apologies to anyone who thought I was complaining. I was just trying to share some technical experiments and experience that I felt worthy of archiving on the interwebz.

Thanks again for the feedback!

Unknown said...

"""But, both times I've read a date from an excel sheet it was NOT originally created with an old version of Mac Excel - and they were both 1900 based."""

That's a bit like saying that you took out accident insurance 2 hours ago ago and haven't been hit by a truck yet.

Firstly, "old" (version of Mac Excel) is irrelevant; the default on Excel 2008 for the Mac is still the 1904 system.

Secondly, whether the file came from a Mac or PC is quite irrelevant; both families of Excel versions allow the user to change the datemode. If you knew that the file came from a Mac and hard-coded 1 for datemode, you have a problem if the user changed the datemode (or if it originated from a PC, then went to a Mac user who sensibly didn't change the datemode!)

I say again: there is absolutely neither reason nor excuse for hardcoding a datemode without an explicit accompanying qualification.

"""Once you get an idea of how all this might could be done, I highly encourage you to reference the documentation!"""

What does that mean? Who is "you"? What is "all this"?

clayg said...

For those who are reading this "thread" after the fact, I think it's important to note that John is the author of xlrd - and we should all be thankful for his hard work.

I am defiantly grateful. Thank you again John.

I apologize a second time if it seemed like I was "complaining"


harit said...

thanks! it just worked!

yomguy said...

working, thanks for help!

Unknown said...

thanks a lot, it helped

check out this

Anonymous said...

gracias amigo, me sirvio mucho,

jpatel said...

It looses the format of the date - e.g. if the date in the file is 1-1-2018, and second one is 1/2/2018, if I expect to write in csv, as the tuple is (year, month, date, h, m, s) format, there is no way I can get info on the original format of the column.