Reading Excel file stored in S3

10,935 views
Skip to first unread message

dsi...@electionsonline.com

unread,
Oct 4, 2017, 7:50:22 PM10/4/17
to openpyxl-users
I'm trying to pass an Excel file stored in an S3 bucket to load_workbook() which doesn't seem possible. It appears that load_workbook() will only accept an OS filepath for its value and I can not first retrieve the object (in this case, the Excel file) from S3, place it in a variable, then pass that variable to load_workbook(). Is that a correct conclusion, or am I missing something obvious? Code sample follows.


from openpyxl import Workbook
from openpyxl import load_workbook
import boto3

# Get document from S3 bucket.
s3 = boto3.resource('s3')

# Get the Excel document from S3.
wbFromS3 = s3.Object('my-bucket-name', 'my-filename.xlsx')

# Load the Excel document into openpyxl memory.
wb = load_workbook(wbFromS3)

Charlie Clark

unread,
Oct 5, 2017, 3:08:00 AM10/5/17
to openpyx...@googlegroups.com
Am .10.2017, 01:50 Uhr, schrieb <dsi...@electionsonline.com>:

> I'm trying to pass an Excel file stored in an S3 bucket to
> load_workbook()
> which doesn't seem possible. It appears that load_workbook() will only
> accept an OS filepath for its value and I can not first retrieve the
> object
> (in this case, the Excel file) from S3, place it in a variable, then pass
> that variable to load_workbook(). Is that a correct conclusion, or am I
> missing something obvious?

No, your conclusion is correct: you cannot open objects from an S3 bucket
into openpyxl; you need to copy them to a file system first.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

dsi...@electionsonline.com

unread,
Oct 5, 2017, 9:00:29 AM10/5/17
to openpyxl-users
Thanks for the speedy response Charlie! Unfortunately, in my situation, moving the file from S3 to a file system defeats the purpose of using S3 in the first place. That reason being that I wanted to have S3 trigger an AWS Lambda function written in Python, and using openpyxl, to modify the Excel file and save it as a TXT file ready for batch import into Amazon Aurora.

I'll keep my fingers crossed that some future release of openpyxl supports the ability to read Excel files stored in S3, but until then will resort to an entirely different approach.

Charlie Clark

unread,
Oct 5, 2017, 9:11:54 AM10/5/17
to openpyx...@googlegroups.com
Am .10.2017, 15:00 Uhr, schrieb <dsi...@electionsonline.com>:

> Thanks for the speedy response Charlie! Unfortunately, in my situation,
> moving the file from S3 to a file system defeats the purpose of using S3
> in the first place. That reason being that I wanted to have S3 trigger
> an AWS Lambda function written in Python, and using openpyxl, to modify
> the Excel file and save it as a TXT file ready for batch import into
> Amazon Aurora.

Buzzword overload! :-)

> I'll keep my fingers crossed that some future release of openpyxl
> supports the ability to read Excel files stored in S3, but until then
> will resort to an entirely different approach.

Never going to happen as it adds a lot of complexity to the library. You
can pass in memory objects but my preference would be to use a utility
that presents S3 as a file system to openpyxl or anything else for that
matter.

sac...@wirewheel.io

unread,
Jul 13, 2018, 6:49:53 PM7/13/18
to openpyxl-users
This is possible. 
  
s3 = boto3.client('s3') obj = s3.get_object(Bucket=s3_bucket, Key=object_path) binary_data = obj['Body'].read() wb = xl.load_workbook(BytesIO(file)) Best, Sachin

wand...@uconn.edu

unread,
Jul 14, 2018, 1:52:10 PM7/14/18
to openpyxl-users
Hi Sachin,
I am in the same situation to open and manipulate an excel file.However, I can't open the excel the with code you recommended. Can you please what should I put for wb = xl.load_workbook(BytesIO(file))? I thought file should be the variable  ---  binary_data, but it didn't work. So can you please help me with that?
Message has been deleted
Message has been deleted

josuem...@gmail.com

unread,
Jul 23, 2018, 6:44:13 PM7/23/18
to openpyxl-users
It works!
Thanks Sachin!

My code:
obj= s3.get_object(Bucket='bucket-name', Key='file.xlsx') 
binary_data = obj['Body'].read()
wb = openpyxl.load_workbook(BytesIO(binary_data))
sheet = wb.active
print(sheet.cell(row = 0, column = 0).value) # To test if it works

Josué.

gagang...@gmail.com

unread,
Nov 6, 2018, 1:49:39 PM11/6/18
to openpyxl-users
Does not work. The function times out

krishna...@gmail.com

unread,
Dec 12, 2019, 4:08:40 AM12/12/19
to openpyxl-users
Hi 

i am trying to read file from s3 bucket 
instead of this am downloading file from s3 bucket
below is my script
import os, sys, csv,json
from itertools import product
import requests
from requests_aws4auth import AWS4Auth
import boto3
import botocore
import config


s3 = boto3.resource('s3')

ACCESS=config.s3bucket['aws_access_key']
SECRET=config.s3bucket['aws_secret_key']
url_e=config.s3bucket['url']
client = boto3.client(service_name="s3", region_name="us-east-1",
                          endpoint_url=url_e,
                          verify=False,
                          aws_access_key_id = ACCESS,
                          aws_secret_access_key= SECRET)

client.download_file('bucketname','downloadfile.csv','downloadfile.csv') 
print ("Downloading object %s from bucket %s" % ('downloadfile.csv','bucketname'))

#step1: file split into two files
if os.path.exists('11.csv'):
    print('File is previously present and have been deleted !')
    os.remove('11.csv')
if os.path.exists('22.csv'):
    print('File is previously present and have been deleted !')
    os.remove('22.csv')
condition = 0
#lines=0    
header = []
with open('downloadfile.csv', 'r', encoding='UTF-8') as file:
#with open(myfile, 'r', encoding='UTF-8') as file:    
    for lines in file:
        if condition ==0:
        #if lines ==0:
            header = lines.strip('\n').split(',')
        #elif lines <= 2:
        elif condition < 2:
 
some one please suggest me how to read file from s3 bucket

Regards,
Murali.

GeekyDanish

unread,
Sep 21, 2020, 8:47:50 AM9/21/20
to openpyxl-users

def read_file():
    url = get_file()
    with urllib.request.urlopen(str(url)) as url:
        s = url.read()
        wb = load_workbook(filename=BytesIO(s))
        return wb
        # print(sheet['A'])

jeff

unread,
Sep 21, 2020, 12:54:36 PM9/21/20
to openpyxl-users
> Does not work. The function times out

Does your Lambda have a suitable amount of memory configured? You might be hitting max memory, in which case the function doesn't finish and it eventually times out.

maki tachi

unread,
Sep 24, 2020, 2:02:02 AM9/24/20
to openpyxl-users
> Does not work. The function times out 

I was stumbling on the same phenomenon 
I changed the timeout setting of lambda from the initial value of 3 seconds to 1 minute and solved it! 


2020年9月22日火曜日 1:54:36 UTC+9 jeff:

Miguel Rodriguez

unread,
Oct 9, 2020, 12:32:30 PM10/9/20
to openpyxl-users
How were you able to use openpyxl inside your lambda function? I can't get this to work.

Jeff Froustet

unread,
Oct 9, 2020, 12:56:42 PM10/9/20
to openpyx...@googlegroups.com
You have to include openpyxl in your lambda's deployment package.

--
You received this message because you are subscribed to a topic in the Google Groups "openpyxl-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openpyxl-users/AhEuaVRbXSs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openpyxl-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/openpyxl-users/c16654c1-8576-42dd-ac65-9f3a9a635d35n%40googlegroups.com.

Miguel Rodriguez

unread,
Oct 9, 2020, 2:11:20 PM10/9/20
to openpyxl-users
How would I got about doing that? Cause I'm trying to add it to the requirement.txt file with these dependencies:
pandas==1.1.2
et-xmlfile==1.0.1
jdcal==1.4.1
openpyxl==3.0.5

Any help with this will be greatly appreciated

Jeff Froustet

unread,
Oct 9, 2020, 2:17:12 PM10/9/20
to openpyx...@googlegroups.com
You need to get the files for all of those library packages and include them in your lambda deployment package. One approach is to create a virtual environment, install the packages in the ve, and copy the files out of the ve's folder.

Using a deployment framework like serverless with docker makes that whole process much easier, though.

You received this message because you are subscribed to the Google Groups "openpyxl-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openpyxl-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/openpyxl-users/0e9fd65c-9a9e-4134-906a-0c2782aef225n%40googlegroups.com.
Message has been deleted

Miguel Rodriguez

unread,
Oct 9, 2020, 3:40:13 PM10/9/20
to openpyxl-users
I think I got the package working now. So I'm trying to grab an excel sheet from my s3 bucket,
manipulate it with openpyxl and enter some data values from pandas then save a copy of the new excel sheet to my s3 bucket with a pdf copy.
Would you happen to know how to save the new excel file into my s3 bucket?

Jeff Froustet

unread,
Oct 9, 2020, 3:42:32 PM10/9/20
to openpyx...@googlegroups.com
I assume you're using the AWS Python SDK, boto3. So you need to use whatever PUT method is supported. If you're using the boto3 client for s3, that would be put_object().

On Fri, Oct 9, 2020, 3:37 PM Miguel Rodriguez <mig...@cleancode.llc> wrote:
I think I got the package working now. So I'm trying to grab an excel sheet from my s3 bucket,
manipulate it with openpyxl and enter some data values from pandas then save the next excel sheet to my s3 bucket with a pdf copy.
Would you happen to know how to save the new excel file into my s3 bucket?

On Friday, October 9, 2020 at 1:17:12 PM UTC-5 jeff wrote:

Miguel Rodriguez

unread,
Oct 9, 2020, 7:05:19 PM10/9/20
to openpyxl-users
Yes I am using boto3 and using the put_object function but my problem is that i read in an excel and manipulate it but don't know how to save the work I've done and then pass that new file to the s3 bucket.
Reply all
Reply to author
Forward
0 new messages