Building a Tensorflow.keras.Model for GPT-J

66 views
Skip to first unread message

Joy Lunkad

unread,
Sep 11, 2021, 11:25:28 AM9/11/21
to Keras-users
I have built an exact replica of GPT-J in Tensorflow. But the weights released by EleutherAI are sharded and I cannot figure out how to load the weights into the model. 

I would like to learn how to load weights, what should I learn?

Sayak Paul

unread,
Sep 11, 2021, 11:45:01 AM9/11/21
to Joy Lunkad, Keras-users
Hi Joy,

Congratulations on accomplishing that! Usually, there's no fixed set of rules for loading weights from models of other formats because structure-wise there are lots of variations. While I cannot provide an exact solution for your problem but I can provide a reference that might be helpful:


Sayak Paul | sayak.dev



On Sat, Sep 11, 2021 at 8:55 PM Joy Lunkad <joyl...@gmail.com> wrote:
I have built an exact replica of GPT-J in Tensorflow. But the weights released by EleutherAI are sharded and I cannot figure out how to load the weights into the model. 

I would like to learn how to load weights, what should I learn?

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/9a5d7239-2766-4079-b52c-a7601a3238ddn%40googlegroups.com.

Joy Lunkad

unread,
Sep 11, 2021, 12:06:31 PM9/11/21
to Keras-users
Looks great, I will check if GPT-J's weights look like that too. Instead of a single npz file, GPT-J's weights are sharded but it does look promising.
Reply all
Reply to author
Forward
0 new messages