Is the Scratchpad Implementation Using a LUT Standard?

33 views
Skip to first unread message

Doug Meyer

unread,
Oct 10, 2017, 8:57:09 AM10/10/17
to linux-ntb
Gents,

As I continue to gain understanding about the NTB code, I am wondering about the use and requirements surrounding scratchpads, at least as I see things in the Switchtec code.

In there, I see that a LUT (LUT0) is used, the size of the LUTs apparently being hard-coded to 64 KiB, for a shared memory window (struct shared_mw) which contains an array of 128 u32 for the scratchpad. Also, it appears that this is what is used both to determine link status and to pass memory address/size information between hosts (ports, peers). I apologize if I have that wrong. Please correct me.

The goal here is just to gain understanding... learn about required APIs vs philosophical decisions vs convenience, etc.

My questions are whether this is a fixture of the NTB architecture, or if this is a convenience to support something else (the latter being a requirement)?

In particular,

But most importantly, I'm wondering about struct shared_mw. Could the Switchtec message registers have been used?
What do people think about how this technique scales when there are more than two peers? Obviously LUTs are a precious resource, and a LUT per peer shared_mw is expensive. A LUT broken up into many shared_mw is a possibility, though there is always risk of trashing stuff.
Also, if the LUTs need to be much larger (for application use), then a large chunk of the BAR space could be used for a relatively small structure.
I'd love to hear anyone's thoughts on this.

As an aside, I'm curious why the LUT size is 64 KiB? Was this just a nice number as a starting point?

Thanks again, folks.

Blessings,
Doug

Allen Hubbe

unread,
Oct 10, 2017, 10:17:58 AM10/10/17
to Doug Meyer, linux-ntb
From: Doug Meyer
> Gents,
>
> As I continue to gain understanding about the NTB code, I am wondering about the use and requirements
> surrounding scratchpads, at least as I see things in the Switchtec code.
>
> In there, I see that a LUT (LUT0) is used, the size of the LUTs apparently being hard-coded to 64 KiB,
> for a shared memory window (struct shared_mw) which contains an array of 128 u32 for the scratchpad.
> Also, it appears that this is what is used both to determine link status and to pass memory
> address/size information between hosts (ports, peers). I apologize if I have that wrong. Please
> correct me.

The struct shared_mw is unique to the Switchtec driver, and your interpretation of its mechanism matches my understanding.

Other drivers determine link state from the hardware, and only expose scratchpads if they are implemented in hardware.

> The goal here is just to gain understanding... learn about required APIs vs philosophical decisions vs
> convenience, etc.
>
> My questions are whether this is a fixture of the NTB architecture, or if this is a convenience to
> support something else (the latter being a requirement)?
>
> In particular,
>
> But most importantly, I'm wondering about struct shared_mw. Could the Switchtec message registers have
> been used?

About a year ago when Serge joined the team, we spent a while trying to unify the message and scratchpad api. At the same time, there is a preference to keep ntb.h very light and expose the hardware functionality as directly as possible. We decided to split the apis for scratchpads and message registers. For hardware that supports message registers, it should expose those via the message api.

What this currently implies is that the next layer up driver needs to work with either scratchpads or message registers. If a driver only works with spads, then it is not portable. I would like there to be some library code added to the common ntb bus driver to help with that. Serge is currently making changes to the ntb_transport driver to support multi-port and message registers on IDT. It may only work for the transport driver at first, but I have some hope that it could be transformed into library code.

> What do people think about how this technique scales when there are more than two peers? Obviously

That limitation was stated upfront with that driver submission. The Switchtec driver only works with two nodes for now.

Doug Meyer

unread,
Oct 10, 2017, 11:52:06 AM10/10/17
to linux-ntb
Dear Allen,

Thank you for your reply and additional information.

Two things:
* I've added a verbose comment to your words about messages and scratchpads, because it dovetails with a discussion I had over here yesterday.

I do need to pursue the crux of my initial post further, which is mostly focused on the bottom part (extension of switchtec).

On Tuesday, October 10, 2017 at 7:17:58 AM UTC-7, Allen Hubbe wrote:
From: Doug Meyer
> Gents,
>
[... elided by dmeyer]

>
> But most importantly, I'm wondering about struct shared_mw. Could the Switchtec message registers have
> been used?

About a year ago when Serge joined the team, we spent a while trying to unify the message and scratchpad api.  At the same time, there is a preference to keep ntb.h very light and expose the hardware functionality as directly as possible.  We decided to split the apis for scratchpads and message registers.  For hardware that supports message registers, it should expose those via the message api.  

What this currently implies is that the next layer up driver needs to work with either scratchpads or message registers.  If a driver only works with spads, then it is not portable.  I would like there to be some library code added to the common ntb bus driver to help with that.  Serge is currently making changes to the ntb_transport driver to support multi-port and message registers on IDT.  It may only work for the transport driver at first, but I have some hope that it could be transformed into library code.

This is a very interesting couple paragraphs. Any my reply here could be painful for you and the team because I'm so new to this...

Yesterday I had a discussion about where the switch resources would be managed/tracked, versus where the resource requests were coming from. In some ways it looks like an MVC architecture (I could really be wrong here). To expand on that, ntb.ko exports the abstracted View of the switch as well as the Control API. Clients of ntb.ko (e.g. ntb_transport) use that and export their own View and Control APIs. In this case, the switch's Model is in ntb.ko, with the hardware-specific plug-ins supporting that Model directly by any means necessary.

What I think that implies is that ntb.c manages the model's resources, but the hardware plug-in needs to be sophisticated enough with switch hardware resource management strategies (not just message registers and scratchpads, but LUTs, direct-mapped windows, doorbells, etc.), a knowledge of what can and can't be dynamically configured in the switch, and what has been reserved during probe/init for infrastructure, so that it has a decent arsenal of ways to try to fulfill whatever is being asked from above.

Given that: whether the configuration is truly static (resources allocated and fixed at start-of-day and never changed), or dynamic with hot-plug events, application or other fabric management configuring and freeing shared memory, perhaps opening and closing windows to large NVMe memory space, or who knows what, it is below ntb.ko that has to try to fulfill each ask, but it is above ntb,ko to manage the overall PCIe fabric presented by the Views on all the connected peers.

The ramifications of this would be that ntb.h (by that I think you mean the upward-facing API for all ntb.ko clients) is the View and Control abstraction, and it can hopefully stay thin, with the cost being that ntb.c and the hardware plug-ins probably get "fat" because they need to not just manage the Model, but pick up all the slack between the abstracted API and the hardware.

Am I even remotely close?

Also, regarding your portability requirement. On one hand, it sounds like messages and scratchpads have distinct properties that make it advantageous to expose separately in ntb.h, and yet the portability requirement seems to want to abstract them as a single interface. Can you offer up any further thoughts on this, please?


 
> What do people think about how this technique scales when there are more than two peers? Obviously

That limitation was stated upfront with that driver submission.  The Switchtec driver only works with two nodes for now.

So this section of my email is where I am really hoping to gain understanding...

I realize that the Switchtec driver has the two-peer limitation, and I apologize that I did not make it clear that the context here is my work to remove that limitation (in increments). My current task is to move switchtec and ntb-hw-switchtec (by hook or by crook) to support up to four peers. 

I can achieve this in a brute-force way, which I may have to do with the time allotted for the task, but my hope is to understand what I assume is a lot of good thought that went into the current design. By gaining understanding, I may be able to choose a near-term approach which is not entirely throw-away code, while also having understanding to be careful to retain parts of the design which you all have deemed essential (or at least preferred). In addition, perhaps some thought had/has already been given to how the current design would be extended, and so this design was chosen. I don't know, so I'm asking.

This is the only way I know to begin engaging on this. If there is a better or preferred approach, please let me know.

All of the following observations were my attempt to have folks give me some understanding if this approach is necessary, sufficient, or something else, as well as to see what thought has been given to scaling up the peers. I should have been explicit (sorry). I'm guessing y'all have thought about where you'd like to see the design go and where you see issues.

For now, I'm likely to merely create an array of shared_mw based on partition number and restrict the Switchtec configuration to use only the first four partitions. But if somebody says "I was thinking that this would actually be best done as Y, and it's probably little more work than your near-term hack" then I'm sure going to listen!

> LUTs are a precious resource, and a LUT per peer shared_mw is expensive. A LUT broken up into many
> shared_mw is a possibility, though there is always risk of trashing stuff.
> Also, if the LUTs need to be much larger (for application use), then a large chunk of the BAR space
> could be used for a relatively small structure.
> I'd love to hear anyone's thoughts on this.
>
> As an aside, I'm curious why the LUT size is 64 KiB? Was this just a nice number as a starting point?
>
> Thanks again, folks.

Thanks again (and again).

Blessings,
Doug

lsgun...@gmail.com

unread,
Oct 11, 2017, 12:19:21 PM10/11/17
to linux-ntb

Obviously LUTs are a precious resource, and a LUT per peer shared_mw is expensive. A LUT broken up into many shared_mw is a possibility, though there is always risk of trashing stuff.

Well, as far as I know, switchtec is the only hardware with LUTs. But they aren't that precious a resource. If I remember correctly, switchtec supports up to 128 of them and up to 48 separate partitions so you could easily have 2 LUTs per peer and still have plenty left over to do other things with. And as far as we know now, there is no other use for LUT windows. Thus, I would say, using one per peer is perfectly acceptable.
 
On Tuesday, October 10, 2017 at 9:52:06 AM UTC-6, Doug Meyer wrote:
Also, regarding your portability requirement. On one hand, it sounds like messages and scratchpads have distinct properties that make it advantageous to expose separately in ntb.h, and yet the portability requirement seems to want to abstract them as a single interface. Can you offer up any further thoughts on this, please?

Yes, scratchpads and messages are distinct enough that you can't shoe-horn messages into the scratchpad api (I tried this a long time ago). In the end, drivers will really just need some interface to say "transfer this information to peer X" and another interface for the peer to receive the data. I agree with Allen in that we need a library to accomplish this based on what the hardware provides through the NTB api. As it currently is there's a lot of duplication in ntb_transport and ntb_perf for this. 


For now, I'm likely to merely create an array of shared_mw based on partition number and restrict the Switchtec configuration to use only the first four partitions. But if somebody says "I was thinking that this would actually be best done as Y, and it's probably little more work than your near-term hack" then I'm sure going to listen!

This was my long term plan too: create one shared_mw per peer. Though, I see no reason to restrict the implementation to 4 partitions. Creating an N-way mapping shouldn't be much harder than a 4-way mapping.

Thanks for your work on this,

Logan

Serge Semin

unread,
Oct 11, 2017, 1:41:08 PM10/11/17
to lsgun...@gmail.com, linux-ntb
On Wed, Oct 11, 2017 at 09:19:20AM -0700, lsgun...@gmail.com <lsgun...@gmail.com> wrote:
>
>
> > Obviously LUTs are a precious resource, and a LUT per peer shared_mw is
> > expensive. A LUT broken up into many shared_mw is a possibility, though
> > there is always risk of trashing stuff.
>
>
> Well, as far as I know, switchtec is the only hardware with LUTs. But they
> aren't that precious a resource. If I remember correctly, switchtec
> supports up to 128 of them and up to 48 separate partitions so you could
> easily have 2 LUTs per peer and still have plenty left over to do other
> things with. And as far as we know now, there is no other use for LUT
> windows. Thus, I would say, using one per peer is perfectly acceptable.
>

You are mistaken. IDT PCIe switch have LUT. There are 24 of them available at
the current hardware for each NTB. All of them are available to be used over
IDT NTB driver as well as all MWs with direct address translation.
Additionally It is true to say about just one LUT entry usage per peer, only if
you got just two NTB-ports available per device. What if you got eight or even more
of them? In this case you'd need to have as many LUT entries reserved as there are
peers available to have connections to. In this case reservation doesn't seem so
inexpensive. As far as I can see from the switchtec brief documents, there can be
up to 48 NTBs per device. So in this case we'd loose 48 MWs just to have some
stupid scratchpads, which is, according to your words, a third part of all the LUT
MWs available to a port.

> On Tuesday, October 10, 2017 at 9:52:06 AM UTC-6, Doug Meyer wrote:
> >
> > Also, regarding your portability requirement. On one hand, it sounds like
> > messages and scratchpads have distinct properties that make it advantageous
> > to expose separately in ntb.h, and yet the portability requirement seems to
> > want to abstract them as a single interface. Can you offer up any further
> > thoughts on this, please?
> >
>
> Yes, scratchpads and messages are distinct enough that you can't shoe-horn
> messages into the scratchpad api (I tried this a long time ago). In the
> end, drivers will really just need some interface to say "transfer this
> information to peer X" and another interface for the peer to receive the
> data. I agree with Allen in that we need a library to accomplish this based
> on what the hardware provides through the NTB api. As it currently is
> there's a lot of duplication in ntb_transport and ntb_perf for this.
>

Such library can be partly unpinned from my implementation of the ntb_perf driver,
which is currently at debug stage on Dave's Intel hardware (see the patchset in the
mailing list or here https://github.com/fancer/ntb).
There is a service subsystem in there, which encapsulates the ntb-messages and
ntb-scratchpads usage to setup the memory windows.
Everyone, who promised to implement such a library, can use the new ntb_perf driver
as a reference, obviously when we finally finished debugging it.
And yes, there is no good way to simulate the NTB-scartchpads using the
NTB-messaging. I tried it at my first attempt to develop the IDT NTB driver.
These interfaces are too different. And I'd say we shouldn't do it, since
they are the hardware specifics, which must be reflected by NTB API.

>
> > For now, I'm likely to merely create an array of shared_mw based on
> > partition number and restrict the Switchtec configuration to use only the
> > first four partitions. But if somebody says "I was thinking that this would
> > actually be best done as Y, and it's probably little more work than your
> > near-term hack" then I'm sure going to listen!
> >
>
> This was my long term plan too: create one shared_mw per peer. Though, I
> see no reason to restrict the implementation to 4 partitions. Creating an
> N-way mapping shouldn't be much harder than a 4-way mapping.
>

As I said before. It is not much harder to create switchtec N-ports hardware
driver then make it just for four ports. You'll need to have the ports descriptors
array in any case. I did the same in the IDT NTB hardware driver. You can check it
out in the code.

Regards,
-Sergey

> Thanks for your work on this,
>
> Logan
>
> --
> You received this message because you are subscribed to the Google Groups "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to linux-ntb+...@googlegroups.com.
> To post to this group, send email to linu...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/linux-ntb/38ffd174-b66f-4077-8cdc-679e412242fc%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Logan Gunthorpe

unread,
Oct 11, 2017, 2:03:07 PM10/11/17
to Serge Semin, linux-ntb
On 11/10/17 11:41 AM, Serge Semin wrote:

> You are mistaken. IDT PCIe switch have LUT. There are 24 of them available at
> the current hardware for each NTB. All of them are available to be used over
> IDT NTB driver as well as all MWs with direct address translation.
> Additionally It is true to say about just one LUT entry usage per peer, only if
> you got just two NTB-ports available per device. What if you got eight or even more
> of them? In this case you'd need to have as many LUT entries reserved as there are
> peers available to have connections to. In this case reservation doesn't seem so
> inexpensive. As far as I can see from the switchtec brief documents, there can be
> up to 48 NTBs per device. So in this case we'd loose 48 MWs just to have some
> stupid scratchpads, which is, according to your words, a third part of all the LUT
> MWs available to a port.
Ah, fair, I wasn't aware the IDT had LUT support. Using up a third of
the LUT windows
does not seem like a problem at all considering we have no other use for
them.
Note: the switchtec code also used the shared_mw for link management,
not just
providing scratchpads. But once the upper layers are reworked to not
require
scratchpads I'd be fine with removing them from the shared_mw.

Logan

D Meyer

unread,
Oct 11, 2017, 2:23:10 PM10/11/17
to Serge Semin, lsgun...@gmail.com, linux-ntb
On Wed, Oct 11, 2017 at 10:41 AM, Serge Semin <fancer...@gmail.com> wrote:
> On Wed, Oct 11, 2017 at 09:19:20AM -0700, lsgun...@gmail.com <lsgun...@gmail.com> wrote:
>>
>>
>> > Obviously LUTs are a precious resource, and a LUT per peer shared_mw is
>> > expensive. A LUT broken up into many shared_mw is a possibility, though
>> > there is always risk of trashing stuff.
>>
>>
>> Well, as far as I know, switchtec is the only hardware with LUTs. But they
>> aren't that precious a resource. If I remember correctly, switchtec
>> supports up to 128 of them and up to 48 separate partitions so you could
>> easily have 2 LUTs per peer and still have plenty left over to do other
>> things with. And as far as we know now, there is no other use for LUT
>> windows. Thus, I would say, using one per peer is perfectly acceptable.
>>
>
> You are mistaken. IDT PCIe switch have LUT. There are 24 of them available at
> the current hardware for each NTB. All of them are available to be used over
> IDT NTB driver as well as all MWs with direct address translation.
> Additionally It is true to say about just one LUT entry usage per peer, only if
> you got just two NTB-ports available per device. What if you got eight or even more
> of them? In this case you'd need to have as many LUT entries reserved as there are
> peers available to have connections to. In this case reservation doesn't seem so
> inexpensive. As far as I can see from the switchtec brief documents, there can be
> up to 48 NTBs per device. So in this case we'd loose 48 MWs just to have some
> stupid scratchpads, which is, according to your words, a third part of all the LUT
> MWs available to a port.

Gents,

First, thanks to Logan and Serge for lots of great information. Y'all
manage to always remind me I'm drinking from a firehose! ;-)

Serge is thinking along the same lines as I am.

Regarding Switchtec, the 96xG3 part has 512 LUTs per Stack and a stack
can have up to eight NTBs. So for that chip, if I'm starting to grasp
this stuff, a machine with 48 NTBs would mean that on a single Stack,
8 (hosts) x 47 (peers) = 376 LUTs would be used up for each of the 8
to have a shared_mw in its own LUT.

Also, the current hard-coded LUT size is 64 KiB, but that can't remain
that way to have flexibility. If the LUTs were, say 16 MiB, then the
shred_mw LUTs use up a massive part of the BAR space.

Oh... A quick side question: For swithtec, I see that the number of
available LUTs for a BAR is read out of the chip and then rounded down
to a power of two... I'm curious why it's rounded down?

Blessings,
Doug

Logan Gunthorpe

unread,
Oct 11, 2017, 2:45:00 PM10/11/17
to D Meyer, Serge Semin, linux-ntb
On 11/10/17 12:23 PM, D Meyer wrote:
> Regarding Switchtec, the 96xG3 part has 512 LUTs per Stack and a stack
> can have up to eight NTBs. So for that chip, if I'm starting to grasp
> this stuff, a machine with 48 NTBs would mean that on a single Stack,
> 8 (hosts) x 47 (peers) = 376 LUTs would be used up for each of the 8
> to have a shared_mw in its own LUT.
Yes, there are a bunch of annoying restrictions like that, but your
example looks correct.
There are lots of LUTs to play around with. The bigger restriction is
the direct windows, of
which you only have 2 per port. Creating a network in ntb_transport (et
al) to communicate across
48 partitions is going to be a very hard problem to solve.
> Also, the current hard-coded LUT size is 64 KiB, but that can't remain
> that way to have flexibility. If the LUTs were, say 16 MiB, then the
> shred_mw LUTs use up a massive part of the BAR space.
Yes, all LUTs must be the same size. Plus the LUT space comes before the
direct
window space in the BAR. So the alignment (and therefore maximum size)
of the direct window space
depends on the size and the number of LUTs. I believe I chose 32 64k
luts so that the
direct window aligns to 2MB. I originally had the size set to 4k, but
this limited the alignment
of the direct window.

So the tradeoffs are: if you increase it you waste memory for LUTs that
don't need the extra space,
and if you decrease it you limit the size of the direct window.
> Oh... A quick side question: For swithtec, I see that the number of
> available LUTs for a BAR is read out of the chip and then rounded down
> to a power of two... I'm curious why it's rounded down?
This has to do with the alignment I mentioned above. If the number of
LUTs is not a power of two,
the direct window won't be nicely aligned and you get some annoying
restrictions on it's size.

Logan

D Meyer

unread,
Oct 12, 2017, 2:05:36 PM10/12/17
to Logan Gunthorpe, Serge Semin, linux-ntb
Dear Logan,

Thanks so much for the very, very helpful reply!

On Wed, Oct 11, 2017 at 11:44 AM, Logan Gunthorpe <lsgun...@gmail.com> wrote:
> On 11/10/17 12:23 PM, D Meyer wrote:
>>
>> Regarding Switchtec, the 96xG3 part has 512 LUTs per Stack and a stack
>> can have up to eight NTBs. So for that chip, if I'm starting to grasp
>> this stuff, a machine with 48 NTBs would mean that on a single Stack,
>> 8 (hosts) x 47 (peers) = 376 LUTs would be used up for each of the 8
>> to have a shared_mw in its own LUT.
>
> Yes, there are a bunch of annoying restrictions like that, but your example
> looks correct.
> There are lots of LUTs to play around with. The bigger restriction is the
> direct windows, of
> which you only have 2 per port. Creating a network in ntb_transport (et al)
> to communicate across
> 48 partitions is going to be a very hard problem to solve.

Regarding your comment about using direct windows for ntb_transport, I
don't understand why ntb_transport has to use a direct window.
Couldn't it (or some other client) use one or more LUTs? I don't see a
restriction/difference in the Switchtec specification that would
prevent that from being done.

It looks like the current default max direct memory window size is 2
MiB, which could be a LUT (If that was the LUT size). Having a LUT
size of 2 MiB doesn't seem like any sort of reach at all... in
previous hardware that I worked with, we had BARs that were very, very
large. Having 32 LUTs * 2 MiB isn't big when you consider that GPCPU
configurations are using BARs larger than 4 GiB.

Over a year ago, PCI-SIG proposed changing the BAR maximum to 2^63 to
permit complete access the entire space, they noted:
> Currently, limiting the resizable BARs to 512 GB means resources are either: a) simply not allocated and left out of the system,
> or b) forced to report a smaller aperture in order to be allocated, but that aperture size is not optimal in all uses of the product
> and may cause software to need to “bank” or otherwise move the aperture at runtime.

Imagine the case where many different clients are using different
capabilities in the fabric. NVMe, GPUs, ntb_transport and it's kin,
etc. As far as I can tell, LUTs are a great way to be able to allocate
chunks of BAR space and set up mappings on an as-needed basis.

Also, as far as I understand the Switchtec part, I can reconfigure the
LUT number and size as well as the direct window size (as long as
nothing is currently using them). That's nice too, though it would be
awesome to eventually have a switch that supports richer features that
give greater flexibility as well as dynamic reconfiguration of memory
windows (ya gotta dream!).

Blessings,
Doug

D Meyer

unread,
Oct 12, 2017, 2:31:07 PM10/12/17
to Logan Gunthorpe, Serge Semin, linux-ntb
Dear Logan, everyone,

It seems like there are LOTs of uses for LUTs.

Imagine accessing PiB of storage that uses memory semantics (early
last year PCI-SIG noted a vendor selling an endpoint capable of 2 TB),
a sea of GPGPU cards, etc. in a heterogeneous cloud computing
environment, and wanting to control access to windows of that stuff
being used by multiple applications running in one or more hosts. A
small number of windows are not sufficient. LUTs provide a good method
for being able to establish long-lasting as well as ephemeral
translations and, in conjunction with IOMMU, PCIe's ACS, etc. as well
as various QoS features could provide a good start on data protection
and client performance.

This much bigger than ntb_transport to support networks with some
NVMe drives and GPU endpoints. Fun!

LUTs aren't an afterthought. I think they're really valuable.

Blessngs,
Doug

Logan Gunthorpe

unread,
Oct 12, 2017, 2:34:00 PM10/12/17
to D Meyer, Serge Semin, linux-ntb
On 12/10/17 12:05 PM, D Meyer wrote:
> Regarding your comment about using direct windows for ntb_transport, I
> don't understand why ntb_transport has to use a direct window.
> Couldn't it (or some other client) use one or more LUTs? I don't see a
> restriction/difference in the Switchtec specification that would
> prevent that from being done.
Yes, true. Though for performance then you'd need the LUT windows to be
larger.
> It looks like the current default max direct memory window size is 2
> MiB, which could be a LUT (If that was the LUT size). Having a LUT
> size of 2 MiB doesn't seem like any sort of reach at all... in
> previous hardware that I worked with, we had BARs that were very, very
> large. Having 32 LUTs * 2 MiB isn't big when you consider that GPCPU
> configurations are using BARs larger than 4 GiB.
True. You just get tripped up slightly with all the LUTs having to be
the same size.

> Imagine the case where many different clients are using different
> capabilities in the fabric. NVMe, GPUs, ntb_transport and it's kin,
> etc. As far as I can tell, LUTs are a great way to be able to allocate
> chunks of BAR space and set up mappings on an as-needed basis.
Yeah, we are a _long_ way off from having multiple clients use the
resources.
Also, you'll find most kernel developers (especially the big names) will
argue
against making decisions on an imagined future. Your presumptions will
more than likely be wrong and you'll have wasted everyone's time. Code
should
be written for today's needs and if someone comes up with some crazy use for
NTB it's their responsibility to figure out how to change the code to
handle it and
justify the extra complexities to the community.

> Also, as far as I understand the Switchtec part, I can reconfigure the
> LUT number and size as well as the direct window size (as long as
> nothing is currently using them). That's nice too, though it would be
> awesome to eventually have a switch that supports richer features that
> give greater flexibility as well as dynamic reconfiguration of memory
> windows (ya gotta dream!).

Yeah, I also ran up against a bunch of gotchas when dealing with LUT
configuration.
There are a number of things that could make it nicer on software
developers but
I wouldn't hold my breath for them happening.

Logan
> Yeah,

Logan Gunthorpe

unread,
Oct 12, 2017, 2:38:38 PM10/12/17
to D Meyer, Serge Semin, linux-ntb
On 12/10/17 12:31 PM, D Meyer wrote:
> Imagine accessing PiB of storage that uses memory semantics (early
> last year PCI-SIG noted a vendor selling an endpoint capable of 2 TB),
> a sea of GPGPU cards, etc. in a heterogeneous cloud computing
> environment, and wanting to control access to windows of that stuff
> being used by multiple applications running in one or more hosts. A
> small number of windows are not sufficient. LUTs provide a good method
> for being able to establish long-lasting as well as ephemeral
> translations and, in conjunction with IOMMU, PCIe's ACS, etc. as well
> as various QoS features could provide a good start on data protection
> and client performance.
>
> This much bigger than ntb_transport to support networks with some
> NVMe drives and GPU endpoints. Fun!
>
> LUTs aren't an afterthought. I think they're really valuable.
See my comment in my previous email. Until there's some published code
(preferably upstream) that uses them it's not worth assuming what those
uses might be and giving yourself arbitrary restrictions that may never
come
to pass. The market may never even want what you are imagining.

Logan


Reply all
Reply to author
Forward
0 new messages