Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

[PATCH] arch/tile: new multi-core architecture for Linux

135 views

Skip to first unread message

Chris Metcalf

unread,

May 20, 2010, 2:10:01 AM5/20/10

At Tilera we have been running Linux 2.6.26 on our architecture for a
while and distributing the sources to our customers. We just sync'ed up
our sources to 2.6.34 and would like to return it to the community more
widely, so I'm hoping to take advantage of the merge window for 2.6.35
to integrate support for our architecture.

The "tile" architecture supports the Tilera chips, both our current
32-bit chips and our upcoming 64-bit architecture. The chips are
multicore, with 64 (or 36) cores per chip on our current product line,
and up to 100 cores on the upcoming 64-bit architecture. They also
include multiple built-in memory controllers, 10 Gb Ethernet, PCIe,
and a number of other I/Os. There's more info at http://www.tilera.com.

The architecture is somewhat MIPS-like, but VLIW, with up to three
instructions per bundle. The system architecture is nicely orthogonal,
with four privilege levels that can be assigned to each of forty-odd
separate protection domains, many with an associated interrupt, e.g.
ITLB/DTLB misses, timer, performance counters, various interrupts
associated with the generic networks that connect the cores, etc.
A hypervisor (kind of like the Alpha PAL) runs at a higher privilege
level to support Linux via software-interrupt calls.

The Linux we ship has some additional performance and functionality
customization in the generic code, but appended is the patch that just
adds the minimum amount of functionality into the platform-independent
code to hook in the tile architecture code in arch/tile. We will
attempt to push the other changes to the platform-independent code
piece by piece, after the initial architecture support is in.
We will also push up the 64-bit TILE-Gx support once that architecture
is fully frozen (e.g. instruction encodings finalized).

We are using the http://www.tilera.com/scm/ web site to push
Tilera-modified sources back up to the community. At the moment, the
arch/tile hierarchy is there (as a bzipped tarball) as well as a copy
of the patch appended to this email. In addition, our gcc, binutils,
and gdb sources are available on the web site. We have not yet started
the community return process for gcc, binutils, and gdb, so they are in
a preliminary form at this point.

The git://www.tilera.com server is up, but without content yet, since
we realized this week that we need to upgrade the web server to
a 64-bit kernel to support a decent git server, so though we plan to
make the code available via git in the future, it isn't yet.

As far as the platform-independent changes go, two of the changes in the
appended patch are uncontroversial, one adding a stanza to MAINTAINERS,
and one adding a line to drivers/pci/Makefile to request "setup-bus.o
setup-irq.o" for tile PCI.

A slightly more interesting one-line change is to <linux/mm.h>,
to support lowmem PAs above the 4GB limit. We use NUMA to manage
the multiple memory controllers attached to the chip, and map some of
each controller into kernel LOWMEM to load-balance memory bandwidth for
kernel-intensive apps. The controllers can each manage up to 16GB, so we
use bits above the 4GB limit in the PA to indicate the controller number.
It turns out that generic Linux almost tolerates this, but requires one
cast in lowmem_page_address() to avoid shifting the high PA bits out of
a 32-bit PFN type.

The final change is just a PCI quirk for our TILEmpower platform, which
explains itself in the comment. This is not a critical change from our
point of view, but without it you can't use the SATA disks attached to
the PCI controller on that platform, so we're hoping it can be accepted
as part of the initial tile architecture submission as well.

I'd appreciate being cc'ed on any comments on the patch or the tile
architecture support, since although I try to follow LKML, the volume
can be somewhat overwhelming.

--- linux-2.6.34/MAINTAINERS 2010-05-16 17:17:36.000000000 -0400
+++ tilera-source/MAINTAINERS 2010-05-17 18:00:12.651112000 -0400
@@ -5436,6 +5436,12 @@
S: Maintained
F: sound/soc/codecs/twl4030*

+TILE ARCHITECTURE
+M: Chris Metcalf <cmet...@tilera.com>
+W: http://www.tilera.com/scm/
+S: Supported
+F: arch/tile/
+
TIPC NETWORK LAYER
M: Jon Maloy <jon....@ericsson.com>
M: Allan Stephens <allan.s...@windriver.com>
--- linux-2.6.34/include/linux/mm.h 2010-05-16 17:17:36.000000000 -0400
+++ tilera-source/include/linux/mm.h 2010-05-17 12:54:33.540145000 -0400
@@ -592,7 +592,7 @@

static __always_inline void *lowmem_page_address(struct page *page)
{
- return __va(page_to_pfn(page) << PAGE_SHIFT);
+ return __va((phys_addr_t)page_to_pfn(page) << PAGE_SHIFT);
}

#if defined(CONFIG_HIGHMEM) && !defined(WANT_PAGE_VIRTUAL)
--- linux-2.6.34/drivers/pci/Makefile 2010-05-09 21:36:28.000000000 -0400
+++ tilera-source/drivers/pci/Makefile 2010-05-13 15:03:05.615238000 -0400
@@ -49,6 +49,7 @@
obj-$(CONFIG_X86_VISWS) += setup-irq.o
obj-$(CONFIG_MN10300) += setup-bus.o
obj-$(CONFIG_MICROBLAZE) += setup-bus.o
+obj-$(CONFIG_TILE) += setup-bus.o setup-irq.o

#
# ACPI Related PCI FW Functions
--- linux-2.6.34/drivers/pci/quirks.c 2010-05-16 17:17:36.000000000 -0400
+++ tilera-source/drivers/pci/quirks.c 2010-05-17 13:26:22.347178000 -0400
@@ -2094,6 +2094,23 @@
quirk_unhide_mch_dev6);

+/*
+ * The Tilera Blade V1.0 platform needs to set the link speed
+ * to 2.5GT(Giga-Transfers)/s (Gen 1). The default link speed
+ * setting is 5GT/s (Gen 2). 0x98 is the Link Control2 PCIe
+ * capability register of the PEX8624 PCIe switch. The switch
+ * supports link speed auto negotiation. This is expected to
+ * be fixed in the next release of the Blade platform.
+ */
+static void __devinit quirk_tile_blade(struct pci_dev *dev)
+{
+ if (blade_pci) {
+ pci_write_config_dword(dev, 0x98, 0x1);
+ mdelay(50);
+ }
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_PLX, 0x8624, quirk_tile_blade);
+
#ifdef CONFIG_PCI_MSI
/* Some chipsets do not support MSI. We cannot easily rely on setting
* PCI_BUS_FLAGS_NO_MSI in its bus flags because there are actually
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Barry Song

unread,

May 20, 2010, 4:10:02 AM5/20/10

Here doesn't make sense. you give a u64 type cast, but change the
meaning of pfn. Is pfn phys_addr_t? Anyway, page_to_pfn can be
re-fulfilled in your arch, but not change it in common code.

Your patch is not compilable, and the subject doesn't match well with
the content. I think you need re-organize patches.

Linus Torvalds

unread,

May 20, 2010, 10:40:03 AM5/20/10

On Thu, 20 May 2010, Barry Song wrote:

> On Thu, May 20, 2010 at 1:43 PM, Chris Metcalf <cmet...@tilera.com> wrote:
> >
> > static __always_inline void *lowmem_page_address(struct page *page)
> > {
> > - return __va(page_to_pfn(page) << PAGE_SHIFT);
> > + return __va((phys_addr_t)page_to_pfn(page) << PAGE_SHIFT);
>
> Here doesn't make sense. you give a u64 type cast, but change the
> meaning of pfn. Is pfn phys_addr_t? Anyway, page_to_pfn can be
> re-fulfilled in your arch, but not change it in common code.

No, it actually makes a lot of sense.

The PFN may well be 32-bit, but then shifting it by PAGE_SHIFT turns the
PFN from a PFN to a physical address. So the cast makes sense as a way to
make sure that the code allows a 32-bit PFN with a 64-bit physical
address.

So I don't thionk there's anything tile-specific about it, and it looks
like a sane patch. If anything, it might make some sense to make this an
explicit thing, ie have a "pfn_to_phys()" helper, because there's a _lot_
of these things open-coded.

And some of them actually have the cast already. See for example

#define pfn_to_nid(pfn) pa_to_nid(((u64)(pfn) << PAGE_SHIFT))

in the alpha <asm/mmzone.h>. Also:

resource_size_t offset = ((resource_size_t)pfn) << PAGE_SHIFT;

in the powerpc PCI code, of

#define page_to_phys(page) ((dma_addr_t)page_to_pfn(page) << PAGE_SHIFT)

in the x86 io code.

In fact, UM has that "pfn_to_phys()" helper already (and has a (phys_t)
cast).

So we do already have a lot of casts (just grep for "pfn.*<<.*SHIFT" and
you'll see them in generic code already, and the new one for tile makes
100% sense. In fact, we should clean up the existing ones.

Linus

Chris Metcalf

unread,

May 20, 2010, 3:20:02 PM5/20/10

On 5/20/2010 1:04 AM, Barry Song wrote:
> On Thu, May 20, 2010 at 1:43 PM, Chris Metcalf <cmet...@tilera.com> wrote:
>
>> --- linux-2.6.34/include/linux/mm.h 2010-05-16 17:17:36.000000000 -0400
>> +++ tilera-source/include/linux/mm.h 2010-05-17 12:54:33.540145000 -0400
>> @@ -592,7 +592,7 @@
>>
>> static __always_inline void *lowmem_page_address(struct page *page)
>> {
>> - return __va(page_to_pfn(page) << PAGE_SHIFT);
>> + return __va((phys_addr_t)page_to_pfn(page) << PAGE_SHIFT);
>>
> Here doesn't make sense. you give a u64 type cast, but change the
> meaning of pfn. Is pfn phys_addr_t? Anyway, page_to_pfn can be
> re-fulfilled in your arch, but not change it in common code.

> [...]

> Your patch is not compilable, and the subject doesn't match well with
> the content. I think you need re-organize patches.
>

Where do you see the compilation failure? I tested this with the only
other architecture I have handy (x86_64) and it built OK. And by code
inspection, <linux/mm.h> includes <linux/mm_types.h> includes
<linux/types.h>, which always provides phys_addr_t suitably (based on
CONFIG_PHYS_ADDR_T_64BIT).

In any case, a better solution might be to #include <linux/pfn.h> in
<linux/mm.h> and write this function as:

static __always_inline void *lowmem_page_address(struct page *page)
{

return __va(PFN_PHYS(page_to_pfn(page));
}

Note that PFN_PHYS() is already defined to include the cast to
phys_addr_t. Jeremy Fitzhardinge added the cast in Sep 2008 with a
comment that echoes this discussion:

generic: make PFN_PHYS explicitly return phys_addr_t

PFN_PHYS, as its name suggests, turns a pfn into a physical address.
However, it is a macro which just operates on its argument without
modifying its type. pfns are typed unsigned long, but an unsigned
long may not be long enough to hold a physical address (32-bit systems
with more than 32 bits of physcial address).

Make sure we cast to phys_addr_t to return a complete result.

Linus, does this seem like the right generic answer, or would it make
more sense, as you suggested, to try to provide a new pfn_to_phys()
function in the architecture-independent code?

In any case, in the spirit of providing a complete answer, I'll provide
a proper patch in a following email.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Chris Metcalf

unread,

May 20, 2010, 3:20:02 PM5/20/10

This ensures that platforms with lowmem PAs above 32 bits work
correctly by avoiding truncating the PA during a left shift.

Signed-off-by: Chris Metcalf <cmet...@tilera.com>
---
include/linux/mm.h | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fb19bb9..33bedcf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -13,6 +13,7 @@
#include <linux/debug_locks.h>
#include <linux/mm_types.h>
#include <linux/range.h>
+#include <linux/pfn.h>

struct mempolicy;
struct anon_vma;
@@ -591,7 +592,7 @@ static inline void set_page_links(struct page *page,
enum zone_type zone,

static __always_inline void *lowmem_page_address(struct page *page)
{
- return __va(page_to_pfn(page) << PAGE_SHIFT);

+ return __va(PFN_PHYS(page_to_pfn(page)));
}

#if defined(CONFIG_HIGHMEM) && !defined(WANT_PAGE_VIRTUAL)

Barry Song

unread,

May 21, 2010, 1:00:01 AM5/21/10

On Fri, May 21, 2010 at 3:10 AM, Chris Metcalf <cmet...@tilera.com> wrote:
> On 5/20/2010 1:04 AM, Barry Song wrote:
>> On Thu, May 20, 2010 at 1:43 PM, Chris Metcalf <cmet...@tilera.com> wrote:
>>
>>> --- linux-2.6.34/include/linux/mm.h 2010-05-16 17:17:36.000000000 -0400
>>> +++ tilera-source/include/linux/mm.h 2010-05-17 12:54:33.540145000 -0400
>>> @@ -592,7 +592,7 @@
>>>
>>> static __always_inline void *lowmem_page_address(struct page *page)
>>> {
>>> - return __va(page_to_pfn(page) << PAGE_SHIFT);
>>> + return __va((phys_addr_t)page_to_pfn(page) << PAGE_SHIFT);
>>>
>> Here doesn't make sense. you give a u64 type cast, but change the
>> meaning of pfn. Is pfn phys_addr_t? Anyway, page_to_pfn can be
>> re-fulfilled in your arch, but not change it in common code.
>> [...]
>> Your patch is not compilable, and the subject doesn't match well with
>> the content. I think you need re-organize patches.

Where does the blade_pci symbol come from? "No matches found" by a
grep. Is it exported in your local codes?
Why not just use #if CONFIG_TILE to cover your quirk_tile_blade. And
where is CONFIG_TILE?
I guess all will be explained by your arch codes, but this patch
depends on your arch codes, so it shouldn't be here by itself.

Chris Metcalf

unread,

May 21, 2010, 11:20:01 AM5/21/10

On 5/20/2010 9:52 PM, Barry Song wrote:
> On 5/20/2010 1:04 AM, Barry Song wrote:
>>> Your patch is not compilable, and the subject doesn't match well with
>>> the content. I think you need re-organize patches.
>>>
> Where does the blade_pci symbol come from? "No matches found" by a
> grep. Is it exported in your local codes?
> Why not just use #if CONFIG_TILE to cover your quirk_tile_blade. And
> where is CONFIG_TILE?
>

Oh, I see; I thought you were referring to the other bit of quoted
patch. Yes, this should be guarded by #ifdef TILE_CONFIG - thanks!

> I guess all will be explained by your arch codes, but this patch
> depends on your arch codes, so it shouldn't be here by itself.
>

The arch code is all at

http://www.tilera.com/scm/linux-2.6.34-arch-tile.tar.bz2

I have been reluctant to send it to LKML as an email patch, since it's
270 files, 87 KLoC, about 2.5 MB. I could break it down into multiple
patches (arch/tile/kernel/, arch/tile/mm, arch/tile/lib, etc.).

I solicit opinions from the community as to what is the best approach :-)

Meanwhile I'll resend the original platform-independent changes (the
MAINTAINER stanza, one line in the PIC Makefile, and the quirk change)
as updated git-am patches.

Chris Metcalf

unread,

May 22, 2010, 12:10:01 AM5/22/10

On 5/19/2010 10:43 PM, Chris Metcalf wrote:
> At Tilera we have been running Linux 2.6.26 on our architecture for a
> while and distributing the sources to our customers. We just sync'ed up
> our sources to 2.6.34 and would like to return it to the community more
> widely, so I'm hoping to take advantage of the merge window for 2.6.35
> to integrate support for our architecture.
>

As an experiment, I've created a "git format-patch" output file for all
the remaining Tilera-specific changes; Alan took the
lowmem_page_address() change into -mm, so hopefully that will make it
into 2.6.35 as well. I'm reluctant to post all the arch/tile contents
to LKML as a single 3 MB monster email, but you can just cut and paste
the following command to pull it into git:

wget http://www.tilera.com/scm/linux-2.6.34-arch-tile.patch | git am

In practice I could probably email it without causing grief to anyone's
mailer, but in the interests of saving disk and network bandwidth I'll
try this way. There are no changes in this patch that affect any other
architecture.

Thanks!

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Arnd Bergmann

unread,

May 23, 2010, 6:10:01 PM5/23/10

On Saturday 22 May 2010 06:05:19 Chris Metcalf wrote:
> As an experiment, I've created a "git format-patch" output file for all
> the remaining Tilera-specific changes; Alan took the
> lowmem_page_address() change into -mm, so hopefully that will make it
> into 2.6.35 as well. I'm reluctant to post all the arch/tile contents
> to LKML as a single 3 MB monster email, but you can just cut and paste
> the following command to pull it into git:
>
> wget http://www.tilera.com/scm/linux-2.6.34-arch-tile.patch | git am

Thanks for this. I took an initial look at the code and it looks pretty
good as far as I got though not mergeable for 2.6.35 IMHO. There are a number
of areas where code should be generic that is not, and there is stuff
in there that I think you should submit separately.

> In practice I could probably email it without causing grief to anyone's
> mailer, but in the interests of saving disk and network bandwidth I'll
> try this way. There are no changes in this patch that affect any other
> architecture.

It would help if you can set up an actual git tree to pull from, but
it also works the way you did it. I looked mostly at the header files,
leaving out the device drivers and oprofile intentionally and I have
not yet found time to look at your arch/tile/{lib,kernel,mm}

Most of these device drivers should be reviewed separately
using the appropriate mailing lists. In general we prefer
the drivers to live in drivers/{net,ata,serial,...} than
in arch/.../drivers.

The notable exception is pci, which should go to arch/tile/pci
but still be reviewed in the pci mailing list.

> arch/tile/oprofile/Makefile | 9 +
> arch/tile/oprofile/backtrace.c | 73 +
> arch/tile/oprofile/op_common.c | 352 +
> arch/tile/oprofile/op_impl.h | 37 +

These should probably go through the oprofile list.

> +config TILE
> + def_bool y
> + select HAVE_OPROFILE
> + select HAVE_IDE
> + select GENERIC_FIND_FIRST_BIT
> + select GENERIC_FIND_NEXT_BIT
> + select RESOURCES_64BIT
> + select USE_GENERIC_SMP_HELPERS
> +
> +# FIXME: investigate whether we need/want these options.
> +# select HAVE_GENERIC_DMA_COHERENT
> +# select HAVE_DMA_ATTRS
> +# select HAVE_IOREMAP_PROT
> +# select HAVE_OPTPROBES
> +# select HAVE_REGS_AND_STACK_ACCESS_API
> +# select HAVE_HW_BREAKPOINT
> +# select PERF_EVENTS
> +# select HAVE_USER_RETURN_NOTIFIER

You will want to implement PERF_EVENTS, which replaces OPROFILE
(you can have both though). You should not need HAVE_IDE, which
is deprecated by libata, but you will need to reimplement the
driver. HAVE_REGS_AND_STACK_ACCESS_API is a good one, you
should implmenent that. HAVE_HW_BREAKPOINT is good, but
requires hardware support.

It is unlikely that you need DMA attributes (unless your PCI
devices want to use nonstandard ordering rules). Similarly,
you hopefully won't need HAVE_GENERIC_DMA_COHERENT.

> +config HOMECACHE
> + bool "Support for dynamic home cache management"
> + depends on TILERA_MDE
> + ---help---
> + Home cache management allows Linux to dynamically adjust
> + which core's (or cores') cache is the "home" for every page
> + of memory. This allows performance improvements on TILEPro
> + (for example, by enabling the default "allbutstack" mode
> + where stack pages are always homed on the core running the
> + task). TILE64 has less performant cache-coherent support,
> + so it is not recommended to disable homecaching for TILE64.
> +
> +config DATAPLANE
> + bool "Support for Zero-Overhead Linux mode"
> + depends on SMP
> + depends on NO_HZ
> + depends on TILERA_MDE
> + ---help---
> + Zero-Overhead Linux mode, also called "dataplane" mode,
> + allows Linux cpus running only a single user task to run
> + without any kernel overhead on that cpu. The normal
> + scheduler tick is disabled, kernel threads such as the
> + softlockup thread are not run, kernel TLB flush IPIs are
> + deferred, vmstat updates are not performed, etc.

These sound like very interesting features that may also be
useful for other architectures. I would recommend splitting them
out into separate patches, by removing the support from the
base architecture patch, and submitting the two patches for these
features for discussion on the linux-kernel and linux-arch
mailing lists.

> +choice
> + depends on EXPERIMENTAL
> + prompt "Memory split" if EMBEDDED
> + default VMSPLIT_3G

I would recommend leaving out this option on your architecture
because of the craziness. If I understand you correctly, the
CPUs are all 64 bit capable, so there is little point in
micro-optimizing the highmem case.

> +config XGBE_MAIN
> + tristate "Tilera GBE/XGBE character device support"
> + default y
> + depends on HUGETLBFS
> + ---help---
> + This is the low-level driver for access to xgbe/gbe/pcie.

This should go to drivers/net/Kconfig.

> +config TILEPCI_ENDP
> + tristate "Tilera PCIE Endpoint Channel Driver"
> + default y
> + depends on !TILEGX
> + ---help---
> + This device is required on Tilera PCI cards; the driver
> + allows Tilera Linux on the chip to communicate with the
> + Intel Linux running on the host.

This driver is likely one of the hardest things to review. I'd
recommend leaving it out of the arch patch for now and submitting
it for a separate review together with the host side driver.

> +config TILE_IDE_GPIO
> + bool "Tilera IDE driver for GPIO"
> + depends on IDE
> + default y
> + ---help---
> + This device provides an IDE interface using the GPIO pins.

replace this with a driver in drivers/ata.

> +config TILE_SOFTUART
> + bool "Tilera Soft UART"
> + default n
> + depends on !TILEGX
> + ---help---
> + This device provides access to the FlexIO UART functionality.
> + It requires a dedicated hypervisor "softuart" driver tile.

I haven't looked at the driver, but it's very likely that you
want to replace it with either a backend for drivers/char/hvc_console.c
or drivers/serial/serial_core.c, modeled after the other drivers
using those interfaces. serial_core is for things that look like
an actual UART, while hvc_console is for abstracted interfaces
that have a simple read/write interface like a hypervisor.

[skipping device drivers]

> diff --git a/arch/tile/feedback/cachepack.c b/arch/tile/feedback/cachepack.c
> new file mode 100644
> index 0000000..7b54348
> --- /dev/null
> +++ b/arch/tile/feedback/cachepack.c
> +#include "file.h"
> +#include <arch/chip.h>
> +#ifdef __KERNEL__
> +#define THREADS_SUPPORTED
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include <linux/module.h>
> +#include <linux/spinlock.h>
> +#include <linux/mm.h>
> +#else
> +#include "threads.h"
> +#include "mmap.h"

This file looks like mixed kernel/user code, which is something
we don't normally do. It also does not follow kernel coding style.
I'd suggest splitting the implementation and having the kernel
version only include the necessary code without all the #ifdef
and in normal style.

You could also leave this out for now.

> diff --git a/arch/tile/include/arch/abi.h b/arch/tile/include/arch/abi.h
> new file mode 100644
> index 0000000..18ad6a0
> --- /dev/null
> +++ b/arch/tile/include/arch/abi.h
> @@ -0,0 +1,93 @@
> +// Copyright 2010 Tilera Corporation. All Rights Reserved.
> +//
> +// This program is free software; you can redistribute it and/or
> +// modify it under the terms of the GNU General Public License
> +// as published by the Free Software Foundation, version 2.
> +//
> +// This program is distributed in the hope that it will be useful, but
> +// WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> +// NON INFRINGEMENT. See the GNU General Public License for
> +// more details.
> +
> +//! @file
> +//!
> +//! ABI-related register definitions helpful when writing assembly code.
> +//!

This file uses nonstandard formatting of the comments. Is it
a generated file, or something that needs to be shared with
other projects?

If it is not shared with anything that strictly mandates the
style, I'd recommend moving to regular kernel style.

> +//! Get the current cycle count.
> +//!
> +static __inline unsigned long long
> +get_cycle_count(void)
> +{
> + unsigned int high = __insn_mfspr(SPR_CYCLE_HIGH);
> + unsigned int low = __insn_mfspr(SPR_CYCLE_LOW);
> + unsigned int high2 = __insn_mfspr(SPR_CYCLE_HIGH);
> + if (__builtin_expect(high == high2, 1))
> + {
> +#ifdef __TILECC__
> +#pragma frequency_hint FREQUENT
> +#endif
> + return (((unsigned long long)high) << 32) | low;
> + }
> + do {
> + low = __insn_mfspr(SPR_CYCLE_LOW);
> + high = high2;
> + high2 = __insn_mfspr(SPR_CYCLE_HIGH);
> + } while (high != high2);
> + return (((unsigned long long)high) << 32) | low;
> +}

I would not use these functions directly in driver code.
You could move all of cycle.h to timex.h and rename
get_cycle_count to get_cycles. The other functions
are not used anywhere, so they don't need to be
part of the header.

You should also implement read_current_timer using
this so you can avoid the expensive delay loop
calibration at boot time.

> +//! Delay for a brief period.
> +//!
> +//! As implemented, this function is a six-cycle slow SPR read.
> +//!
> +static __USUALLY_INLINE void
> +cycle_relax(void)
> +{
> + __insn_mfspr(SPR_PASS);
> +}

Another abstraction you can kill by moving this directly
to cpu_relax and calling that from your relax().

> +/* Use __ALWAYS_INLINE to force inlining, even at "-O0". */
> +#ifndef __ALWAYS_INLINE
> +#define __ALWAYS_INLINE __inline __attribute__((always_inline))
> +#endif
> +
> +/* Use __USUALLY_INLINE to force inlining even at "-Os", but not at "-O0". */
> +#ifndef __USUALLY_INLINE
> +#ifdef __OPTIMIZE__
> +#define __USUALLY_INLINE __ALWAYS_INLINE
> +#else
> +#define __USUALLY_INLINE
> +#endif
> +#endif

Please get rid of these abstraction, inlining is already hard
enough with the macros we have in the common code. We have
an __always_inline macro that is defined the same way as yours
and if you can make a good case for your __USUALLY_INLINE,
we can add that as __usually_inline to linux/compiler.h

> diff --git a/arch/tile/include/asm/Kbuild b/arch/tile/include/asm/Kbuild
> new file mode 100644
> index 0000000..c191db6
> --- /dev/null
> +++ b/arch/tile/include/asm/Kbuild
> @@ -0,0 +1,17 @@
> +include include/asm-generic/Kbuild.asm
> +
> +header-y += hardwall.h
> +header-y += memprof.h
> +header-y += ucontext.h
> +header-y += user.h
> +
> +unifdef-y += bme.h
> +unifdef-y += page.h
> +unifdef-y += tilepci.h

note that header-y and unifdef-y are now synonyms,
you can just make them all header-y.

Do you really need to export user.h and page.h?

> +# FIXME: The kernel probably shouldn't provide these to user-space,
> +# but it's convenient for now to do so.
> +unifdef-y += opcode-tile.h
> +unifdef-y += opcode_constants.h
> +unifdef-y += opcode-tile_32.h
> +unifdef-y += opcode_constants_32.h

The comment is right, they should not be exported.

> diff --git a/arch/tile/include/asm/a.out.h b/arch/tile/include/asm/a.out.h
> new file mode 100644
> index 0000000..36ee719
> --- /dev/null
> +++ b/arch/tile/include/asm/a.out.h

Should not be needed, just remove this file.

> --- /dev/null
> +++ b/arch/tile/include/asm/addrspace.h

This file is not referenced anywhere. I'd suggest removing it
until you send code that actually uses it.

> diff --git a/arch/tile/include/asm/asm.h b/arch/tile/include/asm/asm.h
> new file mode 100644
> index 0000000..f064bc4
> --- /dev/null
> +++ b/arch/tile/include/asm/asm.h

Can be removed. syscall_table.S is the only user (of just one
of its macros), so just change that file to not rely on
the header.

> diff --git a/arch/tile/include/asm/atomic.h b/arch/tile/include/asm/atomic.h
> new file mode 100644
> index 0000000..a4f4714
> --- /dev/null
> +++ b/arch/tile/include/asm/atomic.h
> +
> +#ifndef _ASM_TILE_ATOMIC_H
> +#define _ASM_TILE_ATOMIC_H
> +
> +#ifndef __ASSEMBLY__
> +
> +#include <linux/compiler.h>
> +#include <asm/system.h>
> +
> +#define ATOMIC_INIT(i) { (i) }

This file looks mostly generic, and is to a large extent the
same as the existing asm-generic/atomic.h. Could you add an
#ifdef atomic_add_return to the definition of that in
the generic file and use that, overriding the functions
that need to be architecture specific on SMP systems?

> diff --git a/arch/tile/include/asm/atomic_32.h b/arch/tile/include/asm/atomic_32.h
> new file mode 100644
> index 0000000..e4f8b4f
> --- /dev/null
> +++ b/arch/tile/include/asm/atomic_32.h
> +#ifndef _ASM_TILE_ATOMIC_32_H
> +#define _ASM_TILE_ATOMIC_32_H
> +
> +#include <arch/chip.h>

It's unclear why part of atomic.h is split out into atomic_32.h,
especially when the file actually contains the definitions for
atomic64_t ;-).

> diff --git a/arch/tile/include/asm/backtrace.h b/arch/tile/include/asm/backtrace.h
> new file mode 100644
> index 0000000..3e65364
> --- /dev/null
> +++ b/arch/tile/include/asm/backtrace.h
> +#ifndef _TILE_BACKTRACE_H
> +#define _TILE_BACKTRACE_H
> +
> +#ifndef _LANGUAGE_ASSEMBLY
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif /* __cplusplus */
> +
> +#ifdef __KERNEL__
> +#include <linux/types.h>
> +#else
> +#include <stdint.h>
> +#include <stdbool.h>
> +#endif

The file backtrace.h is not exported to user space, so you don't
need any of these guards in the kernel. It should also be changed
to follow regular coding style.

> diff --git a/arch/tile/include/asm/bitops.h b/arch/tile/include/asm/bitops.h
> new file mode 100644
> index 0000000..dc3228e
> --- /dev/null
> +++ b/arch/tile/include/asm/bitops.h

This file looks completely generic, but improved over the
asm-generic/bitops/* files by using compiler builtins where
possible.
It would be nice if you could change the generic code to
use the same builtins when possible.

> +#include <linux/compiler.h>
> +#include <asm/atomic.h>
> +#include <asm/system.h>
> +
> +/* Tile-specific routines to support <asm/bitops.h>. */
> +unsigned long _atomic_or(volatile unsigned long *p, unsigned long mask);
> +unsigned long _atomic_andn(volatile unsigned long *p, unsigned long mask);
> +unsigned long _atomic_xor(volatile unsigned long *p, unsigned long mask);
> +
> +/**
> + * set_bit - Atomically set a bit in memory
> + * @nr: the bit to set
> + * @addr: the address to start counting from
> + *
> + * This function is atomic and may not be reordered.
> + * See __set_bit() if you do not require the atomic guarantees.
> + * Note that @nr may be almost arbitrarily large; this function is not
> + * restricted to acting on a single-word quantity.
> + */
> +static inline void set_bit(unsigned nr, volatile unsigned long *addr)
> +{
> + _atomic_or(addr + BIT_WORD(nr), BIT_MASK(nr));
> +}

Why not just declare set_bit (and other functions from here)
to be extern?

> +++ b/arch/tile/include/asm/bitsperlong.h
> +
> +# define __BITS_PER_LONG 32

This seems wrong, unless you support _only_ 32 bit user space.

> +#ifndef _ASM_TILE_BME_H
> +#define _ASM_TILE_BME_H
> +
> +#ifndef __KERNEL__
> +#include <stdint.h>
> +#else
> +#include <linux/types.h>
> +#endif

Don't do this, just use the __u32 and similar types in
data structures. The stdint.h types are problematic
in exported kernel headers.

> +/**
> + * Descriptor for user memory attributes.
> + */
> +struct bme_user_mem_desc {
> + void *va; /**< Address of memory. */
> + uint32_t len; /**< Length of memory in bytes. */
> +};

Pointers in ioctl data structures are bad because they
require conversion between 32 bit applications and 64 bit
kernels. Better use a __u64 member or try to avoid the pointers
entirely.

> +/**
> + * Descriptor for physical memory attributes.
> + */
> +struct bme_phys_mem_desc {
> + uint64_t pa; /**< Physical address of memory. */
> + uint32_t len; /**< Length of memory in bytes. */
> + uint64_t pte; /**< Caching attributes. */
> +};

This data structure has implicit padding. I suspect that this
is ok on your arch, but in general you should make the padding
explicit or avoid it by aligning the members. Just make len
a __u64 here.

The problem is that code that is portable to x86 behaves differently
in 32 and 64 bit mode: x86-32 does not add padding here.

> +/** Get the number of pages this range of memory covers. */
> +#define BME_IOC_GET_NUM_PAGES _IO(BME_IOC_TYPE, 0x0)
> +
> +/**
> + * Lock the memory so it can be accessed by BME tiles. User must provide
> + * space for the number of pages included in this range. That number may
> + * be obtained by BME_IOC_GET_NUM_PAGES, above.
> + */
> +#define BME_IOC_LOCK_MEMORY _IO(BME_IOC_TYPE, 0x1)

These should actually be _IOWR, not _IO, because you are
passing data structures.

> --- /dev/null
> +++ b/arch/tile/include/asm/bugs.h
> @@ -0,0 +1,22 @@
> +
> +#ifndef _ASM_TILE_BUGS_H
> +#define _ASM_TILE_BUGS_H
> +
> +static inline void check_bugs(void)
> +{
> +}
> +
> +#endif /* _ASM_TILE_BUGS_H */

While this file is trivial, please just use the asm-generic
version anyway. I have a patch (and have had it for
ages) that lets you leave out any files that only contain
a redirect to asm-generic.

> diff --git a/arch/tile/include/asm/checksum.h b/arch/tile/include/asm/checksum.h
> new file mode 100644
> index 0000000..079ab67
> --- /dev/null
> +++ b/arch/tile/include/asm/checksum.h

I believe you can use the asm-generic version here.

> diff --git a/arch/tile/include/asm/compat.h b/arch/tile/include/asm/compat.h
> new file mode 100644
> index 0000000..5703968
> --- /dev/null
> +++ b/arch/tile/include/asm/compat.h

We don't have an architecture using the asm-generic headers
with CONFIG_COMPAT support yet, so tile would be the first
one. I think you should just move this file to
include/asm-generic/compat.h and use that, so future architectures
don't need to define their own.

> +/*
> + * Idle the core for 8 * iterations cycles.
> + * Also make this a compiler barrier, as it's sometimes used in
> + * lieue of cpu_relax(), which has barrier semantics.
> + */
> +static inline void
> +relax(int iterations)
> +{
> + for (/*above*/; iterations > 0; iterations--)
> + cycle_relax();
> + barrier();
> +}

I'd rather not make this part of the interface. Just move this
definition to your spinlock_32.c file and use an open-coded
version in delay.c

> +static inline void
> +dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, size_t size,
> + enum dma_data_direction direction)
> +{
> + panic("dma_sync_single_for_cpu");
> +}
> +
> +static inline void
> +dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle,
> + size_t size, enum dma_data_direction direction)
> +{
> + panic("dma_sync_single_for_device");
> +}

These definitions do not look helful. If you cannot figure out what
to do here, it may be better to just declare functions without
a definition so you get a link error for drivers that need them
instead of a runtime panic.

Usually you need to do the same thing you do while mapping when
you sync to the device (e.g. a cache flush) and potentially
a cache invalidate when you sync to the CPU.

> diff --git a/arch/tile/include/asm/dma.h b/arch/tile/include/asm/dma.h
> new file mode 100644
> index 0000000..002f12a
> --- /dev/null
> +++ b/arch/tile/include/asm/dma.h

The asm-generic version should be enough unless you plan to
support legacy ISA extension cards.

> +#ifndef _ASM_TILE_HARDWALL_H
> +#define _ASM_TILE_HARDWALL_H
> +
> +#include <linux/ioctl.h>
> +
> +struct hardwall_rectangle {
> + int ulhc_x;
> + int ulhc_y;
> + int width;
> + int height;
> +};
> +
> +#define HARDWALL_FILE "/proc/tile/hardwall"

This does not look right, /proc files should not be used with ioctl,
although we have a few existing cases already. You could probably
change this to a misc chardev or a debugfs file.

> +static inline void *memcpy_fromio(void *dst, void *src, int len)
> +{
> + int x;
> + if ((unsigned long)src & 0x3)
> + panic("memcpy_fromio from non dword aligned address");
> + for (x = 0; x < len; x += 4)
> + *(u32 *)(dst + x) = readl(src + x);
> + return dst;
> +}
> +
> +static inline void *memcpy_toio(void *dst, void *src, int len)
> +{
> + int x;
> + if ((unsigned long)dst & 0x3)
> + panic("memcpy_toio to non dword aligned address");
> + for (x = 0; x < len; x += 4)
> + writel(*(u32 *)(src + x), dst + x);
> + return dst;
> +}
> +

panic looks a bit harsh here. Maybe BUG_ON?

> diff --git a/arch/tile/include/asm/kmap_types.h b/arch/tile/include/asm/kmap_types.h
> new file mode 100644
> index 0000000..1480106
> --- /dev/null
> +++ b/arch/tile/include/asm/kmap_types.h

Any reason for having your own copy of this instead of the
generic file?

> diff --git a/arch/tile/include/asm/kvm.h b/arch/tile/include/asm/kvm.h
> new file mode 100644
> index 0000000..7ed6877
> --- /dev/null
> +++ b/arch/tile/include/asm/kvm.h

If you don't support kvm, just remove this file.

> diff --git a/arch/tile/include/asm/mman.h b/arch/tile/include/asm/mman.h
> new file mode 100644
> index 0000000..e448d45
> --- /dev/null
> +++ b/arch/tile/include/asm/mman.h

This looks like you can use the asm-generic/mman.h file.

> +/*
> + * Specify the "home cache" for the page explicitly. The home cache is
> + * the cache of one particular "home" cpu, which is used as a coherence
> + * point for normal cached operations. Normally the kernel chooses for
> + * you, but you can use the MAP_CACHE_HOME_xxx flags to override.
> + *
> + * User code should not use any symbols with a leading "_" as they are
> + * implementation specific and may change from release to release
> + * without warning.
> + *
> + * See the Tilera mmap(2) man page for more details (e.g. "tile-man mmap").
> + */
> +
> +/* Implementation details; do not use directly. */
> +#define _MAP_CACHE_INCOHERENT 0x40000
> +#define _MAP_CACHE_HOME 0x80000
> +#define _MAP_CACHE_HOME_SHIFT 20
> +#define _MAP_CACHE_HOME_MASK 0x3ff
> +#define _MAP_CACHE_MKHOME(n) \
> + (_MAP_CACHE_HOME | (((n) & _MAP_CACHE_HOME_MASK) << _MAP_CACHE_HOME_SHIFT))
> +

Since the file is exported to user space, the map_cache stuff probably
should not be here, but get moved to a different header that
is private to the kernel.

> diff --git a/arch/tile/include/asm/posix_types.h b/arch/tile/include/asm/posix_types.h
> new file mode 100644
> index 0000000..ab71c9c
> --- /dev/null
> +++ b/arch/tile/include/asm/posix_types.h

Anything wrong with the asm-generic version of this file?
You really should not need to define your own version,
because this is relevant to the user ABI.

> diff --git a/arch/tile/include/asm/sembuf.h b/arch/tile/include/asm/sembuf.h
> new file mode 100644
> index 0000000..d4dc7cd
> --- /dev/null
> +++ b/arch/tile/include/asm/sembuf.h

Same here, this is part of the ABI, so please use the generic version.

> diff --git a/arch/tile/include/asm/shmparam.h b/arch/tile/include/asm/shmparam.h
> new file mode 100644
> index 0000000..bc99ff6
> --- /dev/null
> +++ b/arch/tile/include/asm/shmparam.h

and here.

> --- /dev/null
> +++ b/arch/tile/include/asm/sigcontext.h
> +
> +#ifndef _ASM_TILE_SIGCONTEXT_H
> +#define _ASM_TILE_SIGCONTEXT_H
> +
> +/* NOTE: we can't include <linux/ptrace.h> due to #include dependencies. */
> +#include <asm/ptrace.h>
> +
> +/* Must track <sys/ucontext.h> */
> +
> +struct sigcontext {
> + struct pt_regs regs;
> +};

The comments both do not match the code apparently.

> diff --git a/arch/tile/include/asm/spinlock_32.h b/arch/tile/include/asm/spinlock_32.h
> new file mode 100644
> index 0000000..c609041
> --- /dev/null
> +++ b/arch/tile/include/asm/spinlock_32.h

This file could just be renamed to spinlock.h, afaict.

> diff --git a/arch/tile/include/asm/stat.h b/arch/tile/include/asm/stat.h
> new file mode 100644
> index 0000000..4d86b4e
> --- /dev/null
> +++ b/arch/tile/include/asm/stat.h

part of the ABI, please don't define your own.

> --- /dev/null
> +++ b/arch/tile/include/asm/timex.h
> @@ -0,0 +1,51 @@
> +/*
> + * Copyright 2010 Tilera Corporation. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation, version 2.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT. See the GNU General Public License for
> + * more details.
> + */
> +
> +#ifndef _ASM_TILE_TIMEX_H
> +#define _ASM_TILE_TIMEX_H
> +
> +#include <arch/cycle.h>
> +
> +/* Use this random value, just like most archs. Mysterious. */
> +#define CLOCK_TICK_RATE 1193180 /* Underlying HZ */

long story. It should however actually be something related to the
your frequency, not the time base of the i8253 chip that I hope
you are not using.

> diff --git a/arch/tile/include/asm/unistd.h b/arch/tile/include/asm/unistd.h
> new file mode 100644
> index 0000000..616dc7d
> --- /dev/null
> +++ b/arch/tile/include/asm/unistd.h

Your unistd.h file contains syscall numbers for many calls that
you should not need in a new architecture. Please move to the
asm-generic/unistd.h file instead. There may be a few things you
need to do in libc to get there, but this version is no good.
If you have problems with asm-generic/unistd.h (or any of the other
asm-generic files), feel free to ask me for help.

Arnd

Chris Metcalf

unread,

May 24, 2010, 11:30:03 AM5/24/10

On 5/23/2010 6:08 PM, Arnd Bergmann wrote:
> On Saturday 22 May 2010 06:05:19 Chris Metcalf wrote:
>
>> As an experiment, I've created a "git format-patch" output file for all

>> the remaining Tilera-specific changes [...]

> Thanks for this. I took an initial look at the code and it looks pretty
> good as far as I got though not mergeable for 2.6.35 IMHO.
>

First of all, thank YOU for your review!

Perhaps what we can do is shoot for including a "first round" set of
Tilera support in 2.6.35, which is sufficient to boot the chip up and
work with it, but defer some of the drivers and other features
(oprofile, etc.) for a later merge window.

> It would help if you can set up an actual git tree to pull from, but
> it also works the way you did it.

Hopefully we'll have one by next month sometime. We have to reprovision
our existing web server, so that has to be coordinated with Marketing,
etc. I think for this round we'll have to stick to downloading git
patches, unfortunately.

> Most of these device drivers should be reviewed separately
> using the appropriate mailing lists. In general we prefer
> the drivers to live in drivers/{net,ata,serial,...} than
> in arch/.../drivers.
>
> The notable exception is pci, which should go to arch/tile/pci
> but still be reviewed in the pci mailing list.
>

So this is an interesting question. Currently the "device driver"
support in the arch/tile/drivers directory is for devices which exist
literally only as part of the Tilera silicon, i.e. they are not
separable from the tile architecture itself. For example, the network
driver is tied to the Tilera networking shim DMA engine on the chip.
Does it really make sense to move this to a directory where it is more
visible to other architectures? I can see that it might from the point
of view of code bombings done to network drivers, for example.
Similarly for our other drivers, which are tied to details of the
hypervisor API, etc.

For this first round of Tilera code, I will plan to push only the PCI
driver support (which makes sense to move to its own arch/tile/pci/
directory anyway, since there are half a dozen files there). I'll put
the PCI stuff in its own commit and then cc it to the linux-pci list at
vger.

There is a very minimal hypervisor-API console driver in
arch/tile/kernel/ which I will plan to just leave there for now.

>> arch/tile/oprofile/Makefile | 9 +
>> arch/tile/oprofile/backtrace.c | 73 +
>> arch/tile/oprofile/op_common.c | 352 +
>> arch/tile/oprofile/op_impl.h | 37 +
>>
> These should probably go through the oprofile list.
>

OK. I'll put these in a separate commit as well. These in any case are
not critical for inclusion in the initial batch of Tilera support.

> You will want to implement PERF_EVENTS, which replaces OPROFILE

Yes, we're planning this, and in fact some friendly folks at {large
company I may not be supposed to name} are working on this with us at
the moment. I don't think it will be part of this initial code push,
though.

> (you can have both though). You should not need HAVE_IDE, which
> is deprecated by libata, but you will need to reimplement the
> driver.

I'll file a bug internally on this for us to review. If we make ATA
support a second-round thing anyway, we can do this in a more leisurely
manner.

> HAVE_REGS_AND_STACK_ACCESS_API is a good one, you should implmenent that.

OK. I think this may be straightforward enough to just do as part of
the first round of code.

> HAVE_HW_BREAKPOINT is good, but requires hardware support.
>

We do have some of this support (though with some skid), but in any case
its use needs to be coordinated with the oprofile/perf_event counters,
so we haven't gotten around to it yet. We have a bug open on this
internally already, though.

> +config HOMECACHE
>> + bool "Support for dynamic home cache management"

>> [...]

>> +config DATAPLANE
>> + bool "Support for Zero-Overhead Linux mode"
>>
>>

> These sound like very interesting features that may also be
> useful for other architectures. I would recommend splitting them
> out into separate patches, by removing the support from the
> base architecture patch, and submitting the two patches for these
> features for discussion on the linux-kernel and linux-arch
> mailing lists.
>

Yes, the intent was to submit them later, since they are more
controversial in that they touch platform-independent code. One thing
you'll notice in our Kconfig is a TILERA_MDE config option. This is
effectively a toggle to allow the same Kconfig to be used for both the
code we're returning to the community now, and for the "full featured"
version that we are hacking freely in our MDE ("multicore development
environment", which is what we call the software we ship with the chip).

My initial model was that we would submit all the arch/tile/ code up to
the community, including the code that couldn't yet be enabled due to
missing architecture-independent support. Adding the
architecture-independent code would then be done in a separate patch
thread. But this leaves the Tilera architecture-dependent code present
in the initial submission. How confusing do you think this situation
would be? I could just run our code through an unifdef to remove things
tagged with CONFIG options that can't be enabled due to missing
architecture-independent support.

>> +choice
>> + depends on EXPERIMENTAL
>> + prompt "Memory split" if EMBEDDED
>> + default VMSPLIT_3G
>>
> I would recommend leaving out this option on your architecture
> because of the craziness. If I understand you correctly, the
> CPUs are all 64 bit capable, so there is little point in
> micro-optimizing the highmem case.
>

No, our current shipping hardware is 32-bit only. The next generation
is 64-bit capable so does not use HIGHMEM and doesn't need to allow the
craziness. I added a "depends on !TILEGX" to disable it in that case.

>> +config XGBE_MAIN
>> + tristate "Tilera GBE/XGBE character device support"
>> + default y
>> + depends on HUGETLBFS
>> + ---help---
>> + This is the low-level driver for access to xgbe/gbe/pcie.
>>
> This should go to drivers/net/Kconfig.
>

Maybe not. This driver is just a character device that allows a user
process to talk to the networking hardware directly. For example, you
might have an eth0 that is just a normal PCI device using the
platform-independent networking code, and then have user-space code
driving the 10 Gb on-chip NICs without involving the kernel networking
stack. The Linux networking support (tagged with XGBE_NET) is layered
on top of this driver.

>> diff --git a/arch/tile/feedback/cachepack.c b/arch/tile/feedback/cachepack.c
>> [...]

>>
> This file looks like mixed kernel/user code, which is something
> we don't normally do. It also does not follow kernel coding style.
> I'd suggest splitting the implementation and having the kernel
> version only include the necessary code without all the #ifdef
> and in normal style.
>
> You could also leave this out for now.
>

Yes, for now I'll just leave this feedback-compilation support out. In
another place we have stack backtracing support that is also shared, but
we can actually just unifdef the file when we install it in the kernel
tree, so there will be some blank lines (to make it easier to use
line-number information on the original source) but no __KERNEL__ ifdefs
in the kernel source.

>> diff --git a/arch/tile/include/arch/abi.h b/arch/tile/include/arch/abi.h
>> [...]

>>
> This file uses nonstandard formatting of the comments. Is it
> a generated file, or something that needs to be shared with
> other projects?
>
> If it is not shared with anything that strictly mandates the
> style, I'd recommend moving to regular kernel style.
>

I'll discuss changing the style with the rest of the Tilera software
team. However, we have generally preferred C99 comments for our own
non-Linux code, and this "arch/tile/include/arch/" directory represents
part of the set of headers that provide access to all the grotty details
of the underlying hardware architecture, so can be used within Linux
code, or hypervisor code, booter, user space, etc etc, with no libc or
kernel header inclusion dependencies.

For what it's worth, there do seem to be plenty of files in the
architecture-dependent parts of the kernel, and drivers, that use C99
comments, so there is some precedent for leaving this files in that
style. (grep "^//" hits 866 files, for example.)

>> +//! Get the current cycle count.
>> +//!
>> +static __inline unsigned long long
>> +get_cycle_count(void)

>> [...]

>>
> I would not use these functions directly in driver code.
> You could move all of cycle.h to timex.h and rename
> get_cycle_count to get_cycles. The other functions
> are not used anywhere, so they don't need to be
> part of the header.
>

This is another artifact of how we are sharing code between our <arch>
headers and Linux. Other parts of our code base use these headers too,
so we export the correct clock-capture algorithm here, then instantiate
it once for Linux, in arch/tile/kernel/time.c. On our 64-bit chip, the
CHIP_HAS_SPLIT_CYCLE() #define is false, so we just directly use the
trivial implementation in <arch/cycle.h>.

> You should also implement read_current_timer using
> this so you can avoid the expensive delay loop
> calibration at boot time.
>

We have the following in <asm/timex.h>, which I think should already do
what you are saying:

#define ARCH_HAS_READ_CURRENT_TIMER
static inline int read_current_timer(unsigned long *timer_value)
{
*timer_value = get_cycle_count_low();
return 0;
}

We actually have a one-line change to init/calibrate.c to use an
arch_calibrate_delay_direct() macro if defined, which avoids even having
to use read_current_timer(), but since that's platform-independent code,
I didn't want to get into it yet.

>> +static __USUALLY_INLINE void
>> +cycle_relax(void)

>>
> Another abstraction you can kill by moving this directly
> to cpu_relax and calling that from your relax().
>

Again, shared code with non-Linux sources.

>> +/* Use __ALWAYS_INLINE to force inlining, even at "-O0". */
>> +#ifndef __ALWAYS_INLINE
>> +#define __ALWAYS_INLINE __inline __attribute__((always_inline))
>> +#endif
>> +
>> +/* Use __USUALLY_INLINE to force inlining even at "-Os", but not at "-O0". */
>> +#ifndef __USUALLY_INLINE
>> +#ifdef __OPTIMIZE__
>> +#define __USUALLY_INLINE __ALWAYS_INLINE
>> +#else
>> +#define __USUALLY_INLINE
>> +#endif
>> +#endif
>>
> Please get rid of these abstraction, inlining is already hard
> enough with the macros we have in the common code.

Yes, I've seen some of the inlining wars go by over the years on Linux
forums. But again, these headers are meant to be used in places that
don't have access to internal Linux headers, while at the same time
being easy to #include within code that does use the Linux headers. We
could do some crazy transformation of our <arch> headers and install
them as "asm" headers for Linux, or something like that, but then it
gets harder to write code that can be used both inside Linux and outside
(say, in a user-mode driver, or in the hypervisor).

> Do you really need to export user.h and page.h?

We definitely don't need user.h any more; for a while we were building
strace to include it, but we haven't been for a while. We do use
<asm/page.h> to get the page size in some places, but we could also
provide that directly via libc in <sys/page.h> and not involve the
kernel. Our build allows tuning the page size but only by recompiling
the hypervisor and Linux both, so we just provide page size as a
constant. (Though getpagesize() still uses the auxv value passed to
user space, just in case we make page size dynamic at some point in the
future.)

>
>> --- /dev/null
>> +++ b/arch/tile/include/asm/addrspace.h
>>
> This file is not referenced anywhere. I'd suggest removing it
> until you send code that actually uses it.
>

OK, I've removed it. I assumed that it was required by architectures,
since it is used in various places in the kernel. I see four drivers
that just include it unconditionally at the moment, though curiously,
they don't seem to use any of the symbols it defines. And there are
four architectures (avr32, m32r, mips, sh) that all provide this header
at the moment, though there doesn't seem to be agreement as to what
symbols it should define.

>> diff --git a/arch/tile/include/asm/asm.h b/arch/tile/include/asm/asm.h
>> new file mode 100644
>> index 0000000..f064bc4
>> --- /dev/null
>> +++ b/arch/tile/include/asm/asm.h
>>
> Can be removed. syscall_table.S is the only user (of just one
> of its macros), so just change that file to not rely on
> the header.
>

Well, true, but it's a good abstraction. I actually was planning to use
_ASM_EXTABLE in some of our assembly code, though I hadn't gotten around
to doing so yet.

>> diff --git a/arch/tile/include/asm/atomic.h b/arch/tile/include/asm/atomic.h

>>
> This file looks mostly generic, and is to a large extent the
> same as the existing asm-generic/atomic.h. Could you add an
> #ifdef atomic_add_return to the definition of that in
> the generic file and use that, overriding the functions
> that need to be architecture specific on SMP systems?
>

Seems like a good idea. I'll look into it. Should I submit the
<asm-generic/atomic.h> change first as an independent change from the
Tilera architecture stuff, or just include it with the Tilera stuff?
Same question for the bitops stuff that you mention later on.

> It's unclear why part of atomic.h is split out into atomic_32.h,
> especially when the file actually contains the definitions for
> atomic64_t ;-).
>

Yeah, that nomenclature does end up a little confusing. We adopted the
x86 confusion of using "_32" for our 32-bit architecture (i386 <=>
tilepro) and "_64" for our 64-bit architecture (x86_64 <=> tilegx). So
here, <asm/atomic_32.h> is the atomic support for our 32-bit
architecture, and <asm/atomic_64.h> is the support for our 64-bit
architecture. However, I unifdef'ed out the things tagged with
"__tilegx__" in our sources, and removed the "*_64.[chS]" files, since
the TILE-Gx support is not 100% until we actually start shipping the
silicon.

>> +static inline void set_bit(unsigned nr, volatile unsigned long *addr)
>> +{
>> + _atomic_or(addr + BIT_WORD(nr), BIT_MASK(nr));
>> +}
>>

> +#include <linux/compiler.h>

> Why not just declare set_bit (and other functions from here)
> to be extern?
>

Two reasons. The first is that by exposing the "nr" value here, the
compiler can often optimize it away completely, or just convert it to an
appropriate constant. If we left it in an extern set_bit() the cpu
would always have to do the shifts and adds. Or, if not a constant, the
compiler can often use an empty slot in one of our "instruction bundles"
leading up to the call to _atomic_or() to hide the construction of the
necessary pointer and constant.

>> +++ b/arch/tile/include/asm/bitsperlong.h
>> +
>> +# define __BITS_PER_LONG 32
>>
> This seems wrong, unless you support _only_ 32 bit user space.
>

For the current silicon, we do. For the 64-bit silicon, we support
either flavor, and we use #ifdef __LP64__ to guard this here. But I'm
also unifdef'ing with -U__LP64__ for the sources you're seeing. Perhaps
this just ends up being more, rather than less, confusing!

> with CONFIG_COMPAT support yet, so tile would be the first
> one. I think you should just move this file to
> include/asm-generic/compat.h and use that, so future architectures
> don't need to define their own.
>

Most of it is pretty generic, for sure. Are you comfortable with the
part about registers? We use 64-bit registers in our 32-bit mode, since
for us "compat" mode is just a 32-bit pointer mode, like DEC Alpha's.
So "long long" and "double" are still held in a single 64-bit register
regardless. Here's the relevant part:

/* We use the same register dump format in 32-bit images. */
typedef unsigned long compat_elf_greg_t;
#define COMPAT_ELF_NGREG (sizeof(struct pt_regs) / sizeof(compat_elf_greg_t))
typedef compat_elf_greg_t compat_elf_gregset_t[COMPAT_ELF_NGREG];

>> + * Idle the core for 8 * iterations cycles.
>> + * Also make this a compiler barrier, as it's sometimes used in
>> + * lieue of cpu_relax(), which has barrier semantics.
>> + */
>> +static inline void
>> +relax(int iterations)

>> [...]

>>
> I'd rather not make this part of the interface. Just move this
> definition to your spinlock_32.c file and use an open-coded
> version in delay.c
>

We also use this in spinlock_64.c, which of course you didn't see :-)
We could just move it to asm/spinlock.h and call it __relax() or some
such to suggest that it's not meant to be used by other code. How does
that sound?

> +++ b/arch/tile/include/asm/kmap_types.h
>
> Any reason for having your own copy of this instead of the
> generic file?
>

Yes, it's because we are concerned about chewing up address space. Each
additional km type here requires another page worth of address space per
cpu, and since we are using 64KB pages for TLB efficiency in our
embedded apps, this means 64KB times 64 processors = 4 MB of address
space per km type. (Yes, I've followed the discussions about why large
page sizes are bad for general-purpose computing.)

> This looks like you can use the asm-generic/mman.h file.

No, the bit values for the constants are wrong. We use bits 0x8000 and
up to describe our "homecache" overrides to mmap().

> Since the file is exported to user space, the map_cache stuff probably
> should not be here, but get moved to a different header that
> is private to the kernel.
>

It's part of the optional extended API for mmap() used by Tilera Linux,
so it is actually needed by userspace.

> +++ b/arch/tile/include/asm/posix_types.h
> Anything wrong with the asm-generic version of this file?
>

I somehow missed being aware of the generic version of this (and of
sembuf.h and shmparam.h). It seems likely we can use the generic
posix_types.h, and we can certainly use the generic forms of the others.

>
>> --- /dev/null
>> +++ b/arch/tile/include/asm/sigcontext.h
>> +
>> +#ifndef _ASM_TILE_SIGCONTEXT_H
>> +#define _ASM_TILE_SIGCONTEXT_H
>> +
>> +/* NOTE: we can't include <linux/ptrace.h> due to #include dependencies. */
>> +#include <asm/ptrace.h>
>> +
>> +/* Must track <sys/ucontext.h> */
>> +
>> +struct sigcontext {
>> + struct pt_regs regs;
>> +};
>>
> The comments both do not match the code apparently.
>

Sorry - can you clarify this comment? I don't see the mismatch.

>
> +++ b/arch/tile/include/asm/spinlock_32.h
>
> This file could just be renamed to spinlock.h, afaict.
>

Yes, well, there's the spinlock_64.h version hiding behind the unifdef
here. :-)

> +++ b/arch/tile/include/asm/stat.h
> part of the ABI, please don't define your own.
>

Unfortunately, changing this would require us to make an incompatible
change to current user-space. It may be possible anyway, since we are
planning a number of transitions for our next major release (jump from
kernel 2.6.26, switch from our current SGI-derived compiler to using
gcc, etc.). I'll discuss this internally.

>> +/* Use this random value, just like most archs. Mysterious. */
>> +#define CLOCK_TICK_RATE 1193180 /* Underlying HZ */
>>
> long story. It should however actually be something related to the
> your frequency, not the time base of the i8253 chip that I hope
> you are not using.
>

No, no i8253. But our clock tick rate is controllable dynamically at
boot, so there's certainly no trivial constant that makes sense here.
Should I use the slowest possible frequency here? The fastest? It's
used in some irrelevant drivers, but also in <linux/jiffies.h>, which is
the place that worries me.

> Your unistd.h file contains syscall numbers for many calls that
> you should not need in a new architecture. Please move to the
> asm-generic/unistd.h file instead. There may be a few things you
> need to do in libc to get there, but this version is no good.
> If you have problems with asm-generic/unistd.h (or any of the other
> asm-generic files), feel free to ask me for help.
>

Sounds like we should take this one off-list until I know more precisely
what you're worried about. As far as I know, I did not import any
pointless syscalls. I have a stanza (which of course is unifdef'ed out
of your version) that removes all the foo64() syscalls when used with
64-bit userspace. But I think all the rest are useful.

As for <asm-generic/unistd.h>, I'll look more carefully at it, though of
course using it is also dependent on whether it is reasonable for us to
completely break compatibility with current user-space programs.

Arnd - MANY thanks for your careful review so far. I will implement
what you suggested and await the remainder of your review before
resubmitting patches.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Arnd Bergmann

unread,

May 24, 2010, 3:00:02 PM5/24/10

On Monday 24 May 2010 17:29:18 Chris Metcalf wrote:
> On 5/23/2010 6:08 PM, Arnd Bergmann wrote:
> >
> Perhaps what we can do is shoot for including a "first round" set of
> Tilera support in 2.6.35, which is sufficient to boot the chip up and
> work with it, but defer some of the drivers and other features
> (oprofile, etc.) for a later merge window.

The most important change in my opinion is to get the system call
ABI straight, by making sure you don't introduce interfaces that
will get in your way later. If you can get the kernel to build using
the asm-generic version of unistd.h and the other exported headers,
as well as leaving out the device drivers, that should work.

I would also like to wait for another opinion before it goes in.
Note that the regular procedure is to have the code reviewed
before the start of the merge window, not in the middle of it!

> > It would help if you can set up an actual git tree to pull from, but
> > it also works the way you did it.
>
> Hopefully we'll have one by next month sometime. We have to reprovision
> our existing web server, so that has to be coordinated with Marketing,
> etc. I think for this round we'll have to stick to downloading git
> patches, unfortunately.

I can see two options for speeding that up. The easiest way would be
to just make the bare git tree available on http, instead of a single
file. If you can rsync or ftp to the web server, that should be
sufficient.

Alternatively, you can apply for an account on master.kernel.org,
if your company policies allow you to do that. That should be possible
within a few days at most and will help others locate your tree.

> > Most of these device drivers should be reviewed separately
> > using the appropriate mailing lists. In general we prefer
> > the drivers to live in drivers/{net,ata,serial,...} than
> > in arch/.../drivers.
> >
> > The notable exception is pci, which should go to arch/tile/pci
> > but still be reviewed in the pci mailing list.
> >
>
> So this is an interesting question. Currently the "device driver"
> support in the arch/tile/drivers directory is for devices which exist
> literally only as part of the Tilera silicon, i.e. they are not
> separable from the tile architecture itself. For example, the network
> driver is tied to the Tilera networking shim DMA engine on the chip.
> Does it really make sense to move this to a directory where it is more
> visible to other architectures?

yes.

> I can see that it might from the point
> of view of code bombings done to network drivers, for example.

Exactly, that is indeed an important point. It's more important
for some subsystems than others, but people generally like to be
able to do things like 'grep all network device drivers'.

> Similarly for our other drivers, which are tied to details of the
> hypervisor API, etc.

Just mark them as 'depends on ARCH_TILE' in Kconfig if you cannot
build the drivers elsewhere. Drivers that do not have a good place
to go elsewhere in the tree can probably go to drivers/tile/ rather
than arch/tile.

> For this first round of Tilera code, I will plan to push only the PCI
> driver support (which makes sense to move to its own arch/tile/pci/
> directory anyway, since there are half a dozen files there). I'll put
> the PCI stuff in its own commit and then cc it to the linux-pci list at
> vger.

> There is a very minimal hypervisor-API console driver in
> arch/tile/kernel/ which I will plan to just leave there for now.

ok. arch/tile/hv might be better if you think that the files will
grow substantially, but kernel is also good.

> > You will want to implement PERF_EVENTS, which replaces OPROFILE
>
> Yes, we're planning this, and in fact some friendly folks at {large
> company I may not be supposed to name} are working on this with us at
> the moment. I don't think it will be part of this initial code push,
> though.

Ok, it's certainly not required.

> > (you can have both though). You should not need HAVE_IDE, which
> > is deprecated by libata, but you will need to reimplement the
> > driver.
>
> I'll file a bug internally on this for us to review. If we make ATA
> support a second-round thing anyway, we can do this in a more leisurely
> manner.

> > HAVE_REGS_AND_STACK_ACCESS_API is a good one, you should implmenent that.
>
> OK. I think this may be straightforward enough to just do as part of
> the first round of code.

good

> > HAVE_HW_BREAKPOINT is good, but requires hardware support.
> >
>
> We do have some of this support (though with some skid), but in any case
> its use needs to be coordinated with the oprofile/perf_event counters,
> so we haven't gotten around to it yet. We have a bug open on this
> internally already, though.

I think the simpler the initial code gets, the better. Anything that
you cannot even compile because of other dependencies just makes
the code harder to review.

> >> +choice
> >> + depends on EXPERIMENTAL
> >> + prompt "Memory split" if EMBEDDED
> >> + default VMSPLIT_3G
> >>
> > I would recommend leaving out this option on your architecture
> > because of the craziness. If I understand you correctly, the
> > CPUs are all 64 bit capable, so there is little point in
> > micro-optimizing the highmem case.
> >
>
> No, our current shipping hardware is 32-bit only. The next generation
> is 64-bit capable so does not use HIGHMEM and doesn't need to allow the
> craziness. I added a "depends on !TILEGX" to disable it in that case.

Ah, I see. If you think people will want to tweak this option then,
it should just stay in.

> >> +config XGBE_MAIN
> >> + tristate "Tilera GBE/XGBE character device support"
> >> + default y
> >> + depends on HUGETLBFS
> >> + ---help---
> >> + This is the low-level driver for access to xgbe/gbe/pcie.
> >>
> > This should go to drivers/net/Kconfig.
> >
>
> Maybe not. This driver is just a character device that allows a user
> process to talk to the networking hardware directly. For example, you
> might have an eth0 that is just a normal PCI device using the
> platform-independent networking code, and then have user-space code
> driving the 10 Gb on-chip NICs without involving the kernel networking
> stack. The Linux networking support (tagged with XGBE_NET) is layered
> on top of this driver.

Ah, I missed the part about this being a character device driver. I meant
that the network driver should go to drivers/net/xgbe/, but it probably
also makes sense to keep it together with the 'main' driver.

My initial impression from the chardev interface here is that it may be
better to do this as a new socket family that lets you open a very-raw
socket on the eth0 to do this instead of a chardev, but that discussion
belongs on the netdev list.

> >> diff --git a/arch/tile/feedback/cachepack.c b/arch/tile/feedback/cachepack.c
> >> [...]
> >>
> > This file looks like mixed kernel/user code, which is something
> > we don't normally do. It also does not follow kernel coding style.
> > I'd suggest splitting the implementation and having the kernel
> > version only include the necessary code without all the #ifdef
> > and in normal style.
> >
> > You could also leave this out for now.
> >
>
> Yes, for now I'll just leave this feedback-compilation support out. In
> another place we have stack backtracing support that is also shared, but
> we can actually just unifdef the file when we install it in the kernel
> tree, so there will be some blank lines (to make it easier to use
> line-number information on the original source) but no __KERNEL__ ifdefs
> in the kernel source.

I've seen the empty lines in some places and found them rather confusing.
I also don't think that you will be able to use the line numbers in the
way you hope to, because of patches that other people apply to their
kernels.

> >> diff --git a/arch/tile/include/arch/abi.h b/arch/tile/include/arch/abi.h
> >> [...]
> >>
> > This file uses nonstandard formatting of the comments. Is it
> > a generated file, or something that needs to be shared with
> > other projects?
> >
> > If it is not shared with anything that strictly mandates the
> > style, I'd recommend moving to regular kernel style.
> >
>
> I'll discuss changing the style with the rest of the Tilera software
> team. However, we have generally preferred C99 comments for our own
> non-Linux code, and this "arch/tile/include/arch/" directory represents
> part of the set of headers that provide access to all the grotty details
> of the underlying hardware architecture, so can be used within Linux
> code, or hypervisor code, booter, user space, etc etc, with no libc or
> kernel header inclusion dependencies.

I see. Many people have tried sharing code between the kernel and
other projects, but because of the churn from random people patching
it, this usually results in eventually giving up and letting them
diverge, or declaring the Linux version to be the master copy and
following our coding style everywhere.

> For what it's worth, there do seem to be plenty of files in the
> architecture-dependent parts of the kernel, and drivers, that use C99
> comments, so there is some precedent for leaving this files in that
> style. (grep "^//" hits 866 files, for example.)

We're slowly getting rid of them ;-)

> >> +//! Get the current cycle count.
> >> +//!
> >> +static __inline unsigned long long
> >> +get_cycle_count(void)
> >> [...]
> >>
> > I would not use these functions directly in driver code.
> > You could move all of cycle.h to timex.h and rename
> > get_cycle_count to get_cycles. The other functions
> > are not used anywhere, so they don't need to be
> > part of the header.
> >
>
> This is another artifact of how we are sharing code between our <arch>
> headers and Linux. Other parts of our code base use these headers too,
> so we export the correct clock-capture algorithm here, then instantiate
> it once for Linux, in arch/tile/kernel/time.c. On our 64-bit chip, the
> CHIP_HAS_SPLIT_CYCLE() #define is false, so we just directly use the
> trivial implementation in <arch/cycle.h>.

I see. In general, I'd still recommend avoiding these headers if they
only add another indirection (like the inline.h), but I understand
your reasoning here, so feel free to ignore my recommendation on this one.

> > You should also implement read_current_timer using
> > this so you can avoid the expensive delay loop
> > calibration at boot time.
> >
>
> We have the following in <asm/timex.h>, which I think should already do
> what you are saying:
>
> #define ARCH_HAS_READ_CURRENT_TIMER
> static inline int read_current_timer(unsigned long *timer_value)
> {
> *timer_value = get_cycle_count_low();
> return 0;
> }

Ok, I missed that.

> We actually have a one-line change to init/calibrate.c to use an
> arch_calibrate_delay_direct() macro if defined, which avoids even having
> to use read_current_timer(), but since that's platform-independent code,
> I didn't want to get into it yet.

I believe the recommended way to do this is to disable
CONFIG_GENERIC_CALIBRATE_DELAY and provide an architecture specific
calibrate_delay function.

> >> +/* Use __ALWAYS_INLINE to force inlining, even at "-O0". */
> >> +#ifndef __ALWAYS_INLINE
> >> +#define __ALWAYS_INLINE __inline __attribute__((always_inline))
> >> +#endif
> >> +
> >> +/* Use __USUALLY_INLINE to force inlining even at "-Os", but not at "-O0". */
> >> +#ifndef __USUALLY_INLINE
> >> +#ifdef __OPTIMIZE__
> >> +#define __USUALLY_INLINE __ALWAYS_INLINE
> >> +#else
> >> +#define __USUALLY_INLINE
> >> +#endif
> >> +#endif
> >>
> > Please get rid of these abstraction, inlining is already hard
> > enough with the macros we have in the common code.
>
> Yes, I've seen some of the inlining wars go by over the years on Linux
> forums. But again, these headers are meant to be used in places that
> don't have access to internal Linux headers, while at the same time
> being easy to #include within code that does use the Linux headers. We
> could do some crazy transformation of our <arch> headers and install
> them as "asm" headers for Linux, or something like that, but then it
> gets harder to write code that can be used both inside Linux and outside
> (say, in a user-mode driver, or in the hypervisor).

Well, I guess the easiest way out for you would be to kill both inline.h
and cycle.h from your kernel code as I suggested. They are reasonably
simple anyway. The only other use is in arch/sim.h and I would guess that
you can just turn that into __inline to avoid further discussion.

> > Do you really need to export user.h and page.h?
>
> We definitely don't need user.h any more; for a while we were building
> strace to include it, but we haven't been for a while. We do use
> <asm/page.h> to get the page size in some places, but we could also
> provide that directly via libc in <sys/page.h> and not involve the
> kernel. Our build allows tuning the page size but only by recompiling
> the hypervisor and Linux both, so we just provide page size as a
> constant. (Though getpagesize() still uses the auxv value passed to
> user space, just in case we make page size dynamic at some point in the
> future.)

You cannot use the kernel headers to export build options to user
space, because that breaks the user ABI -- anything built against
the page.h for one page size will not work reliably on another kernel
as it should.

I've forgotten the details, but I think the only reliable way to
find out the page size from user space is sysconf().

> >> diff --git a/arch/tile/include/asm/asm.h b/arch/tile/include/asm/asm.h
> >> new file mode 100644
> >> index 0000000..f064bc4
> >> --- /dev/null
> >> +++ b/arch/tile/include/asm/asm.h
> >>
> > Can be removed. syscall_table.S is the only user (of just one
> > of its macros), so just change that file to not rely on
> > the header.
> >
>
> Well, true, but it's a good abstraction. I actually was planning to use
> _ASM_EXTABLE in some of our assembly code, though I hadn't gotten around
> to doing so yet.

Then just add it back as you start using it. Unused code is by
definition untested and that means it's likely to be broken anyway.

> >> diff --git a/arch/tile/include/asm/atomic.h b/arch/tile/include/asm/atomic.h
> >>
> > This file looks mostly generic, and is to a large extent the
> > same as the existing asm-generic/atomic.h. Could you add an
> > #ifdef atomic_add_return to the definition of that in
> > the generic file and use that, overriding the functions
> > that need to be architecture specific on SMP systems?
> >
>
> Seems like a good idea. I'll look into it. Should I submit the
> <asm-generic/atomic.h> change first as an independent change from the
> Tilera architecture stuff, or just include it with the Tilera stuff?
> Same question for the bitops stuff that you mention later on.

I would do a separate patch for each header you touch (a combined
one for the bitops), and then do the whole architecture last.

> > It's unclear why part of atomic.h is split out into atomic_32.h,
> > especially when the file actually contains the definitions for
> > atomic64_t ;-).
> >
>
> Yeah, that nomenclature does end up a little confusing. We adopted the
> x86 confusion of using "_32" for our 32-bit architecture (i386 <=>
> tilepro) and "_64" for our 64-bit architecture (x86_64 <=> tilegx). So
> here, <asm/atomic_32.h> is the atomic support for our 32-bit
> architecture, and <asm/atomic_64.h> is the support for our 64-bit
> architecture. However, I unifdef'ed out the things tagged with
> "__tilegx__" in our sources, and removed the "*_64.[chS]" files, since
> the TILE-Gx support is not 100% until we actually start shipping the
> silicon.

Ok, I see. Is there anything confidential in the 64 bit code, or is it
just not stable yet? If you are allowed to show the code already, I'd
suggest also submitting it now, you can always get it working later.

It's probably a good idea to send the 64 bit architecture stuff as
a separate patch, since you've already gone through the work of
splitting it out. Just mark CONFIG_64BIT as 'EXPERIMENTAL' if you
don't consider it ready.

> >> +static inline void set_bit(unsigned nr, volatile unsigned long *addr)
> >> +{
> >> + _atomic_or(addr + BIT_WORD(nr), BIT_MASK(nr));
> >> +}
> >>
> > +#include <linux/compiler.h>
> > Why not just declare set_bit (and other functions from here)
> > to be extern?
> >
>
> Two reasons. The first is that by exposing the "nr" value here, the
> compiler can often optimize it away completely, or just convert it to an
> appropriate constant. If we left it in an extern set_bit() the cpu
> would always have to do the shifts and adds. Or, if not a constant, the
> compiler can often use an empty slot in one of our "instruction bundles"
> leading up to the call to _atomic_or() to hide the construction of the
> necessary pointer and constant.

> >> +++ b/arch/tile/include/asm/bitsperlong.h
> >> +
> >> +# define __BITS_PER_LONG 32
> >>
> > This seems wrong, unless you support _only_ 32 bit user space.
> >
>
> For the current silicon, we do. For the 64-bit silicon, we support
> either flavor, and we use #ifdef __LP64__ to guard this here. But I'm
> also unifdef'ing with -U__LP64__ for the sources you're seeing. Perhaps
> this just ends up being more, rather than less, confusing!

yes.

> > with CONFIG_COMPAT support yet, so tile would be the first
> > one. I think you should just move this file to
> > include/asm-generic/compat.h and use that, so future architectures
> > don't need to define their own.
> >
>
> Most of it is pretty generic, for sure. Are you comfortable with the
> part about registers? We use 64-bit registers in our 32-bit mode, since
> for us "compat" mode is just a 32-bit pointer mode, like DEC Alpha's.
> So "long long" and "double" are still held in a single 64-bit register
> regardless. Here's the relevant part:
>
> /* We use the same register dump format in 32-bit images. */
> typedef unsigned long compat_elf_greg_t;
> #define COMPAT_ELF_NGREG (sizeof(struct pt_regs) / sizeof(compat_elf_greg_t))
> typedef compat_elf_greg_t compat_elf_gregset_t[COMPAT_ELF_NGREG];

Ah, I didn't notice those. Just leave out the elf_greg_t stuff from
asm-generic/compat.h then and put them either in your own compat.h
or into the elf.h, as you like.

> >> + * Idle the core for 8 * iterations cycles.
> >> + * Also make this a compiler barrier, as it's sometimes used in
> >> + * lieue of cpu_relax(), which has barrier semantics.
> >> + */
> >> +static inline void
> >> +relax(int iterations)
> >> [...]
> >>
> > I'd rather not make this part of the interface. Just move this
> > definition to your spinlock_32.c file and use an open-coded
> > version in delay.c
> >
>
> We also use this in spinlock_64.c, which of course you didn't see :-)
> We could just move it to asm/spinlock.h and call it __relax() or some
> such to suggest that it's not meant to be used by other code. How does
> that sound?

Yes, maybe even __spinlock_relax() to be more explicit.

> > +++ b/arch/tile/include/asm/kmap_types.h
> >
> > Any reason for having your own copy of this instead of the
> > generic file?
> >
>
> Yes, it's because we are concerned about chewing up address space. Each
> additional km type here requires another page worth of address space per
> cpu, and since we are using 64KB pages for TLB efficiency in our
> embedded apps, this means 64KB times 64 processors = 4 MB of address
> space per km type. (Yes, I've followed the discussions about why large
> page sizes are bad for general-purpose computing.)

I see, that makes sense. It also puts an end to my plans to unify
all kmap_types.h implementations, but that doesn't need to worry you.

> > This looks like you can use the asm-generic/mman.h file.
>
> No, the bit values for the constants are wrong. We use bits 0x8000 and
> up to describe our "homecache" overrides to mmap().
>
> > Since the file is exported to user space, the map_cache stuff probably
> > should not be here, but get moved to a different header that
> > is private to the kernel.
> >
>
> It's part of the optional extended API for mmap() used by Tilera Linux,
> so it is actually needed by userspace.

Ah, that's unfortunate. How bad would it be for you to come up
with a different ABI for the homecache version? I don't have all
the facts but my feeling is that the mmap API should not be
touched by this and that it better fits into an extension of the
numa syscalls, specifically the set_mempolicy/mbind/move_pages
family.

> > +++ b/arch/tile/include/asm/posix_types.h
> > Anything wrong with the asm-generic version of this file?
> >
>
> I somehow missed being aware of the generic version of this (and of
> sembuf.h and shmparam.h). It seems likely we can use the generic
> posix_types.h, and we can certainly use the generic forms of the others.

ok, good.

> >> --- /dev/null
> >> +++ b/arch/tile/include/asm/sigcontext.h
> >> +
> >> +#ifndef _ASM_TILE_SIGCONTEXT_H
> >> +#define _ASM_TILE_SIGCONTEXT_H
> >> +
> >> +/* NOTE: we can't include <linux/ptrace.h> due to #include dependencies. */
> >> +#include <asm/ptrace.h>
> >> +
> >> +/* Must track <sys/ucontext.h> */
> >> +
> >> +struct sigcontext {
> >> + struct pt_regs regs;
> >> +};
> >>
> > The comments both do not match the code apparently.
> >
>
> Sorry - can you clarify this comment? I don't see the mismatch.

Nevermind.

The first one I just misread. I only saw that the comment said 'cannot
include ptrace.h' but then includes it anyway.

For the second one, I assumed that sys/ucontext.h would include the
definition from asm/ucontext.h, which it does not.

> > +++ b/arch/tile/include/asm/stat.h
> > part of the ABI, please don't define your own.
> >
>
> Unfortunately, changing this would require us to make an incompatible
> change to current user-space. It may be possible anyway, since we are
> planning a number of transitions for our next major release (jump from
> kernel 2.6.26, switch from our current SGI-derived compiler to using
> gcc, etc.). I'll discuss this internally.

I believe that in the process of getting upstream, many things will end
up incompatible, so this is your only chance to ever fix the ABI.

> >> +/* Use this random value, just like most archs. Mysterious. */
> >> +#define CLOCK_TICK_RATE 1193180 /* Underlying HZ */
> >>
> > long story. It should however actually be something related to the
> > your frequency, not the time base of the i8253 chip that I hope
> > you are not using.
> >
>
> No, no i8253. But our clock tick rate is controllable dynamically at
> boot, so there's certainly no trivial constant that makes sense here.
> Should I use the slowest possible frequency here? The fastest? It's
> used in some irrelevant drivers, but also in <linux/jiffies.h>, which is
> the place that worries me.

The drivers all should not be using it, actually. The patch I did for
this apparently got lost somewhere, I'll need to dig it out again.

The calculation in linux/jiffies.h tries to figure out how wrong the
timer tick is because of the mismatch between 1193180 (or 1193182) HZ
and the desired 100/250/1000 HZ frequency, and correct that mismatch.

A reasonable value would be something that is a multiple of the possible
HZ values (100, 250, 1000) and a fraction of the possible hw timer
frequencies.

> > Your unistd.h file contains syscall numbers for many calls that
> > you should not need in a new architecture. Please move to the
> > asm-generic/unistd.h file instead. There may be a few things you
> > need to do in libc to get there, but this version is no good.
> > If you have problems with asm-generic/unistd.h (or any of the other
> > asm-generic files), feel free to ask me for help.
> >
>
> Sounds like we should take this one off-list until I know more precisely
> what you're worried about. As far as I know, I did not import any
> pointless syscalls. I have a stanza (which of course is unifdef'ed out
> of your version) that removes all the foo64() syscalls when used with
> 64-bit userspace. But I think all the rest are useful.
>
> As for <asm-generic/unistd.h>, I'll look more carefully at it, though of
> course using it is also dependent on whether it is reasonable for us to
> completely break compatibility with current user-space programs.

Any change in there would break the user ABI, obviously, though there
are two ways to do that though: You could either keep the existing
numbers so that applications using the limited set can still run on
old kernels or use the numbers from asm-generic/unistd.h, which pretty
much guarantees that every single binary application becomes incompatible.
Note that you also get silent breakage from any change in the ABI
headers (stat.h, types.h, ...), so a clear cut may end up being the
better option if you are already changing the ABI.

Note that the asm-generic version defines 244 numbers, while you have
a total of 313 numbers. You obviously need the extra arch specific
syscalls (e.g cmpxchg), so we need to reserve some space for those
in the generic header. All the other ones that are in your version but
not in the generic version are very likely not needed (unless I made
a mistake in the generic code).

Specifically:

- anything that needs a '__ARCH_WANT_SYS_*' definition is deprecated
and has been replaced by a new syscall. The exceptions are
RT_SIGACTION, RT_SIGSUSPEND, STAT64 and LLSEEK (the latter only
on 32 bit), these should be changed in some way to invert the
logic.

- You do not need both the 32 bit and 64 bit version of syscalls
taking an off_t/loff_t argument like fcntl. Just define one syscall number
and assign it to one or the other syscall so you always get a
64 bit argument (off_t on 64 bit, loff_t on 32 bit).

- some calls recently got a new version (pipe/pipe2, dup2/dup3). You
only need one in the kernel, while the older one can be implemented
in user space.

- many file based syscalls now have an 'at' version (openat, linkat, ...)
that takes an extra argument, similar to the previously mentioned ones,
you can implement the old behavior in user space.

> Arnd - MANY thanks for your careful review so far. I will implement
> what you suggested and await the remainder of your review before
> resubmitting patches.

You're welcome. I'll also try to have a look at the remaining files
in arch/tile/{lib,mm,kernel} next.

Arnd

Sam Ravnborg

unread,

May 24, 2010, 4:30:03 PM5/24/10

Hi Chris.

Kernle code looked good from a quick browsing.

Please explain the need for all the different directories within include/
{arch, hv, netio}

I tried not to repeat comments from Arnd in the below.

arch/tile/Kconfig:
The TILE specific symbols looks like they use several different
naming schemes. In some cases the company name (TILERA) is used
and in some cases TILE is used. And both as prefix and suffix.

Please stick to using TILE_ prefix. And maybe TILEGX_ in the
cases this is relevant.

Keep all the simple settings in the top of the file.
Stuff like:
config ZONE_DMA
def_bool y

config SEMAPHORE_SLEEPERS
def_bool y

Belongs in the top of Kconfig - before your first menu.

There is also several TILE specific options missing the TILE_ prefix.
Like:
config XGBE_MAIN

tristate "Tilera GBE/XGBE character device support"

Drop this:
config XGBE_MAIN

tristate "Tilera GBE/XGBE character device support"

It is better to test for the gcc version and disable the option
only in the cases where it is known to fail.

arch/tile/Makefile:

Do not mess with CC like this:
CC = $(CROSS_COMPILE)gcc

I guess you had to do this to support:
LIBGCC_PATH := `$(CC) -print-libgcc-file-name`

If you follow other archs you could do like this:
LIBGCC_PATH := `$(CC) -print-libgcc-file-name`

This is not needed:
CLEAN_FILES += arch/tile/vmlinux.lds

vmlinux.lds lives in kernel/

help is missing.

arch/tile/kernel/Makefile
I has expected that compiling vmlinux.lds required knowledge on $(BITS)
like this:
CPPFLAGS_vmlinux.lds := -m$(BITS)

arch/tile/kernel/vmlinux.lds.S
A lot of effort has been put into unifying the different
variants of vmlinux.lds.
Please see the skeleton outlined in include/asm-generic/vmlinux.lds.h

You include sections.lds - but it is empty.
Drop that file.

You include hvglue.ld.
We use *.lds for linker script file - please rename.
The file looks generated?? How and when?

Futhermore the definitions are not used by vmlinux.lds.S - so drop the include.

arch/tile/initramfs:
Does not look like it belongs in the kernel?

arch/tile/kernel/head_32.S
The file uses:
.section .text.head, "ax"
etc.

Please use the section definitions from include/linux/init.h

arch/tile/include/asm/spinlock.h
Please make this a one-liner when you uses the asm-generic version only.
Same goes for byteorder (which includes linux/byteorder/little_endian.h)

In your mail you did not say anything about the checkpatch status.
It is better that you make your code reasonable checkpatch clean _before_
merging. Then you will not be hit by a lot of janitorial patches doing so.

Likewise please state sparse status. We do not expect it to be sparse clean.
But getting rid of the obvious issues is good too.

Sam

Chris Metcalf

unread,

May 24, 2010, 5:30:03 PM5/24/10

On 5/24/2010 2:53 PM, Arnd Bergmann wrote:
> I would also like to wait for another opinion before it goes in.
> Note that the regular procedure is to have the code reviewed
> before the start of the merge window, not in the middle of it!
>

Ack! My mistake, sorry. I was under the impression that I should wait
for the churn on the list to die down a bit after the stable release (in
this case 2.6.34) before trying to send big batches of new code into LKML.

>>> Since the file is exported to user space, the map_cache stuff probably
>>> should not be here, but get moved to a different header that
>>> is private to the kernel.
>>>
>>>
>> It's part of the optional extended API for mmap() used by Tilera Linux,
>> so it is actually needed by userspace.
>>
> Ah, that's unfortunate. How bad would it be for you to come up
> with a different ABI for the homecache version? I don't have all
> the facts but my feeling is that the mmap API should not be
> touched by this and that it better fits into an extension of the
> numa syscalls, specifically the set_mempolicy/mbind/move_pages
> family.
>

Interesting idea. I'll consider how straightforward this would be to do.

>> As for <asm-generic/unistd.h>, I'll look more carefully at it, though of
>> course using it is also dependent on whether it is reasonable for us to
>> completely break compatibility with current user-space programs.
>>

I think the discussion internally supports breaking backwards
compatibility; this will after all be aligned with our 3.0 release
eventually, which is when we are also switching compilers to gcc. So
I'll see what is involved in the kernel and libc in switching to
<asm-generic/unistd.h> and get back to you with more detailed comments
if necessary.

Yes, although cmpxchg is actually a negative syscall value, which we use
to save every last cycle on that path -- it doesn't do any of the usual
syscall processing at all, just basically takes advantage of the kernel
lock infrastructure.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Chris Metcalf

unread,

May 24, 2010, 5:40:02 PM5/24/10

On 5/24/2010 4:22 PM, Sam Ravnborg wrote:
> Kernle code looked good from a quick browsing.
>

Glad to hear it, and thanks for taking the time to look it over.

> Please explain the need for all the different directories within include/
> {arch, hv, netio}
>

Those three directories are shared with other components of our system.
The "arch" headers are "core architecture" headers which can be used in
any build environment (Linux, hypervisor, user-code, booter, other
"supervisors" like VxWorks, etc.); they are partly small inline hacks to
use the hardware more easily, and partly just lists of name-to-number
mappings for special registers, etc. The "hv" headers are imported from
the hypervisor code; these headers are "owned" by our hypervisor, and
the ones shipped with Linux are the ones that have to do with how to run
a supervisor under our hypervisor. The "netio" headers are another type
of hypervisor header that have to do with interacting with the network
I/O silicon on the chip (the 10 Gbe and 10/100/100 Mb Ethernet).

> There is also several TILE specific options missing the TILE_ prefix.
> Like:
> config XGBE_MAIN
> tristate "Tilera GBE/XGBE character device support"
>
> Drop this:
> config XGBE_MAIN
> tristate "Tilera GBE/XGBE character device support"
>
> It is better to test for the gcc version and disable the option
> only in the cases where it is known to fail.
>

Is the "Drop this" comment a cut and paste bug? I'm guessing you were
referring to CONFIG_WERROR, which enables -Werror support. The problem
is that whether or not you can use -Werror really depends on not just
the kernel version and the gcc version, but very likely also what
drivers you have enabled. We always use it internally. I could also
just pull this out completely (and just force it into "make" externally
within our external build process), or move it to a "generic" configure
option. In any case we can't just automate it, unfortunately.

> Do not mess with CC like this:
> CC = $(CROSS_COMPILE)gcc
>
> I guess you had to do this to support:
> LIBGCC_PATH := `$(CC) -print-libgcc-file-name`
>
> If you follow other archs you could do like this:
> LIBGCC_PATH := `$(CC) -print-libgcc-file-name`
>

I'm guessing you meant like what h8300 does, "$(shell
$(CROSS-COMPILE)$(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name)". That
seems reasonable.

> arch/tile/kernel/Makefile
> I has expected that compiling vmlinux.lds required knowledge on $(BITS)
> like this:
> CPPFLAGS_vmlinux.lds := -m$(BITS)
>

Our 32-bit chips only do 32-bit. In the 64-bit mode we always build the
kernel implicitly -m64, which is the compiler default.

> arch/tile/kernel/vmlinux.lds.S
> A lot of effort has been put into unifying the different
> variants of vmlinux.lds.
> Please see the skeleton outlined in include/asm-generic/vmlinux.lds.h
>

Yes, I've tried to track this somewhat over kernel releases, but I'll go
back and re-examine it with fresh eyes.

> You include hvglue.ld.
> We use *.lds for linker script file - please rename.
> The file looks generated?? How and when?
>

It's sort of a semi-generated file. We have a test in our regressions
that just tests that this file matches the API for our hypervisor, which
is just calls to physical address =32KB plus 64 bytes per syscall
number. These defined addresses are then used for calls to e.g.
hv_flush_asid() or whatever. The hypervisor API changes occasionally,
at which point we update this file. You don't see it used in
vmlinux.lds since it's just used as plain C calls through the arch/tile/
code.

> arch/tile/initramfs:
> Does not look like it belongs in the kernel?
>

Fair enough. We ship it with the kernel to make it easy for our users
to bootstrap up into a plausible initramfs filesystem, but it's strictly
speaking not part of the kernel, so I'll remove it.

> arch/tile/include/asm/spinlock.h
> Please make this a one-liner when you uses the asm-generic version only.
> Same goes for byteorder (which includes linux/byteorder/little_endian.h)
>

I'm not sure what you mean when you say to use the asm-generic version
of spinlock.h, since it's not SMP-ready. Also, I don't see an
asm-generic/byteorder.h, so I'm puzzled there too.

> In your mail you did not say anything about the checkpatch status.
> It is better that you make your code reasonable checkpatch clean _before_
> merging. Then you will not be hit by a lot of janitorial patches doing so.
>

I ran checkpatch over everything I submitted. There are many
complaints, to be sure, but I did a first pass cleaning up everything
that was plausible, so for example all the style issues were fixed, but
things like some uses of volatile, some uses of init_MUTEX, etc., were
not modified. However, I think it's in decent shape from a checkpatch
point of view.

> Likewise please state sparse status. We do not expect it to be sparse clean.
> But getting rid of the obvious issues is good too.
>

I have not run sparse over it. I will do so.

Thanks for your review! Getting this much feedback from LKML is great
-- I really appreciate it.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Sam Ravnborg

unread,

May 25, 2010, 1:10:02 AM5/25/10

> > There is also several TILE specific options missing the TILE_ prefix.
> > Like:
> > config XGBE_MAIN
> > tristate "Tilera GBE/XGBE character device support"
> >
> > Drop this:
> > config XGBE_MAIN
> > tristate "Tilera GBE/XGBE character device support"
> >
> > It is better to test for the gcc version and disable the option
> > only in the cases where it is known to fail.
> >
>
> Is the "Drop this" comment a cut and paste bug?

Yep - sorry.

> I'm guessing you were
> referring to CONFIG_WERROR, which enables -Werror support. The problem
> is that whether or not you can use -Werror really depends on not just
> the kernel version and the gcc version, but very likely also what
> drivers you have enabled. We always use it internally. I could also
> just pull this out completely (and just force it into "make" externally
> within our external build process), or move it to a "generic" configure
> option. In any case we can't just automate it, unfortunately.

As Arnd pointed out the drivers does not belong in the
arch/tile/* hirachy.
And we have some architectures that always uses -Werror unconditionally.
So for the arch part this way to deal with it should be safe.
And the more we can cover under -Werror the better.

I dunno how you best deal with the drivers.

>
> > Do not mess with CC like this:
> > CC = $(CROSS_COMPILE)gcc
> >
> > I guess you had to do this to support:
> > LIBGCC_PATH := `$(CC) -print-libgcc-file-name`
> >
> > If you follow other archs you could do like this:
> > LIBGCC_PATH := `$(CC) -print-libgcc-file-name`
> >
>
> I'm guessing you meant like what h8300 does, "$(shell
> $(CROSS-COMPILE)$(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name)". That
> seems reasonable.

Correct - you are good at guessing :-)

> > arch/tile/include/asm/spinlock.h
> > Please make this a one-liner when you uses the asm-generic version only.
> > Same goes for byteorder (which includes linux/byteorder/little_endian.h)
> >
>
> I'm not sure what you mean when you say to use the asm-generic version
> of spinlock.h, since it's not SMP-ready. Also, I don't see an
> asm-generic/byteorder.h, so I'm puzzled there too.

What I wanted to say was that a header file that simply include
another header file then drop all the boilerplate stuff and
let the header file be a single line.
Both spinlock.h and byteorder.h matches this.

The other 15+ header files that simply include another
heder file you follow this style. So this is a small matter
of consistency.

>
> > In your mail you did not say anything about the checkpatch status.
> > It is better that you make your code reasonable checkpatch clean _before_
> > merging. Then you will not be hit by a lot of janitorial patches doing so.
> >
>
> I ran checkpatch over everything I submitted. There are many
> complaints, to be sure, but I did a first pass cleaning up everything
> that was plausible, so for example all the style issues were fixed, but
> things like some uses of volatile, some uses of init_MUTEX, etc., were
> not modified. However, I think it's in decent shape from a checkpatch
> point of view.

Good. Please include this information in you next submission.

Sam

Chris Metcalf

unread,

May 25, 2010, 10:00:03 AM5/25/10

On 5/24/2010 2:53 PM, Arnd Bergmann wrote:

> Note that the asm-generic version defines 244 numbers, while you have
> a total of 313 numbers. You obviously need the extra arch specific
> syscalls (e.g cmpxchg), so we need to reserve some space for those
> in the generic header. All the other ones that are in your version but
> not in the generic version are very likely not needed (unless I made
> a mistake in the generic code).
>

I looked at the diff of the set of syscalls you provide and the ones
we've been using.

Specific questions:

- How do you propose representing the architecture-specific syscalls?
We have three "very special" syscalls that are negative numbers, which I
won't worry about, since they'll be out of the normal numbering
sequence. But we also have a few others (cmpxchg_baddr, raise_fpe,
flush_cache) that we'll need a numbering location for. I see that you
already have an empty block from 244 (today) to 1023; perhaps
architectures should just use 1023 on down? I'll do this for now.

- You renamed __NR__llseek to __NR_llseek, which of course seems pretty
reasonable, but libc expects to see the former (both glibc and uclibc).
Is it worth requiring non-standard libc code? I may just add
__NR__llseek as an alias in my unistd.h for now.

- Are you planning to keep all the ifdef'ed syscalls going forward?
Because honestly, I'd rather just enable __ARCH_WANT_SYSCALL_NO_AT,
etc., and use the kernel implementations, since otherwise I'll have to
go into both uclibc and glibc and add a bunch of extra Tilera-specific
code and then try to push that up to their community, when really I just
want to have the Tilera architecture userspace support be as generic as
possible.

The result seems positive overall; I'm certainly happy to dump, e.g.,
"nice" and "stime", since they have obvious userspace wrappers (and in
fact libc is already geared up to use them if available). And a few
other syscalls in the Tile list aren't even implemented but were just
brought over from x86 "in case", like afs_syscall, putpmsg, and getpmsg,
so I'm happy to abandon them as well. And "sysfs" is commented out of
uclibc, and not present in glibc, so no big loss there. Other than that
I think the set of supported syscalls will only change by a couple --
and more importantly, from my point of view, Tilera gets to stay
automatically synced to any new syscalls added to Linux going forward.
So this is good.

I assume that folks are committing to not changing any of the existing
numbers, ifdefs, etc. in asm-generic/unistd.h; if we're the only
architecture using it, no one might notice until we did. :-)

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Arnd Bergmann

unread,

May 25, 2010, 11:10:02 AM5/25/10

On Tuesday 25 May 2010, Chris Metcalf wrote:
> I looked at the diff of the set of syscalls you provide and the ones
> we've been using.
>
> Specific questions:
>
> - How do you propose representing the architecture-specific syscalls?
> We have three "very special" syscalls that are negative numbers, which I
> won't worry about, since they'll be out of the normal numbering
> sequence. But we also have a few others (cmpxchg_baddr, raise_fpe,
> flush_cache) that we'll need a numbering location for. I see that you
> already have an empty block from 244 (today) to 1023; perhaps
> architectures should just use 1023 on down? I'll do this for now.

I would keep allocating from the bottom. For now, maybe we should just
reserve 16 arch specific syscall numbers starting at 244, and add

#define __NR_tile_cmpxchg_baddr (__NR_arch_specific_syscall + 0)
#define __NR_tile_raise_fpe (__NR_arch_specific_syscall + 1)
#define __NR_tile_flush_cache (__NR_arch_specific_syscall + 2)

to your own unistd.h.

> - You renamed __NR__llseek to __NR_llseek, which of course seems pretty
> reasonable, but libc expects to see the former (both glibc and uclibc).
> Is it worth requiring non-standard libc code? I may just add
> __NR__llseek as an alias in my unistd.h for now.

That was probably just a mistake on my side. The only other
architecture using the generic version so far is score, so
maybe Chen Liqin can comment on how he dealt with this and
if he depends on the definition now.

> - Are you planning to keep all the ifdef'ed syscalls going forward?
> Because honestly, I'd rather just enable __ARCH_WANT_SYSCALL_NO_AT,
> etc., and use the kernel implementations, since otherwise I'll have to
> go into both uclibc and glibc and add a bunch of extra Tilera-specific
> code and then try to push that up to their community, when really I just
> want to have the Tilera architecture userspace support be as generic as
> possible.

The idea was to only have them around as a transitional helper for
new architectures while getting merged, but nothing should ever
use these in production.

While glibc and uclibc are currently still lacking support for these,
the intention was for both to provide the wrappers in the architecture
independent code like they already do for a lot of other system calls.
Maybe Ulrich can comment on how we would get there, in particular if
we would want to add those helpers to glibc himself or if he would prefer
you to send a patch to do that.

There really should be no code required in glibc to deal with the
generic ABI, other than the parts that deal with the specific register
layout and calling conventions. We're not there yet, but my hope
is that tile is the last architecture that needs to worry about this
and once you get it working with common code, future architectures
just work.

> The result seems positive overall; I'm certainly happy to dump, e.g.,
> "nice" and "stime", since they have obvious userspace wrappers (and in
> fact libc is already geared up to use them if available). And a few
> other syscalls in the Tile list aren't even implemented but were just
> brought over from x86 "in case", like afs_syscall, putpmsg, and getpmsg,
> so I'm happy to abandon them as well. And "sysfs" is commented out of
> uclibc, and not present in glibc, so no big loss there. Other than that
> I think the set of supported syscalls will only change by a couple --
> and more importantly, from my point of view, Tilera gets to stay
> automatically synced to any new syscalls added to Linux going forward.
> So this is good.

ok.

> I assume that folks are committing to not changing any of the existing
> numbers, ifdefs, etc. in asm-generic/unistd.h; if we're the only
> architecture using it, no one might notice until we did. :-)

There is also score using it, but yes, we try very hard not to break
the ABI and any patch modifying these files normally gets posted to
the linux-arch and/or linux-api mailing lists that you should probably
subscribe to as well.

Arnd

Chris Metcalf

unread,

May 25, 2010, 11:20:01 AM5/25/10

On 5/25/2010 11:03 AM, Arnd Bergmann wrote:
> I would keep allocating from the bottom. For now, maybe we should just
> reserve 16 arch specific syscall numbers starting at 244, and add
>
> #define __NR_tile_cmpxchg_baddr (__NR_arch_specific_syscall + 0)
> #define __NR_tile_raise_fpe (__NR_arch_specific_syscall + 1)
> #define __NR_tile_flush_cache (__NR_arch_specific_syscall + 2)
>
> to your own unistd.h.
>

OK.

> The idea was to only have them around as a transitional helper for
> new architectures while getting merged, but nothing should ever
> use these in production.
>

Perhaps the best strategy for Tile for now is to enable the transitional
helpers, and then when glibc no longer requires any of those syscalls,
we can remove them from the kernel. If this happens in the relatively
short term (e.g. before our 3.0 release later this year) all the better,
but for now we can separate this into a first change that preserves most
of the compatibility syscalls, and work towards remove them in a later
release.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Arnd Bergmann

unread,

May 25, 2010, 11:40:01 AM5/25/10

On Tuesday 25 May 2010, Chris Metcalf wrote:

> > The idea was to only have them around as a transitional helper for
> > new architectures while getting merged, but nothing should ever
> > use these in production.
> >
>
> Perhaps the best strategy for Tile for now is to enable the transitional
> helpers, and then when glibc no longer requires any of those syscalls,
> we can remove them from the kernel. If this happens in the relatively
> short term (e.g. before our 3.0 release later this year) all the better,
> but for now we can separate this into a first change that preserves most
> of the compatibility syscalls, and work towards remove them in a later
> release.

I don't like the idea of adding syscalls first and then disabling them
again. We tried that on score and now we're stuck with the wrong syscall
table there because they never got removed.

Instead, I'd suggest you do the minimal syscall table for upstream and
just carry a private patch to enable the other syscalls until you get
a working glibc/eglibc/uclibc with the official kernel.

Arnd

Thomas Gleixner

unread,

May 25, 2010, 4:20:02 PM5/25/10

Chris,

On Thu, 20 May 2010, Chris Metcalf wrote:

> We are using the http://www.tilera.com/scm/ web site to push
> Tilera-modified sources back up to the community. At the moment, the
> arch/tile hierarchy is there (as a bzipped tarball) as well as a copy
> of the patch appended to this email. In addition, our gcc, binutils,

it would be very helpful for review if you could split your patches
into different topics and send a patch series. Though I grabbed the
all in one patch and looked at irq.c and time.c. Comments inlined
below.

--- /dev/null
+++ b/arch/tile/kernel/irq.c

+struct tile_irq_desc {
+ void (*handler)(void *);
+ void *dev_id;
+};
+
+struct tile_irq_desc tile_irq_desc[NR_IRQS] __cacheline_aligned;
+
+/**
+ * tile_request_irq() - Allocate an interrupt handling instance.
+ * @handler: the device driver interrupt handler to be called.
+ * @dev_id: a cookie passed back to the handler function.
+ * @index: index into the interrupt handler table to set. It's
+ * derived from the interrupt bit mask allocated by the HV.
+ *
+ * Each device should call this function to register its interrupt
+ * handler. dev_id must be globally unique. Normally the address of the
+ * device data structure is used as the cookie.

Why are you implementing your private interrupt handling
infrastructure ? What's wrong with the generic interrupt handling
code ? Why is each device driver forced to call tile_request_irq()
which makes it incompatible to the rest of the kernel and therefor
unshareable ?

+ */
+void tile_request_irq(void (*handler)(void *), void *dev_id, int index)
+{
+ struct tile_irq_desc *irq_desc;
+
+ BUG_ON(!handler);
+ BUG_ON(index < 0 || index >= NR_IRQS);
+
+ irq_desc = tile_irq_desc + index;
+ irq_desc->handler = handler;
+ irq_desc->dev_id = dev_id;
+}
+EXPORT_SYMBOL(tile_request_irq);
+
+void tile_free_irq(int index)
+{
+ struct tile_irq_desc *irq_desc;
+
+ BUG_ON(index < 0 || index >= NR_IRQS);
+
+ irq_desc = tile_irq_desc + index;
+ irq_desc->handler = NULL;
+ irq_desc->dev_id = NULL;
+}
+EXPORT_SYMBOL(tile_free_irq);

That code lacks any kind of protection and serialization.

+ for (count = 0; pending_dev_intr_mask; ++count) {
+ if (pending_dev_intr_mask & 0x1) {
+ struct tile_irq_desc *desc = &tile_irq_desc[count];
+ if (desc->handler == NULL) {
+ printk(KERN_ERR "Ignoring hv dev interrupt %d;"
+ " handler not registered!\n", count);
+ } else {
+ desc->handler(desc->dev_id);

You check desc->handler, but you happily call the handler while
dev_id might be still NULL. See above.

+/*
+From struct irq_chip (same as hv_interrupt_type):
+ const char name;
+ unsigned int startup - has default, calls enable
+ void shutdown - has default, calls disable
+ void enable - has default, calls unmask
+ void disable - has default, calls mask
+ void ack - required
+ void mask - required
+ void mask_ack - optional - calls mask,ack
+ void unmask - required - optional for some?
+ void eoi - required for for fasteoi, percpu
+ void end - not used
+ void set_affinity
+ int retrigger - optional
+ int set_type - optional
+ int set_wake - optional
+ void release - optional
+*/

Please do not replicate the comments from include/linux/irq.h as
they are subject to change.

+/*
+ * Generic, controller-independent functions:
+ */
+
+int show_interrupts(struct seq_file *p, void *v)
+{
+ int i = *(loff_t *) v, j;
+ struct irqaction *action;
+ unsigned long flags;
+
+ if (i == 0) {
+ seq_printf(p, " ");
+ for (j = 0; j < NR_CPUS; j++)
+ if (cpu_online(j))
+ seq_printf(p, "CPU%-8d", j);
+ seq_putc(p, '\n');
+ }
+
+ if (i < NR_IRQS) {
+ raw_spin_lock_irqsave(&irq_desc[i].lock, flags);
+ action = irq_desc[i].action;
+ if (!action)
+ goto skip;
+ seq_printf(p, "%3d: ", i);
+#ifndef CONFIG_SMP
+ seq_printf(p, "%10u ", kstat_irqs(i));
+#else
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", kstat_irqs_cpu(i, j));
+#endif
+ seq_printf(p, " %14s", irq_desc[i].chip->typename);
+ seq_printf(p, " %s", action->name);
+
+ for (action = action->next; action; action = action->next)
+ seq_printf(p, ", %s", action->name);
+
+ seq_putc(p, '\n');
+skip:
+ raw_spin_unlock_irqrestore(&irq_desc[i].lock, flags);
+ }
+ return 0;

So that prints which interrupts ? Now you refer to the generic code,
while above you require that tile specific one. -ENOSENSE.

+}
+/*
+ * This is used with the handle_level_irq handler for legacy
+ * interrupts.
+ *
+ * These functions can probably be reused with edge sensitive
+ * interrupts.
+ */
+static struct irq_chip chip_irq_legacy = {
+ .typename = "TILE-LEGACY",
+ .mask_ack = chip_mask_ack_level,
+ .disable = chip_disable_interrupt,
+ .eoi = NULL,

No need for NULL initialization

+ .unmask = chip_unmask_level,
+};
+
+static struct irq_chip chip_irq_edge = {
+ .typename = "TILE-EDGE",
+ .mask = chip_mask_edge,
+ .eoi = NULL,

Ditto

+ .ack = chip_ack_edge,
+ .unmask = chip_unmask_edge,
+};
+
+/*
+ * Handler for PCI IRQs. This acts as a shim between the IRQ
+ * framework at the top of this file and the conventional linux framework.
+ * Invoked from tile_dev_intr() as a handler, with interrupts disabled.

Why do you need this shim layer at all ?

+ */
+static void tile_irq_shim(void *dev)
+{
+ int hv_irq = (int)(unsigned long)dev;
+
+
+
+ generic_handle_irq(hv_irq);
+}

--- /dev/null
+++ b/arch/tile/kernel/time.c

+/* How many cycles per second we are running at. */
+static cycles_t cycles_per_sec __write_once;
+static u32 cyc2ns_mult __write_once;
+#define cyc2ns_shift 30

Please do not use fixed shift values. Use the generic functions to
calculate the optimal shift/mult pairs instead.

+cycles_t get_clock_rate() { return cycles_per_sec; }

Eek. Please use proper coding style.

+
+/*
+ * Called very early from setup_arch() to set cycles_per_sec.
+ * Also called, if required, by sched_clock(), which can be even
+ * earlier if built with CONFIG_LOCKDEP (during lockdep_init).
+ * We initialize it early so we can use it to set up loops_per_jiffy.
+ */
+void setup_clock(void)
+{
+ u64 mult;
+
+ if (cyc2ns_mult)
+ return;
+ cycles_per_sec = hv_sysconf(HV_SYSCONF_CPU_SPEED);
+
+ /*
+ * Compute cyc2ns_mult, as used in sched_clock().
+ * For efficiency of multiplication we want this to be a
+ * 32-bit value, so we validate that here. We want as large a
+ * shift value as possible for precision, but too large a
+ * shift would make cyc2ns_mult more than 32 bits. We pick a
+ * constant value that works well with our typical
+ * frequencies, though we could in principle compute the most
+ * precise value dynamically instead. We can't make the shift
+ * greater than 32 without fixing the algorithm.
+ */
+ mult = (1000000000ULL << cyc2ns_shift) / cycles_per_sec;
+ cyc2ns_mult = (u32) mult;
+ BUILD_BUG_ON(cyc2ns_shift > 32);
+ BUG_ON(mult != cyc2ns_mult);

See above.

+}
+
+#if CHIP_HAS_SPLIT_CYCLE()

That should be a CONFIG_TILE_HAS_SPLIT_CYCLE and not a function like
macro define somewhere in a header file.

+cycles_t get_cycles()
+{
+ return get_cycle_count();
+}
+#endif
+
+cycles_t clocksource_get_cycles(struct clocksource *cs)
+{
+ return get_cycles();
+}
+
+static struct clocksource cycle_counter_clocksource = {
+ .name = "cycle counter",
+ .rating = 300,
+ .read = clocksource_get_cycles,
+ .mask = CLOCKSOURCE_MASK(64),
+ .flags = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+/* Called fairly late in init/main.c, but before we go smp. */
+void __init time_init(void)
+{
+ struct clocksource *src = &cycle_counter_clocksource;
+
+ /* Pick an arbitrary time to start us up. */
+ xtime.tv_sec = mktime(1970, 1, 1, 0, 0, 0);
+ xtime.tv_nsec = 0;

Please do not touch xtime. The core code sets it to 1970 already.

+ /* Initialize and register the clock source. */
+ src->shift = 20; /* arbitrary */
+ src->mult = (1000000000ULL << src->shift) / cycles_per_sec;

See above.

+ clocksource_register(src);
+
+ /* Start up the tile-timer interrupt source on the boot cpu. */
+ setup_tile_timer();
+}
+
+
+/*
+ * Provide support for effectively turning the timer interrupt on and
+ * off via the interrupt mask. Make sure not to unmask it while we are
+ * running the timer interrupt handler, to avoid recursive timer
+ * interrupts; these may be OK in some cases, but it's generally cleaner
+ * to reset the kernel stack before starting the next timer interrupt.

Which would already be guaranteed by the generic interrupt code ....
The clockevent callbacks are already called with interrupts
disabled, so why all this magic ?

+ */
+
+/* Track some status about the timer interrupt. */
+struct timer_status {
+ int enabled; /* currently meant to be enabled? */
+ int in_intr; /* currently in the interrupt handler? */
+};
+static DEFINE_PER_CPU(struct timer_status, timer_status);
+
+/* Enable the timer interrupt, unless we're in the handler. */
+static void enable_timer_intr(void)
+{
+ struct timer_status *status = &__get_cpu_var(timer_status);
+ status->enabled = 1;
+ if (status->in_intr)
+ return;
+ raw_local_irq_unmask_now(INT_TILE_TIMER);
+}
+
+/* Disable the timer interrupt. */
+static void disable_timer_intr(void)
+{
+ struct timer_status *status = &__get_cpu_var(timer_status);
+ status->enabled = 0;
+ raw_local_irq_mask_now(INT_TILE_TIMER);
+}
+
+/* Mark the start of processing for the timer interrupt. */
+static void start_timer_intr(void)
+{
+ struct timer_status *status = &__get_cpu_var(timer_status);
+ status->in_intr = 1;
+ disable_timer_intr();
+}
+
+/* Mark end of processing for the timer interrupt, unmasking if necessary. */
+static void end_timer_intr(void)
+{
+ struct timer_status *status = &__get_cpu_var(timer_status);
+ status->in_intr = 0;
+ if (status->enabled)
+ enable_timer_intr();
+}
+
+
+/*
+ * Define the tile timer clock event device. The timer is driven by
+ * the TILE_TIMER_CONTROL register, which consists of a 31-bit down
+ * counter, plus bit 31, which signifies that the counter has wrapped
+ * from zero to (2**31) - 1. The INT_TILE_TIMER interrupt will be
+ * raised as long as bit 31 is set.
+ */
+
+#define MAX_TICK 0x7fffffff /* we have 31 bits of countdown timer */
+
+static int tile_timer_set_next_event(unsigned long ticks,
+ struct clock_event_device *evt)
+{
+ BUG_ON(ticks > MAX_TICK);
+ __insn_mtspr(SPR_TILE_TIMER_CONTROL, ticks);
+ enable_timer_intr();
+ return 0;
+}
+
+/*
+ * Whenever anyone tries to change modes, we just mask interrupts
+ * and wait for the next event to get set.
+ */
+static void tile_timer_set_mode(enum clock_event_mode mode,
+ struct clock_event_device *evt)
+{
+ disable_timer_intr();
+}
+
+static DEFINE_PER_CPU(struct clock_event_device, tile_timer) = {
+ .name = "tile timer",
+ .features = CLOCK_EVT_FEAT_ONESHOT,
+ .min_delta_ns = 1000, /* at least 1000 cycles to fire the interrupt */

That's not cycles. That's nano seconds ! And please avoid tail comments.

+ .rating = 100,
+ .irq = -1,
+ .set_next_event = tile_timer_set_next_event,
+ .set_mode = tile_timer_set_mode,
+};
+
+void __cpuinit setup_tile_timer(void)
+{
+ struct clock_event_device *evt = &__get_cpu_var(tile_timer);
+
+ /* Fill in fields that are speed-specific. */
+ evt->shift = 20; /* arbitrary */
+ evt->mult = (cycles_per_sec << evt->shift) / 1000000000ULL;

See above.

+ evt->max_delta_ns = (MAX_TICK * 1000000000ULL) / cycles_per_sec;

There is a generic function for this as well. Please use it.

+ /* Mark as being for this cpu only. */
+ evt->cpumask = cpumask_of(smp_processor_id());
+
+ /* Start out with timer not firing. */
+ disable_timer_intr();
+
+ /* Register tile timer. */
+ clockevents_register_device(evt);
+}
+
+/* Called from the interrupt vector. */
+void do_timer_interrupt(struct pt_regs *regs, int fault_num)
+{
+ struct pt_regs *old_regs = set_irq_regs(regs);
+ struct clock_event_device *evt = &__get_cpu_var(tile_timer);
+
+ /* Mask timer interrupts in case someone enable interrupts later. */
+ start_timer_intr();

Nothing enables interrupts in the timer interrupt handler code path.

+ /* Track time spent here in an interrupt context */
+ irq_enter();
+
+ /* Track interrupt count. */
+ __get_cpu_var(irq_stat).irq_timer_count++;
+
+ /* Call the generic timer handler */
+ evt->event_handler(evt);
+
+ /*
+ * Track time spent against the current process again and
+ * process any softirqs if they are waiting.
+ */
+ irq_exit();
+
+ /*
+ * Enable the timer interrupt (if requested) with irqs disabled,
+ * so we don't get recursive timer interrupts.
+ */
+ local_irq_disable();

The code above does _NOT_ reenable interrupts. And if it would, then
you would break irq_exit() assumptions as well.

+ end_timer_intr();
+
+ set_irq_regs(old_regs);
+}
+
+/*
+ * Scheduler clock - returns current time in nanosec units.
+ *
+ * The normal algorithm computes (cycles * cyc2ns_mult) >> cyc2ns_shift.
+ * We can make it potentially more efficient and with a better range
+ * by writing "cycles" as two 32-bit components, "(H << 32) + L" and
+ * then factoring. Here we use M = cyc2ns_mult and S = cyc2ns_shift.
+ *
+ * (((H << 32) + L) * M) >> S =
+ * (((H << 32) * M) >> S) + ((L * M) >> S) =
+ * ((H * M) << (32 - S)) + ((L * M) >> S)
+ */
+unsigned long long sched_clock(void)
+{
+ u64 cycles;
+ u32 cyc_hi, cyc_lo;
+
+ if (unlikely(cyc2ns_mult == 0))
+ setup_clock();

Please initialize stuff _before_ it is called the first time and not
at some arbitrary point conditionally in a hotpath.

+
+ cycles = get_cycles();
+ cyc_hi = (u32) (cycles >> 32);
+ cyc_lo = (u32) (cycles);
+
+ /* Compiler could optimize the 32x32 -> 64 multiplies here. */
+ return ((cyc_hi * (u64)cyc2ns_mult) << (32 - cyc2ns_shift)) +
+ ((cyc_lo * (u64)cyc2ns_mult) >> cyc2ns_shift);
+}
+
+int setup_profiling_timer(unsigned int multiplier)
+{
+ return -EINVAL;
+}

Thanks,

tglx

Arnd Bergmann

unread,

May 25, 2010, 5:50:02 PM5/25/10

Here comes the rest of my review, covering the arch/tile/kernel/ directory.
There isn't much to comment on in arch/tile/mm and arch/tile/lib from my
side, and I still ignored the drivers and oprofile directories.

> diff --git a/arch/tile/kernel/backtrace.c b/arch/tile/kernel/backtrace.c
> new file mode 100644
> index 0000000..3cbb21a
> --- /dev/null
> +++ b/arch/tile/kernel/backtrace.c
> +#ifndef __KERNEL__
> +#include <stdlib.h>
> +#include <stdbool.h>
> +#include <string.h>
> +#else
> +#include <linux/kernel.h>
> +#include <linux/string.h>
> +#define abort() BUG()
> +#endif

Besides being shared kernel/user code (as you already mentioned), this
file looks rather complicated compared to what the other architectures
do.

Is this really necessary because of some property of the architecture
or do you implement other functionality that is not present on existing
archs?

> diff --git a/arch/tile/kernel/compat.c b/arch/tile/kernel/compat.c
> new file mode 100644
> index 0000000..ca6421c
> --- /dev/null
> +++ b/arch/tile/kernel/compat.c
> +/*
> + * Syscalls that take 64-bit numbers traditionally take them in 32-bit
> + * "high" and "low" value parts on 32-bit architectures.
> + * In principle, one could imagine passing some register arguments as
> + * fully 64-bit on TILE-Gx in 32-bit mode, but it seems easier to
> + * adapt the usual convention.
> + */

Yes, that makes sense. You definitely want binary compatibility between
32 bit binaries from a native 32 bit system on TILE-Gx in the syscall
interface.

> +long compat_sys_truncate64(char __user *filename, u32 dummy, u32 low, u32 high)
> +{
> + return sys_truncate(filename, ((loff_t)high << 32) | low);
> +}
> +
> +long compat_sys_ftruncate64(unsigned int fd, u32 dummy, u32 low, u32 high)
> +{
> + return sys_ftruncate(fd, ((loff_t)high << 32) | low);
> +}
> +
> +long compat_sys_pread64(unsigned int fd, char __user *ubuf, size_t count,
> + u32 dummy, u32 low, u32 high)
> +{
> + return sys_pread64(fd, ubuf, count, ((loff_t)high << 32) | low);
> +}
> +
> +long compat_sys_pwrite64(unsigned int fd, char __user *ubuf, size_t count,
> + u32 dummy, u32 low, u32 high)
> +{
> + return sys_pwrite64(fd, ubuf, count, ((loff_t)high << 32) | low);
> +}
> +
> +long compat_sys_lookup_dcookie(u32 low, u32 high, char __user *buf, size_t len)
> +{
> + return sys_lookup_dcookie(((loff_t)high << 32) | low, buf, len);
> +}
> +
> +long compat_sys_sync_file_range2(int fd, unsigned int flags,
> + u32 offset_lo, u32 offset_hi,
> + u32 nbytes_lo, u32 nbytes_hi)
> +{
> + return sys_sync_file_range(fd, ((loff_t)offset_hi << 32) | offset_lo,
> + ((loff_t)nbytes_hi << 32) | nbytes_lo,
> + flags);
> +}
> +
> +long compat_sys_fallocate(int fd, int mode,
> + u32 offset_lo, u32 offset_hi,
> + u32 len_lo, u32 len_hi)
> +{
> + return sys_fallocate(fd, mode, ((loff_t)offset_hi << 32) | offset_lo,
> + ((loff_t)len_hi << 32) | len_lo);
> +}

It may be time to finally provide generic versions of these...
Any work in that direction would be appreciated, but you may also
just keep this code, it's good.

> +/*
> + * The 32-bit runtime uses layouts for "struct stat" and "struct stat64"
> + * that match the TILEPro/TILE64 runtime. Unfortunately the "stat64"
> + * layout on existing 32 bit architectures doesn't quite match the
> + * "normal" 64-bit bit layout, so we have to convert for that too.
> + * Worse, it has an unaligned "st_blocks", so we have to use __copy_to_user().
> + */
> +
> +int cp_compat_stat(struct kstat *kbuf, struct compat_stat __user *ubuf)
> +{
> + compat_ino_t ino;
> +
> + if (!old_valid_dev(kbuf->dev) || !old_valid_dev(kbuf->rdev))
> + return -EOVERFLOW;
> + if (kbuf->size >= 0x7fffffff)
> + return -EOVERFLOW;
> + ino = kbuf->ino;
> + if (sizeof(ino) < sizeof(kbuf->ino) && ino != kbuf->ino)
> + return -EOVERFLOW;
> + if (!access_ok(VERIFY_WRITE, ubuf, sizeof(struct compat_stat)) ||
> + __put_user(old_encode_dev(kbuf->dev), &ubuf->st_dev) ||
> ...

With the asm-generic/stat.h definitions, this is no longer necessary.
Those are defined to be compatible, so you can just call the 64 bit
version of sys_stat in place of the 32 bit sys_stat64.

> +long compat_sys_sched_rr_get_interval(compat_pid_t pid,
> + struct compat_timespec __user *interval)
> +{
> + struct timespec t;
> + int ret;
> + mm_segment_t old_fs = get_fs();
> +
> + set_fs(KERNEL_DS);
> + ret = sys_sched_rr_get_interval(pid, (struct timespec __user *)&t);
> + set_fs(old_fs);
> + if (put_compat_timespec(&t, interval))
> + return -EFAULT;
> + return ret;
> +}

This is relatively ugly and probably identical to the other six copies
of the same function. Someone (not necessarily you) should do this
the right way.

> +
> +ssize_t compat_sys_sendfile(int out_fd, int in_fd, compat_off_t __user *offset,
> + size_t count)
> +{
> + mm_segment_t old_fs = get_fs();
> + int ret;
> + off_t of;
> +
> + if (offset && get_user(of, offset))
> + return -EFAULT;
> +
> + set_fs(KERNEL_DS);
> + ret = sys_sendfile(out_fd, in_fd, offset ? (off_t __user *)&of : NULL,
> + count);
> + set_fs(old_fs);
> +
> + if (offset && put_user(of, offset))
> + return -EFAULT;
> + return ret;
> +}

compat_sys_sendfile will not be needed with the asm-generic/unistd.h definitions,
but I think you will still need a compat_sys_sendfile64, to which the same
applies as to compat_sys_sched_rr_get_interval.

> +/*
> + * The usual compat_sys_msgsnd() and _msgrcv() seem to be assuming
> + * some different calling convention than our normal 32-bit tile code.
> + */

Fascinating, the existing functions are useless, because no architecture
is actually able to call them directly from their sys_call_table.
We should replace those with your version and change the other architectures
accordingly.

> diff --git a/arch/tile/kernel/compat_signal.c b/arch/tile/kernel/compat_signal.c
> new file mode 100644
> index 0000000..e21554e
> --- /dev/null
> +++ b/arch/tile/kernel/compat_signal.c
> +
> +struct compat_sigaction {
> + compat_uptr_t sa_handler;
> + compat_ulong_t sa_flags;
> + compat_uptr_t sa_restorer;
> + sigset_t sa_mask; /* mask last for extensibility */
> +};
> +
> +struct compat_sigaltstack {
> + compat_uptr_t ss_sp;
> + int ss_flags;
> + compat_size_t ss_size;
> +};
> +
> +struct compat_ucontext {
> + compat_ulong_t uc_flags;
> + compat_uptr_t uc_link;
> + struct compat_sigaltstack uc_stack;
> + struct sigcontext uc_mcontext;
> + sigset_t uc_sigmask; /* mask last for extensibility */
> +};

It's been some time since I looked at this stuff, so I'd need help
from others to review it. I sense that it should be simpler though.

> +/*
> + * Interface to /proc and the VFS.
> + */
> +
> +static int hardwall_ioctl(struct inode *inode, struct file *file,
> + unsigned int a, unsigned long b)
> +{
> + struct hardwall_rectangle rect;
> + struct khardwall_rectangle *krect = file_to_rect(file);
> + int sig;
> +
> + switch (a) {
> + case HARDWALL_CREATE:
> + if (udn_disabled)
> + return -ENOSYS;
> + if (copy_from_user(&rect, (const void __user *) b,
> + sizeof(rect)) != 0)
> + return -EFAULT;
> + if (krect != NULL)
> + return -EALREADY;
> + krect = hardwall_create(&rect);
> + if (IS_ERR(krect))
> + return PTR_ERR(krect);
> + _file_to_rect(file) = krect;
> + return 0;
> +
> + case HARDWALL_ACTIVATE:
> + return hardwall_activate(krect);
> +
> + case HARDWALL_DEACTIVATE:
> + if (current->thread.hardwall != krect)
> + return -EINVAL;
> + return hardwall_deactivate(current);
> +
> + case HARDWALL_SIGNAL:
> + if (krect == NULL)
> + return -EINVAL;
> + sig = krect->abort_signal;
> + if (b >= 0)
> + krect->abort_signal = b;
> + return sig;
> +
> + default:
> + return -EINVAL;
> + }
> +}

The hardwall stuff looks like it is quite central to your architecture.
Could you elaborate on what it does?

If it is as essential as it looks, I'd vote for promoting the interface
from an ioctl based one to four real system calls (more if necessary).

> +/* Dump a line of data for the seq_file API to print the hardwalls */
> +static int hardwall_show(struct seq_file *m, void *v)
> +{
> + struct khardwall_rectangle *kr;
> + struct hardwall_rectangle *r;
> + struct task_struct *p;

> + unsigned long flags;
> +

> + if (udn_disabled) {
> + if (ptr_to_index(v) == 0)
> + seq_printf(m, "%dx%d 0,0 pids: 0@0,0\n",
> + smp_width, smp_height);
> + return 0;
> + }
> + spin_lock_irqsave(&hardwall_lock, flags);
> + kr = _nth_rectangle(ptr_to_index(v));
> + if (kr == NULL) {
> + spin_unlock_irqrestore(&hardwall_lock, flags);
> + return 0;
> + }
> + r = &kr->rect;
> + seq_printf(m, "%dx%d %d,%d pids:",
> + r->width, r->height, r->ulhc_x, r->ulhc_y);
> + for_each_hardwall_task(p, &kr->task_head) {
> + unsigned int cpu = cpumask_first(&p->cpus_allowed);
> + unsigned int x = cpu % smp_width;
> + unsigned int y = cpu / smp_width;
> + seq_printf(m, " %d@%d,%d", p->pid, x, y);
> + }
> + seq_printf(m, "\n");
> + spin_unlock_irqrestore(&hardwall_lock, flags);
> + return 0;
> +}

Note that the procfs file format is part of your ABI, and this looks
relatively hard to parse, which may introduce bugs.
For per-process information, it would be better to have a simpler
file in each /proc/<pid>/directory. Would that work for you?

> +static int hardwall_open(struct inode *inode, struct file *file)
> +{
> + /*
> + * The standard "proc_reg_file_ops", which we get from
> + * create_proc_entry(), does not include a "flush" op.
> + * We add it here so that we can deactivate on close.
> + * Since proc_reg_file_ops, and all its function pointers,
> + * are static in fs/proc/inode.c, we just copy them blindly.
> + */
> + static struct file_operations override_ops;
> + if (override_ops.open == NULL) {
> + override_ops = *file->f_op;
> + BUG_ON(override_ops.open == NULL);
> + BUG_ON(override_ops.flush != NULL);
> + override_ops.flush = hardwall_flush;
> + } else {
> + BUG_ON(override_ops.open != file->f_op->open);
> + }
> + file->f_op = &override_ops;
> +
> + return seq_open(file, &hardwall_op);
> +}

As you are probably aware of, this is really ugly. Hopefully it
won't be necessary if you can move to actual syscalls.

> +/* Referenced from proc_tile_init() */
> +static const struct file_operations proc_tile_hardwall_fops = {
> + .open = hardwall_open,
> + .ioctl = hardwall_ioctl,
> +#ifdef CONFIG_COMPAT
> + .compat_ioctl = hardwall_compat_ioctl,
> +#endif
> + .flush = hardwall_flush,
> + .release = hardwall_release,
> + .read = seq_read,
> + .llseek = seq_lseek,
> +};

Note that we're about to remove the .ioctl file operation and
replace it with .unlocked_ioctl everywhere. Also, as I mentioned
in the first review round, ioctl on procfs is something you should
never do.

> diff --git a/arch/tile/kernel/hugevmap.c b/arch/tile/kernel/hugevmap.c
> new file mode 100644
> index 0000000..c408666
> --- /dev/null
> +++ b/arch/tile/kernel/hugevmap.c

Not used anywhere apparently. Can you explain what this is good for?
Maybe leave it out for now, until you merge the code that needs it.
I don't see anything obviously wrong with the implementation though.

> diff --git a/arch/tile/kernel/hv_drivers.c b/arch/tile/kernel/hv_drivers.c
> new file mode 100644
> index 0000000..5e69973
> --- /dev/null
> +++ b/arch/tile/kernel/hv_drivers.c

Please have a look at drivers/char/hvc_{rtas,beat,vio,iseries}.c
to see how we do the same for other hypervisors, in a much simpler
way.

> +/*
> + * Interrupt dispatcher, invoked upon a hypervisor device interrupt downcall
> + */
> +void tile_dev_intr(struct pt_regs *regs, int intnum)
> +{
> + int count;
> +
> + /*
> + * Get the device interrupt pending mask from where the hypervisor
> + * has tucked it away for us.
> + */
> + unsigned long pending_dev_intr_mask = __insn_mfspr(SPR_SYSTEM_SAVE_1_3);
> +
> +
> + /* Track time spent here in an interrupt context. */

> + struct pt_regs *old_regs = set_irq_regs(regs);

> + irq_enter();
> +

> + for (count = 0; pending_dev_intr_mask; ++count) {
> + if (pending_dev_intr_mask & 0x1) {
> + struct tile_irq_desc *desc = &tile_irq_desc[count];
> + if (desc->handler == NULL) {
> + printk(KERN_ERR "Ignoring hv dev interrupt %d;"
> + " handler not registered!\n", count);
> + } else {
> + desc->handler(desc->dev_id);

> + }
> +
> + /* Count device irqs; IPIs are counted elsewhere. */
> + if (count > HV_MAX_IPI_INTERRUPT)
> + __get_cpu_var(irq_stat).irq_dev_intr_count++;
> + }
> + pending_dev_intr_mask >>= 1;
> + }

Why the extra indirection for regular interrupts instead of always calling
generic_handle_irq?

> diff --git a/arch/tile/kernel/memprof.c b/arch/tile/kernel/memprof.c
> new file mode 100644
> index 0000000..9424cc5
> --- /dev/null
> +++ b/arch/tile/kernel/memprof.c

I suppose this could get dropped in favor of perf events?

> +/*
> + * These came from asm-tile/io.h, they made the compiler assert when
> + * they were inlined there, but I shouldn't be worried about the
> + * overhead of the function call if they're just calling panic.
> + */
> +
> +u32 inb(u32 addr)
> +{
> + panic("inb not implemented");
> +}
> +EXPORT_SYMBOL(inb);

If you just remove these definitions, you get a link error for any
driver that tries to use these, which is probably more helpful than
the panic.

OTOH, are you sure that you can't just map the PIO calls to mmio functions
like readb plus some fixed offset? On most non-x86 architectures, the PIO
area of the PCI bus is just mapped to a memory range somewhere.

> +/****************************************************************
> + *
> + * Backward compatible /proc/pci interface.
> + * This is borrowed from 2.6.9, it was dropped by 2.6.18.
> + *
> + * It's good for debugging, in the absence of lspci, but is not
> + * needed for anything to work.
> + *
> + ****************************************************************/

Does this do anything that you can't do with lspci/setpci?
I'd suggest just dropping this again.

> +/*
> + * Support /proc/PID/pgtable
> + */

Do you have applications relying on this? While I can see
how this may be useful, I don't think we should have a
generic interface like this in architecture specific
code.

It also may be used as an attack vector for malicious applications
that have a way of accessing parts of physical memory.

I think it would be better to drop this interface for now.

> +/* Simple /proc/tile files. */
> +SIMPLE_PROC_ENTRY(grid, "%u\t%u\n", smp_width, smp_height)
> +
> +/* More complex /proc/tile files. */
> +static void proc_tile_seq_strconf(struct seq_file *sf, char* what,
> + uint32_t query)

All of these look like they should be files in various places in
sysfs, e.g. in /sys/devices/system/cpu or /sys/firmware/.
Procfs is not necessarily evil, but most of your uses are for
stuff that actually first very well into what we have in sysfs.

> +SEQ_PROC_ENTRY(memory)
> +static int proc_tile_memory_show(struct seq_file *sf, void *v)
> +{
> + int node;
> + int ctrl;
> + HV_Coord coord = { 0, 0 };
> + /*
> + * We make two passes here; one through our memnodes to display
> + * which controllers they correspond with, and one through all
> + * controllers to get their speeds. We may not actually have
> + * access to all of the controllers whose speeds we retrieve, but
> + * we get them because they're useful for mcstat, which provides
> + * stats for physical controllers whether we're using them or not.
> + */
> + for (node = 0; node < MAX_NUMNODES; node++) {
> + ctrl = node_controller[node];
> + if (ctrl >= 0)
> + seq_printf(sf, "controller_%d_node: %d\n", ctrl, node);
> + }
> + /*
> + * Note that we use MAX_NUMNODES as the limit for the controller
> + * loop because we don't have anything better.
> + */
> + for (ctrl = 0; ctrl < MAX_NUMNODES; ctrl++) {
> + HV_MemoryControllerInfo info =
> + hv_inquire_memory_controller(coord, ctrl);
> + if (info.speed)
> + seq_printf(sf, "controller_%d_speed: %llu\n",
> + ctrl, info.speed);
> + }
> + return 0;
> +}

This one should probably be split up into files under /sys/devices/system/node/nodeX/

> +#ifdef CONFIG_DATAPLANE
> +SEQ_PROC_ENTRY(dataplane)
> +static int proc_tile_dataplane_show(struct seq_file *sf, void *v)
> +{
> + int cpu;
> + int space = 0;
> + for_each_cpu(cpu, &dataplane_map) {
> + if (space)
> + seq_printf(sf, " ");
> + else
> + space = 1;
> + seq_printf(sf, "%d", cpu);
> + }
> + if (space)
> + seq_printf(sf, "\n");
> + return 0;
> +}
> +#else
> +#define proc_tile_dataplane_init() do {} while (0)
> +#endif

Not sure where in sysfs this would fit best, but I think the format
should match that of the other cpu bitmaps in /sys/devices/system/node.

> +SEQ_PROC_ENTRY(interrupts)
> +static int proc_tile_interrupts_show(struct seq_file *sf, void *v)
> +{
> + int i;
> +
> + seq_printf(sf, "%-8s%8s%8s%8s%8s%8s%8s%8s\n", "",
> + "timer", "syscall", "resched", "hvflush", "SMPcall",
> + "hvmsg", "devintr");
> +
> + for_each_online_cpu(i) {
> + irq_cpustat_t *irq = &per_cpu(irq_stat, i);
> + seq_printf(sf, "%-8d%8d%8d%8d%8d%8d%8d%8d\n", i,
> + irq->irq_timer_count,
> + irq->irq_syscall_count,
> + irq->irq_resched_count,
> + irq->irq_hv_flush_count,
> + irq->irq_call_count,
> + irq->irq_hv_msg_count,
> + irq->irq_dev_intr_count);
> + }
> + return 0;
> +}

Can you merge this with /proc/interrupts?

> +#ifdef CONFIG_FEEDBACK_COLLECT
> +
> +extern void *__feedback_edges_ptr;
> +extern long __feedback_edges_size;
> +extern void flush_my_deferred_graph(void *dummy);
> +
> +ssize_t feedback_read(struct file *file, char __user *buf, size_t size,
> + loff_t *ppos)

This probably belongs into debugfs, similar to what we do
for gcov.

How much of the feedbackl stuff is generic? It might be good
to put those bits in a common place like kernel/feedback.c
so that other architectures can implement this as well.

> +/*
> + * Support /proc/sys/tile directory
> + */
> +
> +
> +static ctl_table unaligned_table[] = {
> + {
> + .procname = "enabled",
> + .data = &unaligned_fixup,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec
> + },
> + {
> + .procname = "printk",
> + .data = &unaligned_printk,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec
> + },
> + {
> + .procname = "count",
> + .data = &unaligned_fixup_count,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec
> + },
> + {}
> +};
> +
> +
> +static ctl_table tile_root[] = {
> +
> + {
> + .procname = "unaligned_fixup",
> + .mode = 0555,
> + unaligned_table
> + },

Hmm, similar to what sh64 does, yet different.
Not much of a problem though.

> + {
> + .procname = "crashinfo",
> + .data = &show_crashinfo,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec
> + },
> + {}
> +};

How is this different from the existing
exception-trace/userprocess_debug sysctl?
If it is very similar, let's not introduce yet another
name for it but just use the common userprocess_debug.

> +
> +#if CHIP_HAS_CBOX_HOME_MAP()
> +static ctl_table hash_default_table[] = {
> + {
> + .procname = "hash_default",
> + .data = &hash_default,
> + .maxlen = sizeof(int),
> + .mode = 0444,
> + .proc_handler = &proc_dointvec
> + },
> + {}
> +};
> +#endif

This seems to be read-only and coming from a kernel command
line option, so I guess looking at /proc/cmdline would
be a reasonable alternative.

> +long arch_ptrace(struct task_struct *child, long request, long addr, long data)
> +{
> + unsigned long __user *datap;
> + unsigned long tmp;
> + int i;
> + long ret = -EIO;
> +
> +#ifdef CONFIG_COMPAT
> + if (task_thread_info(current)->status & TS_COMPAT)
> + data = (u32)data;
> + if (task_thread_info(child)->status & TS_COMPAT)
> + addr = (u32)addr;
> +#endif
> + datap = (unsigned long __user *)data;
> +
> + switch (request) {
> +
> + case PTRACE_PEEKUSR: /* Read register from pt_regs. */
> + case PTRACE_POKEUSR: /* Write register in pt_regs. */
> + case PTRACE_GETREGS: /* Get all registers from the child. */
> + case PTRACE_SETREGS: /* Set all registers in the child. */
> + case PTRACE_GETFPREGS: /* Get the child FPU state. */
> + case PTRACE_SETFPREGS: /* Set the child FPU state. */

I believe the new way to do this is to implement
CONFIG_HAVE_ARCH_TRACEHOOK and get all these for free.

> + case PTRACE_SETOPTIONS:
> + /* Support TILE-specific ptrace options. */
> + child->ptrace &= ~PT_TRACE_MASK_TILE;
> + tmp = data & PTRACE_O_MASK_TILE;
> + data &= ~PTRACE_O_MASK_TILE;
> + ret = ptrace_request(child, request, addr, data);
> + if (tmp & PTRACE_O_TRACEMIGRATE)
> + child->ptrace |= PT_TRACE_MIGRATE;
> + break;

It may be better to add this to the common code, possibly
in an #ifdef CONFIG_ARCH_TILE, to make sure we never
get conflicting numbers for future PTRACE_O_* values.

> +SYSCALL_DEFINE3(raise_fpe, int, code, unsigned long, addr,
> + struct pt_regs *, regs)

Does this need to be a system call? I thought we already had
other architectures without floating point exceptions in hardware
that don't need this.

> diff --git a/arch/tile/kernel/stack.c b/arch/tile/kernel/stack.c
> new file mode 100644
> index 0000000..3190bc1
> --- /dev/null
> +++ b/arch/tile/kernel/stack.c
> +/* Callback for backtracer; basically a glorified memcpy */
> +static bool read_memory_func(void *result, VirtualAddress address,
> + unsigned int size, void *vkbt)
> +{
> + int retval;
> + struct KBacktraceIterator *kbt = (struct KBacktraceIterator *)vkbt;
> + if (in_kernel_text(address)) {
> + /* OK to read kernel code. */
> + } else if (address >= PAGE_OFFSET) {
> + /* We only tolerate kernel-space reads of this task's stack */
> + if (!in_kernel_stack(kbt, address))
> + return 0;
> + } else if (kbt->pgtable == NULL) {
> + return 0; /* can't read user space in other tasks */
> + } else if (!valid_address(kbt, address)) {
> + return 0; /* invalid user-space address */
> + }
> + pagefault_disable();
> + retval = __copy_from_user_inatomic(result, (const void *)address,
> + size);
> + pagefault_enable();
> + return (retval == 0);
> +}

more backtrace code in stack.c, same comment as above.

> diff --git a/arch/tile/kernel/sys.c b/arch/tile/kernel/sys.c
> new file mode 100644
> index 0000000..97fde79
> --- /dev/null
> +++ b/arch/tile/kernel/sys.c
> +/*
> + * Syscalls that pass 64-bit values on 32-bit systems normally
> + * pass them as (low,high) word packed into the immediately adjacent
> + * registers. If the low word naturally falls on an even register,
> + * our ABI makes it work correctly; if not, we adjust it here.
> + * Handling it here means we don't have to fix uclibc AND glibc AND
> + * any other standard libcs we want to support.
> + */
> +
> +
> +
> +ssize_t sys32_readahead(int fd, u32 offset_lo, u32 offset_hi, u32 count)
> +{
> + return sys_readahead(fd, ((loff_t)offset_hi << 32) | offset_lo, count);
> +}
> +
> +long sys32_fadvise64(int fd, u32 offset_lo, u32 offset_hi,
> + u32 len, int advice)
> +{
> + return sys_fadvise64_64(fd, ((loff_t)offset_hi << 32) | offset_lo,
> + len, advice);
> +}
> +
> +int sys32_fadvise64_64(int fd, u32 offset_lo, u32 offset_hi,
> + u32 len_lo, u32 len_hi, int advice)
> +{
> + return sys_fadvise64_64(fd, ((loff_t)offset_hi << 32) | offset_lo,
> + ((loff_t)len_hi << 32) | len_lo, advice);
> +}

These seem to belong with the other similar functions in compat.c

> +
> +
> +
> +/*
> + * This API uses a 4KB-page-count offset into the file descriptor.
> + * It is likely not the right API to use on a 64-bit platform.
> + */
> +SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
> + unsigned long, prot, unsigned long, flags,
> + unsigned int, fd, unsigned long, off_4k)
> +{
> +#define PAGE_ADJUST (PAGE_SHIFT - 12)
> + if (off_4k & ((1 << PAGE_ADJUST) - 1))
> + return -EINVAL;
> + return sys_mmap_pgoff(addr, len, prot, flags, fd,
> + off_4k >> PAGE_ADJUST);
> +}
> +
> +/*
> + * This API uses a byte offset into the file descriptor.
> + * It is likely not the right API to use on a 32-bit platform.
> + */
> +SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
> + unsigned long, prot, unsigned long, flags,
> + unsigned int, fd, unsigned long, offset)
> +{
> + if (offset & ((1 << PAGE_SHIFT) - 1))
> + return -EINVAL;
> + return sys_mmap_pgoff(addr, len, prot, flags, fd,
> + offset >> PAGE_SHIFT);
> +}

Just use the sys_mmap_pgoff system call directly, rather than
defining your own wrappers. Since that syscall is newer than
asm-generic/unistd.h, that file might need some changes,
together with fixes to arch/score to make sure we don't break
its ABI.

> diff --git a/arch/tile/kernel/syscall_table.S b/arch/tile/kernel/syscall_table.S
> new file mode 100644
> index 0000000..7fcd160
> --- /dev/null
> +++ b/arch/tile/kernel/syscall_table.S

This file should be replaced with the C variant, as
arch/score/kernel/sys_call_table.c does.

> diff --git a/arch/tile/kernel/tile-desc_32.c b/arch/tile/kernel/tile-desc_32.c
> new file mode 100644
> index 0000000..771eab1
> --- /dev/null
> +++ b/arch/tile/kernel/tile-desc_32.c
> @@ -0,0 +1,13865 @@
> +/* Define to include "bfd.h" and get actual BFD relocations below. */
> +/* #define WANT_BFD_RELOCS */
> +
> +#ifdef WANT_BFD_RELOCS
> +#include "bfd.h"
> +#define MAYBE_BFD_RELOC(X) (X)
> +#else
> +#define MAYBE_BFD_RELOC(X) -1
> +#endif
> +
> +/* Special registers. */
> +#define TREG_LR 55
> +#define TREG_SN 56
> +#define TREG_ZERO 63
> +
> +#if defined(__KERNEL__) || defined(_LIBC)
> +// FIXME: Rename this.
> +#include <asm/opcode-tile.h>
> +#else
> +#include "tile-desc.h"
> +#endif

It seems that this file fits in the same category as the
backtrace code. Maybe move both away from arch/tile/kernel into a
different directory?

> diff --git a/arch/tile/lib/checksum.c b/arch/tile/lib/checksum.c
> new file mode 100644
> index 0000000..a909a35
> --- /dev/null
> +++ b/arch/tile/lib/checksum.c

Have you tried to use the generic lib/checksum.c implementation?

Chris Metcalf

unread,

May 25, 2010, 10:00:02 PM5/25/10

Thomas, thanks for your feedback. If I don't comment on something you
said it's because you were obviously right and I applied a suitable fix. :-)

On 5/25/2010 4:12 PM, Thomas Gleixner wrote:
> +/**
> + * tile_request_irq() - Allocate an interrupt handling instance.

> [...]

>
> Why are you implementing your private interrupt handling
> infrastructure ? What's wrong with the generic interrupt handling
> code ? Why is each device driver forced to call tile_request_irq()
> which makes it incompatible to the rest of the kernel and therefor
> unshareable ?
>

Our interrupt management code has evolved as we have developed this
code, so I won't present arguments as to why it's perfect the way it is,
but just why it IS the way it is. :-)

The tile irq.c does not replace the generic Linux IRQ management code,
but instead provides a very limited set of virtual interrupts that are
only used by our para-virtualized device drivers, and delivered to Linux
via a single hypervisor downcall that atomically sets "virtual
interrupt" bits in a bitmask. The PCI root complex driver reserves four
of these bits (i.e. irqs) to map real PCI interrupts; they are then fed
forward into the regular Linux IRQ system to manage all "standard"
devices. The other tile-specific para-virtualized drivers that use this
interface are the PCI endpoint code, xgbe network driver, ATA-over-GPIO
driver, and the IPI layer. None of these para-virtualized drivers are
actually shareable with other Linux architectures in any case.

We have an outstanding enhancement request in our bug tracking system to
switch to using the Linux generic IRQs directly, and plan to implement
it prior to our next major release. But we haven't done it yet, and
this code, though somewhat crufty, is reasonably stable. I'm also not
the primary maintainer of this particular piece of code, so I'd rather
wait until that person frees up and have him do it, instead of trying to
hack it myself.

In any case, I'll add commentary material (probably just an edited
version of the explanatory paragraph above) into irq.c so at least it's
clear what's going on.

> +void tile_free_irq(int index)
> +[...]

>
> That code lacks any kind of protection and serialization.
>

Interesting point. As it happens, these calls are all made during boot,
so they are serialized that way. But in principle we could use the xgbe
driver as a module, at least, so you're right; I'll add a spinlock.

> [...]

>
> You check desc->handler, but you happily call the handler while
> dev_id might be still NULL. See above.
>

Assuming we spinlock the irq request/free routines, I think this is
safe, since the unlock memory fence will guarantee visibility of the
fields prior to any attempt to use them. We always allocate the
interrupt, then tell the hypervisor to start delivering them; on device
unload we tell the hypervisor to stop delivering interrupts, then free
it. The "tell the hypervisor" steps use on_each_cpu() and wait, so are
fully synchronous.

> +/*
> + * Generic, controller-independent functions:
> + */
> +
> +int show_interrupts(struct seq_file *p, void *v)

> +[...]

>
> So that prints which interrupts ? Now you refer to the generic code,
> while above you require that tile specific one. -ENOSENSE.
>

Yes, this is confusing. This routine is required by procfs, and it
shows just the PCI interrupts, not the tile irqs. I'll add a comment,
and try to segregate the file into "generic irqs" and "tile irqs" more
obviously, for now. The routine itself will be more sensible once we
integrate our tile_irqs into the generic system.

> +/* How many cycles per second we are running at. */
> +static cycles_t cycles_per_sec __write_once;
> +static u32 cyc2ns_mult __write_once;
> +#define cyc2ns_shift 30
>
> Please do not use fixed shift values. Use the generic functions to
> calculate the optimal shift/mult pairs instead.
>

Thanks; I wasn't aware of these. I'll switch the code over to use them,
and the other helper functions you pointed out.

> +#if CHIP_HAS_SPLIT_CYCLE()
>
> That should be a CONFIG_TILE_HAS_SPLIT_CYCLE and not a function like
> macro define somewhere in a header file.
>

This is not a configurable option. The <arch/chip.h> header (which is
not a Linux header per-se, but one of our "core architecture" headers
that can be used in any programming context) provides a set of
CHIP_xxx() macros. We use a functional macro style because we saw too
many instances of "#ifdef CHIP_FOO_MISSPELLED" where the misspelling
wasn't noticed until much later.

> +/*
> + * Provide support for effectively turning the timer interrupt on and
> + * off via the interrupt mask. Make sure not to unmask it while we are
> + * running the timer interrupt handler, to avoid recursive timer
> + * interrupts; these may be OK in some cases, but it's generally cleaner
> + * to reset the kernel stack before starting the next timer interrupt.
>
> Which would already be guaranteed by the generic interrupt code ....
> The clockevent callbacks are already called with interrupts
> disabled, so why all this magic ?
>

The code was written so that it would be robust in the face of the timer
interrupt-path code potentially enabling interrupts, since I couldn't
convince myself it didn't. I'll rip out all that code and replace it
with a couple of BUG() checks instead. Thanks, that's a nice cleanup.

And thanks again for the feedback. It's very helpful.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

liqin...@sunplusct.com

unread,

May 25, 2010, 11:30:01 PM5/25/10

Arnd Bergmann <ar...@arndb.de> 写于 2010-05-25 23:03:11:

>
> > - You renamed __NR__llseek to __NR_llseek, which of course seems
pretty
> > reasonable, but libc expects to see the former (both glibc and
uclibc).
> > Is it worth requiring non-standard libc code? I may just add
> > __NR__llseek as an alias in my unistd.h for now.
>
> That was probably just a mistake on my side. The only other
> architecture using the generic version so far is score, so
> maybe Chen Liqin can comment on how he dealt with this and
> if he depends on the definition now.
>

When we port glibc to score arch, we fixed the llseek.c file,
remove the underscore before llseek. maybe it has compatible problems,
but it work ok with score application.

34c34
< return (loff_t) (INLINE_SYSCALL (_llseek, 5, fd, (off_t) (offset >>
32),
---
> return (loff_t) (INLINE_SYSCALL (llseek, 5, fd, (off_t) (offset >>
32),

--
liqin

Paul Mundt

unread,

May 26, 2010, 1:10:02 AM5/26/10

On Mon, May 24, 2010 at 11:29:18AM -0400, Chris Metcalf wrote:
> On 5/23/2010 6:08 PM, Arnd Bergmann wrote:
> > The notable exception is pci, which should go to arch/tile/pci
> > but still be reviewed in the pci mailing list.
>
> So this is an interesting question. Currently the "device driver"
> support in the arch/tile/drivers directory is for devices which exist
> literally only as part of the Tilera silicon, i.e. they are not
> separable from the tile architecture itself. For example, the network
> driver is tied to the Tilera networking shim DMA engine on the chip.
> Does it really make sense to move this to a directory where it is more
> visible to other architectures? I can see that it might from the point
> of view of code bombings done to network drivers, for example.
> Similarly for our other drivers, which are tied to details of the
> hypervisor API, etc.

It also depends what precisely your goal with arch/tile/drivers is. In
the sh case I started out with an arch/sh/pci and then migrated to an
arch/sh/drivers/ model when we started having to support various bus
operations similar to PCI. Anything common or shared on the other hand
gets pushed in to drivers/sh/ directly.

These days there is also a drivers/platform/<arch> abstraction which
you could easily use for platform-specific drivers that aren't things
like CPU/board-specific bus operations/fixups.

> >> --- /dev/null
> >> +++ b/arch/tile/include/asm/addrspace.h
> >>
> > This file is not referenced anywhere. I'd suggest removing it
> > until you send code that actually uses it.
> >
>
> OK, I've removed it. I assumed that it was required by architectures,
> since it is used in various places in the kernel. I see four drivers
> that just include it unconditionally at the moment, though curiously,
> they don't seem to use any of the symbols it defines. And there are
> four architectures (avr32, m32r, mips, sh) that all provide this header
> at the moment, though there doesn't seem to be agreement as to what
> symbols it should define.
>

To give a bit of background on this..

All of these platforms provide this header for legacy reasons, and it's
not a road you want to go down. Its primary purpose was to provide
definitions for memory segments, and specifically wrappers for flipping
between them. For platforms that have 1:1 cached/uncached mappings for
lowmem in different segments, old drivers used to commonly toggle the
high bits of an address to determine whether access was cached or not.
These days any driver that has knowledge of memory segmentation is almost
certainly doing something wrong.

If you need to support this sort of thing, then you ideally want to hide
the segmentation information in your ioremap() implementation (you can
look at what arch/sh/include/asm/io.h does for its ioremap cases, and we
have a wide variety of corner cases outside of legacy segmentation).

These platforms have also traditionally had these segments bypass the MMU
completely, so while you don't take page faults for lowmem, you can't
reuse parts of the address space in untranslatable holes. Some
architectures (like sh) have dropped the segmentation entirely for
certain MMU modes which permits for things like setting up an uncached
mapping for kernel text without enabling drivers to game the system
without proper remapping.

Chris Metcalf

unread,

May 26, 2010, 9:50:02 AM5/26/10

On 5/25/2010 10:44 PM, liqin...@sunplusct.com wrote:
> Arnd Bergmann <ar...@arndb.de> at 2010-05-25 23:03:11

>>> - You renamed __NR__llseek to __NR_llseek, which of course seems pretty
>>>
>>> reasonable, but libc expects to see the former (both glibc and uclibc).
>>>
>>> Is it worth requiring non-standard libc code? I may just add
>>> __NR__llseek as an alias in my unistd.h for now.
>>>
>> That was probably just a mistake on my side. The only other
>> architecture using the generic version so far is score, so
>> maybe Chen Liqin can comment on how he dealt with this and
>> if he depends on the definition now.
>>
> When we port glibc to score arch, we fixed the llseek.c file,
> remove the underscore before llseek. maybe it has compatible problems,
> but it work ok with score application.
>

This sounds like the right solution for the generic code too, but
presumably it would need some kind of "#if !defined(__NR_llseek) &&
defined(__NR__llseek)" hackery in the llseek.c common code in glibc.
Ulrich, does that seem like the right direction for you?

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Chris Metcalf

unread,

May 26, 2010, 12:30:03 PM5/26/10

On 5/25/2010 4:12 PM, Thomas Gleixner wrote:

> +unsigned long long sched_clock(void)
> +{
> + u64 cycles;
> + u32 cyc_hi, cyc_lo;
> +
> + if (unlikely(cyc2ns_mult == 0))
> + setup_clock();
>
> Please initialize stuff _before_ it is called the first time and not
> at some arbitrary point conditionally in a hotpath.
>

Looking more closely at this, the reason for this lazy initialization
was that sched_clock() can be called from lockdep_init(), which runs way
before any tasteful architecture-specific initialization can happen.
Perhaps the correct model is that during the early stages of boot, we
are happy to shift by zero, multiply by zero, and claim the time is zero :-)

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Arnd Bergmann

unread,

May 26, 2010, 1:20:01 PM5/26/10

On Wednesday 26 May 2010 18:22:33 Chris Metcalf wrote:
> On 5/25/2010 4:12 PM, Thomas Gleixner wrote:
> > +unsigned long long sched_clock(void)
> > +{
> > + u64 cycles;
> > + u32 cyc_hi, cyc_lo;
> > +
> > + if (unlikely(cyc2ns_mult == 0))
> > + setup_clock();
> >

> > Please initialize stuff before it is called the first time and not

> > at some arbitrary point conditionally in a hotpath.
> >
>
> Looking more closely at this, the reason for this lazy initialization
> was that sched_clock() can be called from lockdep_init(), which runs way
> before any tasteful architecture-specific initialization can happen.
> Perhaps the correct model is that during the early stages of boot, we
> are happy to shift by zero, multiply by zero, and claim the time is zero :-)

Yes, that is what other architectures do. The time remains zero during
early boot. setup_arch is probably a good place to start the clock.

Arnd

Chris Metcalf

unread,

May 26, 2010, 7:10:01 PM5/26/10

On 5/25/2010 11:21 AM, Arnd Bergmann wrote (in private email):
> I just realized that the the sys_call_table is not yet
> being generated automatically. The code is only present
> in arch/score/kernel/sys_call_table.c.
>
> To do this correctly, you should take that file and
> put it into kernel/sys_call_table.c, configured with
> CONFIG_GENERIC_SYSCALLTABLE, which you then enable
> in your arch/tile/Kconfig.
> The unistd.h is also missing the compat syscall table
> entries. It would be good to extend the macros to cover
> that as well, similar to how it's done in
> arch/powerpc/include/asm/systbl.h.
>

The hard part to applying this approach turned out to be the COMPAT code
for our 64-bit platform. The approach I am using now is to extend
<linux/compat.h> with all the compat syscalls that are not currently
prototyped, and then to include a a set of #defines that allow all the
compat syscalls to be invoked as "compat_sys_foo()", e.g.

+/* Standard Linux functions that don't have "compat" versions. */
+#define compat_sys_accept sys_accept
+#define compat_sys_accept4 sys_accept4
+#define compat_sys_access sys_access
+#define compat_sys_acct sys_acct
[...]
+#define compat_sys_uselib sys_uselib
+#define compat_sys_vfork sys_vfork
+#define compat_sys_vhangup sys_vhangup
+#define compat_sys_write sys_write

With that in place, you can then use the "arch/score" mechanism to
generate not just the main syscall table, but the compat table as well,
by doing something like this:

+#undef __SYSCALL
+#define __SYSCALL(nr, call) [nr] = (compat_##call),
+
+void *compat_sys_call_table[__NR_syscalls] = {
+ [0 ... __NR_syscalls-1] = sys_ni_syscall,
+#include <asm/unistd.h>
+};

To make this really work out, I also had to add a __SYSCALL_COMPAT
notion to <asm-generic/unistd.h>; when this is set, the __NR_xxx values
and the __SYSCALL stuff are set up as if for a 32-bit platform, even if
the real platform is 64-bit, so that the header can be used to create
the compat_sys_call_table[] properly.

I fixed a few other minor glitches too, like the fact that we need
sys_fadvise64_64 to be the "primary" syscall even in the 64-bit case
(not sys_fadvise64), and adding an __ARCH_WANT_SYNC_FILE_RANGE2 flag to
the generic ABI so platforms can request the use of that flavor of the
ABI instead. (It makes a difference on our platform.) And I took
Arnd's suggestion and added 16 architecture-specific syscalls from 244
to 259.

Note that it turns out not to be quite right to make the
sys_call_table.c a generic file, at least in our case, since you really
want to allow tweaking the actual syscall functions as part of
generating the sys_call_table[] array. For example, on our 32-bit
platforms some of the 64-bit syscalls need wrappers since otherwise
there is a mismatch between the generic code in libc that splits 64-bit
values into 32-bit registers, and the actual registers pairs used by our
ABI for native 64-bit values. In any case it's only half a dozen lines
of common code. And in compat mode there are additional overrides you
want, such as using sys_newstat() for compat_sys_stat64(), if your
architecture will tolerate it, etc.

I'll send a complete patch later once I've finished digesting all the
various suggestions folks have sent, but this was a big enough piece
that I thought I'd at least summarize the design back to LKML in case
people would care to comment.

Chris Metcalf

unread,

May 26, 2010, 9:00:01 PM5/26/10

On 5/25/2010 5:45 PM, Arnd Bergmann wrote:
> Here comes the rest of my review, covering the arch/tile/kernel/ directory.
> There isn't much to comment on in arch/tile/mm and arch/tile/lib from my
> side, and I still ignored the drivers and oprofile directories.
>

Thanks, that's great. The drivers and oprofile stuff will not be part
of the submission we will make this week anyway, so I think that's OK.

>> diff --git a/arch/tile/kernel/backtrace.c b/arch/tile/kernel/backtrace.c
>> [...]

> this
> file looks rather complicated compared to what the other architectures
> do.
>
> Is this really necessary because of some property of the architecture
> or do you implement other functionality that is not present on existing
> archs?
>

The functionality we implement is to support backtrace of arbitrary
code, as long as it follows a pretty minimalist ABI. This includes
pretty much arbitrarily-optimized code, as well as, of course, code with
no dwarf debug info available. As a result the backtracer is slightly
more complicated, but only for the initial leaf function; after that
it's easy to chain through the call frames.

> Yes, that makes sense. You definitely want binary compatibility between
> 32 bit binaries from a native 32 bit system on TILE-Gx in the syscall
> interface.
>

The thing is, the COMPAT layer on TILE-Gx is actually not providing
TILEPro compatibility, since the architectures are too different --
conceptually similar but with different opcode numbering, etc. Instead
what it's doing is providing a 32-bit pointer ABI, to help porting
crufty old code (this is in fact the primary customer driver), or to
allow more compact representations of pointer-heavy data.

> compat_sys_sendfile will not be needed with the asm-generic/unistd.h definitions,
> but I think you will still need a compat_sys_sendfile64, to which the same
> applies as to compat_sys_sched_rr_get_interval.
>

I think it's the other way around: compat_sys_sendfile64() is just
sys_sendfile64(), but compat_sys_sendfile() needs to exist since it has
to write a 32-bit pointer back to userspace.

>> +static int hardwall_ioctl(struct inode *inode, struct file *file,
>> + unsigned int a, unsigned long b)
>> +{

>> [...]

>>
> The hardwall stuff looks like it is quite central to your architecture.
> Could you elaborate on what it does?
>

It's not "central" but it is an important enabler for access to our
"user network". This is a wormhole-routed mesh network (the UDN, or
user dynamic network) that connects all the cpus. If a task affinitizes
itself to a single cpu (to avoid migration) and opens /dev/hardwall and
does an ioctl on it, it can associate the particular /dev/hardwall file
object with some non-overlapping subrectangle of the whole 8x8 chip (our
cpus are laid out as "tiles" in an 8x8 configuration). It can then do
an "activate" ioctl to get access to that subrectangle of the UDN, from
that cpu. Other threads in that process (or anyone who can share that
file object one way or another, e.g. fork or sendmsg) can then also do
an "activate" ioctl on that file object and also get access, and they
can then exchange messages with very low latency (register file to
register file in a handful of cycles) and high bandwidth (32 bits/cycle
or about 3GB/sec).

The actual "hardwall" refers to the fact that cpus on the periphery of
the allocated subrectangle of cpus set up the router so that they will
get an interrupt if some cpu tries to send a message that would
terminate outside the set of allocated cpus. Doing it this way means
several unrelated tasks could have separate message-passing arenas
(spatially dividing the chip) and whenever the last task holding a
reference to a hardwall file object exits, the OS can drain any messages
from the UDN and deallocate the subrectangle in question.

> If it is as essential as it looks, I'd vote for promoting the interface
> from an ioctl based one to four real system calls (more if necessary).
>

The notion of using a file descriptor as the "rights" object is pretty
central, so I think a character device will work out well.

> Note that the procfs file format is part of your ABI, and this looks
> relatively hard to parse, which may introduce bugs.
> For per-process information, it would be better to have a simpler
> file in each /proc/<pid>/directory. Would that work for you?
>

Well, the hardwalls aren't exactly per-process anyway, and we don't in
practice use the ASCII output for anything much, so it may not matter
that they're not too parseable. I may just look into making them more
parsable when I convert it to a /dev interface and leave it at that.

I'm planning to defer this in any case, since the UDN interface, though
a nice-to-have, obviously isn't needed to run any standard C code. I'll
make that part of a follow-up patch.

> Note that we're about to remove the .ioctl file operation and
> replace it with .unlocked_ioctl everywhere.
>

OK, for now I'll ensure that we are locking everything internally
correctly. I believe we are already anyway.

> [hugevmap] Not used anywhere apparently. Can you explain what this is good for?

> Maybe leave it out for now, until you merge the code that needs it.
> I don't see anything obviously wrong with the implementation though.
>

I'll omit it; we haven't used it yet. The intent was to provide
guaranteed huge pages for TLB purposes to kernel drivers. Currently we
just start with huge pages where possible, and fragment them if necessary.

>> +++ b/arch/tile/kernel/hv_drivers.c
>>
> Please have a look at drivers/char/hvc_{rtas,beat,vio,iseries}.c
> to see how we do the same for other hypervisors, in a much simpler
> way.
>

Great, thanks for the pointer.

>> diff --git a/arch/tile/kernel/memprof.c b/arch/tile/kernel/memprof.c
>> new file mode 100644
>> index 0000000..9424cc5
>> --- /dev/null
>> +++ b/arch/tile/kernel/memprof.c
>>
> I suppose this could get dropped in favor of perf events?
>

I don't know enough about perf events to be sure, but I don't think so;
the memprof device is intended to provide a stream of information on
things like memory latency and bandwidth. But perhaps it could be wired
into perf events. I'll probably move this to "drivers", and in any case
omit it entirely from the first patch.

> +EXPORT_SYMBOL(inb);
>
> If you just remove these definitions, you get a link error for any
> driver that tries to use these, which is probably more helpful than
> the panic.
>
> OTOH, are you sure that you can't just map the PIO calls to mmio functions
> like readb plus some fixed offset? On most non-x86 architectures, the PIO
> area of the PCI bus is just mapped to a memory range somewhere.
>

I'll try to remove them and see if anything falls over. We don't have
any memory-mapped addresses in the 32-bit architecture, though that
changes with the 64-bit architecture, which introduces IO mappings. For
PCI we actually have to do a hypervisor transaction for reads or writes.

>> +/*
>> + * Support /proc/PID/pgtable
>> + */
>>
> Do you have applications relying on this? While I can see
> how this may be useful, I don't think we should have a
> generic interface like this in architecture specific
> code.
>
> It also may be used as an attack vector for malicious applications
> that have a way of accessing parts of physical memory.
>
> I think it would be better to drop this interface for now.
>

We do find it useful internally, mostly because it shows you what
homecaching is actually in effect for pages in an application. But we
don't rely on it, and it is (to be generous) only semi-tastefully hooked
into the generic code, and the hooks are not present in the code we're
currently trying to return to the community. So I'll remove it for now.

>> +/* Simple /proc/tile files. */
>> +SIMPLE_PROC_ENTRY(grid, "%u\t%u\n", smp_width, smp_height)
>> +
>> +/* More complex /proc/tile files. */
>> +static void proc_tile_seq_strconf(struct seq_file *sf, char* what,
>> + uint32_t query)
>>
> All of these look like they should be files in various places in
> sysfs, e.g. in /sys/devices/system/cpu or /sys/firmware/.
> Procfs is not necessarily evil, but most of your uses are for
> stuff that actually first very well into what we have in sysfs.
>

Interesting possibility. I'll look into it.

>> +SEQ_PROC_ENTRY(interrupts)
>> +static int proc_tile_interrupts_show(struct seq_file *sf, void *v)
>> +{

>> [...]

>>
> Can you merge this with /proc/interrupts?
>

It turns out /proc/interrupts is formatted the wrong way round if you
have 64 processors :-) You want one row per cpu, not one column per cpu!

Also, there are things listed that are not strictly IRQs in the normal
sense (things like TLB flushes and syscalls) which are still good for
assessing where latency glitches might be coming from on a particular cpu.

In any case, this will likely be removed for the first round of
submission, along with all the other /proc stuff.

>> +#ifdef CONFIG_FEEDBACK_COLLECT
>> +[...]

> This probably belongs into debugfs, similar to what we do
> for gcov.
>
> How much of the feedbackl stuff is generic? It might be good
> to put those bits in a common place like kernel/feedback.c
> so that other architectures can implement this as well.
>

Hmm, interesting. The feedback stuff is somewhat generic, at least the
link-ordering piece; it relies on some separate userspace code that
computes cache-conflict information and then lays out all the functions
in a good order based on who calls whom. But I'll be removing it for
now and then re-introducing it later as a separate patch anyway.

>> + .procname = "crashinfo",
>> + .data = &show_crashinfo,
>> + .maxlen = sizeof(int),
>> + .mode = 0644,
>> + .proc_handler = &proc_dointvec
>> + },
>> + {}
>> +};
>>
> How is this different from the existing
> exception-trace/userprocess_debug sysctl?
> If it is very similar, let's not introduce yet another
> name for it but just use the common userprocess_debug.
>

I had made a note of doing this earlier when I was porting our code up
to 2.6.34. For now I'm going to remove the tile-specific thing, and
then later look at using the exception-trace hook. I think they're
pretty similar.

> This seems to be read-only and coming from a kernel command
> line option, so I guess looking at /proc/cmdline would
> be a reasonable alternative.
>

I always find that kind of painful, since you have to parse it exactly
as the kernel would to be sure you're getting it right; strstr() is only
a 99% solution.

> I believe the new way to do this is to implement
> CONFIG_HAVE_ARCH_TRACEHOOK and get all these for free.
>

I'll check that out.

>> +SYSCALL_DEFINE3(raise_fpe, int, code, unsigned long, addr,
>> + struct pt_regs *, regs)
>>
> Does this need to be a system call? I thought we already had
> other architectures without floating point exceptions in hardware
> that don't need this.
>

Hmm, I didn't know about that. Any information would be appreciated. I
guess you could synthesize something that looked like a signal purely in
user-space? But how would debuggers trap it? I'm not sure how it would
work without a system call.

>> diff --git a/arch/tile/kernel/sys.c b/arch/tile/kernel/sys.c
>> [...]

>> +ssize_t sys32_readahead(int fd, u32 offset_lo, u32 offset_hi, u32 count)
>> +{
>> + return sys_readahead(fd, ((loff_t)offset_hi << 32) | offset_lo, count);
>> +}
>> +
>>
>>

> These seem to belong with the other similar functions in compat.c
>

Except they're also used by the 32-bit platform where there is no compat
mode (the compat code DOES use them too, it's true).

> Just use the sys_mmap_pgoff system call directly, rather than
> defining your own wrappers. Since that syscall is newer than
> asm-generic/unistd.h, that file might need some changes,
> together with fixes to arch/score to make sure we don't break
> its ABI.
>

It should be OK. Their sys_mmap2() just tail-calls sys_mmap_pgoff()
anyway, so it should be possible to switch mmap2 in asm-generic/unistd.h
to be mmap_pgoff instead. We'll need some user-space changes (our mmap2
is defined in 4KB units) but that's not hard.

> It seems that this file fits in the same category as the
> backtrace code. Maybe move both away from arch/tile/kernel into a
> different directory?
>

I'll think about it. These are both basically disassembly-related, so
maybe an arch/tile/disasm directory with the tile_desc stuff and the
backtracer? I'm not sure it's really worth moving out of
arch/tile/kernel though.

> Have you tried to use the generic lib/checksum.c implementation?

That sounds good. We only touched do_csum(), which already has an
"#ifndef do_csum" in the generic code.

Thanks for all this!

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Arnd Bergmann

unread,

May 27, 2010, 4:50:01 AM5/27/10

On Thursday 27 May 2010, Chris Metcalf wrote:
> > Yes, that makes sense. You definitely want binary compatibility between
> > 32 bit binaries from a native 32 bit system on TILE-Gx in the syscall
> > interface.
> >
>
> The thing is, the COMPAT layer on TILE-Gx is actually not providing
> TILEPro compatibility, since the architectures are too different --
> conceptually similar but with different opcode numbering, etc. Instead
> what it's doing is providing a 32-bit pointer ABI, to help porting
> crufty old code (this is in fact the primary customer driver), or to
> allow more compact representations of pointer-heavy data.

Ah, interesting. I don't think any architecture does it this way
so far. IIRC, while alpha had some applications built in 32 bit
mode in the early days, those were just using the 64 bit system
calls directly.

Then again, that probably required some rather ugly hacks to get
the libc working, so since we have the compat code in kernel now,
your approach is probably much better.

Are you able to build 32 bit kernels for TILE-Gx as well? It's
probably something you never really want to do for performance
reasons, but I guess you could use that to verify that the
ABI is really compatible.

> > compat_sys_sendfile will not be needed with the asm-generic/unistd.h definitions,
> > but I think you will still need a compat_sys_sendfile64, to which the same
> > applies as to compat_sys_sched_rr_get_interval.
> >
>
> I think it's the other way around: compat_sys_sendfile64() is just
> sys_sendfile64(), but compat_sys_sendfile() needs to exist since it has
> to write a 32-bit pointer back to userspace.

Ah. I guess you're right about compat_sys_sendfile64 not being needed.
Funny enough, parisc, powerpc, s390 and sparc all define it anyway, so
it didn't occur to me that they don't actually need to.

What I meant about compat_sys_sendfile is that you only define it if
the 32 bit ABI contains a reference to sys_sendfile in the first
place. With asm-generic/unistd.h, 32 bit uses always uses the sys_sendfile64
kernel interface, while for 64 bit the two are identical, so we take
the regular sys_sendfile.

ok, I see. Now you could easily do this with system calls as well:
Instead of the initial ioctl that associates the file descriptor
with a rectangle, you can have a syscall that creates a rectangle
and a file descriptor (using anon_inode_getfd) associated with it,
and returns the fd to user space. This is similar to what we
do for other system call interfaces that operate on their own fds.

Another alternative might be to combine this with cpusets subsystem,
which has a related functionality. I guess that would be the
preferred way if you expect tile-gx to take over the world and
have lots of applications written to it.
For a niche product, the syscall or ioctl approach does seem
simple enough, and it does not require other users of cpusets
to learn about requirements of your rectangles.

> > Note that the procfs file format is part of your ABI, and this looks
> > relatively hard to parse, which may introduce bugs.
> > For per-process information, it would be better to have a simpler
> > file in each /proc/<pid>/directory. Would that work for you?
> >
>
> Well, the hardwalls aren't exactly per-process anyway, and we don't in
> practice use the ASCII output for anything much, so it may not matter
> that they're not too parseable. I may just look into making them more
> parsable when I convert it to a /dev interface and leave it at that.

On a chardev, a binary interface seems more appropriate than
an text based one anyway, so you could add another ioctl for this.

> I'm planning to defer this in any case, since the UDN interface, though
> a nice-to-have, obviously isn't needed to run any standard C code. I'll
> make that part of a follow-up patch.

> > Note that we're about to remove the .ioctl file operation and
> > replace it with .unlocked_ioctl everywhere.
> >
>
> OK, for now I'll ensure that we are locking everything internally
> correctly. I believe we are already anyway.

ok. Then please just use .unlocked_ioctl in new drivers.

> > [hugevmap] Not used anywhere apparently. Can you explain what this is good for?
> > Maybe leave it out for now, until you merge the code that needs it.
> > I don't see anything obviously wrong with the implementation though.
> >
>
> I'll omit it; we haven't used it yet. The intent was to provide
> guaranteed huge pages for TLB purposes to kernel drivers. Currently we
> just start with huge pages where possible, and fragment them if necessary.

Ok. Do you use huge pages for backing the linear kernel mapping?
Normally device drivers get huge pages for free in kmalloc and
get_free_pages because all the memory is mapped using the largest
page size anyway.

> > +EXPORT_SYMBOL(inb);
> >
> > If you just remove these definitions, you get a link error for any
> > driver that tries to use these, which is probably more helpful than
> > the panic.
> >
> > OTOH, are you sure that you can't just map the PIO calls to mmio functions
> > like readb plus some fixed offset? On most non-x86 architectures, the PIO
> > area of the PCI bus is just mapped to a memory range somewhere.
> >
>
> I'll try to remove them and see if anything falls over. We don't have
> any memory-mapped addresses in the 32-bit architecture, though that
> changes with the 64-bit architecture, which introduces IO mappings. For
> PCI we actually have to do a hypervisor transaction for reads or writes.

Ok, then I assume that PIO would also be a hypervisor call, right?
If you don't have MMIO on 32 bit, you might want to not define either
PIO (inb, ...) no MMIO (readb, ...) calls there and disable
CONFIG_HAVE_MMIO in Kconfig.

> >> +SEQ_PROC_ENTRY(interrupts)
> >> +static int proc_tile_interrupts_show(struct seq_file *sf, void *v)
> >> +{
> >> [...]
> >>
> > Can you merge this with /proc/interrupts?
> >
>
> It turns out /proc/interrupts is formatted the wrong way round if you
> have 64 processors :-) You want one row per cpu, not one column per cpu!

Yes, interesting observation. I'm sure the Altix folks are suffering from
this a lot.

> Also, there are things listed that are not strictly IRQs in the normal
> sense (things like TLB flushes and syscalls) which are still good for
> assessing where latency glitches might be coming from on a particular cpu.

That's fine, just look at what a current x86 kernel gives you (slightly
cut):
CPU0 CPU1
0: 18764948 504980 IO-APIC-edge timer
1: 228456 2572 IO-APIC-edge i8042
9: 2632595 79544 IO-APIC-fasteoi acpi
12: 1094468 43409 IO-APIC-edge i8042
16: 82761 1455 IO-APIC-fasteoi uhci_hcd:usb6, yenta, heci
28: 908865 85857 PCI-MSI-edge ahci
29: 6421 11595 PCI-MSI-edge eth0
NMI: 0 0 Non-maskable interrupts
LOC: 1987682 9057144 Local timer interrupts
SPU: 0 0 Spurious interrupts
CNT: 0 0 Performance counter interrupts
PND: 0 0 Performance pending work
RES: 3598785 3903513 Rescheduling interrupts
CAL: 8848 5944 Function call interrupts
TLB: 31467 18283 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 354 346 Machine check polls
ERR: 0
MIS: 0

Lots of things in there that fit your category.

> > This seems to be read-only and coming from a kernel command
> > line option, so I guess looking at /proc/cmdline would
> > be a reasonable alternative.
> >
>
> I always find that kind of painful, since you have to parse it exactly
> as the kernel would to be sure you're getting it right; strstr() is only
> a 99% solution.

How about making it a module_param then? You can still see it
in /sys/modules/*/parameters then, even if the code is builtin,
but it won't be in the sysctl name space any more.

> >> +SYSCALL_DEFINE3(raise_fpe, int, code, unsigned long, addr,
> >> + struct pt_regs *, regs)
> >>
> > Does this need to be a system call? I thought we already had
> > other architectures without floating point exceptions in hardware
> > that don't need this.
> >
>
> Hmm, I didn't know about that. Any information would be appreciated. I
> guess you could synthesize something that looked like a signal purely in
> user-space? But how would debuggers trap it? I'm not sure how it would
> work without a system call.

I think the C99 standard allows you to not implement SIGFPE at all but
instead rely on applications doing fetestexcept() etc.

> >> diff --git a/arch/tile/kernel/sys.c b/arch/tile/kernel/sys.c
> >> [...]
> >> +ssize_t sys32_readahead(int fd, u32 offset_lo, u32 offset_hi, u32 count)
> >> +{
> >> + return sys_readahead(fd, ((loff_t)offset_hi << 32) | offset_lo, count);
> >> +}
> >> +
> >>
> >>
> > These seem to belong with the other similar functions in compat.c
> >
>
> Except they're also used by the 32-bit platform where there is no compat
> mode (the compat code DOES use them too, it's true).

I see. AFAICT, all other architectures don't need the wrapper in
the 32 bit native case because they define the syscall calling
conventions in libc such that they match what the kernel
expects for a 64 bit argument (typically split in two subsequent
argument slots). Would that work for you as well?

> > Just use the sys_mmap_pgoff system call directly, rather than
> > defining your own wrappers. Since that syscall is newer than
> > asm-generic/unistd.h, that file might need some changes,
> > together with fixes to arch/score to make sure we don't break
> > its ABI.
> >
>
> It should be OK. Their sys_mmap2() just tail-calls sys_mmap_pgoff()
> anyway, so it should be possible to switch mmap2 in asm-generic/unistd.h
> to be mmap_pgoff instead. We'll need some user-space changes (our mmap2
> is defined in 4KB units) but that's not hard.

Hmm, I forgot about the page size. Actually the definition of sys_mmap2
is to use 4KB units on all architectures except ia64, independent
of the real page size. Maybe it's better to keep using sys_mmap/sys_mmap2
after all but then use only one of the two (sys_mmap on 64 bit, sys_mmap2
on 32 bit and compat).

Either way should work though.

> > It seems that this file fits in the same category as the
> > backtrace code. Maybe move both away from arch/tile/kernel into a
> > different directory?
> >
>
> I'll think about it. These are both basically disassembly-related, so
> maybe an arch/tile/disasm directory with the tile_desc stuff and the
> backtracer? I'm not sure it's really worth moving out of
> arch/tile/kernel though.

Ok. If you leave them in the directory, just split them out into a separate
patch on your next submission then.

Arnd

Chris Metcalf

unread,

May 27, 2010, 9:40:02 AM5/27/10

On 5/27/2010 4:41 AM, Arnd Bergmann wrote:
> On Thursday 27 May 2010, Chris Metcalf wrote:
>
>> The thing is, the COMPAT layer on TILE-Gx is actually not providing
>> TILEPro compatibility, since the architectures are too different --
>> conceptually similar but with different opcode numbering, etc. Instead
>> what it's doing is providing a 32-bit pointer ABI, to help porting
>> crufty old code (this is in fact the primary customer driver), or to
>> allow more compact representations of pointer-heavy data.
>>

> [...]

> Are you able to build 32 bit kernels for TILE-Gx as well? It's
> probably something you never really want to do for performance
> reasons, but I guess you could use that to verify that the
> ABI is really compatible.
>

No, we haven't tried to do this. I suppose it would be possible to port
the TILE-Gx kernel to use -m32 mode and HIGHMEM, but I think it would
just uglify the code. :-)

> What I meant about compat_sys_sendfile is that you only define it if
> the 32 bit ABI contains a reference to sys_sendfile in the first
> place. With asm-generic/unistd.h, 32 bit uses always uses the sys_sendfile64
> kernel interface, while for 64 bit the two are identical, so we take
> the regular sys_sendfile.
>

Right, true enough. I'm still building internally with
__ARCH_WANT_SYSCALL_OFF_T, so some extra compat functions are still
needed for linking the kernel. I'll try to remember to unifdef them out
of the code I submit back to the community.

>> The notion of using a file descriptor as the "rights" object is pretty
>> central, so I think a character device will work out well.
>>
> ok, I see. Now you could easily do this with system calls as well:
> Instead of the initial ioctl that associates the file descriptor
> with a rectangle, you can have a syscall that creates a rectangle
> and a file descriptor (using anon_inode_getfd) associated with it,
> and returns the fd to user space. This is similar to what we
> do for other system call interfaces that operate on their own fds.
>

Yes, good point. I'll be holding back this code from the initial patch,
so I can think about it some more. I'm still predisposed to avoid
adding system calls in general, though.

> On a chardev, a binary interface seems more appropriate than

> an text based one anyway, so you could add another ioctl for this.ok. Then please just use .unlocked_ioctl in new drivers.
>

OK, I bombed all our existing drivers to use .unlocked_ioctl. It's
convenient that unlocked_ioctl now has the same signature as compat_ioctl.

> Ok. Do you use huge pages for backing the linear kernel mapping?
> Normally device drivers get huge pages for free in kmalloc and
> get_free_pages because all the memory is mapped using the largest
> page size anyway.
>

We do now. At the time we (semi-speculatively) wrote the hugevmap code,
we didn't. I won't return this code to the community until we actually
use it, in any case.

>>> +EXPORT_SYMBOL(inb);
>>>
>>> If you just remove these definitions, you get a link error for any
>>> driver that tries to use these, which is probably more helpful than
>>> the panic.
>>>
>>> OTOH, are you sure that you can't just map the PIO calls to mmio functions
>>> like readb plus some fixed offset? On most non-x86 architectures, the PIO
>>> area of the PCI bus is just mapped to a memory range somewhere.
>>>
>>>
>> I'll try to remove them and see if anything falls over. We don't have
>> any memory-mapped addresses in the 32-bit architecture, though that
>> changes with the 64-bit architecture, which introduces IO mappings. For
>> PCI we actually have to do a hypervisor transaction for reads or writes.
>>
> Ok, then I assume that PIO would also be a hypervisor call, right?
> If you don't have MMIO on 32 bit, you might want to not define either
> PIO (inb, ...) no MMIO (readb, ...) calls there and disable
> CONFIG_HAVE_MMIO in Kconfig.
>

We don't define CONFIG_HAVE_MMIO, but drivers certainly seem to use
ioread/iowrite methods as well as inb/outb without guarding them with
any particular tests, so we have to provide definitions of some kind for
all of them. I'll confer with our PCI developer to see if we can clean
up the set of definitions in io.h.

>>> Does this need to be a system call? I thought we already had
>>> other architectures without floating point exceptions in hardware
>>> that don't need this.
>>>
>>>
>> Hmm, I didn't know about that. Any information would be appreciated. I
>> guess you could synthesize something that looked like a signal purely in
>> user-space? But how would debuggers trap it? I'm not sure how it would
>> work without a system call.
>>
> I think the C99 standard allows you to not implement SIGFPE at all but
> instead rely on applications doing fetestexcept() etc.
>

We use this not for the floating-point operations, but for integer
divide-by-zero. In principle we could use it for floating-point too,
but we currently don't, since generally folks don't expect it there.

>>>> diff --git a/arch/tile/kernel/sys.c b/arch/tile/kernel/sys.c
>>>> [...]
>>>> +ssize_t sys32_readahead(int fd, u32 offset_lo, u32 offset_hi, u32 count)
>>>> +{
>>>> + return sys_readahead(fd, ((loff_t)offset_hi << 32) | offset_lo, count);
>>>> +}
>>>> +
>>>>
>>>>
>>> These seem to belong with the other similar functions in compat.c
>>>
>>>
>> Except they're also used by the 32-bit platform where there is no compat
>> mode (the compat code DOES use them too, it's true).
>>
> I see. AFAICT, all other architectures don't need the wrapper in
> the 32 bit native case because they define the syscall calling
> conventions in libc such that they match what the kernel
> expects for a 64 bit argument (typically split in two subsequent
> argument slots). Would that work for you as well?
>

Yes, we could override this in libc. My assumption was that it was
cleaner to do it in the kernel, since we support uclibc and glibc, and
doing it in the kernel meant only doing it in one place.

>>> Just use the sys_mmap_pgoff system call directly, rather than
>>> defining your own wrappers. Since that syscall is newer than
>>> asm-generic/unistd.h, that file might need some changes,
>>> together with fixes to arch/score to make sure we don't break
>>> its ABI.
>>>
>>>
>> It should be OK. Their sys_mmap2() just tail-calls sys_mmap_pgoff()
>> anyway, so it should be possible to switch mmap2 in asm-generic/unistd.h
>> to be mmap_pgoff instead. We'll need some user-space changes (our mmap2
>> is defined in 4KB units) but that's not hard.
>>
> Hmm, I forgot about the page size. Actually the definition of sys_mmap2
> is to use 4KB units on all architectures except ia64, independent
> of the real page size. Maybe it's better to keep using sys_mmap/sys_mmap2
> after all but then use only one of the two (sys_mmap on 64 bit, sys_mmap2
> on 32 bit and compat).
>

I'll keep it as-is, then. Like the sendfile discussion above, we'll
need both for now, but I'll see if I can unifdef the unwanted ones out
for the community.

>> I'll think about it. These are both basically disassembly-related, so
>> maybe an arch/tile/disasm directory with the tile_desc stuff and the
>> backtracer? I'm not sure it's really worth moving out of
>> arch/tile/kernel though.
>>
> Ok. If you leave them in the directory, just split them out into a separate
> patch on your next submission then.
>

Does this imply separate git commits to our repository, if we want to do
things the Right Way? I always tend to try to commit things in such a
way that everything is always buildable between each commit, and I can't
easily pull out the disassembly-related files from the kernel. On the
other hand I can easily split up a single bit GIT commit-patch into
multiple emails, but then of course it wouldn't apply easily to a "git
am". Guidance?? :-)

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Paul Mundt

unread,

May 27, 2010, 9:50:02 AM5/27/10

On Thu, May 27, 2010 at 03:41:44PM +0200, Geert Uytterhoeven wrote:

> On Thu, May 27, 2010 at 15:30, Chris Metcalf <cmet...@tilera.com> wrote:
> > On 5/27/2010 4:41 AM, Arnd Bergmann wrote:
> >> On Thursday 27 May 2010, Chris Metcalf wrote:
> >>>> +EXPORT_SYMBOL(inb);
> >>>>
> >>>> If you just remove these definitions, you get a link error for any
> >>>> driver that tries to use these, which is probably more helpful than
> >>>> the panic.
> >>>>
> >>>> OTOH, are you sure that you can't just map the PIO calls to mmio functions
> >>>> like readb plus some fixed offset? On most non-x86 architectures, the PIO
> >>>> area of the PCI bus is just mapped to a memory range somewhere.
> >>>>
> >>>>

> >>> I'll try to remove them and see if anything falls over. ??We don't have

> >>> any memory-mapped addresses in the 32-bit architecture, though that

> >>> changes with the 64-bit architecture, which introduces IO mappings. ??For

> >>> PCI we actually have to do a hypervisor transaction for reads or writes.
> >>>
> >> Ok, then I assume that PIO would also be a hypervisor call, right?
> >> If you don't have MMIO on 32 bit, you might want to not define either
> >> PIO (inb, ...) no MMIO (readb, ...) calls there and disable
> >> CONFIG_HAVE_MMIO in Kconfig.
> >>
> >
> > We don't define CONFIG_HAVE_MMIO, but drivers certainly seem to use
> > ioread/iowrite methods as well as inb/outb without guarding them with
> > any particular tests, so we have to provide definitions of some kind for

> > all of them. ??I'll confer with our PCI developer to see if we can clean

> > up the set of definitions in io.h.
>

> It's CONFIG_NO_IOMEM (cfr. s390 and um), which is inverted and turned into
> CONFIG_HAS_IOMEM, to be checked by drivers.
>
Likewise for CONFIG_NO_IOPORT for disabling PIO, although you'll probably
want to conditionalize this on PCI I/O.

Geert Uytterhoeven

unread,

May 27, 2010, 9:50:02 AM5/27/10

On Thu, May 27, 2010 at 15:30, Chris Metcalf <cmet...@tilera.com> wrote:

> On 5/27/2010 4:41 AM, Arnd Bergmann wrote:
>> On Thursday 27 May 2010, Chris Metcalf wrote:
>>>> +EXPORT_SYMBOL(inb);
>>>>
>>>> If you just remove these definitions, you get a link error for any
>>>> driver that tries to use these, which is probably more helpful than
>>>> the panic.
>>>>
>>>> OTOH, are you sure that you can't just map the PIO calls to mmio functions
>>>> like readb plus some fixed offset? On most non-x86 architectures, the PIO
>>>> area of the PCI bus is just mapped to a memory range somewhere.
>>>>
>>>>
>>> I'll try to remove them and see if anything falls over. We don't have
>>> any memory-mapped addresses in the 32-bit architecture, though that
>>> changes with the 64-bit architecture, which introduces IO mappings. For
>>> PCI we actually have to do a hypervisor transaction for reads or writes.
>>>
>> Ok, then I assume that PIO would also be a hypervisor call, right?
>> If you don't have MMIO on 32 bit, you might want to not define either
>> PIO (inb, ...) no MMIO (readb, ...) calls there and disable
>> CONFIG_HAVE_MMIO in Kconfig.
>>
>
> We don't define CONFIG_HAVE_MMIO, but drivers certainly seem to use
> ioread/iowrite methods as well as inb/outb without guarding them with
> any particular tests, so we have to provide definitions of some kind for
> all of them. I'll confer with our PCI developer to see if we can clean
> up the set of definitions in io.h.

It's CONFIG_NO_IOMEM (cfr. s390 and um), which is inverted and turned into

CONFIG_HAS_IOMEM, to be checked by drivers.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Arnd Bergmann

unread,

May 27, 2010, 10:20:01 AM5/27/10

On Thursday 27 May 2010, Chris Metcalf wrote:
> On 5/27/2010 4:41 AM, Arnd Bergmann wrote:
> > Are you able to build 32 bit kernels for TILE-Gx as well? It's
> > probably something you never really want to do for performance
> > reasons, but I guess you could use that to verify that the
> > ABI is really compatible.
> >
>
> No, we haven't tried to do this. I suppose it would be possible to port
> the TILE-Gx kernel to use -m32 mode and HIGHMEM, but I think it would
> just uglify the code. :-)

Probably right, yes.

> >> The notion of using a file descriptor as the "rights" object is pretty
> >> central, so I think a character device will work out well.
> >>
> > ok, I see. Now you could easily do this with system calls as well:
> > Instead of the initial ioctl that associates the file descriptor
> > with a rectangle, you can have a syscall that creates a rectangle
> > and a file descriptor (using anon_inode_getfd) associated with it,
> > and returns the fd to user space. This is similar to what we
> > do for other system call interfaces that operate on their own fds.
> >
>
> Yes, good point. I'll be holding back this code from the initial patch,
> so I can think about it some more. I'm still predisposed to avoid
> adding system calls in general, though.

Well, adding chardevs just for the sake of doing ioctl in place of
a syscall is no better than adding the real syscall for something that
should be one.
It has all the disadvantages of new syscalls but does it in a sneaky way.

> >> I'll try to remove them and see if anything falls over. We don't have
> >> any memory-mapped addresses in the 32-bit architecture, though that
> >> changes with the 64-bit architecture, which introduces IO mappings. For
> >> PCI we actually have to do a hypervisor transaction for reads or writes.
> >>
> > Ok, then I assume that PIO would also be a hypervisor call, right?
> > If you don't have MMIO on 32 bit, you might want to not define either
> > PIO (inb, ...) no MMIO (readb, ...) calls there and disable
> > CONFIG_HAVE_MMIO in Kconfig.
> >
>
> We don't define CONFIG_HAVE_MMIO, but drivers certainly seem to use
> ioread/iowrite methods as well as inb/outb without guarding them with
> any particular tests, so we have to provide definitions of some kind for
> all of them. I'll confer with our PCI developer to see if we can clean
> up the set of definitions in io.h.

As Geert mentioned, I meant CONFIG_HAS_IOMEM. If that is disabled,
no code should ever call any of these functions.

> >> Hmm, I didn't know about that. Any information would be appreciated. I
> >> guess you could synthesize something that looked like a signal purely in
> >> user-space? But how would debuggers trap it? I'm not sure how it would
> >> work without a system call.
> >>
> > I think the C99 standard allows you to not implement SIGFPE at all but
> > instead rely on applications doing fetestexcept() etc.
> >
>
> We use this not for the floating-point operations, but for integer
> divide-by-zero. In principle we could use it for floating-point too,
> but we currently don't, since generally folks don't expect it there.

Ah, I see. That probably makes a lot of sense to present as a signal
the way you do.

> >> Except they're also used by the 32-bit platform where there is no compat
> >> mode (the compat code DOES use them too, it's true).
> >>
> > I see. AFAICT, all other architectures don't need the wrapper in
> > the 32 bit native case because they define the syscall calling
> > conventions in libc such that they match what the kernel
> > expects for a 64 bit argument (typically split in two subsequent
> > argument slots). Would that work for you as well?
> >
>
> Yes, we could override this in libc. My assumption was that it was
> cleaner to do it in the kernel, since we support uclibc and glibc, and
> doing it in the kernel meant only doing it in one place.

That's not the way I meant. There are two options how (any) libc can
implement this:
1. the calling conventions for user function calls and for kernel
function calls are the same, so you don't need to do anything here.
2. the calling conventions are different, so you already need a wrapper
in user space for 64 bit arguments to split them up and you could
to that in exactly the way that the kernel expects to be called.

> >> I'll think about it. These are both basically disassembly-related, so
> >> maybe an arch/tile/disasm directory with the tile_desc stuff and the
> >> backtracer? I'm not sure it's really worth moving out of
> >> arch/tile/kernel though.
> >>
> > Ok. If you leave them in the directory, just split them out into a separate
> > patch on your next submission then.
> >
>
> Does this imply separate git commits to our repository, if we want to do
> things the Right Way? I always tend to try to commit things in such a
> way that everything is always buildable between each commit, and I can't
> easily pull out the disassembly-related files from the kernel. On the
> other hand I can easily split up a single bit GIT commit-patch into
> multiple emails, but then of course it wouldn't apply easily to a "git
> am". Guidance?? :-)

You're right that any commit should result in something that's buildable.
In this case I think you can make an exception because before the first
patch, nothing builds in arch/tile, so you extend that phase to two
or three patches before you get to the first one that's actually
compilable.

Arnd

Chris Metcalf

unread,

May 27, 2010, 10:40:02 AM5/27/10

On 5/27/2010 10:11 AM, Arnd Bergmann wrote:
>>> > > I see. AFAICT, all other architectures don't need the wrapper in
>>> > > the 32 bit native case because they define the syscall calling
>>> > > conventions in libc such that they match what the kernel
>>> > > expects for a 64 bit argument (typically split in two subsequent
>>> > > argument slots). Would that work for you as well?
>>>
>>
>> > Yes, we could override this in libc. My assumption was that it was
>> > cleaner to do it in the kernel, since we support uclibc and glibc, and
>> > doing it in the kernel meant only doing it in one place.
>>
> That's not the way I meant. There are two options how (any) libc can
> implement this:
> 1. the calling conventions for user function calls and for kernel
> function calls are the same, so you don't need to do anything here.
> 2. the calling conventions are different, so you already need a wrapper
> in user space for 64 bit arguments to split them up and you could
> to that in exactly the way that the kernel expects to be called.
>

The issue is that libc support for 64-bit operands on 32-bit platforms
tends to look like "syscall(foo64, arg1, LOW(arg2), HIGH(arg2))". This
naturally passes the arguments in consecutive registers, for a
register-based calling convention like ours. However, invoking
"foo64(arg1, (u64)arg2)" passes the u64 argument in the next consecutive
even/odd numbered pair of registers on our architecture. Arguably this
notion of register alignment isn't particularly helpful, but we opted to
do it this way when we settled on the API. The upshot is that to match
this, userspace needs to do "syscall(foo64, arg1, dummy, LOW(arg2),
HIGH(arg2))". So we need to provide these dummy-argument versions of
the syscall wrappers to all the libcs that we use (currently uclibc,
glibc, and sometimes newlib). Where the 64-bit argument falls naturally
on an even register boundary we don't need to provide any kernel stub.

Basically the scenario is your #2 above, but userspace already has an
implementation of the user-space wrapper in the generic code, and I'm
trying to avoid having to provide a tile-specific version of it.

For reference, here's readahead() in glibc (overridden to be a pure
syscall wrapper for 64-bit architectures):

ssize_t
__readahead (int fd, off64_t offset, size_t count)
{
return INLINE_SYSCALL (readahead, 4, fd,
__LONG_LONG_PAIR ((off_t) (offset >> 32),
(off_t) (offset & 0xffffffff)),
count);
}

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Marc Gauthier

unread,

May 27, 2010, 11:00:03 AM5/27/10

linux-ar...@vger.kernel.org wrote:
> On Thursday 27 May 2010, Chris Metcalf wrote:
>> On 5/27/2010 4:41 AM, Arnd Bergmann wrote:
>>>> Hmm, I didn't know about that. Any information would be
>>>> appreciated. I guess you could synthesize something that looked
>>>> like a signal purely in user-space? But how would debuggers trap
>>>> it? I'm not sure how it would work without a system call.
>>>
>>> I think the C99 standard allows you to not implement SIGFPE at all
>>> but instead rely on applications doing fetestexcept() etc.
>>
>> We use this not for the floating-point operations, but for integer
>> divide-by-zero. In principle we could use it for floating-point too,
>> but we currently don't, since generally folks don't expect it there.
>
> Ah, I see. That probably makes a lot of sense to present as a signal
> the way you do.

FWIW, this can also be done using some recognizable illegal
instruction sequence, if the architecture reserves some opcodes
as always illegal. This makes the division routine (typically part
of libgcc) more independent of OS, which has some merit.
The kernel illegal instruction handler needs to recognize this
sequence and turn it into a SIGFPE instead of SIGILL.

The Xtensa architecture libgcc added this recently, but we haven't
yet added the SIGILL=>SIGFPE code to the kernel.

-Marc

Chris Metcalf

unread,

May 27, 2010, 11:10:02 AM5/27/10

On 5/27/2010 11:02 AM, Arnd Bergmann wrote:
> Ok, I see. No objection to your kernel code then, we just need to
> figure out how to do that with the generic sys_call_table.
>

That turns out to be fairly easy:

#undef __SYSCALL
#define __SYSCALL(nr, call) [nr] = (call),

#ifndef __tilegx__
#define sys_fadvise64 sys32_fadvise64
#define sys_fadvise64_64 sys32_fadvise64_64
#define sys_readahead sys32_readahead
#endif

void *sys_call_table[__NR_syscalls] = {

[0 ... __NR_syscalls-1] = sys_ni_syscall,

#include <asm/unistd.h>

Arnd Bergmann

unread,

May 27, 2010, 11:10:03 AM5/27/10

On Thursday 27 May 2010, Chris Metcalf wrote:

> On 5/27/2010 10:11 AM, Arnd Bergmann wrote:
>
> The issue is that libc support for 64-bit operands on 32-bit platforms
> tends to look like "syscall(foo64, arg1, LOW(arg2), HIGH(arg2))". This
> naturally passes the arguments in consecutive registers, for a
> register-based calling convention like ours. However, invoking
> "foo64(arg1, (u64)arg2)" passes the u64 argument in the next consecutive
> even/odd numbered pair of registers on our architecture. Arguably this
> notion of register alignment isn't particularly helpful, but we opted to
> do it this way when we settled on the API. The upshot is that to match
> this, userspace needs to do "syscall(foo64, arg1, dummy, LOW(arg2),
> HIGH(arg2))". So we need to provide these dummy-argument versions of
> the syscall wrappers to all the libcs that we use (currently uclibc,
> glibc, and sometimes newlib). Where the 64-bit argument falls naturally
> on an even register boundary we don't need to provide any kernel stub.

ok, makes sense. IIRC, the s390 architecture has the same requirement,
probably some others as well.

> Basically the scenario is your #2 above, but userspace already has an
> implementation of the user-space wrapper in the generic code, and I'm
> trying to avoid having to provide a tile-specific version of it.
>
> For reference, here's readahead() in glibc (overridden to be a pure
> syscall wrapper for 64-bit architectures):
>
> ssize_t
> __readahead (int fd, off64_t offset, size_t count)
> {
> return INLINE_SYSCALL (readahead, 4, fd,
> __LONG_LONG_PAIR ((off_t) (offset >> 32),
> (off_t) (offset & 0xffffffff)),
> count);
> }

Ok, I see. No objection to your kernel code then, we just need to

figure out how to do that with the generic sys_call_table.

Arnd

Chris Metcalf

unread,

May 27, 2010, 11:10:03 AM5/27/10

On 5/27/2010 10:11 AM, Arnd Bergmann wrote:

> You're right that any commit should result in something that's buildable.
> In this case I think you can make an exception because before the first
> patch, nothing builds in arch/tile, so you extend that phase to two
> or three patches before you get to the first one that's actually
> compilable.
>

OK, will do. I'm also planning to do a hard git reset to remove the
original patch, since much of it is no longer wanted anyway. I'll mail
out a diff of what has changed relative to that original submission as
just a regular "diff -ru" so previous reviewers can see what has changed.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Arnd Bergmann

unread,

May 27, 2010, 11:30:02 AM5/27/10

On Thursday 27 May 2010, Chris Metcalf wrote:

> That turns out to be fairly easy:
>
> #undef __SYSCALL
> #define __SYSCALL(nr, call) [nr] = (call),
>
> #ifndef __tilegx__
> #define sys_fadvise64 sys32_fadvise64
> #define sys_fadvise64_64 sys32_fadvise64_64
> #define sys_readahead sys32_readahead
> #endif
>
> void *sys_call_table[__NR_syscalls] = {
> [0 ... __NR_syscalls-1] = sys_ni_syscall,
> #include <asm/unistd.h>
> };
>

Ok. This does mean that you're no longer using a shared
version of the sys_call_table.c file but your own one, but
since the file so simple, that should not be a problem.

We can think about merging it when we have more architectures
that need a hack like this, which might never happen.

Arnd

Jamie Lokier

unread,

May 27, 2010, 4:40:02 PM5/27/10

They do need it.

For example, on Sparc, compat_sys_sendfile64 takes a 32-bit
compat_size_t argument, and calls sys_sendfile64 with a 64-bit size_t
argument.

I'll be very surprised if 32-bit tile is using 64-bit size_t already :-)

-- Jamie

> To unsubscribe from this list: send the line "unsubscribe linux-arch" in

Arnd Bergmann

unread,

May 27, 2010, 5:00:02 PM5/27/10

On Thursday 27 May 2010 22:34:12 Jamie Lokier wrote:
> > > > compat_sys_sendfile will not be needed with the asm-generic/unistd.h definitions,
> > > > but I think you will still need a compat_sys_sendfile64, to which the same
> > > > applies as to compat_sys_sched_rr_get_interval.
> > > >
> > >
> > > I think it's the other way around: compat_sys_sendfile64() is just
> > > sys_sendfile64(), but compat_sys_sendfile() needs to exist since it has
> > > to write a 32-bit pointer back to userspace.
> >
> > Ah. I guess you're right about compat_sys_sendfile64 not being needed.
> > Funny enough, parisc, powerpc, s390 and sparc all define it anyway, so
> > it didn't occur to me that they don't actually need to.
>
> They do need it.
>
> For example, on Sparc, compat_sys_sendfile64 takes a 32-bit
> compat_size_t argument, and calls sys_sendfile64 with a 64-bit size_t
> argument.

But size_t is unsigned and the upper halves of the argument registers
are always zero-filled in the sparc64 compat syscall entry, so I guess the
conversion is still not necessary there.

PowerPC probably still needs it but doesn't need to do the nasty set_fs()
stuff. s390x needs a proper assembly wrapper and parisc uses the same
method as sparc64 IIRC.

mips and x86 already just call the native sendfile64 syscall.

> I'll be very surprised if 32-bit tile is using 64-bit size_t already :-)

** Surprise! **

Chris mentioned that the tile compat support is not actually for the
native 32 bit ABI but for a third ILP32 ABI of the 64 bit Tile-GX
processors. I assume all registers in this mode are 64 bit, and so are the
syscall arguments.

Arnd
--

Chris Metcalf

unread,

May 28, 2010, 12:50:02 PM5/28/10

On 5/25/2010 5:45 PM, Arnd Bergmann wrote:

>> +u32 inb(u32 addr)
>> > +{
>> > + panic("inb not implemented");
>> > +}
>> > +EXPORT_SYMBOL(inb);
>>
> If you just remove these definitions, you get a link error for any
> driver that tries to use these, which is probably more helpful than
> the panic.
>
> OTOH, are you sure that you can't just map the PIO calls to mmio functions
> like readb plus some fixed offset? On most non-x86 architectures, the PIO
> area of the PCI bus is just mapped to a memory range somewhere.
>

I looked at where inb() and friends are used, and although we can get
close to not requiring them for tile, it's not quite there, and will
need some further discussions on the various lists to clean up enough
for our purposes. There are three problems that surface when I try to
remove the inb family:

1. The ide-io-std.c file sets up the default_tp_ops, which we override
in our IDE driver, but we still have to build this file, and it contains
calls to inb, etc.

2. The usb pci quirks file contains some quirks that use inb and outb,
and since we support PCI, we have to try to compile this file.

3. CONFIG_DEVPORT defaults to yes, unless you're M68K.

None of this seems to depend on CONFIG_HAS_IOPORT.

Our PCI driver supports IOMEM read/write, but not IOPORT.

Perhaps something like CONFIG_ARCH_PCI_HAS_NO_IOPORT or some such, and
then we can disable all of the above things if that config option is
present (and CONFIG_HAS_IOPORT is false also?).

For now, I'll just leave the inb/outb implementation as panic() calls.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Arnd Bergmann

unread,

May 28, 2010, 1:20:02 PM5/28/10

On Friday 28 May 2010, Chris Metcalf wrote:
> I looked at where inb() and friends are used, and although we can get
> close to not requiring them for tile, it's not quite there, and will
> need some further discussions on the various lists to clean up enough
> for our purposes. There are three problems that surface when I try to
> remove the inb family:
>
> 1. The ide-io-std.c file sets up the default_tp_ops, which we override
> in our IDE driver, but we still have to build this file, and it contains
> calls to inb, etc.

It's only needed in the IDE layer though and will go away once you
move to an ATA driver, right?

> 2. The usb pci quirks file contains some quirks that use inb and outb,
> and since we support PCI, we have to try to compile this file.
>
> 3. CONFIG_DEVPORT defaults to yes, unless you're M68K.
>
> None of this seems to depend on CONFIG_HAS_IOPORT.

All three places you have found seem to be actual bugs.

> Our PCI driver supports IOMEM read/write, but not IOPORT.
>
> Perhaps something like CONFIG_ARCH_PCI_HAS_NO_IOPORT or some such, and
> then we can disable all of the above things if that config option is
> present (and CONFIG_HAS_IOPORT is false also?).

That's what CONFIG_NO_IOPORT is supposed to be used for in the
first place, so I think we should just use the existing CONFIG_HAS_IOPORT
symbol to disable the broken code you found. CONFIG_DEVPORT then
not even needs to check for M68K.

Arnd

Chris Metcalf

unread,

May 28, 2010, 1:30:03 PM5/28/10

On 5/28/2010 1:16 PM, Arnd Bergmann wrote:
> On Friday 28 May 2010, Chris Metcalf wrote:
>
>> I looked at where inb() and friends are used, and although we can get
>> close to not requiring them for tile, it's not quite there, and will
>> need some further discussions on the various lists to clean up enough
>> for our purposes. There are three problems that surface when I try to
>> remove the inb family:
>>
>> 1. The ide-io-std.c file sets up the default_tp_ops, which we override
>> in our IDE driver, but we still have to build this file, and it contains
>> calls to inb, etc.
>>
> It's only needed in the IDE layer though and will go away once you
> move to an ATA driver, right?
>

That sounds plausible, though I haven't looked at what's involved with
this yet.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Chris Metcalf

unread,

May 28, 2010, 2:00:03 PM5/28/10

On 5/27/2010 10:52 AM, Marc Gauthier wrote:
>>> We use [a syscall] not for the floating-point operations, but for integer

>>> divide-by-zero. In principle we could use it for floating-point too,
>>> but we currently don't, since generally folks don't expect it there.
>>>
>> Ah, I see. That probably makes a lot of sense to present as a signal
>> the way you do.
>>
> FWIW, this can also be done using some recognizable illegal
> instruction sequence, if the architecture reserves some opcodes
> as always illegal.

We do reserve a range of illegal values, and this is a great idea. I've
removed the syscall from our kernel, and will add support for the
appropriate magic in the trap handler once we pick an encoding and give
it a name in the assembler.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Chris Metcalf

unread,

May 28, 2010, 11:40:02 PM5/28/10

Signed-off-by: Chris Metcalf <cmet...@tilera.com>
---
arch/tile/lib/Makefile | 16 +
arch/tile/lib/__invalidate_icache.S | 106 ++++++
arch/tile/lib/atomic_32.c | 347 +++++++++++++++++++
arch/tile/lib/atomic_asm_32.S | 197 +++++++++++
arch/tile/lib/checksum.c | 102 ++++++
arch/tile/lib/cpumask.c | 51 +++
arch/tile/lib/delay.c | 34 ++
arch/tile/lib/exports.c | 78 +++++
arch/tile/lib/mb_incoherent.S | 34 ++
arch/tile/lib/memchr_32.c | 68 ++++
arch/tile/lib/memcpy_32.S | 628 +++++++++++++++++++++++++++++++++++
arch/tile/lib/memcpy_tile64.c | 271 +++++++++++++++
arch/tile/lib/memmove_32.c | 63 ++++
arch/tile/lib/memset_32.c | 274 +++++++++++++++
arch/tile/lib/spinlock_32.c | 221 ++++++++++++
arch/tile/lib/spinlock_common.h | 64 ++++
arch/tile/lib/strchr_32.c | 66 ++++
arch/tile/lib/strlen_32.c | 36 ++
arch/tile/lib/uaccess.c | 31 ++
arch/tile/lib/usercopy_32.S | 223 +++++++++++++
20 files changed, 2910 insertions(+), 0 deletions(-)
create mode 100644 arch/tile/lib/Makefile
create mode 100644 arch/tile/lib/__invalidate_icache.S
create mode 100644 arch/tile/lib/atomic_32.c
create mode 100644 arch/tile/lib/atomic_asm_32.S
create mode 100644 arch/tile/lib/checksum.c
create mode 100644 arch/tile/lib/cpumask.c
create mode 100644 arch/tile/lib/delay.c
create mode 100644 arch/tile/lib/exports.c
create mode 100644 arch/tile/lib/mb_incoherent.S
create mode 100644 arch/tile/lib/memchr_32.c
create mode 100644 arch/tile/lib/memcpy_32.S
create mode 100644 arch/tile/lib/memcpy_tile64.c
create mode 100644 arch/tile/lib/memmove_32.c
create mode 100644 arch/tile/lib/memset_32.c
create mode 100644 arch/tile/lib/spinlock_32.c
create mode 100644 arch/tile/lib/spinlock_common.h
create mode 100644 arch/tile/lib/strchr_32.c
create mode 100644 arch/tile/lib/strlen_32.c
create mode 100644 arch/tile/lib/uaccess.c
create mode 100644 arch/tile/lib/usercopy_32.S

diff --git a/arch/tile/lib/Makefile b/arch/tile/lib/Makefile
new file mode 100644
index 0000000..ea9c209
--- /dev/null
+++ b/arch/tile/lib/Makefile
@@ -0,0 +1,16 @@
+#
+# Makefile for TILE-specific library files..
+#
+
+lib-y = checksum.o cpumask.o delay.o __invalidate_icache.o \
+ mb_incoherent.o uaccess.o \
+ memcpy_$(BITS).o memchr_$(BITS).o memmove_$(BITS).o memset_$(BITS).o \
+ strchr_$(BITS).o strlen_$(BITS).o
+
+ifneq ($(CONFIG_TILEGX),y)
+lib-y += atomic_32.o atomic_asm_32.o memcpy_tile64.o
+endif
+
+lib-$(CONFIG_SMP) += spinlock_$(BITS).o usercopy_$(BITS).o
+
+obj-$(CONFIG_MODULES) += exports.o
diff --git a/arch/tile/lib/__invalidate_icache.S b/arch/tile/lib/__invalidate_icache.S
new file mode 100644
index 0000000..92e7050
--- /dev/null
+++ b/arch/tile/lib/__invalidate_icache.S
@@ -0,0 +1,106 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ * A routine for synchronizing the instruction and data caches.
+ * Useful for self-modifying code.
+ *
+ * r0 holds the buffer address
+ * r1 holds the size in bytes
+ */
+
+#include <arch/chip.h>
+#include <feedback.h>
+
+#if defined(__NEWLIB__) || defined(__BME__)
+#include <sys/page.h>
+#else
+#include <asm/page.h>
+#endif
+
+#ifdef __tilegx__
+/* Share code among Tile family chips but adjust opcodes appropriately. */
+#define slt cmpltu
+#define bbst blbst
+#define bnezt bnzt
+#endif
+
+#if defined(__tilegx__) && __SIZEOF_POINTER__ == 4
+/* Force 32-bit ops so pointers wrap around appropriately. */
+#define ADD_PTR addx
+#define ADDI_PTR addxi
+#else
+#define ADD_PTR add
+#define ADDI_PTR addi
+#endif
+
+ .section .text.__invalidate_icache, "ax"
+ .global __invalidate_icache
+ .type __invalidate_icache,@function
+ .hidden __invalidate_icache
+ .align 8
+__invalidate_icache:
+ FEEDBACK_ENTER(__invalidate_icache)
+ {
+ ADD_PTR r1, r0, r1 /* end of buffer */
+ blez r1, .Lexit /* skip out if size <= 0 */
+ }
+ {
+ ADDI_PTR r1, r1, -1 /* point to last byte to flush */
+ andi r0, r0, -CHIP_L1I_LINE_SIZE() /* align to cache-line size */
+ }
+ {
+ andi r1, r1, -CHIP_L1I_LINE_SIZE() /* last cache line to flush */
+ mf
+ }
+#if CHIP_L1I_CACHE_SIZE() > PAGE_SIZE
+ {
+ moveli r4, CHIP_L1I_CACHE_SIZE() / PAGE_SIZE /* loop counter */
+ move r2, r0 /* remember starting address */
+ }
+#endif
+ drain
+ {
+ slt r3, r0, r1 /* set up loop invariant */
+#if CHIP_L1I_CACHE_SIZE() > PAGE_SIZE
+ moveli r6, PAGE_SIZE
+#endif
+ }
+.Lentry:
+ {
+ icoh r0
+ ADDI_PTR r0, r0, CHIP_L1I_LINE_SIZE() /* advance buffer */
+ }
+ {
+ slt r3, r0, r1 /* check if buffer < buffer + size */
+ bbst r3, .Lentry /* loop if buffer < buffer + size */
+ }
+#if CHIP_L1I_CACHE_SIZE() > PAGE_SIZE
+ {
+ ADD_PTR r2, r2, r6
+ ADD_PTR r1, r1, r6
+ }
+ {
+ move r0, r2
+ addi r4, r4, -1
+ }
+ {
+ slt r3, r0, r1 /* set up loop invariant */
+ bnezt r4, .Lentry
+ }
+#endif
+ drain
+.Lexit:
+ jrp lr
+
+.Lend___invalidate_icache:
+ .size __invalidate_icache, \
+ .Lend___invalidate_icache - __invalidate_icache
diff --git a/arch/tile/lib/atomic_32.c b/arch/tile/lib/atomic_32.c
new file mode 100644
index 0000000..be1e8ac
--- /dev/null
+++ b/arch/tile/lib/atomic_32.c
@@ -0,0 +1,347 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/cache.h>
+#include <linux/delay.h>
+#include <linux/uaccess.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <asm/atomic.h>
+#include <arch/chip.h>
+
+/* The routines in atomic_asm.S are private, so we only declare them here. */
+extern struct __get_user __atomic_cmpxchg(volatile int *p,
+ int *lock, int o, int n);
+extern struct __get_user __atomic_xchg(volatile int *p, int *lock, int n);
+extern struct __get_user __atomic_xchg_add(volatile int *p, int *lock, int n);
+extern struct __get_user __atomic_xchg_add_unless(volatile int *p,
+ int *lock, int o, int n);
+extern struct __get_user __atomic_or(volatile int *p, int *lock, int n);
+extern struct __get_user __atomic_andn(volatile int *p, int *lock, int n);
+extern struct __get_user __atomic_xor(volatile int *p, int *lock, int n);
+
+extern u64 __atomic64_cmpxchg(volatile u64 *p, int *lock, u64 o, u64 n);
+extern u64 __atomic64_xchg(volatile u64 *p, int *lock, u64 n);
+extern u64 __atomic64_xchg_add(volatile u64 *p, int *lock, u64 n);
+extern u64 __atomic64_xchg_add_unless(volatile u64 *p,
+ int *lock, u64 o, u64 n);
+
+
+/* See <asm/atomic.h> */
+#if ATOMIC_LOCKS_FOUND_VIA_TABLE()
+
+/*
+ * A block of memory containing locks for atomic ops. Each instance of this
+ * struct will be homed on a different CPU.
+ */
+struct atomic_locks_on_cpu {
+ int lock[ATOMIC_HASH_L2_SIZE];
+} __attribute__((aligned(ATOMIC_HASH_L2_SIZE * 4)));
+
+static DEFINE_PER_CPU(struct atomic_locks_on_cpu, atomic_lock_pool);
+
+/* The locks we'll use until __init_atomic_per_cpu is called. */
+static struct atomic_locks_on_cpu __initdata initial_atomic_locks;
+
+/* Hash into this vector to get a pointer to lock for the given atomic. */
+struct atomic_locks_on_cpu *atomic_lock_ptr[ATOMIC_HASH_L1_SIZE]
+ __write_once = {
+ [0 ... ATOMIC_HASH_L1_SIZE-1] (&initial_atomic_locks)
+};
+
+#else /* ATOMIC_LOCKS_FOUND_VIA_TABLE() */
+
+/* This page is remapped on startup to be hash-for-home. */
+int atomic_locks[PAGE_SIZE / sizeof(int) /* Only ATOMIC_HASH_SIZE is used */]
+ __attribute__((aligned(PAGE_SIZE), section(".bss.page_aligned")));
+
+#endif /* ATOMIC_LOCKS_FOUND_VIA_TABLE() */
+
+static inline int *__atomic_hashed_lock(volatile void *v)
+{
+ /* NOTE: this code must match "sys_cmpxchg" in kernel/intvec.S */
+#if ATOMIC_LOCKS_FOUND_VIA_TABLE()
+ unsigned long i =
+ (unsigned long) v & ((PAGE_SIZE-1) & -sizeof(long long));
+ unsigned long n = __insn_crc32_32(0, i);
+
+ /* Grab high bits for L1 index. */
+ unsigned long l1_index = n >> ((sizeof(n) * 8) - ATOMIC_HASH_L1_SHIFT);
+ /* Grab low bits for L2 index. */
+ unsigned long l2_index = n & (ATOMIC_HASH_L2_SIZE - 1);
+
+ return &atomic_lock_ptr[l1_index]->lock[l2_index];
+#else
+ /*
+ * Use bits [3, 3 + ATOMIC_HASH_SHIFT) as the lock index.
+ * Using mm works here because atomic_locks is page aligned.
+ */
+ unsigned long ptr = __insn_mm((unsigned long)v >> 1,
+ (unsigned long)atomic_locks,
+ 2, (ATOMIC_HASH_SHIFT + 2) - 1);
+ return (int *)ptr;
+#endif
+}
+
+#ifdef CONFIG_SMP
+/* Return whether the passed pointer is a valid atomic lock pointer. */
+static int is_atomic_lock(int *p)
+{
+#if ATOMIC_LOCKS_FOUND_VIA_TABLE()
+ int i;
+ for (i = 0; i < ATOMIC_HASH_L1_SIZE; ++i) {
+
+ if (p >= &atomic_lock_ptr[i]->lock[0] &&
+ p < &atomic_lock_ptr[i]->lock[ATOMIC_HASH_L2_SIZE]) {
+ return 1;
+ }
+ }
+ return 0;
+#else
+ return p >= &atomic_locks[0] && p < &atomic_locks[ATOMIC_HASH_SIZE];
+#endif
+}
+
+void __atomic_fault_unlock(int *irqlock_word)
+{
+ BUG_ON(!is_atomic_lock(irqlock_word));
+ BUG_ON(*irqlock_word != 1);
+ *irqlock_word = 0;
+}
+
+#endif /* CONFIG_SMP */
+
+static inline int *__atomic_setup(volatile void *v)
+{
+ /* Issue a load to the target to bring it into cache. */
+ *(volatile int *)v;
+ return __atomic_hashed_lock(v);
+}
+
+int _atomic_xchg(atomic_t *v, int n)
+{
+ return __atomic_xchg(&v->counter, __atomic_setup(v), n).val;
+}
+EXPORT_SYMBOL(_atomic_xchg);
+
+int _atomic_xchg_add(atomic_t *v, int i)
+{
+ return __atomic_xchg_add(&v->counter, __atomic_setup(v), i).val;
+}
+EXPORT_SYMBOL(_atomic_xchg_add);
+
+int _atomic_xchg_add_unless(atomic_t *v, int a, int u)
+{
+ /*
+ * Note: argument order is switched here since it is easier
+ * to use the first argument consistently as the "old value"
+ * in the assembly, as is done for _atomic_cmpxchg().
+ */
+ return __atomic_xchg_add_unless(&v->counter, __atomic_setup(v), u, a)
+ .val;
+}
+EXPORT_SYMBOL(_atomic_xchg_add_unless);
+
+int _atomic_cmpxchg(atomic_t *v, int o, int n)
+{
+ return __atomic_cmpxchg(&v->counter, __atomic_setup(v), o, n).val;
+}
+EXPORT_SYMBOL(_atomic_cmpxchg);
+
+unsigned long _atomic_or(volatile unsigned long *p, unsigned long mask)
+{
+ return __atomic_or((int *)p, __atomic_setup(p), mask).val;
+}
+EXPORT_SYMBOL(_atomic_or);
+
+unsigned long _atomic_andn(volatile unsigned long *p, unsigned long mask)
+{
+ return __atomic_andn((int *)p, __atomic_setup(p), mask).val;
+}
+EXPORT_SYMBOL(_atomic_andn);
+
+unsigned long _atomic_xor(volatile unsigned long *p, unsigned long mask)
+{
+ return __atomic_xor((int *)p, __atomic_setup(p), mask).val;
+}
+EXPORT_SYMBOL(_atomic_xor);
+
+
+u64 _atomic64_xchg(atomic64_t *v, u64 n)
+{
+ return __atomic64_xchg(&v->counter, __atomic_setup(v), n);
+}
+EXPORT_SYMBOL(_atomic64_xchg);
+
+u64 _atomic64_xchg_add(atomic64_t *v, u64 i)
+{
+ return __atomic64_xchg_add(&v->counter, __atomic_setup(v), i);
+}
+EXPORT_SYMBOL(_atomic64_xchg_add);
+
+u64 _atomic64_xchg_add_unless(atomic64_t *v, u64 a, u64 u)
+{
+ /*
+ * Note: argument order is switched here since it is easier
+ * to use the first argument consistently as the "old value"
+ * in the assembly, as is done for _atomic_cmpxchg().
+ */
+ return __atomic64_xchg_add_unless(&v->counter, __atomic_setup(v),
+ u, a);
+}
+EXPORT_SYMBOL(_atomic64_xchg_add_unless);
+
+u64 _atomic64_cmpxchg(atomic64_t *v, u64 o, u64 n)
+{
+ return __atomic64_cmpxchg(&v->counter, __atomic_setup(v), o, n);
+}
+EXPORT_SYMBOL(_atomic64_cmpxchg);
+
+
+static inline int *__futex_setup(__user int *v)
+{
+ /*
+ * Issue a prefetch to the counter to bring it into cache.
+ * As for __atomic_setup, but we can't do a read into the L1
+ * since it might fault; instead we do a prefetch into the L2.
+ */
+ __insn_prefetch(v);
+ return __atomic_hashed_lock(v);
+}
+
+struct __get_user futex_set(int *v, int i)
+{
+ return __atomic_xchg(v, __futex_setup(v), i);
+}
+
+struct __get_user futex_add(int *v, int n)
+{
+ return __atomic_xchg_add(v, __futex_setup(v), n);
+}
+
+struct __get_user futex_or(int *v, int n)
+{
+ return __atomic_or(v, __futex_setup(v), n);
+}
+
+struct __get_user futex_andn(int *v, int n)
+{
+ return __atomic_andn(v, __futex_setup(v), n);
+}
+
+struct __get_user futex_xor(int *v, int n)
+{
+ return __atomic_xor(v, __futex_setup(v), n);
+}
+
+struct __get_user futex_cmpxchg(int *v, int o, int n)
+{
+ return __atomic_cmpxchg(v, __futex_setup(v), o, n);
+}
+
+/*
+ * If any of the atomic or futex routines hit a bad address (not in
+ * the page tables at kernel PL) this routine is called. The futex
+ * routines are never used on kernel space, and the normal atomics and
+ * bitops are never used on user space. So a fault on kernel space
+ * must be fatal, but a fault on userspace is a futex fault and we
+ * need to return -EFAULT. Note that the context this routine is
+ * invoked in is the context of the "_atomic_xxx()" routines called
+ * by the functions in this file.
+ */
+struct __get_user __atomic_bad_address(int *addr)
+{
+ if (unlikely(!access_ok(VERIFY_WRITE, addr, sizeof(int))))
+ panic("Bad address used for kernel atomic op: %p\n", addr);
+ return (struct __get_user) { .err = -EFAULT };
+}
+
+
+#if CHIP_HAS_CBOX_HOME_MAP()
+static int __init noatomichash(char *str)
+{
+ printk("noatomichash is deprecated.\n");
+ return 1;
+}
+__setup("noatomichash", noatomichash);
+#endif
+
+void __init __init_atomic_per_cpu(void)
+{
+#if ATOMIC_LOCKS_FOUND_VIA_TABLE()
+
+ unsigned int i;
+ int actual_cpu;
+
+ /*
+ * Before this is called from setup, we just have one lock for
+ * all atomic objects/operations. Here we replace the
+ * elements of atomic_lock_ptr so that they point at per_cpu
+ * integers. This seemingly over-complex approach stems from
+ * the fact that DEFINE_PER_CPU defines an entry for each cpu
+ * in the grid, not each cpu from 0..ATOMIC_HASH_SIZE-1. But
+ * for efficient hashing of atomics to their locks we want a
+ * compile time constant power of 2 for the size of this
+ * table, so we use ATOMIC_HASH_SIZE.
+ *
+ * Here we populate atomic_lock_ptr from the per cpu
+ * atomic_lock_pool, interspersing by actual cpu so that
+ * subsequent elements are homed on consecutive cpus.
+ */
+
+ actual_cpu = cpumask_first(cpu_possible_mask);
+
+ for (i = 0; i < ATOMIC_HASH_L1_SIZE; ++i) {
+ /*
+ * Preincrement to slightly bias against using cpu 0,
+ * which has plenty of stuff homed on it already.
+ */
+ actual_cpu = cpumask_next(actual_cpu, cpu_possible_mask);
+ if (actual_cpu >= nr_cpu_ids)
+ actual_cpu = cpumask_first(cpu_possible_mask);
+
+ atomic_lock_ptr[i] = &per_cpu(atomic_lock_pool, actual_cpu);
+ }
+
+#else /* ATOMIC_LOCKS_FOUND_VIA_TABLE() */
+
+ /* Validate power-of-two and "bigger than cpus" assumption */
+ BUG_ON(ATOMIC_HASH_SIZE & (ATOMIC_HASH_SIZE-1));
+ BUG_ON(ATOMIC_HASH_SIZE < nr_cpu_ids);
+
+ /*
+ * On TILEPro we prefer to use a single hash-for-home
+ * page, since this means atomic operations are less
+ * likely to encounter a TLB fault and thus should
+ * in general perform faster. You may wish to disable
+ * this in situations where few hash-for-home tiles
+ * are configured.
+ */
+ BUG_ON((unsigned long)atomic_locks % PAGE_SIZE != 0);
+
+ /* The locks must all fit on one page. */
+ BUG_ON(ATOMIC_HASH_SIZE * sizeof(int) > PAGE_SIZE);
+
+ /*
+ * We use the page offset of the atomic value's address as
+ * an index into atomic_locks, excluding the low 3 bits.
+ * That should not produce more indices than ATOMIC_HASH_SIZE.
+ */
+ BUG_ON((PAGE_SIZE >> 3) > ATOMIC_HASH_SIZE);
+
+#endif /* ATOMIC_LOCKS_FOUND_VIA_TABLE() */
+
+ /* The futex code makes this assumption, so we validate it here. */
+ BUG_ON(sizeof(atomic_t) != sizeof(int));
+}
diff --git a/arch/tile/lib/atomic_asm_32.S b/arch/tile/lib/atomic_asm_32.S
new file mode 100644
index 0000000..c0d0585
--- /dev/null
+++ b/arch/tile/lib/atomic_asm_32.S
@@ -0,0 +1,197 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ *
+ * Support routines for atomic operations. Each function takes:
+ *
+ * r0: address to manipulate
+ * r1: pointer to atomic lock guarding this operation (for FUTEX_LOCK_REG)
+ * r2: new value to write, or for cmpxchg/add_unless, value to compare against
+ * r3: (cmpxchg/xchg_add_unless) new value to write or add;
+ * (atomic64 ops) high word of value to write
+ * r4/r5: (cmpxchg64/add_unless64) new value to write or add
+ *
+ * The 32-bit routines return a "struct __get_user" so that the futex code
+ * has an opportunity to return -EFAULT to the user if needed.
+ * The 64-bit routines just return a "long long" with the value,
+ * since they are only used from kernel space and don't expect to fault.
+ * Support for 16-bit ops is included in the framework but we don't provide
+ * any (x86_64 has an atomic_inc_short(), so we might want to some day).
+ *
+ * Note that the caller is advised to issue a suitable L1 or L2
+ * prefetch on the address being manipulated to avoid extra stalls.
+ * In addition, the hot path is on two icache lines, and we start with
+ * a jump to the second line to make sure they are both in cache so
+ * that we never stall waiting on icache fill while holding the lock.
+ * (This doesn't work out with most 64-bit ops, since they consume
+ * too many bundles, so may take an extra i-cache stall.)
+ *
+ * These routines set the INTERRUPT_CRITICAL_SECTION bit, just
+ * like sys_cmpxchg(), so that NMIs like PERF_COUNT will not interrupt
+ * the code, just page faults.
+ *
+ * If the load or store faults in a way that can be directly fixed in
+ * the do_page_fault_ics() handler (e.g. a vmalloc reference) we fix it
+ * directly, return to the instruction that faulted, and retry it.
+ *
+ * If the load or store faults in a way that potentially requires us
+ * to release the atomic lock, then retry (e.g. a migrating PTE), we
+ * reset the PC in do_page_fault_ics() to the "tns" instruction so
+ * that on return we will reacquire the lock and restart the op. We
+ * are somewhat overloading the exception_table_entry notion by doing
+ * this, since those entries are not normally used for migrating PTEs.
+ *
+ * If the main page fault handler discovers a bad address, it will see
+ * the PC pointing to the "tns" instruction (due to the earlier
+ * exception_table_entry processing in do_page_fault_ics), and
+ * re-reset the PC to the fault handler, atomic_bad_address(), which
+ * effectively takes over from the atomic op and can either return a
+ * bad "struct __get_user" (for user addresses) or can just panic (for
+ * bad kernel addresses).
+ *
+ * Note that if the value we would store is the same as what we
+ * loaded, we bypass the load. Other platforms with true atomics can
+ * make the guarantee that a non-atomic __clear_bit(), for example,
+ * can safely race with an atomic test_and_set_bit(); this example is
+ * from bit_spinlock.h in slub_lock() / slub_unlock(). We can't do
+ * that on Tile since the "atomic" op is really just a
+ * read/modify/write, and can race with the non-atomic
+ * read/modify/write. However, if we can short-circuit the write when
+ * it is not needed, in the atomic case, we avoid the race.
+ */
+
+#include <linux/linkage.h>
+#include <asm/atomic.h>
+#include <asm/page.h>
+#include <asm/processor.h>
+
+ .section .text.atomic,"ax"
+ENTRY(__start_atomic_asm_code)
+
+ .macro atomic_op, name, bitwidth, body
+ .align 64
+STD_ENTRY_SECTION(__atomic\name, .text.atomic)
+ {
+ movei r24, 1
+ j 4f /* branch to second cache line */
+ }
+1: {
+ .ifc \bitwidth,16
+ lh r22, r0
+ .else
+ lw r22, r0
+ addi r23, r0, 4
+ .endif
+ }
+ .ifc \bitwidth,64
+ lw r23, r23
+ .endif
+ \body /* set r24, and r25 if 64-bit */
+ {
+ seq r26, r22, r24
+ seq r27, r23, r25
+ }
+ .ifc \bitwidth,64
+ bbnst r27, 2f
+ .endif
+ bbs r26, 3f /* skip write-back if it's the same value */
+2: {
+ .ifc \bitwidth,16
+ sh r0, r24
+ .else
+ sw r0, r24
+ addi r23, r0, 4
+ .endif
+ }
+ .ifc \bitwidth,64
+ sw r23, r25
+ .endif
+ mf
+3: {
+ move r0, r22
+ .ifc \bitwidth,64
+ move r1, r23
+ .else
+ move r1, zero
+ .endif
+ sw ATOMIC_LOCK_REG_NAME, zero
+ }
+ mtspr INTERRUPT_CRITICAL_SECTION, zero
+ jrp lr
+4: {
+ move ATOMIC_LOCK_REG_NAME, r1
+ mtspr INTERRUPT_CRITICAL_SECTION, r24
+ }
+#ifndef CONFIG_SMP
+ j 1b /* no atomic locks */
+#else
+ {
+ tns r21, ATOMIC_LOCK_REG_NAME
+ moveli r23, 2048 /* maximum backoff time in cycles */
+ }
+ {
+ bzt r21, 1b /* branch if lock acquired */
+ moveli r25, 32 /* starting backoff time in cycles */
+ }
+5: mtspr INTERRUPT_CRITICAL_SECTION, zero
+ mfspr r26, CYCLE_LOW /* get start point for this backoff */
+6: mfspr r22, CYCLE_LOW /* test to see if we've backed off enough */
+ sub r22, r22, r26
+ slt r22, r22, r25
+ bbst r22, 6b
+ {
+ mtspr INTERRUPT_CRITICAL_SECTION, r24
+ shli r25, r25, 1 /* double the backoff; retry the tns */
+ }
+ {
+ tns r21, ATOMIC_LOCK_REG_NAME
+ slt r26, r23, r25 /* is the proposed backoff too big? */
+ }
+ {
+ bzt r21, 1b /* branch if lock acquired */
+ mvnz r25, r26, r23
+ }
+ j 5b
+#endif
+ STD_ENDPROC(__atomic\name)
+ .ifc \bitwidth,32
+ .pushsection __ex_table,"a"
+ .word 1b, __atomic\name
+ .word 2b, __atomic\name
+ .word __atomic\name, __atomic_bad_address
+ .popsection
+ .endif
+ .endm
+
+atomic_op _cmpxchg, 32, "seq r26, r22, r2; { bbns r26, 3f; move r24, r3 }"
+atomic_op _xchg, 32, "move r24, r2"
+atomic_op _xchg_add, 32, "add r24, r22, r2"
+atomic_op _xchg_add_unless, 32, \
+ "sne r26, r22, r2; { bbns r26, 3f; add r24, r22, r3 }"
+atomic_op _or, 32, "or r24, r22, r2"
+atomic_op _andn, 32, "nor r2, r2, zero; and r24, r22, r2"
+atomic_op _xor, 32, "xor r24, r22, r2"
+
+atomic_op 64_cmpxchg, 64, "{ seq r26, r22, r2; seq r27, r23, r3 }; \
+ { bbns r26, 3f; move r24, r4 }; { bbns r27, 3f; move r25, r5 }"
+atomic_op 64_xchg, 64, "{ move r24, r2; move r25, r3 }"
+atomic_op 64_xchg_add, 64, "{ add r24, r22, r2; add r25, r23, r3 }; \
+ slt_u r26, r24, r22; add r25, r25, r26"
+atomic_op 64_xchg_add_unless, 64, \
+ "{ sne r26, r22, r2; sne r27, r23, r3 }; \
+ { bbns r26, 3f; add r24, r22, r4 }; \
+ { bbns r27, 3f; add r25, r23, r5 }; \
+ slt_u r26, r24, r22; add r25, r25, r26"
+
+ jrp lr /* happy backtracer */
+
+ENTRY(__end_atomic_asm_code)
diff --git a/arch/tile/lib/checksum.c b/arch/tile/lib/checksum.c
new file mode 100644
index 0000000..e4bab5b
--- /dev/null
+++ b/arch/tile/lib/checksum.c
@@ -0,0 +1,102 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ * Support code for the main lib/checksum.c.
+ */
+
+#include <net/checksum.h>
+#include <linux/module.h>
+
+static inline unsigned int longto16(unsigned long x)
+{
+ unsigned long ret;
+#ifdef __tilegx__
+ ret = __insn_v2sadu(x, 0);
+ ret = __insn_v2sadu(ret, 0);
+#else
+ ret = __insn_sadh_u(x, 0);
+ ret = __insn_sadh_u(ret, 0);
+#endif
+ return ret;
+}
+
+__wsum do_csum(const unsigned char *buff, int len)
+{
+ int odd, count;
+ unsigned long result = 0;
+
+ if (len <= 0)
+ goto out;
+ odd = 1 & (unsigned long) buff;
+ if (odd) {
+ result = (*buff << 8);
+ len--;
+ buff++;
+ }
+ count = len >> 1; /* nr of 16-bit words.. */
+ if (count) {
+ if (2 & (unsigned long) buff) {
+ result += *(const unsigned short *)buff;
+ count--;
+ len -= 2;
+ buff += 2;
+ }
+ count >>= 1; /* nr of 32-bit words.. */
+ if (count) {
+#ifdef __tilegx__
+ if (4 & (unsigned long) buff) {
+ unsigned int w = *(const unsigned int *)buff;
+ result = __insn_v2sadau(result, w, 0);
+ count--;
+ len -= 4;
+ buff += 4;
+ }
+ count >>= 1; /* nr of 64-bit words.. */
+#endif
+
+ /*
+ * This algorithm could wrap around for very
+ * large buffers, but those should be impossible.
+ */
+ BUG_ON(count >= 65530);
+
+ while (count) {
+ unsigned long w = *(const unsigned long *)buff;
+ count--;
+ buff += sizeof(w);
+#ifdef __tilegx__
+ result = __insn_v2sadau(result, w, 0);
+#else
+ result = __insn_sadah_u(result, w, 0);
+#endif
+ }
+#ifdef __tilegx__
+ if (len & 4) {
+ unsigned int w = *(const unsigned int *)buff;
+ result = __insn_v2sadau(result, w, 0);
+ buff += 4;
+ }
+#endif
+ }
+ if (len & 2) {
+ result += *(const unsigned short *) buff;
+ buff += 2;
+ }
+ }
+ if (len & 1)
+ result += *buff;
+ result = longto16(result);
+ if (odd)
+ result = swab16(result);
+out:
+ return result;
+}
diff --git a/arch/tile/lib/cpumask.c b/arch/tile/lib/cpumask.c
new file mode 100644
index 0000000..af745b3
--- /dev/null
+++ b/arch/tile/lib/cpumask.c
@@ -0,0 +1,51 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/cpumask.h>
+#include <linux/ctype.h>
+#include <linux/errno.h>
+
+/*
+ * Allow cropping out bits beyond the end of the array.
+ * Move to "lib" directory if more clients want to use this routine.
+ */
+int bitmap_parselist_crop(const char *bp, unsigned long *maskp, int nmaskbits)
+{
+ unsigned a, b;
+
+ bitmap_zero(maskp, nmaskbits);
+ do {
+ if (!isdigit(*bp))
+ return -EINVAL;
+ a = simple_strtoul(bp, (char **)&bp, 10);
+ b = a;
+ if (*bp == '-') {
+ bp++;
+ if (!isdigit(*bp))
+ return -EINVAL;
+ b = simple_strtoul(bp, (char **)&bp, 10);
+ }
+ if (!(a <= b))
+ return -EINVAL;
+ if (b >= nmaskbits)
+ b = nmaskbits-1;
+ while (a <= b) {
+ set_bit(a, maskp);
+ a++;
+ }
+ if (*bp == ',')
+ bp++;
+ } while (*bp != '\0' && *bp != '\n');
+ return 0;
+}
diff --git a/arch/tile/lib/delay.c b/arch/tile/lib/delay.c
new file mode 100644
index 0000000..5801b03
--- /dev/null
+++ b/arch/tile/lib/delay.c
@@ -0,0 +1,34 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/module.h>
+#include <linux/delay.h>
+#include <linux/thread_info.h>
+#include <asm/fixmap.h>
+#include <hv/hypervisor.h>
+
+void __udelay(unsigned long usecs)
+{
+ hv_nanosleep(usecs * 1000);
+}
+EXPORT_SYMBOL(__udelay);
+
+void __ndelay(unsigned long nsecs)
+{
+ hv_nanosleep(nsecs);
+}
+EXPORT_SYMBOL(__ndelay);
+
+/* FIXME: should be declared in a header somewhere. */
+EXPORT_SYMBOL(__delay);
diff --git a/arch/tile/lib/exports.c b/arch/tile/lib/exports.c
new file mode 100644
index 0000000..af8e70e
--- /dev/null
+++ b/arch/tile/lib/exports.c
@@ -0,0 +1,78 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ *
+ * Exports from assembler code and from libtile-cc.
+ */
+
+#include <linux/module.h>
+
+/* arch/tile/lib/usercopy.S */
+#include <linux/uaccess.h>
+EXPORT_SYMBOL(__get_user_1);
+EXPORT_SYMBOL(__get_user_2);
+EXPORT_SYMBOL(__get_user_4);
+EXPORT_SYMBOL(__put_user_1);
+EXPORT_SYMBOL(__put_user_2);
+EXPORT_SYMBOL(__put_user_4);
+EXPORT_SYMBOL(__put_user_8);
+EXPORT_SYMBOL(strnlen_user_asm);
+EXPORT_SYMBOL(strncpy_from_user_asm);
+EXPORT_SYMBOL(clear_user_asm);
+
+/* arch/tile/kernel/entry.S */
+#include <linux/kernel.h>
+#include <asm/processor.h>
+EXPORT_SYMBOL(current_text_addr);
+EXPORT_SYMBOL(dump_stack);
+
+/* arch/tile/lib/__memcpy.S */
+/* NOTE: on TILE64, these symbols appear in arch/tile/lib/memcpy_tile64.c */
+EXPORT_SYMBOL(memcpy);
+EXPORT_SYMBOL(__copy_to_user_inatomic);
+EXPORT_SYMBOL(__copy_from_user_inatomic);
+EXPORT_SYMBOL(__copy_from_user_zeroing);
+
+/* hypervisor glue */
+#include <hv/hypervisor.h>
+EXPORT_SYMBOL(hv_dev_open);
+EXPORT_SYMBOL(hv_dev_pread);
+EXPORT_SYMBOL(hv_dev_pwrite);
+EXPORT_SYMBOL(hv_dev_close);
+
+/* -ltile-cc */
+uint32_t __udivsi3(uint32_t dividend, uint32_t divisor);
+EXPORT_SYMBOL(__udivsi3);
+int32_t __divsi3(int32_t dividend, int32_t divisor);
+EXPORT_SYMBOL(__divsi3);
+uint64_t __udivdi3(uint64_t dividend, uint64_t divisor);
+EXPORT_SYMBOL(__udivdi3);
+int64_t __divdi3(int64_t dividend, int64_t divisor);
+EXPORT_SYMBOL(__divdi3);
+uint32_t __umodsi3(uint32_t dividend, uint32_t divisor);
+EXPORT_SYMBOL(__umodsi3);
+int32_t __modsi3(int32_t dividend, int32_t divisor);
+EXPORT_SYMBOL(__modsi3);
+uint64_t __umoddi3(uint64_t dividend, uint64_t divisor);
+EXPORT_SYMBOL(__umoddi3);
+int64_t __moddi3(int64_t dividend, int64_t divisor);
+EXPORT_SYMBOL(__moddi3);
+#ifndef __tilegx__
+uint64_t __ll_mul(uint64_t n0, uint64_t n1);
+EXPORT_SYMBOL(__ll_mul);
+#endif
+#ifndef __tilegx__
+int64_t __muldi3(int64_t, int64_t);
+EXPORT_SYMBOL(__muldi3);
+uint64_t __lshrdi3(uint64_t, unsigned int);
+EXPORT_SYMBOL(__lshrdi3);
+#endif
diff --git a/arch/tile/lib/mb_incoherent.S b/arch/tile/lib/mb_incoherent.S
new file mode 100644
index 0000000..989ad7b
--- /dev/null
+++ b/arch/tile/lib/mb_incoherent.S
@@ -0,0 +1,34 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ *
+ * Assembly code for invoking the HV's fence_incoherent syscall.
+ */
+
+#include <linux/linkage.h>
+#include <hv/syscall_public.h>
+#include <arch/abi.h>
+#include <arch/chip.h>
+
+#if !CHIP_HAS_MF_WAITS_FOR_VICTIMS()
+
+/*
+ * Invoke the hypervisor's fence_incoherent syscall, which guarantees
+ * that all victims for cachelines homed on this tile have reached memory.
+ */
+STD_ENTRY(__mb_incoherent)
+ moveli TREG_SYSCALL_NR_NAME, HV_SYS_fence_incoherent
+ swint2
+ jrp lr
+ STD_ENDPROC(__mb_incoherent)
+
+#endif
diff --git a/arch/tile/lib/memchr_32.c b/arch/tile/lib/memchr_32.c
new file mode 100644
index 0000000..6235283
--- /dev/null
+++ b/arch/tile/lib/memchr_32.c
@@ -0,0 +1,68 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/module.h>
+
+void *memchr(const void *s, int c, size_t n)
+{
+ /* Get an aligned pointer. */
+ const uintptr_t s_int = (uintptr_t) s;
+ const uint32_t *p = (const uint32_t *)(s_int & -4);
+
+ /* Create four copies of the byte for which we are looking. */
+ const uint32_t goal = 0x01010101 * (uint8_t) c;
+
+ /* Read the first word, but munge it so that bytes before the array
+ * will not match goal.
+ *
+ * Note that this shift count expression works because we know
+ * shift counts are taken mod 32.
+ */
+ const uint32_t before_mask = (1 << (s_int << 3)) - 1;
+ uint32_t v = (*p | before_mask) ^ (goal & before_mask);
+
+ /* Compute the address of the last byte. */
+ const char *const last_byte_ptr = (const char *)s + n - 1;
+
+ /* Compute the address of the word containing the last byte. */
+ const uint32_t *const last_word_ptr =
+ (const uint32_t *)((uintptr_t) last_byte_ptr & -4);
+
+ uint32_t bits;
+ char *ret;
+
+ if (__builtin_expect(n == 0, 0)) {
+ /* Don't dereference any memory if the array is empty. */
+ return NULL;
+ }
+
+ while ((bits = __insn_seqb(v, goal)) == 0) {
+ if (__builtin_expect(p == last_word_ptr, 0)) {
+ /* We already read the last word in the array,
+ * so give up.
+ */
+ return NULL;
+ }
+ v = *++p;
+ }
+
+ /* We found a match, but it might be in a byte past the end
+ * of the array.
+ */
+ ret = ((char *)p) + (__insn_ctz(bits) >> 3);
+ return (ret <= last_byte_ptr) ? ret : NULL;
+}
+EXPORT_SYMBOL(memchr);
diff --git a/arch/tile/lib/memcpy_32.S b/arch/tile/lib/memcpy_32.S
new file mode 100644
index 0000000..f92984b
--- /dev/null
+++ b/arch/tile/lib/memcpy_32.S
@@ -0,0 +1,628 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ *
+ * This file shares the implementation of the userspace memcpy and
+ * the kernel's memcpy, copy_to_user and copy_from_user.
+ */
+
+#include <arch/chip.h>
+
+#if CHIP_HAS_WH64() || defined(MEMCPY_TEST_WH64)
+#define MEMCPY_USE_WH64
+#endif
+
+
+#include <linux/linkage.h>
+
+/* On TILE64, we wrap these functions via arch/tile/lib/memcpy_tile64.c */
+#if !CHIP_HAS_COHERENT_LOCAL_CACHE()
+#define memcpy __memcpy_asm
+#define __copy_to_user_inatomic __copy_to_user_inatomic_asm
+#define __copy_from_user_inatomic __copy_from_user_inatomic_asm
+#define __copy_from_user_zeroing __copy_from_user_zeroing_asm
+#endif
+
+#define IS_MEMCPY 0
+#define IS_COPY_FROM_USER 1
+#define IS_COPY_FROM_USER_ZEROING 2
+#define IS_COPY_TO_USER -1
+
+ .section .text.memcpy_common, "ax"
+ .align 64
+
+/* Use this to preface each bundle that can cause an exception so
+ * the kernel can clean up properly. The special cleanup code should
+ * not use these, since it knows what it is doing.
+ */
+#define EX \
+ .pushsection __ex_table, "a"; \
+ .word 9f, memcpy_common_fixup; \
+ .popsection; \
+ 9
+
+
+/* __copy_from_user_inatomic takes the kernel target address in r0,
+ * the user source in r1, and the bytes to copy in r2.
+ * It returns the number of uncopiable bytes (hopefully zero) in r0.
+ */
+ENTRY(__copy_from_user_inatomic)
+.type __copy_from_user_inatomic, @function
+ FEEDBACK_ENTER_EXPLICIT(__copy_from_user_inatomic, \
+ .text.memcpy_common, \
+ .Lend_memcpy_common - __copy_from_user_inatomic)
+ { movei r29, IS_COPY_FROM_USER; j memcpy_common }
+ .size __copy_from_user_inatomic, . - __copy_from_user_inatomic
+
+/* __copy_from_user_zeroing is like __copy_from_user_inatomic, but
+ * any uncopiable bytes are zeroed in the target.
+ */
+ENTRY(__copy_from_user_zeroing)
+.type __copy_from_user_zeroing, @function
+ FEEDBACK_REENTER(__copy_from_user_inatomic)
+ { movei r29, IS_COPY_FROM_USER_ZEROING; j memcpy_common }
+ .size __copy_from_user_zeroing, . - __copy_from_user_zeroing
+
+/* __copy_to_user_inatomic takes the user target address in r0,
+ * the kernel source in r1, and the bytes to copy in r2.
+ * It returns the number of uncopiable bytes (hopefully zero) in r0.
+ */
+ENTRY(__copy_to_user_inatomic)
+.type __copy_to_user_inatomic, @function
+ FEEDBACK_REENTER(__copy_from_user_inatomic)
+ { movei r29, IS_COPY_TO_USER; j memcpy_common }
+ .size __copy_to_user_inatomic, . - __copy_to_user_inatomic
+
+ENTRY(memcpy)
+.type memcpy, @function
+ FEEDBACK_REENTER(__copy_from_user_inatomic)
+ { movei r29, IS_MEMCPY }
+ .size memcpy, . - memcpy
+ /* Fall through */
+
+ .type memcpy_common, @function
+memcpy_common:
+ /* On entry, r29 holds one of the IS_* macro values from above. */
+
+
+ /* r0 is the dest, r1 is the source, r2 is the size. */
+
+ /* Save aside original dest so we can return it at the end. */
+ { sw sp, lr; move r23, r0; or r4, r0, r1 }
+
+ /* Check for an empty size. */
+ { bz r2, .Ldone; andi r4, r4, 3 }
+
+ /* Save aside original values in case of a fault. */
+ { move r24, r1; move r25, r2 }
+ move r27, lr
+
+ /* Check for an unaligned source or dest. */
+ { bnz r4, .Lcopy_unaligned_maybe_many; addli r4, r2, -256 }
+
+.Lcheck_aligned_copy_size:
+ /* If we are copying < 256 bytes, branch to simple case. */
+ { blzt r4, .Lcopy_8_check; slti_u r8, r2, 8 }
+
+ /* Copying >= 256 bytes, so jump to complex prefetching loop. */
+ { andi r6, r1, 63; j .Lcopy_many }
+
+/*
+ *
+ * Aligned 4 byte at a time copy loop
+ *
+ */
+
+.Lcopy_8_loop:
+ /* Copy two words at a time to hide load latency. */
+EX: { lw r3, r1; addi r1, r1, 4; slti_u r8, r2, 16 }
+EX: { lw r4, r1; addi r1, r1, 4 }
+EX: { sw r0, r3; addi r0, r0, 4; addi r2, r2, -4 }
+EX: { sw r0, r4; addi r0, r0, 4; addi r2, r2, -4 }
+.Lcopy_8_check:
+ { bzt r8, .Lcopy_8_loop; slti_u r4, r2, 4 }
+
+ /* Copy odd leftover word, if any. */
+ { bnzt r4, .Lcheck_odd_stragglers }
+EX: { lw r3, r1; addi r1, r1, 4 }
+EX: { sw r0, r3; addi r0, r0, 4; addi r2, r2, -4 }
+
+.Lcheck_odd_stragglers:
+ { bnz r2, .Lcopy_unaligned_few }
+
+.Ldone:
+ /* For memcpy return original dest address, else zero. */
+ { mz r0, r29, r23; jrp lr }
+
+
+/*
+ *
+ * Prefetching multiple cache line copy handler (for large transfers).
+ *
+ */
+
+ /* Copy words until r1 is cache-line-aligned. */
+.Lalign_loop:
+EX: { lw r3, r1; addi r1, r1, 4 }
+ { andi r6, r1, 63 }
+EX: { sw r0, r3; addi r0, r0, 4; addi r2, r2, -4 }
+.Lcopy_many:
+ { bnzt r6, .Lalign_loop; addi r9, r0, 63 }
+
+ { addi r3, r1, 60; andi r9, r9, -64 }
+
+#ifdef MEMCPY_USE_WH64
+ /* No need to prefetch dst, we'll just do the wh64
+ * right before we copy a line.
+ */
+#endif
+
+EX: { lw r5, r3; addi r3, r3, 64; movei r4, 1 }
+ /* Intentionally stall for a few cycles to leave L2 cache alone. */
+ { bnzt zero, .; move r27, lr }
+EX: { lw r6, r3; addi r3, r3, 64 }
+ /* Intentionally stall for a few cycles to leave L2 cache alone. */
+ { bnzt zero, . }
+EX: { lw r7, r3; addi r3, r3, 64 }
+#ifndef MEMCPY_USE_WH64
+ /* Prefetch the dest */
+ /* Intentionally stall for a few cycles to leave L2 cache alone. */
+ { bnzt zero, . }
+ /* Use a real load to cause a TLB miss if necessary. We aren't using
+ * r28, so this should be fine.
+ */
+EX: { lw r28, r9; addi r9, r9, 64 }
+ /* Intentionally stall for a few cycles to leave L2 cache alone. */
+ { bnzt zero, . }
+ { prefetch r9; addi r9, r9, 64 }
+ /* Intentionally stall for a few cycles to leave L2 cache alone. */
+ { bnzt zero, . }
+ { prefetch r9; addi r9, r9, 64 }
+#endif
+ /* Intentionally stall for a few cycles to leave L2 cache alone. */
+ { bz zero, .Lbig_loop2 }
+
+ /* On entry to this loop:
+ * - r0 points to the start of dst line 0
+ * - r1 points to start of src line 0
+ * - r2 >= (256 - 60), only the first time the loop trips.
+ * - r3 contains r1 + 128 + 60 [pointer to end of source line 2]
+ * This is our prefetch address. When we get near the end
+ * rather than prefetching off the end this is changed to point
+ * to some "safe" recently loaded address.
+ * - r5 contains *(r1 + 60) [i.e. last word of source line 0]
+ * - r6 contains *(r1 + 64 + 60) [i.e. last word of source line 1]
+ * - r9 contains ((r0 + 63) & -64)
+ * [start of next dst cache line.]
+ */
+
+.Lbig_loop:
+ { jal .Lcopy_line2; add r15, r1, r2 }
+
+.Lbig_loop2:
+ /* Copy line 0, first stalling until r5 is ready. */
+EX: { move r12, r5; lw r16, r1 }
+ { bz r4, .Lcopy_8_check; slti_u r8, r2, 8 }
+ /* Prefetch several lines ahead. */
+EX: { lw r5, r3; addi r3, r3, 64 }
+ { jal .Lcopy_line }
+
+ /* Copy line 1, first stalling until r6 is ready. */
+EX: { move r12, r6; lw r16, r1 }
+ { bz r4, .Lcopy_8_check; slti_u r8, r2, 8 }
+ /* Prefetch several lines ahead. */
+EX: { lw r6, r3; addi r3, r3, 64 }
+ { jal .Lcopy_line }
+
+ /* Copy line 2, first stalling until r7 is ready. */
+EX: { move r12, r7; lw r16, r1 }
+ { bz r4, .Lcopy_8_check; slti_u r8, r2, 8 }
+ /* Prefetch several lines ahead. */
+EX: { lw r7, r3; addi r3, r3, 64 }
+ /* Use up a caches-busy cycle by jumping back to the top of the
+ * loop. Might as well get it out of the way now.
+ */
+ { j .Lbig_loop }
+
+
+ /* On entry:
+ * - r0 points to the destination line.
+ * - r1 points to the source line.
+ * - r3 is the next prefetch address.
+ * - r9 holds the last address used for wh64.
+ * - r12 = WORD_15
+ * - r16 = WORD_0.
+ * - r17 == r1 + 16.
+ * - r27 holds saved lr to restore.
+ *
+ * On exit:
+ * - r0 is incremented by 64.
+ * - r1 is incremented by 64, unless that would point to a word
+ * beyond the end of the source array, in which case it is redirected
+ * to point to an arbitrary word already in the cache.
+ * - r2 is decremented by 64.
+ * - r3 is unchanged, unless it points to a word beyond the
+ * end of the source array, in which case it is redirected
+ * to point to an arbitrary word already in the cache.
+ * Redirecting is OK since if we are that close to the end
+ * of the array we will not come back to this subroutine
+ * and use the contents of the prefetched address.
+ * - r4 is nonzero iff r2 >= 64.
+ * - r9 is incremented by 64, unless it points beyond the
+ * end of the last full destination cache line, in which
+ * case it is redirected to a "safe address" that can be
+ * clobbered (sp - 64)
+ * - lr contains the value in r27.
+ */
+
+/* r26 unused */
+
+.Lcopy_line:
+ /* TODO: when r3 goes past the end, we would like to redirect it
+ * to prefetch the last partial cache line (if any) just once, for the
+ * benefit of the final cleanup loop. But we don't want to
+ * prefetch that line more than once, or subsequent prefetches
+ * will go into the RTF. But then .Lbig_loop should unconditionally
+ * branch to top of loop to execute final prefetch, and its
+ * nop should become a conditional branch.
+ */
+
+ /* We need two non-memory cycles here to cover the resources
+ * used by the loads initiated by the caller.
+ */
+ { add r15, r1, r2 }
+.Lcopy_line2:
+ { slt_u r13, r3, r15; addi r17, r1, 16 }
+
+ /* NOTE: this will stall for one cycle as L1 is busy. */
+
+ /* Fill second L1D line. */
+EX: { lw r17, r17; addi r1, r1, 48; mvz r3, r13, r1 } /* r17 = WORD_4 */
+
+#ifdef MEMCPY_TEST_WH64
+ /* Issue a fake wh64 that clobbers the destination words
+ * with random garbage, for testing.
+ */
+ { movei r19, 64; crc32_32 r10, r2, r9 }
+.Lwh64_test_loop:
+EX: { sw r9, r10; addi r9, r9, 4; addi r19, r19, -4 }
+ { bnzt r19, .Lwh64_test_loop; crc32_32 r10, r10, r19 }
+#elif CHIP_HAS_WH64()
+ /* Prepare destination line for writing. */
+EX: { wh64 r9; addi r9, r9, 64 }
+#else
+ /* Prefetch dest line */
+ { prefetch r9; addi r9, r9, 64 }
+#endif
+ /* Load seven words that are L1D hits to cover wh64 L2 usage. */
+
+ /* Load the three remaining words from the last L1D line, which
+ * we know has already filled the L1D.
+ */
+EX: { lw r4, r1; addi r1, r1, 4; addi r20, r1, 16 } /* r4 = WORD_12 */
+EX: { lw r8, r1; addi r1, r1, 4; slt_u r13, r20, r15 }/* r8 = WORD_13 */
+EX: { lw r11, r1; addi r1, r1, -52; mvz r20, r13, r1 } /* r11 = WORD_14 */
+
+ /* Load the three remaining words from the first L1D line, first
+ * stalling until it has filled by "looking at" r16.
+ */
+EX: { lw r13, r1; addi r1, r1, 4; move zero, r16 } /* r13 = WORD_1 */
+EX: { lw r14, r1; addi r1, r1, 4 } /* r14 = WORD_2 */
+EX: { lw r15, r1; addi r1, r1, 8; addi r10, r0, 60 } /* r15 = WORD_3 */
+
+ /* Load second word from the second L1D line, first
+ * stalling until it has filled by "looking at" r17.
+ */
+EX: { lw r19, r1; addi r1, r1, 4; move zero, r17 } /* r19 = WORD_5 */
+
+ /* Store last word to the destination line, potentially dirtying it
+ * for the first time, which keeps the L2 busy for two cycles.
+ */
+EX: { sw r10, r12 } /* store(WORD_15) */
+
+ /* Use two L1D hits to cover the sw L2 access above. */
+EX: { lw r10, r1; addi r1, r1, 4 } /* r10 = WORD_6 */
+EX: { lw r12, r1; addi r1, r1, 4 } /* r12 = WORD_7 */
+
+ /* Fill third L1D line. */
+EX: { lw r18, r1; addi r1, r1, 4 } /* r18 = WORD_8 */
+
+ /* Store first L1D line. */
+EX: { sw r0, r16; addi r0, r0, 4; add r16, r0, r2 } /* store(WORD_0) */
+EX: { sw r0, r13; addi r0, r0, 4; andi r16, r16, -64 } /* store(WORD_1) */
+EX: { sw r0, r14; addi r0, r0, 4; slt_u r16, r9, r16 } /* store(WORD_2) */
+#ifdef MEMCPY_USE_WH64
+EX: { sw r0, r15; addi r0, r0, 4; addi r13, sp, -64 } /* store(WORD_3) */
+#else
+ /* Back up the r9 to a cache line we are already storing to
+ * if it gets past the end of the dest vector. Strictly speaking,
+ * we don't need to back up to the start of a cache line, but it's free
+ * and tidy, so why not?
+ */
+EX: { sw r0, r15; addi r0, r0, 4; andi r13, r0, -64 } /* store(WORD_3) */
+#endif
+ /* Store second L1D line. */
+EX: { sw r0, r17; addi r0, r0, 4; mvz r9, r16, r13 }/* store(WORD_4) */
+EX: { sw r0, r19; addi r0, r0, 4 } /* store(WORD_5) */
+EX: { sw r0, r10; addi r0, r0, 4 } /* store(WORD_6) */
+EX: { sw r0, r12; addi r0, r0, 4 } /* store(WORD_7) */
+
+EX: { lw r13, r1; addi r1, r1, 4; move zero, r18 } /* r13 = WORD_9 */
+EX: { lw r14, r1; addi r1, r1, 4 } /* r14 = WORD_10 */
+EX: { lw r15, r1; move r1, r20 } /* r15 = WORD_11 */
+
+ /* Store third L1D line. */
+EX: { sw r0, r18; addi r0, r0, 4 } /* store(WORD_8) */
+EX: { sw r0, r13; addi r0, r0, 4 } /* store(WORD_9) */
+EX: { sw r0, r14; addi r0, r0, 4 } /* store(WORD_10) */
+EX: { sw r0, r15; addi r0, r0, 4 } /* store(WORD_11) */
+
+ /* Store rest of fourth L1D line. */
+EX: { sw r0, r4; addi r0, r0, 4 } /* store(WORD_12) */
+ {
+EX: sw r0, r8 /* store(WORD_13) */
+ addi r0, r0, 4
+ /* Will r2 be > 64 after we subtract 64 below? */
+ shri r4, r2, 7
+ }
+ {
+EX: sw r0, r11 /* store(WORD_14) */
+ addi r0, r0, 8
+ /* Record 64 bytes successfully copied. */
+ addi r2, r2, -64
+ }
+
+ { jrp lr; move lr, r27 }
+
+ /* Convey to the backtrace library that the stack frame is size
+ * zero, and the real return address is on the stack rather than
+ * in 'lr'.
+ */
+ { info 8 }
+
+ .align 64
+.Lcopy_unaligned_maybe_many:
+ /* Skip the setup overhead if we aren't copying many bytes. */
+ { slti_u r8, r2, 20; sub r4, zero, r0 }
+ { bnzt r8, .Lcopy_unaligned_few; andi r4, r4, 3 }
+ { bz r4, .Ldest_is_word_aligned; add r18, r1, r2 }
+
+/*
+ *
+ * unaligned 4 byte at a time copy handler.
+ *
+ */
+
+ /* Copy single bytes until r0 == 0 mod 4, so we can store words. */
+.Lalign_dest_loop:
+EX: { lb_u r3, r1; addi r1, r1, 1; addi r4, r4, -1 }
+EX: { sb r0, r3; addi r0, r0, 1; addi r2, r2, -1 }
+ { bnzt r4, .Lalign_dest_loop; andi r3, r1, 3 }
+
+ /* If source and dest are now *both* aligned, do an aligned copy. */
+ { bz r3, .Lcheck_aligned_copy_size; addli r4, r2, -256 }
+
+.Ldest_is_word_aligned:
+
+#if CHIP_HAS_DWORD_ALIGN()
+EX: { andi r8, r0, 63; lwadd_na r6, r1, 4}
+ { slti_u r9, r2, 64; bz r8, .Ldest_is_L2_line_aligned }
+
+ /* This copies unaligned words until either there are fewer
+ * than 4 bytes left to copy, or until the destination pointer
+ * is cache-aligned, whichever comes first.
+ *
+ * On entry:
+ * - r0 is the next store address.
+ * - r1 points 4 bytes past the load address corresponding to r0.
+ * - r2 >= 4
+ * - r6 is the next aligned word loaded.
+ */
+.Lcopy_unaligned_src_words:
+EX: { lwadd_na r7, r1, 4; slti_u r8, r2, 4 + 4 }
+ /* stall */
+ { dword_align r6, r7, r1; slti_u r9, r2, 64 + 4 }
+EX: { swadd r0, r6, 4; addi r2, r2, -4 }
+ { bnz r8, .Lcleanup_unaligned_words; andi r8, r0, 63 }
+ { bnzt r8, .Lcopy_unaligned_src_words; move r6, r7 }
+
+ /* On entry:
+ * - r0 is the next store address.
+ * - r1 points 4 bytes past the load address corresponding to r0.
+ * - r2 >= 4 (# of bytes left to store).
+ * - r6 is the next aligned src word value.
+ * - r9 = (r2 < 64U).
+ * - r18 points one byte past the end of source memory.
+ */
+.Ldest_is_L2_line_aligned:
+
+ {
+ /* Not a full cache line remains. */
+ bnz r9, .Lcleanup_unaligned_words
+ move r7, r6
+ }
+
+ /* r2 >= 64 */
+
+ /* Kick off two prefetches, but don't go past the end. */
+ { addi r3, r1, 63 - 4; addi r8, r1, 64 + 63 - 4 }
+ { prefetch r3; move r3, r8; slt_u r8, r8, r18 }
+ { mvz r3, r8, r1; addi r8, r3, 64 }
+ { prefetch r3; move r3, r8; slt_u r8, r8, r18 }
+ { mvz r3, r8, r1; movei r17, 0 }
+
+.Lcopy_unaligned_line:
+ /* Prefetch another line. */
+ { prefetch r3; addi r15, r1, 60; addi r3, r3, 64 }
+ /* Fire off a load of the last word we are about to copy. */
+EX: { lw_na r15, r15; slt_u r8, r3, r18 }
+
+EX: { mvz r3, r8, r1; wh64 r0 }
+
+ /* This loop runs twice.
+ *
+ * On entry:
+ * - r17 is even before the first iteration, and odd before
+ * the second. It is incremented inside the loop. Encountering
+ * an even value at the end of the loop makes it stop.
+ */
+.Lcopy_half_an_unaligned_line:
+EX: {
+ /* Stall until the last byte is ready. In the steady state this
+ * guarantees all words to load below will be in the L2 cache, which
+ * avoids shunting the loads to the RTF.
+ */
+ move zero, r15
+ lwadd_na r7, r1, 16
+ }
+EX: { lwadd_na r11, r1, 12 }
+EX: { lwadd_na r14, r1, -24 }
+EX: { lwadd_na r8, r1, 4 }
+EX: { lwadd_na r9, r1, 4 }
+EX: {
+ lwadd_na r10, r1, 8
+ /* r16 = (r2 < 64), after we subtract 32 from r2 below. */
+ slti_u r16, r2, 64 + 32
+ }
+EX: { lwadd_na r12, r1, 4; addi r17, r17, 1 }
+EX: { lwadd_na r13, r1, 8; dword_align r6, r7, r1 }
+EX: { swadd r0, r6, 4; dword_align r7, r8, r1 }
+EX: { swadd r0, r7, 4; dword_align r8, r9, r1 }
+EX: { swadd r0, r8, 4; dword_align r9, r10, r1 }
+EX: { swadd r0, r9, 4; dword_align r10, r11, r1 }
+EX: { swadd r0, r10, 4; dword_align r11, r12, r1 }
+EX: { swadd r0, r11, 4; dword_align r12, r13, r1 }
+EX: { swadd r0, r12, 4; dword_align r13, r14, r1 }
+EX: { swadd r0, r13, 4; addi r2, r2, -32 }
+ { move r6, r14; bbst r17, .Lcopy_half_an_unaligned_line }
+
+ { bzt r16, .Lcopy_unaligned_line; move r7, r6 }
+
+ /* On entry:
+ * - r0 is the next store address.
+ * - r1 points 4 bytes past the load address corresponding to r0.
+ * - r2 >= 0 (# of bytes left to store).
+ * - r7 is the next aligned src word value.
+ */
+.Lcleanup_unaligned_words:
+ /* Handle any trailing bytes. */
+ { bz r2, .Lcopy_unaligned_done; slti_u r8, r2, 4 }
+ { bzt r8, .Lcopy_unaligned_src_words; move r6, r7 }
+
+ /* Move r1 back to the point where it corresponds to r0. */
+ { addi r1, r1, -4 }
+
+#else /* !CHIP_HAS_DWORD_ALIGN() */
+
+ /* Compute right/left shift counts and load initial source words. */
+ { andi r5, r1, -4; andi r3, r1, 3 }
+EX: { lw r6, r5; addi r5, r5, 4; shli r3, r3, 3 }
+EX: { lw r7, r5; addi r5, r5, 4; sub r4, zero, r3 }
+
+ /* Load and store one word at a time, using shifts and ORs
+ * to correct for the misaligned src.
+ */
+.Lcopy_unaligned_src_loop:
+ { shr r6, r6, r3; shl r8, r7, r4 }
+EX: { lw r7, r5; or r8, r8, r6; move r6, r7 }
+EX: { sw r0, r8; addi r0, r0, 4; addi r2, r2, -4 }
+ { addi r5, r5, 4; slti_u r8, r2, 8 }
+ { bzt r8, .Lcopy_unaligned_src_loop; addi r1, r1, 4 }
+
+ { bz r2, .Lcopy_unaligned_done }
+#endif /* !CHIP_HAS_DWORD_ALIGN() */
+
+ /* Fall through */
+
+/*
+ *
+ * 1 byte at a time copy handler.
+ *
+ */
+
+.Lcopy_unaligned_few:
+EX: { lb_u r3, r1; addi r1, r1, 1 }
+EX: { sb r0, r3; addi r0, r0, 1; addi r2, r2, -1 }
+ { bnzt r2, .Lcopy_unaligned_few }
+
+.Lcopy_unaligned_done:
+
+ /* For memcpy return original dest address, else zero. */
+ { mz r0, r29, r23; jrp lr }
+
+.Lend_memcpy_common:
+ .size memcpy_common, .Lend_memcpy_common - memcpy_common
+
+ .section .fixup,"ax"
+memcpy_common_fixup:
+ .type memcpy_common_fixup, @function
+
+ /* Skip any bytes we already successfully copied.
+ * r2 (num remaining) is correct, but r0 (dst) and r1 (src)
+ * may not be quite right because of unrolling and prefetching.
+ * So we need to recompute their values as the address just
+ * after the last byte we are sure was successfully loaded and
+ * then stored.
+ */
+
+ /* Determine how many bytes we successfully copied. */
+ { sub r3, r25, r2 }
+
+ /* Add this to the original r0 and r1 to get their new values. */
+ { add r0, r23, r3; add r1, r24, r3 }
+
+ { bzt r29, memcpy_fixup_loop }
+ { blzt r29, copy_to_user_fixup_loop }
+
+copy_from_user_fixup_loop:
+ /* Try copying the rest one byte at a time, expecting a load fault. */
+.Lcfu: { lb_u r3, r1; addi r1, r1, 1 }
+ { sb r0, r3; addi r0, r0, 1; addi r2, r2, -1 }
+ { bnzt r2, copy_from_user_fixup_loop }
+
+.Lcopy_from_user_fixup_zero_remainder:
+ { bbs r29, 2f } /* low bit set means IS_COPY_FROM_USER */
+ /* byte-at-a-time loop faulted, so zero the rest. */
+ { move r3, r2; bz r2, 2f /* should be impossible, but handle it. */ }
+1: { sb r0, zero; addi r0, r0, 1; addi r3, r3, -1 }
+ { bnzt r3, 1b }
+2: move lr, r27
+ { move r0, r2; jrp lr }
+
+copy_to_user_fixup_loop:
+ /* Try copying the rest one byte at a time, expecting a store fault. */
+ { lb_u r3, r1; addi r1, r1, 1 }
+.Lctu: { sb r0, r3; addi r0, r0, 1; addi r2, r2, -1 }
+ { bnzt r2, copy_to_user_fixup_loop }
+.Lcopy_to_user_fixup_done:
+ move lr, r27
+ { move r0, r2; jrp lr }
+
+memcpy_fixup_loop:
+ /* Try copying the rest one byte at a time. We expect a disastrous
+ * fault to happen since we are in fixup code, but let it happen.
+ */
+ { lb_u r3, r1; addi r1, r1, 1 }
+ { sb r0, r3; addi r0, r0, 1; addi r2, r2, -1 }
+ { bnzt r2, memcpy_fixup_loop }
+ /* This should be unreachable, we should have faulted again.
+ * But be paranoid and handle it in case some interrupt changed
+ * the TLB or something.
+ */
+ move lr, r27
+ { move r0, r23; jrp lr }
+
+ .size memcpy_common_fixup, . - memcpy_common_fixup
+
+ .section __ex_table,"a"
+ .word .Lcfu, .Lcopy_from_user_fixup_zero_remainder
+ .word .Lctu, .Lcopy_to_user_fixup_done
diff --git a/arch/tile/lib/memcpy_tile64.c b/arch/tile/lib/memcpy_tile64.c
new file mode 100644
index 0000000..4f00473
--- /dev/null
+++ b/arch/tile/lib/memcpy_tile64.c
@@ -0,0 +1,271 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/string.h>
+#include <linux/smp.h>
+#include <linux/module.h>
+#include <linux/uaccess.h>
+#include <asm/fixmap.h>
+#include <asm/kmap_types.h>
+#include <asm/tlbflush.h>
+#include <hv/hypervisor.h>
+#include <arch/chip.h>
+
+
+#if !CHIP_HAS_COHERENT_LOCAL_CACHE()
+
+/* Defined in memcpy.S */
+extern unsigned long __memcpy_asm(void *to, const void *from, unsigned long n);
+extern unsigned long __copy_to_user_inatomic_asm(
+ void __user *to, const void *from, unsigned long n);
+extern unsigned long __copy_from_user_inatomic_asm(
+ void *to, const void __user *from, unsigned long n);
+extern unsigned long __copy_from_user_zeroing_asm(
+ void *to, const void __user *from, unsigned long n);
+
+typedef unsigned long (*memcpy_t)(void *, const void *, unsigned long);
+
+/* Size above which to consider TLB games for performance */
+#define LARGE_COPY_CUTOFF 2048
+
+/* Communicate to the simulator what we are trying to do. */
+#define sim_allow_multiple_caching(b) \
+ __insn_mtspr(SPR_SIM_CONTROL, \
+ SIM_CONTROL_ALLOW_MULTIPLE_CACHING | ((b) << _SIM_CONTROL_OPERATOR_BITS))
+
+/*
+ * Copy memory by briefly enabling incoherent cacheline-at-a-time mode.
+ *
+ * We set up our own source and destination PTEs that we fully control.
+ * This is the only way to guarantee that we don't race with another
+ * thread that is modifying the PTE; we can't afford to try the
+ * copy_{to,from}_user() technique of catching the interrupt, since
+ * we must run with interrupts disabled to avoid the risk of some
+ * other code seeing the incoherent data in our cache. (Recall that
+ * our cache is indexed by PA, so even if the other code doesn't use
+ * our KM_MEMCPY virtual addresses, they'll still hit in cache using
+ * the normal VAs that aren't supposed to hit in cache.)
+ */
+static void memcpy_multicache(void *dest, const void *source,
+ pte_t dst_pte, pte_t src_pte, int len)
+{
+ int idx, i;
+ unsigned long flags, newsrc, newdst, endsrc;
+ pmd_t *pmdp;
+ pte_t *ptep;
+ int cpu = get_cpu();
+
+ /*
+ * Disable interrupts so that we don't recurse into memcpy()
+ * in an interrupt handler, nor accidentally reference
+ * the PA of the source from an interrupt routine. Also
+ * notify the simulator that we're playing games so we don't
+ * generate spurious coherency warnings.
+ */
+ local_irq_save(flags);
+ sim_allow_multiple_caching(1);
+
+ /* Set up the new dest mapping */
+ idx = FIX_KMAP_BEGIN + (KM_TYPE_NR * cpu) + KM_MEMCPY0;
+ newdst = __fix_to_virt(idx) + ((unsigned long)dest & (PAGE_SIZE-1));
+ pmdp = pmd_offset(pud_offset(pgd_offset_k(newdst), newdst), newdst);
+ ptep = pte_offset_kernel(pmdp, newdst);
+ if (pte_val(*ptep) != pte_val(dst_pte)) {
+ set_pte(ptep, dst_pte);
+ local_flush_tlb_page(NULL, newdst, PAGE_SIZE);
+ }
+
+ /* Set up the new source mapping */
+ idx += (KM_MEMCPY0 - KM_MEMCPY1);
+ src_pte = hv_pte_set_nc(src_pte);
+ src_pte = hv_pte_clear_writable(src_pte); /* be paranoid */
+ newsrc = __fix_to_virt(idx) + ((unsigned long)source & (PAGE_SIZE-1));
+ pmdp = pmd_offset(pud_offset(pgd_offset_k(newsrc), newsrc), newsrc);
+ ptep = pte_offset_kernel(pmdp, newsrc);
+ *ptep = src_pte; /* set_pte() would be confused by this */
+ local_flush_tlb_page(NULL, newsrc, PAGE_SIZE);
+
+ /* Actually move the data. */
+ __memcpy_asm((void *)newdst, (const void *)newsrc, len);
+
+ /*
+ * Remap the source as locally-cached and not OLOC'ed so that
+ * we can inval without also invaling the remote cpu's cache.
+ * This also avoids known errata with inv'ing cacheable oloc data.
+ */
+ src_pte = hv_pte_set_mode(src_pte, HV_PTE_MODE_CACHE_NO_L3);
+ src_pte = hv_pte_set_writable(src_pte); /* need write access for inv */
+ *ptep = src_pte; /* set_pte() would be confused by this */
+ local_flush_tlb_page(NULL, newsrc, PAGE_SIZE);
+
+ /*
+ * Do the actual invalidation, covering the full L2 cache line
+ * at the end since __memcpy_asm() is somewhat aggressive.
+ */
+ __inv_buffer((void *)newsrc, len);
+
+ /*
+ * We're done: notify the simulator that all is back to normal,
+ * and re-enable interrupts and pre-emption.
+ */
+ sim_allow_multiple_caching(0);
+ local_irq_restore(flags);
+ put_cpu_no_resched();
+}
+
+/*
+ * Identify large copies from remotely-cached memory, and copy them
+ * via memcpy_multicache() if they look good, otherwise fall back
+ * to the particular kind of copying passed as the memcpy_t function.
+ */
+static unsigned long fast_copy(void *dest, const void *source, int len,
+ memcpy_t func)
+{
+ /*
+ * Check if it's big enough to bother with. We may end up doing a
+ * small copy via TLB manipulation if we're near a page boundary,
+ * but presumably we'll make it up when we hit the second page.
+ */
+ while (len >= LARGE_COPY_CUTOFF) {
+ int copy_size, bytes_left_on_page;
+ pte_t *src_ptep, *dst_ptep;
+ pte_t src_pte, dst_pte;
+ struct page *src_page, *dst_page;
+
+ /* Is the source page oloc'ed to a remote cpu? */
+retry_source:
+ src_ptep = virt_to_pte(current->mm, (unsigned long)source);
+ if (src_ptep == NULL)
+ break;
+ src_pte = *src_ptep;
+ if (!hv_pte_get_present(src_pte) ||
+ !hv_pte_get_readable(src_pte) ||
+ hv_pte_get_mode(src_pte) != HV_PTE_MODE_CACHE_TILE_L3)
+ break;
+ if (get_remote_cache_cpu(src_pte) == smp_processor_id())
+ break;
+ src_page = pfn_to_page(hv_pte_get_pfn(src_pte));
+ get_page(src_page);
+ if (pte_val(src_pte) != pte_val(*src_ptep)) {
+ put_page(src_page);
+ goto retry_source;
+ }
+ if (pte_huge(src_pte)) {
+ /* Adjust the PTE to correspond to a small page */
+ int pfn = hv_pte_get_pfn(src_pte);
+ pfn += (((unsigned long)source & (HPAGE_SIZE-1))
+ >> PAGE_SHIFT);
+ src_pte = pfn_pte(pfn, src_pte);
+ src_pte = pte_mksmall(src_pte);
+ }
+
+ /* Is the destination page writable? */
+retry_dest:
+ dst_ptep = virt_to_pte(current->mm, (unsigned long)dest);
+ if (dst_ptep == NULL) {
+ put_page(src_page);
+ break;
+ }
+ dst_pte = *dst_ptep;
+ if (!hv_pte_get_present(dst_pte) ||
+ !hv_pte_get_writable(dst_pte)) {
+ put_page(src_page);
+ break;
+ }
+ dst_page = pfn_to_page(hv_pte_get_pfn(dst_pte));
+ if (dst_page == src_page) {
+ /*
+ * Source and dest are on the same page; this
+ * potentially exposes us to incoherence if any
+ * part of src and dest overlap on a cache line.
+ * Just give up rather than trying to be precise.
+ */
+ put_page(src_page);
+ break;
+ }
+ get_page(dst_page);
+ if (pte_val(dst_pte) != pte_val(*dst_ptep)) {
+ put_page(dst_page);
+ goto retry_dest;
+ }
+ if (pte_huge(dst_pte)) {
+ /* Adjust the PTE to correspond to a small page */
+ int pfn = hv_pte_get_pfn(dst_pte);
+ pfn += (((unsigned long)dest & (HPAGE_SIZE-1))
+ >> PAGE_SHIFT);
+ dst_pte = pfn_pte(pfn, dst_pte);
+ dst_pte = pte_mksmall(dst_pte);
+ }
+
+ /* All looks good: create a cachable PTE and copy from it */
+ copy_size = len;
+ bytes_left_on_page =
+ PAGE_SIZE - (((int)source) & (PAGE_SIZE-1));
+ if (copy_size > bytes_left_on_page)
+ copy_size = bytes_left_on_page;
+ bytes_left_on_page =
+ PAGE_SIZE - (((int)dest) & (PAGE_SIZE-1));
+ if (copy_size > bytes_left_on_page)
+ copy_size = bytes_left_on_page;
+ memcpy_multicache(dest, source, dst_pte, src_pte, copy_size);
+
+ /* Release the pages */
+ put_page(dst_page);
+ put_page(src_page);
+
+ /* Continue on the next page */
+ dest += copy_size;
+ source += copy_size;
+ len -= copy_size;
+ }
+
+ return func(dest, source, len);
+}
+
+void *memcpy(void *to, const void *from, __kernel_size_t n)
+{
+ if (n < LARGE_COPY_CUTOFF)
+ return (void *)__memcpy_asm(to, from, n);
+ else
+ return (void *)fast_copy(to, from, n, __memcpy_asm);
+}
+
+unsigned long __copy_to_user_inatomic(void __user *to, const void *from,
+ unsigned long n)
+{
+ if (n < LARGE_COPY_CUTOFF)
+ return __copy_to_user_inatomic_asm(to, from, n);
+ else
+ return fast_copy(to, from, n, __copy_to_user_inatomic_asm);
+}
+
+unsigned long __copy_from_user_inatomic(void *to, const void __user *from,
+ unsigned long n)
+{
+ if (n < LARGE_COPY_CUTOFF)
+ return __copy_from_user_inatomic_asm(to, from, n);
+ else
+ return fast_copy(to, from, n, __copy_from_user_inatomic_asm);
+}
+
+unsigned long __copy_from_user_zeroing(void *to, const void __user *from,
+ unsigned long n)
+{
+ if (n < LARGE_COPY_CUTOFF)
+ return __copy_from_user_zeroing_asm(to, from, n);
+ else
+ return fast_copy(to, from, n, __copy_from_user_zeroing_asm);
+}
+
+#endif /* !CHIP_HAS_COHERENT_LOCAL_CACHE() */
diff --git a/arch/tile/lib/memmove_32.c b/arch/tile/lib/memmove_32.c
new file mode 100644
index 0000000..f09d8c4
--- /dev/null
+++ b/arch/tile/lib/memmove_32.c
@@ -0,0 +1,63 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/module.h>
+
+void *memmove(void *dest, const void *src, size_t n)
+{
+ if ((const char *)src >= (char *)dest + n
+ || (char *)dest >= (const char *)src + n) {
+ /* We found no overlap, so let memcpy do all the heavy
+ * lifting (prefetching, etc.)
+ */
+ return memcpy(dest, src, n);
+ }
+
+ if (n != 0) {
+ const uint8_t *in;
+ uint8_t x;
+ uint8_t *out;
+ int stride;
+
+ if (src < dest) {
+ /* copy backwards */
+ in = (const uint8_t *)src + n - 1;
+ out = (uint8_t *)dest + n - 1;
+ stride = -1;
+ } else {
+ /* copy forwards */
+ in = (const uint8_t *)src;
+ out = (uint8_t *)dest;
+ stride = 1;
+ }
+
+ /* Manually software-pipeline this loop. */
+ x = *in;
+ in += stride;
+
+ while (--n != 0) {
+ *out = x;
+ out += stride;
+ x = *in;
+ in += stride;
+ }
+
+ *out = x;
+ }
+
+ return dest;
+}
+EXPORT_SYMBOL(memmove);
diff --git a/arch/tile/lib/memset_32.c b/arch/tile/lib/memset_32.c
new file mode 100644
index 0000000..8593bc8
--- /dev/null
+++ b/arch/tile/lib/memset_32.c
@@ -0,0 +1,274 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ */
+
+#include <arch/chip.h>
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/module.h>
+
+
+void *memset(void *s, int c, size_t n)
+{
+ uint32_t *out32;
+ int n32;
+ uint32_t v16, v32;
+ uint8_t *out8 = s;
+#if !CHIP_HAS_WH64()
+ int ahead32;
+#else
+ int to_align32;
+#endif
+
+ /* Experimentation shows that a trivial tight loop is a win up until
+ * around a size of 20, where writing a word at a time starts to win.
+ */
+#define BYTE_CUTOFF 20
+
+#if BYTE_CUTOFF < 3
+ /* This must be at least at least this big, or some code later
+ * on doesn't work.
+ */
+#error "BYTE_CUTOFF is too small"
+#endif
+
+ if (n < BYTE_CUTOFF) {
+ /* Strangely, this turns out to be the tightest way to
+ * write this loop.
+ */
+ if (n != 0) {
+ do {
+ /* Strangely, combining these into one line
+ * performs worse.
+ */
+ *out8 = c;
+ out8++;
+ } while (--n != 0);
+ }
+
+ return s;
+ }
+
+#if !CHIP_HAS_WH64()
+ /* Use a spare issue slot to start prefetching the first cache
+ * line early. This instruction is free as the store can be buried
+ * in otherwise idle issue slots doing ALU ops.
+ */
+ __insn_prefetch(out8);
+
+ /* We prefetch the end so that a short memset that spans two cache
+ * lines gets some prefetching benefit. Again we believe this is free
+ * to issue.
+ */
+ __insn_prefetch(&out8[n - 1]);
+#endif /* !CHIP_HAS_WH64() */
+
+
+ /* Align 'out8'. We know n >= 3 so this won't write past the end. */
+ while (((uintptr_t) out8 & 3) != 0) {
+ *out8++ = c;
+ --n;
+ }
+
+ /* Align 'n'. */
+ while (n & 3)
+ out8[--n] = c;
+
+ out32 = (uint32_t *) out8;
+ n32 = n >> 2;
+
+ /* Tile input byte out to 32 bits. */
+ v16 = __insn_intlb(c, c);
+ v32 = __insn_intlh(v16, v16);
+
+ /* This must be at least 8 or the following loop doesn't work. */
+#define CACHE_LINE_SIZE_IN_WORDS (CHIP_L2_LINE_SIZE() / 4)
+
+#if !CHIP_HAS_WH64()
+
+ ahead32 = CACHE_LINE_SIZE_IN_WORDS;
+
+ /* We already prefetched the first and last cache lines, so
+ * we only need to do more prefetching if we are storing
+ * to more than two cache lines.
+ */
+ if (n32 > CACHE_LINE_SIZE_IN_WORDS * 2) {
+ int i;
+
+ /* Prefetch the next several cache lines.
+ * This is the setup code for the software-pipelined
+ * loop below.
+ */
+#define MAX_PREFETCH 5
+ ahead32 = n32 & -CACHE_LINE_SIZE_IN_WORDS;
+ if (ahead32 > MAX_PREFETCH * CACHE_LINE_SIZE_IN_WORDS)
+ ahead32 = MAX_PREFETCH * CACHE_LINE_SIZE_IN_WORDS;
+
+ for (i = CACHE_LINE_SIZE_IN_WORDS;
+ i < ahead32; i += CACHE_LINE_SIZE_IN_WORDS)
+ __insn_prefetch(&out32[i]);
+ }
+
+ if (n32 > ahead32) {
+ while (1) {
+ int j;
+
+ /* Prefetch by reading one word several cache lines
+ * ahead. Since loads are non-blocking this will
+ * cause the full cache line to be read while we are
+ * finishing earlier cache lines. Using a store
+ * here causes microarchitectural performance
+ * problems where a victimizing store miss goes to
+ * the head of the retry FIFO and locks the pipe for
+ * a few cycles. So a few subsequent stores in this
+ * loop go into the retry FIFO, and then later
+ * stores see other stores to the same cache line
+ * are already in the retry FIFO and themselves go
+ * into the retry FIFO, filling it up and grinding
+ * to a halt waiting for the original miss to be
+ * satisfied.
+ */
+ __insn_prefetch(&out32[ahead32]);
+
+#if 1
+#if CACHE_LINE_SIZE_IN_WORDS % 4 != 0
+#error "Unhandled CACHE_LINE_SIZE_IN_WORDS"
+#endif
+
+ n32 -= CACHE_LINE_SIZE_IN_WORDS;
+
+ /* Save icache space by only partially unrolling
+ * this loop.
+ */
+ for (j = CACHE_LINE_SIZE_IN_WORDS / 4; j > 0; j--) {
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ }
+#else
+ /* Unfortunately, due to a code generator flaw this
+ * allocates a separate register for each of these
+ * stores, which requires a large number of spills,
+ * which makes this procedure enormously bigger
+ * (something like 70%)
+ */
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ n32 -= 16;
+#endif
+
+ /* To save compiled code size, reuse this loop even
+ * when we run out of prefetching to do by dropping
+ * ahead32 down.
+ */
+ if (n32 <= ahead32) {
+ /* Not even a full cache line left,
+ * so stop now.
+ */
+ if (n32 < CACHE_LINE_SIZE_IN_WORDS)
+ break;
+
+ /* Choose a small enough value that we don't
+ * prefetch past the end. There's no sense
+ * in touching cache lines we don't have to.
+ */
+ ahead32 = CACHE_LINE_SIZE_IN_WORDS - 1;
+ }
+ }
+ }
+
+#else /* CHIP_HAS_WH64() */
+
+ /* Determine how many words we need to emit before the 'out32'
+ * pointer becomes aligned modulo the cache line size.
+ */
+ to_align32 =
+ (-((uintptr_t)out32 >> 2)) & (CACHE_LINE_SIZE_IN_WORDS - 1);
+
+ /* Only bother aligning and using wh64 if there is at least
+ * one full cache line to process. This check also prevents
+ * overrunning the end of the buffer with alignment words.
+ */
+ if (to_align32 <= n32 - CACHE_LINE_SIZE_IN_WORDS) {
+ int lines_left;
+
+ /* Align out32 mod the cache line size so we can use wh64. */
+ n32 -= to_align32;
+ for (; to_align32 != 0; to_align32--) {
+ *out32 = v32;
+ out32++;
+ }
+
+ /* Use unsigned divide to turn this into a right shift. */
+ lines_left = (unsigned)n32 / CACHE_LINE_SIZE_IN_WORDS;
+
+ do {
+ /* Only wh64 a few lines at a time, so we don't
+ * exceed the maximum number of victim lines.
+ */
+ int x = ((lines_left < CHIP_MAX_OUTSTANDING_VICTIMS())
+ ? lines_left
+ : CHIP_MAX_OUTSTANDING_VICTIMS());
+ uint32_t *wh = out32;
+ int i = x;
+ int j;
+
+ lines_left -= x;
+
+ do {
+ __insn_wh64(wh);
+ wh += CACHE_LINE_SIZE_IN_WORDS;
+ } while (--i);
+
+ for (j = x * (CACHE_LINE_SIZE_IN_WORDS / 4); j != 0; j--) {
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ *out32++ = v32;
+ }
+ } while (lines_left != 0);
+
+ /* We processed all full lines above, so only this many
+ * words remain to be processed.
+ */
+ n32 &= CACHE_LINE_SIZE_IN_WORDS - 1;
+ }
+
+#endif /* CHIP_HAS_WH64() */
+
+ /* Now handle any leftover values. */
+ if (n32 != 0) {
+ do {
+ *out32 = v32;
+ out32++;
+ } while (--n32 != 0);
+ }
+
+ return s;
+}
+EXPORT_SYMBOL(memset);
diff --git a/arch/tile/lib/spinlock_32.c b/arch/tile/lib/spinlock_32.c
new file mode 100644
index 0000000..485e24d
--- /dev/null
+++ b/arch/tile/lib/spinlock_32.c
@@ -0,0 +1,221 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/spinlock.h>
+#include <linux/module.h>
+#include <asm/processor.h>
+
+#include "spinlock_common.h"
+
+void arch_spin_lock(arch_spinlock_t *lock)
+{
+ int my_ticket;
+ int iterations = 0;
+ int delta;
+
+ while ((my_ticket = __insn_tns((void *)&lock->next_ticket)) & 1)
+ delay_backoff(iterations++);
+
+ /* Increment the next ticket number, implicitly releasing tns lock. */
+ lock->next_ticket = my_ticket + TICKET_QUANTUM;
+
+ /* Wait until it's our turn. */
+ while ((delta = my_ticket - lock->current_ticket) != 0)
+ relax((128 / CYCLES_PER_RELAX_LOOP) * delta);
+}
+EXPORT_SYMBOL(arch_spin_lock);
+
+int arch_spin_trylock(arch_spinlock_t *lock)
+{
+ /*
+ * Grab a ticket; no need to retry if it's busy, we'll just
+ * treat that the same as "locked", since someone else
+ * will lock it momentarily anyway.
+ */
+ int my_ticket = __insn_tns((void *)&lock->next_ticket);
+
+ if (my_ticket == lock->current_ticket) {
+ /* Not currently locked, so lock it by keeping this ticket. */
+ lock->next_ticket = my_ticket + TICKET_QUANTUM;
+ /* Success! */
+ return 1;
+ }
+
+ if (!(my_ticket & 1)) {
+ /* Release next_ticket. */
+ lock->next_ticket = my_ticket;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL(arch_spin_trylock);
+
+void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+ u32 iterations = 0;
+ while (arch_spin_is_locked(lock))
+ delay_backoff(iterations++);
+}
+EXPORT_SYMBOL(arch_spin_unlock_wait);
+
+/*
+ * The low byte is always reserved to be the marker for a "tns" operation
+ * since the low bit is set to "1" by a tns. The next seven bits are
+ * zeroes. The next byte holds the "next" writer value, i.e. the ticket
+ * available for the next task that wants to write. The third byte holds
+ * the current writer value, i.e. the writer who holds the current ticket.
+ * If current == next == 0, there are no interested writers.
+ */
+#define WR_NEXT_SHIFT _WR_NEXT_SHIFT
+#define WR_CURR_SHIFT _WR_CURR_SHIFT
+#define WR_WIDTH _WR_WIDTH
+#define WR_MASK ((1 << WR_WIDTH) - 1)
+
+/*
+ * The last eight bits hold the active reader count. This has to be
+ * zero before a writer can start to write.
+ */
+#define RD_COUNT_SHIFT _RD_COUNT_SHIFT
+#define RD_COUNT_WIDTH _RD_COUNT_WIDTH
+#define RD_COUNT_MASK ((1 << RD_COUNT_WIDTH) - 1)
+
+
+/* Lock the word, spinning until there are no tns-ers. */
+static inline u32 get_rwlock(arch_rwlock_t *rwlock)
+{
+ u32 iterations = 0;
+ for (;;) {
+ u32 val = __insn_tns((int *)&rwlock->lock);
+ if (unlikely(val & 1)) {
+ delay_backoff(iterations++);
+ continue;
+ }
+ return val;
+ }
+}
+
+int arch_read_trylock_slow(arch_rwlock_t *rwlock)
+{
+ u32 val = get_rwlock(rwlock);
+ int locked = (val << RD_COUNT_WIDTH) == 0;
+ rwlock->lock = val + (locked << RD_COUNT_SHIFT);
+ return locked;
+}
+EXPORT_SYMBOL(arch_read_trylock_slow);
+
+void arch_read_unlock_slow(arch_rwlock_t *rwlock)
+{
+ u32 val = get_rwlock(rwlock);
+ rwlock->lock = val - (1 << RD_COUNT_SHIFT);
+}
+EXPORT_SYMBOL(arch_read_unlock_slow);
+
+void arch_write_unlock_slow(arch_rwlock_t *rwlock, u32 val)
+{
+ u32 eq, mask = 1 << WR_CURR_SHIFT;
+ while (unlikely(val & 1)) {
+ /* Limited backoff since we are the highest-priority task. */
+ relax(4);
+ val = __insn_tns((int *)&rwlock->lock);
+ }
+ val = __insn_addb(val, mask);
+ eq = __insn_seqb(val, val << (WR_CURR_SHIFT - WR_NEXT_SHIFT));
+ val = __insn_mz(eq & mask, val);
+ rwlock->lock = val;
+}
+EXPORT_SYMBOL(arch_write_unlock_slow);
+
+/*
+ * We spin until everything but the reader bits (which are in the high
+ * part of the word) are zero, i.e. no active or waiting writers, no tns.
+ *
+ * ISSUE: This approach can permanently starve readers. A reader who sees
+ * a writer could instead take a ticket lock (just like a writer would),
+ * and atomically enter read mode (with 1 reader) when it gets the ticket.
+ * This way both readers and writers will always make forward progress
+ * in a finite time.
+ */
+void arch_read_lock_slow(arch_rwlock_t *rwlock, u32 val)
+{
+ u32 iterations = 0;
+ do {
+ if (!(val & 1))
+ rwlock->lock = val;
+ delay_backoff(iterations++);
+ val = __insn_tns((int *)&rwlock->lock);
+ } while ((val << RD_COUNT_WIDTH) != 0);
+ rwlock->lock = val + (1 << RD_COUNT_SHIFT);
+}
+EXPORT_SYMBOL(arch_read_lock_slow);
+
+void arch_write_lock_slow(arch_rwlock_t *rwlock, u32 val)
+{
+ /*
+ * The trailing underscore on this variable (and curr_ below)
+ * reminds us that the high bits are garbage; we mask them out
+ * when we compare them.
+ */
+ u32 my_ticket_;
+
+ /* Take out the next ticket; this will also stop would-be readers. */
+ if (val & 1)
+ val = get_rwlock(rwlock);
+ rwlock->lock = __insn_addb(val, 1 << WR_NEXT_SHIFT);
+
+ /* Extract my ticket value from the original word. */
+ my_ticket_ = val >> WR_NEXT_SHIFT;
+
+ /*
+ * Wait until the "current" field matches our ticket, and
+ * there are no remaining readers.
+ */
+ for (;;) {
+ u32 curr_ = val >> WR_CURR_SHIFT;
+ u32 readers = val >> RD_COUNT_SHIFT;
+ u32 delta = ((my_ticket_ - curr_) & WR_MASK) + !!readers;
+ if (likely(delta == 0))
+ break;
+
+ /* Delay based on how many lock-holders are still out there. */
+ relax((256 / CYCLES_PER_RELAX_LOOP) * delta);
+
+ /*
+ * Get a non-tns value to check; we don't need to tns
+ * it ourselves. Since we're not tns'ing, we retry
+ * more rapidly to get a valid value.
+ */
+ while ((val = rwlock->lock) & 1)
+ relax(4);
+ }
+}
+EXPORT_SYMBOL(arch_write_lock_slow);
+
+int __tns_atomic_acquire(atomic_t *lock)
+{
+ int ret;
+ u32 iterations = 0;
+
+ BUG_ON(__insn_mfspr(SPR_INTERRUPT_CRITICAL_SECTION));
+ __insn_mtspr(SPR_INTERRUPT_CRITICAL_SECTION, 1);
+
+ while ((ret = __insn_tns((void *)&lock->counter)) == 1)
+ delay_backoff(iterations++);
+ return ret;
+}
+
+void __tns_atomic_release(atomic_t *p, int v)
+{
+ p->counter = v;
+ __insn_mtspr(SPR_INTERRUPT_CRITICAL_SECTION, 0);
+}
diff --git a/arch/tile/lib/spinlock_common.h b/arch/tile/lib/spinlock_common.h
new file mode 100644
index 0000000..8dffebd
--- /dev/null
+++ b/arch/tile/lib/spinlock_common.h
@@ -0,0 +1,64 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ * This file is included into spinlock_32.c or _64.c.
+ */
+
+/*
+ * The mfspr in __spinlock_relax() is 5 or 6 cycles plus 2 for loop
+ * overhead.
+ */
+#ifdef __tilegx__
+#define CYCLES_PER_RELAX_LOOP 7
+#else
+#define CYCLES_PER_RELAX_LOOP 8
+#endif
+
+/*
+ * Idle the core for CYCLES_PER_RELAX_LOOP * iterations cycles.
+ */
+static inline void
+relax(int iterations)
+{
+ for (/*above*/; iterations > 0; iterations--)
+ __insn_mfspr(SPR_PASS);
+ barrier();
+}
+
+/* Perform bounded exponential backoff.*/
+void delay_backoff(int iterations)
+{
+ u32 exponent, loops;
+
+ /*
+ * 2^exponent is how many times we go around the loop,
+ * which takes 8 cycles. We want to start with a 16- to 31-cycle
+ * loop, so we need to go around minimum 2 = 2^1 times, so we
+ * bias the original value up by 1.
+ */
+ exponent = iterations + 1;
+
+ /*
+ * Don't allow exponent to exceed 7, so we have 128 loops,
+ * or 1,024 (to 2,047) cycles, as our maximum.
+ */
+ if (exponent > 8)
+ exponent = 8;
+
+ loops = 1 << exponent;
+
+ /* Add a randomness factor so two cpus never get in lock step. */
+ loops += __insn_crc32_32(stack_pointer, get_cycles_low()) &
+ (loops - 1);
+
+ relax(1 << exponent);
+}
diff --git a/arch/tile/lib/strchr_32.c b/arch/tile/lib/strchr_32.c
new file mode 100644
index 0000000..c94e6f7
--- /dev/null
+++ b/arch/tile/lib/strchr_32.c
@@ -0,0 +1,66 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/module.h>
+
+#undef strchr
+
+char *strchr(const char *s, int c)
+{
+ int z, g;
+
+ /* Get an aligned pointer. */
+ const uintptr_t s_int = (uintptr_t) s;
+ const uint32_t *p = (const uint32_t *)(s_int & -4);
+
+ /* Create four copies of the byte for which we are looking. */
+ const uint32_t goal = 0x01010101 * (uint8_t) c;
+
+ /* Read the first aligned word, but force bytes before the string to
+ * match neither zero nor goal (we make sure the high bit of each
+ * byte is 1, and the low 7 bits are all the opposite of the goal
+ * byte).
+ *
+ * Note that this shift count expression works because we know shift
+ * counts are taken mod 32.
+ */
+ const uint32_t before_mask = (1 << (s_int << 3)) - 1;
+ uint32_t v = (*p | before_mask) ^ (goal & __insn_shrib(before_mask, 1));
+
+ uint32_t zero_matches, goal_matches;
+ while (1) {
+ /* Look for a terminating '\0'. */
+ zero_matches = __insn_seqb(v, 0);
+
+ /* Look for the goal byte. */
+ goal_matches = __insn_seqb(v, goal);
+
+ if (__builtin_expect(zero_matches | goal_matches, 0))
+ break;
+
+ v = *++p;
+ }
+
+ z = __insn_ctz(zero_matches);
+ g = __insn_ctz(goal_matches);
+
+ /* If we found c before '\0' we got a match. Note that if c == '\0'
+ * then g == z, and we correctly return the address of the '\0'
+ * rather than NULL.
+ */
+ return (g <= z) ? ((char *)p) + (g >> 3) : NULL;
+}
+EXPORT_SYMBOL(strchr);
diff --git a/arch/tile/lib/strlen_32.c b/arch/tile/lib/strlen_32.c
new file mode 100644
index 0000000..f26f88e
--- /dev/null
+++ b/arch/tile/lib/strlen_32.c
@@ -0,0 +1,36 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/module.h>
+
+size_t strlen(const char *s)
+{
+ /* Get an aligned pointer. */
+ const uintptr_t s_int = (uintptr_t) s;
+ const uint32_t *p = (const uint32_t *)(s_int & -4);
+
+ /* Read the first word, but force bytes before the string to be nonzero.
+ * This expression works because we know shift counts are taken mod 32.
+ */
+ uint32_t v = *p | ((1 << (s_int << 3)) - 1);
+
+ uint32_t bits;
+ while ((bits = __insn_seqb(v, 0)) == 0)
+ v = *++p;
+
+ return ((const char *)p) + (__insn_ctz(bits) >> 3) - s;
+}
+EXPORT_SYMBOL(strlen);
diff --git a/arch/tile/lib/uaccess.c b/arch/tile/lib/uaccess.c
new file mode 100644
index 0000000..9ae1825
--- /dev/null
+++ b/arch/tile/lib/uaccess.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/uaccess.h>
+#include <linux/module.h>
+
+int __range_ok(unsigned long addr, unsigned long size)
+{
+ unsigned long limit = current_thread_info()->addr_limit.seg;
+ __chk_user_ptr(addr);
+ return !((addr < limit && size <= limit - addr) ||
+ is_arch_mappable_range(addr, size));
+}
+EXPORT_SYMBOL(__range_ok);
+
+void copy_from_user_overflow(void)
+{
+ WARN(1, "Buffer overflow detected!\n");
+}
+EXPORT_SYMBOL(copy_from_user_overflow);
diff --git a/arch/tile/lib/usercopy_32.S b/arch/tile/lib/usercopy_32.S
new file mode 100644
index 0000000..979f76d
--- /dev/null
+++ b/arch/tile/lib/usercopy_32.S
@@ -0,0 +1,223 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/linkage.h>
+#include <asm/errno.h>
+#include <asm/cache.h>
+#include <arch/chip.h>
+
+/* Access user memory, but use MMU to avoid propagating kernel exceptions. */
+
+ .pushsection .fixup,"ax"
+
+get_user_fault:
+ { move r0, zero; move r1, zero }
+ { movei r2, -EFAULT; jrp lr }
+ ENDPROC(get_user_fault)
+
+put_user_fault:
+ { movei r0, -EFAULT; jrp lr }
+ ENDPROC(put_user_fault)
+
+ .popsection
+
+/*
+ * __get_user_N functions take a pointer in r0, and return 0 in r2
+ * on success, with the value in r0; or else -EFAULT in r2.
+ */
+#define __get_user_N(bytes, LOAD) \
+ STD_ENTRY(__get_user_##bytes); \
+1: { LOAD r0, r0; move r1, zero; move r2, zero }; \
+ jrp lr; \
+ STD_ENDPROC(__get_user_##bytes); \
+ .pushsection __ex_table,"a"; \
+ .word 1b, get_user_fault; \
+ .popsection
+
+__get_user_N(1, lb_u)
+__get_user_N(2, lh_u)
+__get_user_N(4, lw)
+
+/*
+ * __get_user_8 takes a pointer in r0, and returns 0 in r2
+ * on success, with the value in r0/r1; or else -EFAULT in r2.
+ */
+ STD_ENTRY(__get_user_8);
+1: { lw r0, r0; addi r1, r0, 4 };
+2: { lw r1, r1; move r2, zero };
+ jrp lr;
+ STD_ENDPROC(__get_user_8);
+ .pushsection __ex_table,"a";
+ .word 1b, get_user_fault;
+ .word 2b, get_user_fault;
+ .popsection
+
+/*
+ * __put_user_N functions take a value in r0 and a pointer in r1,
+ * and return 0 in r0 on success or -EFAULT on failure.
+ */
+#define __put_user_N(bytes, STORE) \
+ STD_ENTRY(__put_user_##bytes); \
+1: { STORE r1, r0; move r0, zero }; \
+ jrp lr; \
+ STD_ENDPROC(__put_user_##bytes); \
+ .pushsection __ex_table,"a"; \
+ .word 1b, put_user_fault; \
+ .popsection
+
+__put_user_N(1, sb)
+__put_user_N(2, sh)
+__put_user_N(4, sw)
+
+/*
+ * __put_user_8 takes a value in r0/r1 and a pointer in r2,
+ * and returns 0 in r0 on success or -EFAULT on failure.
+ */
+STD_ENTRY(__put_user_8)
+1: { sw r2, r0; addi r2, r2, 4 }
+2: { sw r2, r1; move r0, zero }
+ jrp lr
+ STD_ENDPROC(__put_user_8)
+ .pushsection __ex_table,"a"
+ .word 1b, put_user_fault
+ .word 2b, put_user_fault
+ .popsection
+
+
+/*
+ * strnlen_user_asm takes the pointer in r0, and the length bound in r1.
+ * It returns the length, including the terminating NUL, or zero on exception.
+ * If length is greater than the bound, returns one plus the bound.
+ */
+STD_ENTRY(strnlen_user_asm)
+ { bz r1, 2f; addi r3, r0, -1 } /* bias down to include NUL */
+1: { lb_u r4, r0; addi r1, r1, -1 }
+ bz r4, 2f
+ { bnzt r1, 1b; addi r0, r0, 1 }
+2: { sub r0, r0, r3; jrp lr }
+ STD_ENDPROC(strnlen_user_asm)
+ .pushsection .fixup,"ax"
+strnlen_user_fault:
+ { move r0, zero; jrp lr }
+ ENDPROC(strnlen_user_fault)
+ .section __ex_table,"a"
+ .word 1b, strnlen_user_fault
+ .popsection
+
+/*
+ * strncpy_from_user_asm takes the kernel target pointer in r0,
+ * the userspace source pointer in r1, and the length bound (including
+ * the trailing NUL) in r2. On success, it returns the string length
+ * (not including the trailing NUL), or -EFAULT on failure.
+ */
+STD_ENTRY(strncpy_from_user_asm)
+ { bz r2, 2f; move r3, r0 }
+1: { lb_u r4, r1; addi r1, r1, 1; addi r2, r2, -1 }
+ { sb r0, r4; addi r0, r0, 1 }
+ bz r2, 2f
+ bnzt r4, 1b
+ addi r0, r0, -1 /* don't count the trailing NUL */
+2: { sub r0, r0, r3; jrp lr }
+ STD_ENDPROC(strncpy_from_user_asm)
+ .pushsection .fixup,"ax"
+strncpy_from_user_fault:
+ { movei r0, -EFAULT; jrp lr }
+ ENDPROC(strncpy_from_user_fault)
+ .section __ex_table,"a"
+ .word 1b, strncpy_from_user_fault
+ .popsection
+
+/*
+ * clear_user_asm takes the user target address in r0 and the
+ * number of bytes to zero in r1.
+ * It returns the number of uncopiable bytes (hopefully zero) in r0.
+ * Note that we don't use a separate .fixup section here since we fall
+ * through into the "fixup" code as the last straight-line bundle anyway.
+ */
+STD_ENTRY(clear_user_asm)
+ { bz r1, 2f; or r2, r0, r1 }
+ andi r2, r2, 3
+ bzt r2, .Lclear_aligned_user_asm
+1: { sb r0, zero; addi r0, r0, 1; addi r1, r1, -1 }
+ bnzt r1, 1b
+2: { move r0, r1; jrp lr }
+ .pushsection __ex_table,"a"
+ .word 1b, 2b
+ .popsection
+
+.Lclear_aligned_user_asm:
+1: { sw r0, zero; addi r0, r0, 4; addi r1, r1, -4 }
+ bnzt r1, 1b
+2: { move r0, r1; jrp lr }
+ STD_ENDPROC(clear_user_asm)
+ .pushsection __ex_table,"a"
+ .word 1b, 2b
+ .popsection
+
+/*
+ * flush_user_asm takes the user target address in r0 and the
+ * number of bytes to flush in r1.
+ * It returns the number of unflushable bytes (hopefully zero) in r0.
+ */
+STD_ENTRY(flush_user_asm)
+ bz r1, 2f
+ { movei r2, L2_CACHE_BYTES; add r1, r0, r1 }
+ { sub r2, zero, r2; addi r1, r1, L2_CACHE_BYTES-1 }
+ { and r0, r0, r2; and r1, r1, r2 }
+ { sub r1, r1, r0 }
+1: { flush r0; addi r1, r1, -CHIP_FLUSH_STRIDE() }
+ { addi r0, r0, CHIP_FLUSH_STRIDE(); bnzt r1, 1b }
+2: { move r0, r1; jrp lr }
+ STD_ENDPROC(flush_user_asm)
+ .pushsection __ex_table,"a"
+ .word 1b, 2b
+ .popsection
+
+/*
+ * inv_user_asm takes the user target address in r0 and the
+ * number of bytes to invalidate in r1.
+ * It returns the number of not inv'able bytes (hopefully zero) in r0.
+ */
+STD_ENTRY(inv_user_asm)
+ bz r1, 2f
+ { movei r2, L2_CACHE_BYTES; add r1, r0, r1 }
+ { sub r2, zero, r2; addi r1, r1, L2_CACHE_BYTES-1 }
+ { and r0, r0, r2; and r1, r1, r2 }
+ { sub r1, r1, r0 }
+1: { inv r0; addi r1, r1, -CHIP_INV_STRIDE() }
+ { addi r0, r0, CHIP_INV_STRIDE(); bnzt r1, 1b }
+2: { move r0, r1; jrp lr }
+ STD_ENDPROC(inv_user_asm)
+ .pushsection __ex_table,"a"
+ .word 1b, 2b
+ .popsection
+
+/*
+ * finv_user_asm takes the user target address in r0 and the
+ * number of bytes to flush-invalidate in r1.
+ * It returns the number of not finv'able bytes (hopefully zero) in r0.
+ */
+STD_ENTRY(finv_user_asm)
+ bz r1, 2f
+ { movei r2, L2_CACHE_BYTES; add r1, r0, r1 }
+ { sub r2, zero, r2; addi r1, r1, L2_CACHE_BYTES-1 }
+ { and r0, r0, r2; and r1, r1, r2 }
+ { sub r1, r1, r0 }
+1: { finv r0; addi r1, r1, -CHIP_FINV_STRIDE() }
+ { addi r0, r0, CHIP_FINV_STRIDE(); bnzt r1, 1b }
+2: { move r0, r1; jrp lr }
+ STD_ENDPROC(finv_user_asm)
+ .pushsection __ex_table,"a"
+ .word 1b, 2b
+ .popsection
--
1.6.5.2

Chris Metcalf

unread,

May 28, 2010, 11:40:02 PM5/28/10

Signed-off-by: Chris Metcalf <cmet...@tilera.com>
---

arch/tile/mm/Makefile | 9 +
arch/tile/mm/elf.c | 164 +++++++
arch/tile/mm/extable.c | 30 ++
arch/tile/mm/fault.c | 905 ++++++++++++++++++++++++++++++++++++
arch/tile/mm/highmem.c | 328 ++++++++++++++
arch/tile/mm/homecache.c | 445 ++++++++++++++++++
arch/tile/mm/hugetlbpage.c | 343 ++++++++++++++
arch/tile/mm/init.c | 1082 ++++++++++++++++++++++++++++++++++++++++++++
arch/tile/mm/migrate.h | 50 ++
arch/tile/mm/migrate_32.S | 211 +++++++++
arch/tile/mm/mmap.c | 75 +++
arch/tile/mm/pgtable.c | 566 +++++++++++++++++++++++
12 files changed, 4208 insertions(+), 0 deletions(-)
create mode 100644 arch/tile/mm/Makefile
create mode 100644 arch/tile/mm/elf.c
create mode 100644 arch/tile/mm/extable.c
create mode 100644 arch/tile/mm/fault.c
create mode 100644 arch/tile/mm/highmem.c
create mode 100644 arch/tile/mm/homecache.c
create mode 100644 arch/tile/mm/hugetlbpage.c
create mode 100644 arch/tile/mm/init.c
create mode 100644 arch/tile/mm/migrate.h
create mode 100644 arch/tile/mm/migrate_32.S
create mode 100644 arch/tile/mm/mmap.c
create mode 100644 arch/tile/mm/pgtable.c

diff --git a/arch/tile/mm/Makefile b/arch/tile/mm/Makefile
new file mode 100644
index 0000000..e252aed
--- /dev/null
+++ b/arch/tile/mm/Makefile
@@ -0,0 +1,9 @@
+#
+# Makefile for the linux tile-specific parts of the memory manager.
+#
+
+obj-y := init.o pgtable.o fault.o extable.o elf.o \
+ mmap.o homecache.o migrate_$(BITS).o
+
+obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
+obj-$(CONFIG_HIGHMEM) += highmem.o
diff --git a/arch/tile/mm/elf.c b/arch/tile/mm/elf.c
new file mode 100644
index 0000000..818c9be
--- /dev/null
+++ b/arch/tile/mm/elf.c
@@ -0,0 +1,164 @@

+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ */
+

+#include <linux/mm.h>
+#include <linux/pagemap.h>
+#include <linux/binfmts.h>
+#include <linux/compat.h>
+#include <linux/mman.h>
+#include <linux/elf.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+
+/* Notify a running simulator, if any, that an exec just occurred. */
+static void sim_notify_exec(const char *binary_name)
+{
+ unsigned char c;
+ do {
+ c = *binary_name++;
+ __insn_mtspr(SPR_SIM_CONTROL,
+ (SIM_CONTROL_OS_EXEC
+ | (c << _SIM_CONTROL_OPERATOR_BITS)));
+
+ } while (c);
+}
+
+static int notify_exec(void)
+{
+ int retval = 0; /* failure */
+ struct vm_area_struct *vma = current->mm->mmap;
+ while (vma) {
+ if ((vma->vm_flags & VM_EXECUTABLE) && vma->vm_file)
+ break;
+ vma = vma->vm_next;
+ }
+ if (vma) {
+ char *buf = (char *) __get_free_page(GFP_KERNEL);
+ if (buf) {
+ char *path = d_path(&vma->vm_file->f_path,
+ buf, PAGE_SIZE);
+ if (!IS_ERR(path)) {
+ sim_notify_exec(path);
+ retval = 1;
+ }
+ free_page((unsigned long)buf);
+ }
+ }
+ return retval;
+}
+
+/* Notify a running simulator, if any, that we loaded an interpreter. */
+static void sim_notify_interp(unsigned long load_addr)
+{
+ size_t i;
+ for (i = 0; i < sizeof(load_addr); i++) {
+ unsigned char c = load_addr >> (i * 8);
+ __insn_mtspr(SPR_SIM_CONTROL,
+ (SIM_CONTROL_OS_INTERP
+ | (c << _SIM_CONTROL_OPERATOR_BITS)));
+ }
+}
+
+
+/* Kernel address of page used to map read-only kernel data into userspace. */
+static void *vdso_page;
+
+/* One-entry array used for install_special_mapping. */
+static struct page *vdso_pages[1];
+
+int __init vdso_setup(void)
+{
+ extern char __rt_sigreturn[], __rt_sigreturn_end[];
+ vdso_page = (void *)get_zeroed_page(GFP_ATOMIC);
+ memcpy(vdso_page, __rt_sigreturn, __rt_sigreturn_end - __rt_sigreturn);
+ vdso_pages[0] = virt_to_page(vdso_page);
+ return 0;
+}
+device_initcall(vdso_setup);
+
+const char *arch_vma_name(struct vm_area_struct *vma)
+{
+ if (vma->vm_private_data == vdso_pages)
+ return "[vdso]";
+#ifndef __tilegx__
+ if (vma->vm_start == MEM_USER_INTRPT)
+ return "[intrpt]";
+#endif

+ return NULL;
+}
+

+int arch_setup_additional_pages(struct linux_binprm *bprm,
+ int executable_stack)
+{
+ struct mm_struct *mm = current->mm;
+ unsigned long vdso_base;
+ int retval = 0;
+
+ /*
+ * Notify the simulator that an exec just occurred.
+ * If we can't find the filename of the mapping, just use
+ * whatever was passed as the linux_binprm filename.
+ */
+ if (!notify_exec())
+ sim_notify_exec(bprm->filename);
+
+ down_write(&mm->mmap_sem);
+
+ /*
+ * MAYWRITE to allow gdb to COW and set breakpoints
+ *
+ * Make sure the vDSO gets into every core dump. Dumping its
+ * contents makes post-mortem fully interpretable later
+ * without matching up the same kernel and hardware config to
+ * see what PC values meant.
+ */
+ vdso_base = VDSO_BASE;
+ retval = install_special_mapping(mm, vdso_base, PAGE_SIZE,
+ VM_READ|VM_EXEC|
+ VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC|
+ VM_ALWAYSDUMP,
+ vdso_pages);
+
+#ifndef __tilegx__
+ /*
+ * Set up a user-interrupt mapping here; the user can't
+ * create one themselves since it is above TASK_SIZE.
+ * We make it unwritable by default, so the model for adding
+ * interrupt vectors always involves an mprotect.
+ */
+ if (!retval) {
+ unsigned long addr = MEM_USER_INTRPT;
+ addr = mmap_region(NULL, addr, INTRPT_SIZE,
+ MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE,
+ VM_READ|VM_EXEC|
+ VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0);
+ if (addr > (unsigned long) -PAGE_SIZE)
+ retval = (int) addr;
+ }
+#endif
+
+ up_write(&mm->mmap_sem);
+
+ return retval;
+}
+
+
+void elf_plat_init(struct pt_regs *regs, unsigned long load_addr)
+{
+ /* Zero all registers. */
+ memset(regs, 0, sizeof(*regs));
+
+ /* Report the interpreter's load address. */
+ sim_notify_interp(load_addr);
+}
diff --git a/arch/tile/mm/extable.c b/arch/tile/mm/extable.c
new file mode 100644
index 0000000..4fb0acb
--- /dev/null
+++ b/arch/tile/mm/extable.c
@@ -0,0 +1,30 @@

+#include <linux/spinlock.h>
+#include <linux/uaccess.h>
+
+int fixup_exception(struct pt_regs *regs)
+{
+ const struct exception_table_entry *fixup;
+
+ fixup = search_exception_tables(regs->pc);
+ if (fixup) {
+ regs->pc = fixup->fixup;

+ return 1;
+ }
+
+ return 0;
+}

diff --git a/arch/tile/mm/fault.c b/arch/tile/mm/fault.c
new file mode 100644
index 0000000..9b6b92f
--- /dev/null
+++ b/arch/tile/mm/fault.c
@@ -0,0 +1,905 @@

+ * From i386 code copyright (C) 1995 Linus Torvalds
+ */
+
+#include <linux/signal.h>
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/ptrace.h>
+#include <linux/mman.h>
+#include <linux/mm.h>
+#include <linux/smp.h>
+#include <linux/smp_lock.h>
+#include <linux/interrupt.h>
+#include <linux/init.h>
+#include <linux/tty.h>
+#include <linux/vt_kern.h> /* For unblank_screen() */
+#include <linux/highmem.h>
+#include <linux/module.h>
+#include <linux/kprobes.h>
+#include <linux/hugetlb.h>
+#include <linux/syscalls.h>
+#include <linux/uaccess.h>
+
+#include <asm/system.h>
+#include <asm/pgalloc.h>
+#include <asm/sections.h>
+
+#include <arch/interrupts.h>
+
+/*
+ * Unlock any spinlocks which will prevent us from getting the
+ * message out
+ */
+void bust_spinlocks(int yes)
+{
+ int loglevel_save = console_loglevel;
+
+ if (yes) {
+ oops_in_progress = 1;
+ return;
+ }
+ oops_in_progress = 0;
+ /*
+ * OK, the message is on the console. Now we call printk()
+ * without oops_in_progress set so that printk will give klogd
+ * a poke. Hold onto your hats...
+ */
+ console_loglevel = 15; /* NMI oopser may have shut the console up */
+ printk(" ");
+ console_loglevel = loglevel_save;
+}
+
+static noinline void force_sig_info_fault(int si_signo, int si_code,
+ unsigned long address, int fault_num, struct task_struct *tsk)
+{
+ siginfo_t info;
+
+ if (unlikely(tsk->pid < 2)) {
+ panic("Signal %d (code %d) at %#lx sent to %s!",
+ si_signo, si_code & 0xffff, address,
+ tsk->pid ? "init" : "the idle task");
+ }
+
+ info.si_signo = si_signo;
+ info.si_errno = 0;
+ info.si_code = si_code;
+ info.si_addr = (void __user *)address;
+ info.si_trapno = fault_num;
+ force_sig_info(si_signo, &info, tsk);
+}
+
+#ifndef __tilegx__
+/*
+ * Synthesize the fault a PL0 process would get by doing a word-load of
+ * an unaligned address or a high kernel address. Called indirectly
+ * from sys_cmpxchg() in kernel/intvec.S.
+ */
+int _sys_cmpxchg_badaddr(unsigned long address, struct pt_regs *regs)
+{
+ if (address >= PAGE_OFFSET)
+ force_sig_info_fault(SIGSEGV, SEGV_MAPERR, address,
+ INT_DTLB_MISS, current);
+ else
+ force_sig_info_fault(SIGBUS, BUS_ADRALN, address,
+ INT_UNALIGN_DATA, current);
+
+ /*
+ * Adjust pc to point at the actual instruction, which is unusual
+ * for syscalls normally, but is appropriate when we are claiming
+ * that a syscall swint1 caused a page fault or bus error.
+ */
+ regs->pc -= 8;
+
+ /*
+ * Mark this as a caller-save interrupt, like a normal page fault,
+ * so that when we go through the signal handler path we will
+ * properly restore r0, r1, and r2 for the signal handler arguments.
+ */
+ regs->flags |= PT_FLAGS_CALLER_SAVES;

+
+ return 0;
+}

+#endif
+
+static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
+{
+ unsigned index = pgd_index(address);
+ pgd_t *pgd_k;
+ pud_t *pud, *pud_k;
+ pmd_t *pmd, *pmd_k;
+
+ pgd += index;
+ pgd_k = init_mm.pgd + index;
+
+ if (!pgd_present(*pgd_k))
+ return NULL;
+
+ pud = pud_offset(pgd, address);
+ pud_k = pud_offset(pgd_k, address);
+ if (!pud_present(*pud_k))
+ return NULL;
+
+ pmd = pmd_offset(pud, address);
+ pmd_k = pmd_offset(pud_k, address);
+ if (!pmd_present(*pmd_k))
+ return NULL;
+ if (!pmd_present(*pmd)) {
+ set_pmd(pmd, *pmd_k);
+ arch_flush_lazy_mmu_mode();
+ } else
+ BUG_ON(pmd_ptfn(*pmd) != pmd_ptfn(*pmd_k));
+ return pmd_k;
+}
+
+/*
+ * Handle a fault on the vmalloc or module mapping area
+ */
+static inline int vmalloc_fault(pgd_t *pgd, unsigned long address)
+{
+ pmd_t *pmd_k;
+ pte_t *pte_k;
+
+ /* Make sure we are in vmalloc area */
+ if (!(address >= VMALLOC_START && address < VMALLOC_END))
+ return -1;
+
+ /*
+ * Synchronize this task's top level page-table
+ * with the 'reference' page table.
+ */
+ pmd_k = vmalloc_sync_one(pgd, address);
+ if (!pmd_k)
+ return -1;
+ if (pmd_huge(*pmd_k))
+ return 0; /* support TILE huge_vmap() API */
+ pte_k = pte_offset_kernel(pmd_k, address);
+ if (!pte_present(*pte_k))
+ return -1;
+ return 0;
+}
+
+/* Wait until this PTE has completed migration. */
+static void wait_for_migration(pte_t *pte)
+{
+ if (pte_migrating(*pte)) {
+ /*
+ * Wait until the migrater fixes up this pte.
+ * We scale the loop count by the clock rate so we'll wait for
+ * a few seconds here.
+ */
+ int retries = 0;
+ int bound = get_clock_rate();
+ while (pte_migrating(*pte)) {
+ barrier();
+ if (++retries > bound)
+ panic("Hit migrating PTE (%#llx) and"
+ " page PFN %#lx still migrating",
+ pte->val, pte_pfn(*pte));
+ }
+ }
+}
+
+/*
+ * It's not generally safe to use "current" to get the page table pointer,
+ * since we might be running an oprofile interrupt in the middle of a
+ * task switch.
+ */
+static pgd_t *get_current_pgd(void)
+{
+ HV_Context ctx = hv_inquire_context();
+ unsigned long pgd_pfn = ctx.page_table >> PAGE_SHIFT;
+ struct page *pgd_page = pfn_to_page(pgd_pfn);
+ BUG_ON(PageHighMem(pgd_page)); /* oops, HIGHPTE? */
+ return (pgd_t *) __va(ctx.page_table);
+}
+
+/*
+ * We can receive a page fault from a migrating PTE at any time.
+ * Handle it by just waiting until the fault resolves.
+ *
+ * It's also possible to get a migrating kernel PTE that resolves
+ * itself during the downcall from hypervisor to Linux. We just check
+ * here to see if the PTE seems valid, and if so we retry it.
+ *
+ * NOTE! We MUST NOT take any locks for this case. We may be in an
+ * interrupt or a critical region, and must do as little as possible.
+ * Similarly, we can't use atomic ops here, since we may be handling a
+ * fault caused by an atomic op access.
+ */
+static int handle_migrating_pte(pgd_t *pgd, int fault_num,
+ unsigned long address,
+ int is_kernel_mode, int write)
+{
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+ pte_t pteval;
+
+ if (pgd_addr_invalid(address))
+ return 0;
+
+ pgd += pgd_index(address);
+ pud = pud_offset(pgd, address);
+ if (!pud || !pud_present(*pud))
+ return 0;
+ pmd = pmd_offset(pud, address);
+ if (!pmd || !pmd_present(*pmd))
+ return 0;
+ pte = pmd_huge_page(*pmd) ? ((pte_t *)pmd) :
+ pte_offset_kernel(pmd, address);
+ pteval = *pte;
+ if (pte_migrating(pteval)) {
+ wait_for_migration(pte);

+ return 1;
+ }
+

+ if (!is_kernel_mode || !pte_present(pteval))
+ return 0;
+ if (fault_num == INT_ITLB_MISS) {
+ if (pte_exec(pteval))
+ return 1;
+ } else if (write) {
+ if (pte_write(pteval))
+ return 1;
+ } else {
+ if (pte_read(pteval))

+ return 1;
+ }
+
+ return 0;

+}
+
+/*
+ * This routine is responsible for faulting in user pages.
+ * It passes the work off to one of the appropriate routines.
+ * It returns true if the fault was successfully handled.
+ */
+static int handle_page_fault(struct pt_regs *regs,
+ int fault_num,
+ int is_page_fault,
+ unsigned long address,
+ int write)
+{
+ struct task_struct *tsk;
+ struct mm_struct *mm;
+ struct vm_area_struct *vma;
+ unsigned long stack_offset;
+ int fault;
+ int si_code;
+ int is_kernel_mode;
+ pgd_t *pgd;
+
+ /* on TILE, protection faults are always writes */
+ if (!is_page_fault)
+ write = 1;
+
+ is_kernel_mode = (EX1_PL(regs->ex1) != USER_PL);
+
+ tsk = validate_current();
+
+ /*
+ * Check to see if we might be overwriting the stack, and bail
+ * out if so. The page fault code is a relatively likely
+ * place to get trapped in an infinite regress, and once we
+ * overwrite the whole stack, it becomes very hard to recover.
+ */
+ stack_offset = stack_pointer & (THREAD_SIZE-1);
+ if (stack_offset < THREAD_SIZE / 8) {
+ printk(KERN_ALERT "Potential stack overrun: sp %#lx\n",
+ stack_pointer);
+ show_regs(regs);
+ printk(KERN_ALERT "Killing current process %d/%s\n",
+ tsk->pid, tsk->comm);
+ do_group_exit(SIGKILL);
+ }
+
+ /*
+ * Early on, we need to check for migrating PTE entries;
+ * see homecache.c. If we find a migrating PTE, we wait until
+ * the backing page claims to be done migrating, then we procede.
+ * For kernel PTEs, we rewrite the PTE and return and retry.
+ * Otherwise, we treat the fault like a normal "no PTE" fault,
+ * rather than trying to patch up the existing PTE.
+ */
+ pgd = get_current_pgd();
+ if (handle_migrating_pte(pgd, fault_num, address,
+ is_kernel_mode, write))
+ return 1;
+
+ si_code = SEGV_MAPERR;
+
+ /*
+ * We fault-in kernel-space virtual memory on-demand. The
+ * 'reference' page table is init_mm.pgd.
+ *
+ * NOTE! We MUST NOT take any locks for this case. We may
+ * be in an interrupt or a critical region, and should
+ * only copy the information from the master page table,
+ * nothing more.
+ *
+ * This verifies that the fault happens in kernel space
+ * and that the fault was not a protection fault.
+ */
+ if (unlikely(address >= TASK_SIZE &&
+ !is_arch_mappable_range(address, 0))) {
+ if (is_kernel_mode && is_page_fault &&
+ vmalloc_fault(pgd, address) >= 0)
+ return 1;
+ /*
+ * Don't take the mm semaphore here. If we fixup a prefetch
+ * fault we could otherwise deadlock.
+ */
+ mm = NULL; /* happy compiler */
+ vma = NULL;
+ goto bad_area_nosemaphore;
+ }
+
+ /*
+ * If we're trying to touch user-space addresses, we must
+ * be either at PL0, or else with interrupts enabled in the
+ * kernel, so either way we can re-enable interrupts here.
+ */
+ local_irq_enable();
+
+ mm = tsk->mm;
+
+ /*
+ * If we're in an interrupt, have no user context or are running in an
+ * atomic region then we must not take the fault.
+ */
+ if (in_atomic() || !mm) {
+ vma = NULL; /* happy compiler */
+ goto bad_area_nosemaphore;
+ }
+
+ /*
+ * When running in the kernel we expect faults to occur only to
+ * addresses in user space. All other faults represent errors in the
+ * kernel and should generate an OOPS. Unfortunately, in the case of an
+ * erroneous fault occurring in a code path which already holds mmap_sem
+ * we will deadlock attempting to validate the fault against the
+ * address space. Luckily the kernel only validly references user
+ * space from well defined areas of code, which are listed in the
+ * exceptions table.
+ *
+ * As the vast majority of faults will be valid we will only perform
+ * the source reference check when there is a possibility of a deadlock.
+ * Attempt to lock the address space, if we cannot we then validate the
+ * source. If this is invalid we can skip the address space check,
+ * thus avoiding the deadlock.
+ */
+ if (!down_read_trylock(&mm->mmap_sem)) {
+ if (is_kernel_mode &&
+ !search_exception_tables(regs->pc)) {
+ vma = NULL; /* happy compiler */
+ goto bad_area_nosemaphore;
+ }
+ down_read(&mm->mmap_sem);
+ }
+
+ vma = find_vma(mm, address);
+ if (!vma)
+ goto bad_area;
+ if (vma->vm_start <= address)
+ goto good_area;
+ if (!(vma->vm_flags & VM_GROWSDOWN))
+ goto bad_area;
+ if (regs->sp < PAGE_OFFSET) {
+ /*
+ * accessing the stack below sp is always a bug.
+ */
+ if (address < regs->sp)
+ goto bad_area;
+ }
+ if (expand_stack(vma, address))
+ goto bad_area;
+
+/*
+ * Ok, we have a good vm_area for this memory access, so
+ * we can handle it..
+ */
+good_area:
+ si_code = SEGV_ACCERR;
+ if (fault_num == INT_ITLB_MISS) {
+ if (!(vma->vm_flags & VM_EXEC))
+ goto bad_area;
+ } else if (write) {
+#ifdef TEST_VERIFY_AREA
+ if (!is_page_fault && regs->cs == KERNEL_CS)
+ printk("WP fault at "REGFMT"\n", regs->eip);
+#endif
+ if (!(vma->vm_flags & VM_WRITE))
+ goto bad_area;
+ } else {
+ if (!is_page_fault || !(vma->vm_flags & VM_READ))
+ goto bad_area;
+ }
+
+ survive:
+ /*
+ * If for any reason at all we couldn't handle the fault,
+ * make sure we exit gracefully rather than endlessly redo
+ * the fault.
+ */
+ fault = handle_mm_fault(mm, vma, address, write);
+ if (unlikely(fault & VM_FAULT_ERROR)) {
+ if (fault & VM_FAULT_OOM)
+ goto out_of_memory;
+ else if (fault & VM_FAULT_SIGBUS)
+ goto do_sigbus;
+ BUG();
+ }
+ if (fault & VM_FAULT_MAJOR)
+ tsk->maj_flt++;
+ else
+ tsk->min_flt++;
+
+ /*
+ * If this was an asynchronous fault,
+ * restart the appropriate engine.
+ */
+ switch (fault_num) {
+#if CHIP_HAS_TILE_DMA()
+ case INT_DMATLB_MISS:
+ case INT_DMATLB_MISS_DWNCL:
+ case INT_DMATLB_ACCESS:
+ case INT_DMATLB_ACCESS_DWNCL:
+ __insn_mtspr(SPR_DMA_CTR, SPR_DMA_CTR__REQUEST_MASK);
+ break;
+#endif
+#if CHIP_HAS_SN_PROC()
+ case INT_SNITLB_MISS:
+ case INT_SNITLB_MISS_DWNCL:
+ __insn_mtspr(SPR_SNCTL,
+ __insn_mfspr(SPR_SNCTL) &
+ ~SPR_SNCTL__FRZPROC_MASK);
+ break;
+#endif
+ }
+
+ up_read(&mm->mmap_sem);
+ return 1;
+
+/*
+ * Something tried to access memory that isn't in our memory map..
+ * Fix it, but check if it's kernel or user first..
+ */
+bad_area:
+ up_read(&mm->mmap_sem);
+
+bad_area_nosemaphore:
+ /* User mode accesses just cause a SIGSEGV */
+ if (!is_kernel_mode) {
+ /*
+ * It's possible to have interrupts off here.
+ */
+ local_irq_enable();
+
+ force_sig_info_fault(SIGSEGV, si_code, address,
+ fault_num, tsk);
+ return 0;
+ }
+
+no_context:
+ /* Are we prepared to handle this kernel fault? */
+ if (fixup_exception(regs))
+ return 0;
+
+/*
+ * Oops. The kernel tried to access some bad page. We'll have to
+ * terminate things with extreme prejudice.
+ */
+
+ bust_spinlocks(1);
+
+ /* FIXME: no lookup_address() yet */
+#ifdef SUPPORT_LOOKUP_ADDRESS
+ if (fault_num == INT_ITLB_MISS) {
+ pte_t *pte = lookup_address(address);
+
+ if (pte && pte_present(*pte) && !pte_exec_kernel(*pte))
+ printk(KERN_CRIT "kernel tried to execute"
+ " non-executable page - exploit attempt?"
+ " (uid: %d)\n", current->uid);
+ }
+#endif
+ if (address < PAGE_SIZE)
+ printk(KERN_ALERT "Unable to handle kernel NULL pointer dereference\n");
+ else
+ printk(KERN_ALERT "Unable to handle kernel paging request\n");
+ printk(" at virtual address "REGFMT", pc "REGFMT"\n",
+ address, regs->pc);
+
+ show_regs(regs);
+
+ if (unlikely(tsk->pid < 2)) {
+ panic("Kernel page fault running %s!",
+ tsk->pid ? "init" : "the idle task");
+ }
+
+ /*
+ * More FIXME: we should probably copy the i386 here and
+ * implement a generic die() routine. Not today.
+ */
+#ifdef SUPPORT_DIE
+ die("Oops", regs);
+#endif
+ bust_spinlocks(1);
+
+ do_group_exit(SIGKILL);
+
+/*
+ * We ran out of memory, or some other thing happened to us that made
+ * us unable to handle the page fault gracefully.
+ */
+out_of_memory:
+ up_read(&mm->mmap_sem);
+ if (is_global_init(tsk)) {
+ yield();
+ down_read(&mm->mmap_sem);
+ goto survive;
+ }
+ printk("VM: killing process %s\n", tsk->comm);
+ if (!is_kernel_mode)
+ do_group_exit(SIGKILL);
+ goto no_context;
+
+do_sigbus:
+ up_read(&mm->mmap_sem);
+
+ /* Kernel mode? Handle exceptions or die */
+ if (is_kernel_mode)
+ goto no_context;
+
+ force_sig_info_fault(SIGBUS, BUS_ADRERR, address, fault_num, tsk);
+ return 0;
+}
+
+#ifndef __tilegx__
+
+extern char sys_cmpxchg[], __sys_cmpxchg_end[];
+extern char __sys_cmpxchg_grab_lock[];
+extern char __start_atomic_asm_code[], __end_atomic_asm_code[];
+
+/*
+ * We return this structure in registers to avoid having to write
+ * additional save/restore code in the intvec.S caller.
+ */
+struct intvec_state {
+ void *handler;
+ unsigned long vecnum;
+ unsigned long fault_num;
+ unsigned long info;
+ unsigned long retval;
+};
+
+/* We must release ICS before panicking or we won't get anywhere. */
+#define ics_panic(fmt, ...) do { \
+ __insn_mtspr(SPR_INTERRUPT_CRITICAL_SECTION, 0); \
+ panic(fmt, __VA_ARGS__); \
+} while (0)
+
+void do_page_fault(struct pt_regs *regs, int fault_num,
+ unsigned long address, unsigned long write);
+
+/*
+ * When we take an ITLB or DTLB fault or access violation in the
+ * supervisor while the critical section bit is set, the hypervisor is
+ * reluctant to write new values into the EX_CONTEXT_1_x registers,
+ * since that might indicate we have not yet squirreled the SPR
+ * contents away and can thus safely take a recursive interrupt.
+ * Accordingly, the hypervisor passes us the PC via SYSTEM_SAVE_1_2.
+ */
+struct intvec_state do_page_fault_ics(struct pt_regs *regs, int fault_num,
+ unsigned long address,
+ unsigned long info)
+{
+ unsigned long pc = info & ~1;
+ int write = info & 1;
+ pgd_t *pgd = get_current_pgd();
+
+ /* Retval is 1 at first since we will handle the fault fully. */
+ struct intvec_state state = {
+ do_page_fault, fault_num, address, write, 1
+ };
+
+ /* Validate that we are plausibly in the right routine. */
+ if ((pc & 0x7) != 0 || pc < PAGE_OFFSET ||
+ (fault_num != INT_DTLB_MISS &&
+ fault_num != INT_DTLB_ACCESS)) {
+ unsigned long old_pc = regs->pc;
+ regs->pc = pc;
+ ics_panic("Bad ICS page fault args:"
+ " old PC %#lx, fault %d/%d at %#lx\n",
+ old_pc, fault_num, write, address);
+ }
+
+ /* We might be faulting on a vmalloc page, so check that first. */
+ if (fault_num != INT_DTLB_ACCESS && vmalloc_fault(pgd, address) >= 0)
+ return state;
+
+ /*
+ * If we faulted with ICS set in sys_cmpxchg, we are providing
+ * a user syscall service that should generate a signal on
+ * fault. We didn't set up a kernel stack on initial entry to
+ * sys_cmpxchg, but instead had one set up by the fault, which
+ * (because sys_cmpxchg never releases ICS) came to us via the
+ * SYSTEM_SAVE_1_2 mechanism, and thus EX_CONTEXT_1_[01] are
+ * still referencing the original user code. We release the
+ * atomic lock and rewrite pt_regs so that it appears that we
+ * came from user-space directly, and after we finish the
+ * fault we'll go back to user space and re-issue the swint.
+ * This way the backtrace information is correct if we need to
+ * emit a stack dump at any point while handling this.
+ *
+ * Must match register use in sys_cmpxchg().
+ */
+ if (pc >= (unsigned long) sys_cmpxchg &&
+ pc < (unsigned long) __sys_cmpxchg_end) {
+#ifdef CONFIG_SMP
+ /* Don't unlock before we could have locked. */
+ if (pc >= (unsigned long)__sys_cmpxchg_grab_lock) {
+ int *lock_ptr = (int *)(regs->regs[ATOMIC_LOCK_REG]);
+ __atomic_fault_unlock(lock_ptr);
+ }
+#endif
+ regs->sp = regs->regs[27];
+ }
+
+ /*
+ * We can also fault in the atomic assembly, in which
+ * case we use the exception table to do the first-level fixup.
+ * We may re-fixup again in the real fault handler if it
+ * turns out the faulting address is just bad, and not,
+ * for example, migrating.
+ */
+ else if (pc >= (unsigned long) __start_atomic_asm_code &&
+ pc < (unsigned long) __end_atomic_asm_code) {
+ const struct exception_table_entry *fixup;
+#ifdef CONFIG_SMP
+ /* Unlock the atomic lock. */
+ int *lock_ptr = (int *)(regs->regs[ATOMIC_LOCK_REG]);
+ __atomic_fault_unlock(lock_ptr);
+#endif
+ fixup = search_exception_tables(pc);
+ if (!fixup)
+ ics_panic("ICS atomic fault not in table:"
+ " PC %#lx, fault %d", pc, fault_num);
+ regs->pc = fixup->fixup;
+ regs->ex1 = PL_ICS_EX1(KERNEL_PL, 0);
+ }
+
+ /*
+ * NOTE: the one other type of access that might bring us here
+ * are the memory ops in __tns_atomic_acquire/__tns_atomic_release,
+ * but we don't have to check specially for them since we can
+ * always safely return to the address of the fault and retry,
+ * since no separate atomic locks are involved.

+ */
+
+ /*

+ * Now that we have released the atomic lock (if necessary),
+ * it's safe to spin if the PTE that caused the fault was migrating.
+ */
+ if (fault_num == INT_DTLB_ACCESS)
+ write = 1;
+ if (handle_migrating_pte(pgd, fault_num, address, 1, write))
+ return state;
+
+ /* Return zero so that we continue on with normal fault handling. */
+ state.retval = 0;
+ return state;
+}
+
+#endif /* !__tilegx__ */
+
+/*
+ * This routine handles page faults. It determines the address, and the
+ * problem, and then passes it handle_page_fault() for normal DTLB and
+ * ITLB issues, and for DMA or SN processor faults when we are in user
+ * space. For the latter, if we're in kernel mode, we just save the
+ * interrupt away appropriately and return immediately. We can't do
+ * page faults for user code while in kernel mode.
+ */
+void do_page_fault(struct pt_regs *regs, int fault_num,
+ unsigned long address, unsigned long write)
+{
+ int is_page_fault;
+
+ /* This case should have been handled by do_page_fault_ics(). */
+ BUG_ON(write & ~1);
+
+#if CHIP_HAS_TILE_DMA()
+ /*
+ * If it's a DMA fault, suspend the transfer while we're
+ * handling the miss; we'll restart after it's handled. If we
+ * don't suspend, it's possible that this process could swap
+ * out and back in, and restart the engine since the DMA is
+ * still 'running'.
+ */
+ if (fault_num == INT_DMATLB_MISS ||
+ fault_num == INT_DMATLB_ACCESS ||
+ fault_num == INT_DMATLB_MISS_DWNCL ||
+ fault_num == INT_DMATLB_ACCESS_DWNCL) {
+ __insn_mtspr(SPR_DMA_CTR, SPR_DMA_CTR__SUSPEND_MASK);
+ while (__insn_mfspr(SPR_DMA_USER_STATUS) &
+ SPR_DMA_STATUS__BUSY_MASK)
+ ;
+ }
+#endif
+
+ /* Validate fault num and decide if this is a first-time page fault. */
+ switch (fault_num) {
+ case INT_ITLB_MISS:
+ case INT_DTLB_MISS:
+#if CHIP_HAS_TILE_DMA()
+ case INT_DMATLB_MISS:
+ case INT_DMATLB_MISS_DWNCL:
+#endif
+#if CHIP_HAS_SN_PROC()
+ case INT_SNITLB_MISS:
+ case INT_SNITLB_MISS_DWNCL:
+#endif
+ is_page_fault = 1;
+ break;
+
+ case INT_DTLB_ACCESS:
+#if CHIP_HAS_TILE_DMA()
+ case INT_DMATLB_ACCESS:
+ case INT_DMATLB_ACCESS_DWNCL:
+#endif
+ is_page_fault = 0;
+ break;
+
+ default:
+ panic("Bad fault number %d in do_page_fault", fault_num);
+ }
+
+ if (EX1_PL(regs->ex1) != USER_PL) {
+ struct async_tlb *async;
+ switch (fault_num) {
+#if CHIP_HAS_TILE_DMA()
+ case INT_DMATLB_MISS:
+ case INT_DMATLB_ACCESS:
+ case INT_DMATLB_MISS_DWNCL:
+ case INT_DMATLB_ACCESS_DWNCL:
+ async = &current->thread.dma_async_tlb;
+ break;
+#endif
+#if CHIP_HAS_SN_PROC()
+ case INT_SNITLB_MISS:
+ case INT_SNITLB_MISS_DWNCL:
+ async = &current->thread.sn_async_tlb;
+ break;
+#endif
+ default:
+ async = NULL;
+ }
+ if (async) {
+
+ /*
+ * No vmalloc check required, so we can allow
+ * interrupts immediately at this point.
+ */
+ local_irq_enable();
+
+ set_thread_flag(TIF_ASYNC_TLB);
+ if (async->fault_num != 0) {
+ panic("Second async fault %d;"
+ " old fault was %d (%#lx/%ld)",
+ fault_num, async->fault_num,
+ address, write);
+ }
+ BUG_ON(fault_num == 0);
+ async->fault_num = fault_num;
+ async->is_fault = is_page_fault;
+ async->is_write = write;
+ async->address = address;
+ return;
+ }
+ }
+
+ handle_page_fault(regs, fault_num, is_page_fault, address, write);
+}
+
+
+#if CHIP_HAS_TILE_DMA() || CHIP_HAS_SN_PROC()
+/*
+ * Check an async_tlb structure to see if a deferred fault is waiting,
+ * and if so pass it to the page-fault code.
+ */
+static void handle_async_page_fault(struct pt_regs *regs,
+ struct async_tlb *async)
+{
+ if (async->fault_num) {
+ /*
+ * Clear async->fault_num before calling the page-fault
+ * handler so that if we re-interrupt before returning
+ * from the function we have somewhere to put the
+ * information from the new interrupt.
+ */
+ int fault_num = async->fault_num;
+ async->fault_num = 0;
+ handle_page_fault(regs, fault_num, async->is_fault,
+ async->address, async->is_write);
+ }
+}
+#endif /* CHIP_HAS_TILE_DMA() || CHIP_HAS_SN_PROC() */
+
+
+/*
+ * This routine effectively re-issues asynchronous page faults
+ * when we are returning to user space.
+ */
+void do_async_page_fault(struct pt_regs *regs)
+{
+ /*
+ * Clear thread flag early. If we re-interrupt while processing
+ * code here, we will reset it and recall this routine before
+ * returning to user space.
+ */
+ clear_thread_flag(TIF_ASYNC_TLB);
+
+#if CHIP_HAS_TILE_DMA()
+ handle_async_page_fault(regs, &current->thread.dma_async_tlb);
+#endif
+#if CHIP_HAS_SN_PROC()
+ handle_async_page_fault(regs, &current->thread.sn_async_tlb);
+#endif
+}
+
+void vmalloc_sync_all(void)
+{
+#ifdef __tilegx__
+ /* Currently all L1 kernel pmd's are static and shared. */
+ BUG_ON(pgd_index(VMALLOC_END) != pgd_index(VMALLOC_START));
+#else
+ /*
+ * Note that races in the updates of insync and start aren't
+ * problematic: insync can only get set bits added, and updates to
+ * start are only improving performance (without affecting correctness
+ * if undone).
+ */
+ static DECLARE_BITMAP(insync, PTRS_PER_PGD);
+ static unsigned long start = PAGE_OFFSET;
+ unsigned long address;
+
+ BUILD_BUG_ON(PAGE_OFFSET & ~PGDIR_MASK);
+ for (address = start; address >= PAGE_OFFSET; address += PGDIR_SIZE) {
+ if (!test_bit(pgd_index(address), insync)) {
+ unsigned long flags;
+ struct list_head *pos;
+
+ spin_lock_irqsave(&pgd_lock, flags);
+ list_for_each(pos, &pgd_list)
+ if (!vmalloc_sync_one(list_to_pgd(pos),
+ address)) {
+ /* Must be at first entry in list. */
+ BUG_ON(pos != pgd_list.next);
+ break;
+ }
+ spin_unlock_irqrestore(&pgd_lock, flags);
+ if (pos != pgd_list.next)
+ set_bit(pgd_index(address), insync);
+ }
+ if (address == start && test_bit(pgd_index(address), insync))
+ start = address + PGDIR_SIZE;
+ }
+#endif
+}
diff --git a/arch/tile/mm/highmem.c b/arch/tile/mm/highmem.c
new file mode 100644
index 0000000..1fcecc5
--- /dev/null
+++ b/arch/tile/mm/highmem.c
@@ -0,0 +1,328 @@

+#include <linux/highmem.h>
+#include <linux/module.h>
+#include <linux/pagemap.h>
+#include <asm/homecache.h>
+
+#define kmap_get_pte(vaddr) \
+ pte_offset_kernel(pmd_offset(pud_offset(pgd_offset_k(vaddr), (vaddr)),\
+ (vaddr)), (vaddr))
+
+
+void *kmap(struct page *page)
+{
+ void *kva;
+ unsigned long flags;
+ pte_t *ptep;
+
+ might_sleep();
+ if (!PageHighMem(page))
+ return page_address(page);
+ kva = kmap_high(page);
+
+ /*
+ * Rewrite the PTE under the lock. This ensures that the page
+ * is not currently migrating.
+ */
+ ptep = kmap_get_pte((unsigned long)kva);
+ flags = homecache_kpte_lock();
+ set_pte_at(&init_mm, kva, ptep, mk_pte(page, page_to_kpgprot(page)));
+ homecache_kpte_unlock(flags);
+
+ return kva;
+}
+EXPORT_SYMBOL(kmap);
+
+void kunmap(struct page *page)
+{
+ if (in_interrupt())
+ BUG();
+ if (!PageHighMem(page))
+ return;
+ kunmap_high(page);
+}
+EXPORT_SYMBOL(kunmap);
+
+static void debug_kmap_atomic_prot(enum km_type type)
+{
+#ifdef CONFIG_DEBUG_HIGHMEM
+ static unsigned warn_count = 10;
+
+ if (unlikely(warn_count == 0))
+ return;
+
+ if (unlikely(in_interrupt())) {
+ if (in_irq()) {
+ if (type != KM_IRQ0 && type != KM_IRQ1 &&
+ type != KM_BIO_SRC_IRQ &&
+ /* type != KM_BIO_DST_IRQ && */
+ type != KM_BOUNCE_READ) {
+ WARN_ON(1);
+ warn_count--;
+ }
+ } else if (!irqs_disabled()) { /* softirq */
+ if (type != KM_IRQ0 && type != KM_IRQ1 &&
+ type != KM_SOFTIRQ0 && type != KM_SOFTIRQ1 &&
+ type != KM_SKB_SUNRPC_DATA &&
+ type != KM_SKB_DATA_SOFTIRQ &&
+ type != KM_BOUNCE_READ) {
+ WARN_ON(1);
+ warn_count--;
+ }
+ }
+ }
+
+ if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ ||
+ type == KM_BIO_SRC_IRQ /* || type == KM_BIO_DST_IRQ */) {
+ if (!irqs_disabled()) {
+ WARN_ON(1);
+ warn_count--;
+ }
+ } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) {
+ if (irq_count() == 0 && !irqs_disabled()) {
+ WARN_ON(1);
+ warn_count--;
+ }
+ }
+#endif
+}
+
+/*
+ * Describe a single atomic mapping of a page on a given cpu at a
+ * given address, and allow it to be linked into a list.
+ */
+struct atomic_mapped_page {
+ struct list_head list;
+ struct page *page;
+ int cpu;
+ unsigned long va;
+};
+
+static spinlock_t amp_lock = __SPIN_LOCK_UNLOCKED(&amp_lock);
+static struct list_head amp_list = LIST_HEAD_INIT(amp_list);
+
+/*
+ * Combining this structure with a per-cpu declaration lets us give
+ * each cpu an atomic_mapped_page structure per type.
+ */
+struct kmap_amps {
+ struct atomic_mapped_page per_type[KM_TYPE_NR];
+};
+DEFINE_PER_CPU(struct kmap_amps, amps);
+
+/*
+ * Add a page and va, on this cpu, to the list of kmap_atomic pages,
+ * and write the new pte to memory. Writing the new PTE under the
+ * lock guarantees that it is either on the list before migration starts
+ * (if we won the race), or set_pte() sets the migrating bit in the PTE
+ * (if we lost the race). And doing it under the lock guarantees
+ * that when kmap_atomic_fix_one_pte() comes along, it finds a valid
+ * PTE in memory, iff the mapping is still on the amp_list.
+ *
+ * Finally, doing it under the lock lets us safely examine the page
+ * to see if it is immutable or not, for the generic kmap_atomic() case.
+ * If we examine it earlier we are exposed to a race where it looks
+ * writable earlier, but becomes immutable before we write the PTE.
+ */
+static void kmap_atomic_register(struct page *page, enum km_type type,
+ unsigned long va, pte_t *ptep, pte_t pteval)
+{
+ unsigned long flags;
+ struct atomic_mapped_page *amp;
+
+ flags = homecache_kpte_lock();
+ spin_lock(&amp_lock);
+
+ /* With interrupts disabled, now fill in the per-cpu info. */
+ amp = &__get_cpu_var(amps).per_type[type];
+ amp->page = page;
+ amp->cpu = smp_processor_id();
+ amp->va = va;
+
+ /* For generic kmap_atomic(), choose the PTE writability now. */
+ if (!pte_read(pteval))
+ pteval = mk_pte(page, page_to_kpgprot(page));
+
+ list_add(&amp->list, &amp_list);
+ set_pte(ptep, pteval);
+ arch_flush_lazy_mmu_mode();
+
+ spin_unlock(&amp_lock);
+ homecache_kpte_unlock(flags);
+}
+
+/*
+ * Remove a page and va, on this cpu, from the list of kmap_atomic pages.
+ * Linear-time search, but we count on the lists being short.
+ * We don't need to adjust the PTE under the lock (as opposed to the
+ * kmap_atomic_register() case), since we're just unconditionally
+ * zeroing the PTE after it's off the list.
+ */
+static void kmap_atomic_unregister(struct page *page, unsigned long va)
+{
+ unsigned long flags;
+ struct atomic_mapped_page *amp;
+ int cpu = smp_processor_id();
+ spin_lock_irqsave(&amp_lock, flags);
+ list_for_each_entry(amp, &amp_list, list) {
+ if (amp->page == page && amp->cpu == cpu && amp->va == va)
+ break;
+ }
+ BUG_ON(&amp->list == &amp_list);
+ list_del(&amp->list);
+ spin_unlock_irqrestore(&amp_lock, flags);
+}
+
+/* Helper routine for kmap_atomic_fix_kpte(), below. */
+static void kmap_atomic_fix_one_kpte(struct atomic_mapped_page *amp,
+ int finished)
+{
+ pte_t *ptep = kmap_get_pte(amp->va);
+ if (!finished) {
+ set_pte(ptep, pte_mkmigrate(*ptep));
+ flush_remote(0, 0, NULL, amp->va, PAGE_SIZE, PAGE_SIZE,
+ cpumask_of(amp->cpu), NULL, 0);
+ } else {
+ /*
+ * Rewrite a default kernel PTE for this page.
+ * We rely on the fact that set_pte() writes the
+ * present+migrating bits last.
+ */
+ pte_t pte = mk_pte(amp->page, page_to_kpgprot(amp->page));
+ set_pte(ptep, pte);
+ }
+}
+
+/*
+ * This routine is a helper function for homecache_fix_kpte(); see
+ * its comments for more information on the "finished" argument here.
+ *
+ * Note that we hold the lock while doing the remote flushes, which
+ * will stall any unrelated cpus trying to do kmap_atomic operations.
+ * We could just update the PTEs under the lock, and save away copies
+ * of the structs (or just the va+cpu), then flush them after we
+ * release the lock, but it seems easier just to do it all under the lock.
+ */
+void kmap_atomic_fix_kpte(struct page *page, int finished)
+{
+ struct atomic_mapped_page *amp;
+ unsigned long flags;
+ spin_lock_irqsave(&amp_lock, flags);
+ list_for_each_entry(amp, &amp_list, list) {
+ if (amp->page == page)
+ kmap_atomic_fix_one_kpte(amp, finished);
+ }
+ spin_unlock_irqrestore(&amp_lock, flags);
+}
+
+/*
+ * kmap_atomic/kunmap_atomic is significantly faster than kmap/kunmap
+ * because the kmap code must perform a global TLB invalidation when
+ * the kmap pool wraps.
+ *
+ * Note that they may be slower than on x86 (etc.) because unlike on
+ * those platforms, we do have to take a global lock to map and unmap
+ * pages on Tile (see above).
+ *
+ * When holding an atomic kmap is is not legal to sleep, so atomic
+ * kmaps are appropriate for short, tight code paths only.
+ */
+void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot)
+{
+ enum fixed_addresses idx;
+ unsigned long vaddr;
+ pte_t *pte;
+
+ /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
+ pagefault_disable();
+
+ /* Avoid icache flushes by disallowing atomic executable mappings. */
+ BUG_ON(pte_exec(prot));
+
+ if (!PageHighMem(page))
+ return page_address(page);
+
+ debug_kmap_atomic_prot(type);
+
+ idx = type + KM_TYPE_NR*smp_processor_id();
+ vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+ pte = kmap_get_pte(vaddr);
+ BUG_ON(!pte_none(*pte));
+
+ /* Register that this page is mapped atomically on this cpu. */
+ kmap_atomic_register(page, type, vaddr, pte, mk_pte(page, prot));
+
+ return (void *)vaddr;
+}
+EXPORT_SYMBOL(kmap_atomic_prot);
+
+void *kmap_atomic(struct page *page, enum km_type type)
+{
+ /* PAGE_NONE is a magic value that tells us to check immutability. */
+ return kmap_atomic_prot(page, type, PAGE_NONE);
+}
+EXPORT_SYMBOL(kmap_atomic);
+
+void kunmap_atomic(void *kvaddr, enum km_type type)
+{
+ unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
+ enum fixed_addresses idx = type + KM_TYPE_NR*smp_processor_id();
+
+ /*
+ * Force other mappings to Oops if they try to access this pte without
+ * first remapping it. Keeping stale mappings around is a bad idea.
+ */
+ if (vaddr == __fix_to_virt(FIX_KMAP_BEGIN+idx)) {
+ pte_t *pte = kmap_get_pte(vaddr);
+ pte_t pteval = *pte;
+ BUG_ON(!pte_present(pteval) && !pte_migrating(pteval));
+ kmap_atomic_unregister(pte_page(pteval), vaddr);
+ kpte_clear_flush(pte, vaddr);
+ } else {
+ /* Must be a lowmem page */
+ BUG_ON(vaddr < PAGE_OFFSET);
+ BUG_ON(vaddr >= (unsigned long)high_memory);
+ }
+
+ arch_flush_lazy_mmu_mode();
+ pagefault_enable();
+}
+EXPORT_SYMBOL(kunmap_atomic);
+
+/*
+ * This API is supposed to allow us to map memory without a "struct page".
+ * Currently we don't support this, though this may change in the future.
+ */
+void *kmap_atomic_pfn(unsigned long pfn, enum km_type type)
+{
+ return kmap_atomic(pfn_to_page(pfn), type);
+}
+void *kmap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot)
+{
+ return kmap_atomic_prot(pfn_to_page(pfn), type, prot);
+}
+
+struct page *kmap_atomic_to_page(void *ptr)
+{
+ pte_t *pte;
+ unsigned long vaddr = (unsigned long)ptr;
+
+ if (vaddr < FIXADDR_START)
+ return virt_to_page(ptr);
+
+ pte = kmap_get_pte(vaddr);
+ return pte_page(*pte);
+}
diff --git a/arch/tile/mm/homecache.c b/arch/tile/mm/homecache.c
new file mode 100644
index 0000000..52feb77
--- /dev/null
+++ b/arch/tile/mm/homecache.c
@@ -0,0 +1,445 @@

+ * This code maintains the "home" for each page in the system.
+ */
+
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/spinlock.h>
+#include <linux/list.h>
+#include <linux/bootmem.h>
+#include <linux/rmap.h>
+#include <linux/pagemap.h>
+#include <linux/mutex.h>
+#include <linux/interrupt.h>
+#include <linux/sysctl.h>
+#include <linux/pagevec.h>
+#include <linux/ptrace.h>
+#include <linux/timex.h>
+#include <linux/cache.h>
+#include <linux/smp.h>
+
+#include <asm/page.h>
+#include <asm/sections.h>
+#include <asm/tlbflush.h>
+#include <asm/pgalloc.h>
+#include <asm/homecache.h>
+
+#include "migrate.h"
+
+
+#if CHIP_HAS_COHERENT_LOCAL_CACHE()
+
+/*
+ * The noallocl2 option suppresses all use of the L2 cache to cache
+ * locally from a remote home. There's no point in using it if we
+ * don't have coherent local caching, though.
+ */
+int __write_once noallocl2;
+static int __init set_noallocl2(char *str)
+{
+ noallocl2 = 1;
+ return 0;
+}
+early_param("noallocl2", set_noallocl2);
+
+#else
+
+#define noallocl2 0
+
+#endif
+
+
+
+/* Provide no-op versions of these routines to keep flush_remote() cleaner. */
+#define mark_caches_evicted_start() 0
+#define mark_caches_evicted_finish(mask, timestamp) do {} while (0)
+
+
+
+
+/*
+ * Update the irq_stat for cpus that we are going to interrupt
+ * with TLB or cache flushes. Also handle removing dataplane cpus
+ * from the TLB flush set, and setting dataplane_tlb_state instead.
+ */
+static void hv_flush_update(const struct cpumask *cache_cpumask,
+ struct cpumask *tlb_cpumask,
+ unsigned long tlb_va, unsigned long tlb_length,
+ HV_Remote_ASID *asids, int asidcount)
+{
+ struct cpumask mask;
+ int i, cpu;
+
+ cpumask_clear(&mask);
+ if (cache_cpumask)
+ cpumask_or(&mask, &mask, cache_cpumask);
+ if (tlb_cpumask && tlb_length) {
+ cpumask_or(&mask, &mask, tlb_cpumask);
+ }
+
+ for (i = 0; i < asidcount; ++i)
+ cpumask_set_cpu(asids[i].y * smp_width + asids[i].x, &mask);
+
+ /*
+ * Don't bother to update atomically; losing a count
+ * here is not that critical.
+ */
+ for_each_cpu(cpu, &mask)
+ ++per_cpu(irq_stat, cpu).irq_hv_flush_count;
+}
+
+/*
+ * This wrapper function around hv_flush_remote() does several things:
+ *
+ * - Provides a return value error-checking panic path, since
+ * there's never any good reason for hv_flush_remote() to fail.
+ * - Accepts a 32-bit PFN rather than a 64-bit PA, which generally
+ * is the type that Linux wants to pass around anyway.
+ * - Centralizes the mark_caches_evicted() handling.
+ * - Canonicalizes that lengths of zero make cpumasks NULL.
+ * - Handles deferring TLB flushes for dataplane tiles.
+ * - Tracks remote interrupts in the per-cpu irq_cpustat_t.
+ *
+ * Note that we have to wait until the cache flush completes before
+ * updating the per-cpu last_cache_flush word, since otherwise another
+ * concurrent flush can race, conclude the flush has already
+ * completed, and start to use the page while it's still dirty
+ * remotely (running concurrently with the actual evict, presumably).
+ */
+void flush_remote(unsigned long cache_pfn, unsigned long cache_control,
+ const struct cpumask *cache_cpumask_orig,
+ HV_VirtAddr tlb_va, unsigned long tlb_length,
+ unsigned long tlb_pgsize,
+ const struct cpumask *tlb_cpumask_orig,
+ HV_Remote_ASID *asids, int asidcount)
+{
+ int rc;
+ int timestamp = 0; /* happy compiler */
+ struct cpumask cache_cpumask_copy, tlb_cpumask_copy;
+ struct cpumask *cache_cpumask, *tlb_cpumask;
+ HV_PhysAddr cache_pa;
+ char cache_buf[NR_CPUS*5], tlb_buf[NR_CPUS*5];
+
+ mb(); /* provided just to simplify "magic hypervisor" mode */
+
+ /*
+ * Canonicalize and copy the cpumasks.
+ */
+ if (cache_cpumask_orig && cache_control) {
+ cpumask_copy(&cache_cpumask_copy, cache_cpumask_orig);
+ cache_cpumask = &cache_cpumask_copy;
+ } else {
+ cpumask_clear(&cache_cpumask_copy);
+ cache_cpumask = NULL;
+ }
+ if (cache_cpumask == NULL)
+ cache_control = 0;
+ if (tlb_cpumask_orig && tlb_length) {
+ cpumask_copy(&tlb_cpumask_copy, tlb_cpumask_orig);
+ tlb_cpumask = &tlb_cpumask_copy;
+ } else {
+ cpumask_clear(&tlb_cpumask_copy);
+ tlb_cpumask = NULL;
+ }
+
+ hv_flush_update(cache_cpumask, tlb_cpumask, tlb_va, tlb_length,
+ asids, asidcount);
+ cache_pa = (HV_PhysAddr)cache_pfn << PAGE_SHIFT;
+ if (cache_control & HV_FLUSH_EVICT_L2)
+ timestamp = mark_caches_evicted_start();
+ rc = hv_flush_remote(cache_pa, cache_control,
+ cpumask_bits(cache_cpumask),
+ tlb_va, tlb_length, tlb_pgsize,
+ cpumask_bits(tlb_cpumask),
+ asids, asidcount);
+ if (cache_control & HV_FLUSH_EVICT_L2)
+ mark_caches_evicted_finish(cache_cpumask, timestamp);
+ if (rc == 0)
+ return;
+ cpumask_scnprintf(cache_buf, sizeof(cache_buf), &cache_cpumask_copy);
+ cpumask_scnprintf(tlb_buf, sizeof(tlb_buf), &tlb_cpumask_copy);
+
+ printk("hv_flush_remote(%#llx, %#lx, %p [%s],"
+ " %#lx, %#lx, %#lx, %p [%s], %p, %d) = %d\n",
+ cache_pa, cache_control, cache_cpumask, cache_buf,
+ (unsigned long)tlb_va, tlb_length, tlb_pgsize,
+ tlb_cpumask, tlb_buf,
+ asids, asidcount, rc);
+ if (asidcount > 0) {
+ int i;
+ printk(" asids:");
+ for (i = 0; i < asidcount; ++i)
+ printk(" %d,%d,%d",
+ asids[i].x, asids[i].y, asids[i].asid);
+ printk("\n");
+ }
+ panic("Unsafe to continue.");
+}
+
+void homecache_evict(const struct cpumask *mask)
+{
+ flush_remote(0, HV_FLUSH_EVICT_L2, mask, 0, 0, 0, NULL, NULL, 0);
+}
+
+/* Return a mask of the cpus whose caches currently own these pages. */
+static void homecache_mask(struct page *page, int pages,
+ struct cpumask *home_mask)
+{
+ int i;
+ cpumask_clear(home_mask);
+ for (i = 0; i < pages; ++i) {
+ int home = page_home(&page[i]);
+ if (home == PAGE_HOME_IMMUTABLE ||
+ home == PAGE_HOME_INCOHERENT) {
+ cpumask_copy(home_mask, cpu_possible_mask);
+ return;
+ }
+#if CHIP_HAS_CBOX_HOME_MAP()
+ if (home == PAGE_HOME_HASH) {
+ cpumask_or(home_mask, home_mask, &hash_for_home_map);
+ continue;
+ }
+#endif
+ if (home == PAGE_HOME_UNCACHED)
+ continue;
+ BUG_ON(home < 0 || home >= NR_CPUS);
+ cpumask_set_cpu(home, home_mask);
+ }
+}
+
+/*
+ * Return the passed length, or zero if it's long enough that we
+ * believe we should evict the whole L2 cache.
+ */
+static unsigned long cache_flush_length(unsigned long length)
+{
+ return (length >= CHIP_L2_CACHE_SIZE()) ? HV_FLUSH_EVICT_L2 : length;
+}
+
+/* On the simulator, confirm lines have been evicted everywhere. */
+static void validate_lines_evicted(unsigned long pfn, size_t length)
+{
+ sim_syscall(SIM_SYSCALL_VALIDATE_LINES_EVICTED,
+ (HV_PhysAddr)pfn << PAGE_SHIFT, length);
+}
+
+/* Flush a page out of whatever cache(s) it is in. */
+void homecache_flush_cache(struct page *page, int order)
+{
+ int pages = 1 << order;
+ int length = cache_flush_length(pages * PAGE_SIZE);
+ unsigned long pfn = page_to_pfn(page);
+ struct cpumask home_mask;
+
+ homecache_mask(page, pages, &home_mask);
+ flush_remote(pfn, length, &home_mask, 0, 0, 0, NULL, NULL, 0);
+ validate_lines_evicted(pfn, pages * PAGE_SIZE);
+}
+
+
+/* Report the home corresponding to a given PTE. */
+static int pte_to_home(pte_t pte)
+{
+ if (hv_pte_get_nc(pte))
+ return PAGE_HOME_IMMUTABLE;
+ switch (hv_pte_get_mode(pte)) {
+ case HV_PTE_MODE_CACHE_TILE_L3:
+ return get_remote_cache_cpu(pte);
+ case HV_PTE_MODE_CACHE_NO_L3:
+ return PAGE_HOME_INCOHERENT;
+ case HV_PTE_MODE_UNCACHED:
+ return PAGE_HOME_UNCACHED;
+#if CHIP_HAS_CBOX_HOME_MAP()
+ case HV_PTE_MODE_CACHE_HASH_L3:
+ return PAGE_HOME_HASH;
+#endif
+ }
+ panic("Bad PTE %#llx\n", pte.val);
+}
+
+/* Update the home of a PTE if necessary (can also be used for a pgprot_t). */
+pte_t pte_set_home(pte_t pte, int home)
+{
+ /* Check for non-linear file mapping "PTEs" and pass them through. */
+ if (pte_file(pte))
+ return pte;
+
+#if CHIP_HAS_MMIO()
+ /* Check for MMIO mappings and pass them through. */
+ if (hv_pte_get_mode(pte) == HV_PTE_MODE_MMIO)
+ return pte;
+#endif
+
+
+ /*
+ * Only immutable pages get NC mappings. If we have a
+ * non-coherent PTE, but the underlying page is not
+ * immutable, it's likely the result of a forced
+ * caching setting running up against ptrace setting
+ * the page to be writable underneath. In this case,
+ * just keep the PTE coherent.
+ */
+ if (hv_pte_get_nc(pte) && home != PAGE_HOME_IMMUTABLE) {
+ pte = hv_pte_clear_nc(pte);
+ printk("non-immutable page incoherently referenced: %#llx\n",
+ pte.val);
+ }
+
+ switch (home) {
+
+ case PAGE_HOME_UNCACHED:
+ pte = hv_pte_set_mode(pte, HV_PTE_MODE_UNCACHED);
+ break;
+
+ case PAGE_HOME_INCOHERENT:
+ pte = hv_pte_set_mode(pte, HV_PTE_MODE_CACHE_NO_L3);
+ break;
+
+ case PAGE_HOME_IMMUTABLE:
+ /*
+ * We could home this page anywhere, since it's immutable,
+ * but by default just home it to follow "hash_default".
+ */
+ BUG_ON(hv_pte_get_writable(pte));
+ if (pte_get_forcecache(pte)) {
+ /* Upgrade "force any cpu" to "No L3" for immutable. */
+ if (hv_pte_get_mode(pte) == HV_PTE_MODE_CACHE_TILE_L3
+ && pte_get_anyhome(pte)) {
+ pte = hv_pte_set_mode(pte,
+ HV_PTE_MODE_CACHE_NO_L3);
+ }
+ } else
+#if CHIP_HAS_CBOX_HOME_MAP()
+ if (hash_default)
+ pte = hv_pte_set_mode(pte, HV_PTE_MODE_CACHE_HASH_L3);
+ else
+#endif
+ pte = hv_pte_set_mode(pte, HV_PTE_MODE_CACHE_NO_L3);
+ pte = hv_pte_set_nc(pte);
+ break;
+
+#if CHIP_HAS_CBOX_HOME_MAP()
+ case PAGE_HOME_HASH:
+ pte = hv_pte_set_mode(pte, HV_PTE_MODE_CACHE_HASH_L3);
+ break;
+#endif
+
+ default:
+ BUG_ON(home < 0 || home >= NR_CPUS ||
+ !cpu_is_valid_lotar(home));
+ pte = hv_pte_set_mode(pte, HV_PTE_MODE_CACHE_TILE_L3);
+ pte = set_remote_cache_cpu(pte, home);
+ break;
+ }
+
+#if CHIP_HAS_NC_AND_NOALLOC_BITS()
+ if (noallocl2)
+ pte = hv_pte_set_no_alloc_l2(pte);
+
+ /* Simplify "no local and no l3" to "uncached" */
+ if (hv_pte_get_no_alloc_l2(pte) && hv_pte_get_no_alloc_l1(pte) &&
+ hv_pte_get_mode(pte) == HV_PTE_MODE_CACHE_NO_L3) {
+ pte = hv_pte_set_mode(pte, HV_PTE_MODE_UNCACHED);
+ }
+#endif
+
+ /* Checking this case here gives a better panic than from the hv. */
+ BUG_ON(hv_pte_get_mode(pte) == 0);
+
+ return pte;
+}
+
+/*
+ * The routines in this section are the "static" versions of the normal
+ * dynamic homecaching routines; they just set the home cache
+ * of a kernel page once, and require a full-chip cache/TLB flush,
+ * so they're not suitable for anything but infrequent use.
+ */
+
+#if CHIP_HAS_CBOX_HOME_MAP()
+static inline int initial_page_home(void) { return PAGE_HOME_HASH; }
+#else
+static inline int initial_page_home(void) { return 0; }
+#endif
+
+int page_home(struct page *page)
+{
+ if (PageHighMem(page)) {
+ return initial_page_home();
+ } else {
+ unsigned long kva = (unsigned long)page_address(page);
+ return pte_to_home(*virt_to_pte(NULL, kva));
+ }
+}
+
+void homecache_change_page_home(struct page *page, int order, int home)
+{
+ int i, pages = (1 << order);
+ unsigned long kva;
+
+ BUG_ON(PageHighMem(page));
+ BUG_ON(page_count(page) > 1);
+ BUG_ON(page_mapcount(page) != 0);
+ kva = (unsigned long) page_address(page);
+ flush_remote(0, HV_FLUSH_EVICT_L2, &cpu_cacheable_map,
+ kva, pages * PAGE_SIZE, PAGE_SIZE, cpu_online_mask,
+ NULL, 0);
+
+ for (i = 0; i < pages; ++i, kva += PAGE_SIZE) {
+ pte_t *ptep = virt_to_pte(NULL, kva);
+ pte_t pteval = *ptep;
+ BUG_ON(!pte_present(pteval) || pte_huge(pteval));
+ *ptep = pte_set_home(pteval, home);
+ }
+}
+
+struct page *homecache_alloc_pages(gfp_t gfp_mask,
+ unsigned int order, int home)
+{
+ struct page *page;
+ BUG_ON(gfp_mask & __GFP_HIGHMEM); /* must be lowmem */
+ page = alloc_pages(gfp_mask, order);
+ if (page)
+ homecache_change_page_home(page, order, home);
+ return page;
+}
+
+struct page *homecache_alloc_pages_node(int nid, gfp_t gfp_mask,
+ unsigned int order, int home)
+{
+ struct page *page;
+ BUG_ON(gfp_mask & __GFP_HIGHMEM); /* must be lowmem */
+ page = alloc_pages_node(nid, gfp_mask, order);
+ if (page)
+ homecache_change_page_home(page, order, home);
+ return page;
+}
+
+void homecache_free_pages(unsigned long addr, unsigned int order)
+{
+ struct page *page;
+
+ if (addr == 0)
+ return;
+
+ VM_BUG_ON(!virt_addr_valid((void *)addr));
+ page = virt_to_page((void *)addr);
+ if (put_page_testzero(page)) {
+ int pages = (1 << order);
+ homecache_change_page_home(page, order, initial_page_home());
+ while (pages--)
+ __free_page(page++);
+ }
+}
diff --git a/arch/tile/mm/hugetlbpage.c b/arch/tile/mm/hugetlbpage.c
new file mode 100644
index 0000000..c38570f
--- /dev/null
+++ b/arch/tile/mm/hugetlbpage.c
@@ -0,0 +1,343 @@

+ * TILE Huge TLB Page Support for Kernel.
+ * Taken from i386 hugetlb implementation:
+ * Copyright (C) 2002, Rohit Seth <rohit...@intel.com>
+ */
+
+#include <linux/init.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/hugetlb.h>
+#include <linux/pagemap.h>
+#include <linux/smp_lock.h>
+#include <linux/slab.h>
+#include <linux/err.h>
+#include <linux/sysctl.h>
+#include <linux/mman.h>
+#include <asm/tlb.h>
+#include <asm/tlbflush.h>
+
+pte_t *huge_pte_alloc(struct mm_struct *mm,
+ unsigned long addr, unsigned long sz)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pte_t *pte = NULL;
+
+ /* We do not yet support multiple huge page sizes. */
+ BUG_ON(sz != PMD_SIZE);
+
+ pgd = pgd_offset(mm, addr);
+ pud = pud_alloc(mm, pgd, addr);
+ if (pud)
+ pte = (pte_t *) pmd_alloc(mm, pud, addr);
+ BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
+
+ return pte;
+}
+
+pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd = NULL;
+
+ pgd = pgd_offset(mm, addr);
+ if (pgd_present(*pgd)) {
+ pud = pud_offset(pgd, addr);
+ if (pud_present(*pud))
+ pmd = pmd_offset(pud, addr);
+ }
+ return (pte_t *) pmd;
+}
+
+#ifdef HUGETLB_TEST
+struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
+ int write)
+{
+ unsigned long start = address;
+ int length = 1;
+ int nr;
+ struct page *page;
+ struct vm_area_struct *vma;
+
+ vma = find_vma(mm, addr);
+ if (!vma || !is_vm_hugetlb_page(vma))
+ return ERR_PTR(-EINVAL);
+
+ pte = huge_pte_offset(mm, address);
+
+ /* hugetlb should be locked, and hence, prefaulted */
+ WARN_ON(!pte || pte_none(*pte));
+
+ page = &pte_page(*pte)[vpfn % (HPAGE_SIZE/PAGE_SIZE)];
+
+ WARN_ON(!PageHead(page));
+
+ return page;
+}
+
+int pmd_huge(pmd_t pmd)

+{
+ return 0;
+}
+

+int pud_huge(pud_t pud)

+{
+ return 0;
+}
+

+struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+ pmd_t *pmd, int write)
+{

+ return NULL;
+}
+

+#else
+
+struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
+ int write)
+{
+ return ERR_PTR(-EINVAL);
+}
+
+int pmd_huge(pmd_t pmd)
+{
+ return !!(pmd_val(pmd) & _PAGE_HUGE_PAGE);
+}
+
+int pud_huge(pud_t pud)
+{
+ return !!(pud_val(pud) & _PAGE_HUGE_PAGE);
+}
+
+struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+ pmd_t *pmd, int write)
+{
+ struct page *page;
+
+ page = pte_page(*(pte_t *)pmd);
+ if (page)
+ page += ((address & ~PMD_MASK) >> PAGE_SHIFT);
+ return page;
+}
+
+struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
+ pud_t *pud, int write)
+{
+ struct page *page;
+
+ page = pte_page(*(pte_t *)pud);
+ if (page)
+ page += ((address & ~PUD_MASK) >> PAGE_SHIFT);
+ return page;
+}
+
+int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)

+{
+ return 0;
+}
+

+#endif
+
+#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
+static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
+ unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags)
+{
+ struct hstate *h = hstate_file(file);
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma;
+ unsigned long start_addr;
+
+ if (len > mm->cached_hole_size) {
+ start_addr = mm->free_area_cache;
+ } else {
+ start_addr = TASK_UNMAPPED_BASE;
+ mm->cached_hole_size = 0;
+ }
+
+full_search:
+ addr = ALIGN(start_addr, huge_page_size(h));
+
+ for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
+ /* At this point: (!vma || addr < vma->vm_end). */
+ if (TASK_SIZE - len < addr) {
+ /*
+ * Start a new search - just in case we missed
+ * some holes.
+ */
+ if (start_addr != TASK_UNMAPPED_BASE) {
+ start_addr = TASK_UNMAPPED_BASE;
+ mm->cached_hole_size = 0;
+ goto full_search;
+ }
+ return -ENOMEM;
+ }
+ if (!vma || addr + len <= vma->vm_start) {
+ mm->free_area_cache = addr + len;
+ return addr;
+ }
+ if (addr + mm->cached_hole_size < vma->vm_start)
+ mm->cached_hole_size = vma->vm_start - addr;
+ addr = ALIGN(vma->vm_end, huge_page_size(h));
+ }
+}
+
+static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
+ unsigned long addr0, unsigned long len,
+ unsigned long pgoff, unsigned long flags)
+{
+ struct hstate *h = hstate_file(file);
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma, *prev_vma;
+ unsigned long base = mm->mmap_base, addr = addr0;
+ unsigned long largest_hole = mm->cached_hole_size;
+ int first_time = 1;
+
+ /* don't allow allocations above current base */
+ if (mm->free_area_cache > base)
+ mm->free_area_cache = base;
+
+ if (len <= largest_hole) {
+ largest_hole = 0;
+ mm->free_area_cache = base;
+ }
+try_again:
+ /* make sure it can fit in the remaining address space */
+ if (mm->free_area_cache < len)
+ goto fail;
+
+ /* either no address requested or cant fit in requested address hole */
+ addr = (mm->free_area_cache - len) & huge_page_mask(h);
+ do {
+ /*
+ * Lookup failure means no vma is above this address,
+ * i.e. return with success:
+ */
+ vma = find_vma_prev(mm, addr, &prev_vma);
+ if (!vma) {
+ return addr;
+ break;
+ }
+
+ /*
+ * new region fits between prev_vma->vm_end and
+ * vma->vm_start, use it:
+ */
+ if (addr + len <= vma->vm_start &&
+ (!prev_vma || (addr >= prev_vma->vm_end))) {
+ /* remember the address as a hint for next time */
+ mm->cached_hole_size = largest_hole;
+ mm->free_area_cache = addr;
+ return addr;
+ } else {
+ /* pull free_area_cache down to the first hole */
+ if (mm->free_area_cache == vma->vm_end) {
+ mm->free_area_cache = vma->vm_start;
+ mm->cached_hole_size = largest_hole;
+ }
+ }
+
+ /* remember the largest hole we saw so far */
+ if (addr + largest_hole < vma->vm_start)
+ largest_hole = vma->vm_start - addr;
+
+ /* try just below the current vma->vm_start */
+ addr = (vma->vm_start - len) & huge_page_mask(h);
+
+ } while (len <= vma->vm_start);
+
+fail:
+ /*
+ * if hint left us with no space for the requested
+ * mapping then try again:
+ */
+ if (first_time) {
+ mm->free_area_cache = base;
+ largest_hole = 0;
+ first_time = 0;
+ goto try_again;
+ }
+ /*
+ * A failed mmap() very likely causes application failure,
+ * so fall back to the bottom-up function here. This scenario
+ * can happen with large stack limits and large mmap()
+ * allocations.
+ */
+ mm->free_area_cache = TASK_UNMAPPED_BASE;
+ mm->cached_hole_size = ~0UL;
+ addr = hugetlb_get_unmapped_area_bottomup(file, addr0,
+ len, pgoff, flags);
+
+ /*
+ * Restore the topdown base:
+ */
+ mm->free_area_cache = base;
+ mm->cached_hole_size = ~0UL;
+
+ return addr;
+}
+
+unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
+ unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+ struct hstate *h = hstate_file(file);
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma;
+
+ if (len & ~huge_page_mask(h))
+ return -EINVAL;
+ if (len > TASK_SIZE)
+ return -ENOMEM;
+
+ if (flags & MAP_FIXED) {
+ if (prepare_hugepage_range(file, addr, len))
+ return -EINVAL;
+ return addr;
+ }
+
+ if (addr) {
+ addr = ALIGN(addr, huge_page_size(h));
+ vma = find_vma(mm, addr);
+ if (TASK_SIZE - len >= addr &&
+ (!vma || addr + len <= vma->vm_start))
+ return addr;
+ }
+ if (current->mm->get_unmapped_area == arch_get_unmapped_area)
+ return hugetlb_get_unmapped_area_bottomup(file, addr, len,
+ pgoff, flags);
+ else
+ return hugetlb_get_unmapped_area_topdown(file, addr, len,
+ pgoff, flags);
+}
+
+static __init int setup_hugepagesz(char *opt)
+{
+ unsigned long ps = memparse(opt, &opt);
+ if (ps == PMD_SIZE) {
+ hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
+ } else if (ps == PUD_SIZE) {
+ hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
+ } else {
+ printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n",
+ ps >> 20);
+ return 0;
+ }
+ return 1;
+}
+__setup("hugepagesz=", setup_hugepagesz);
+
+#endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/
diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
new file mode 100644
index 0000000..31b5c09
--- /dev/null
+++ b/arch/tile/mm/init.c
@@ -0,0 +1,1082 @@
+/*
+ * Copyright (C) 1995 Linus Torvalds

+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/module.h>

+#include <linux/signal.h>
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/ptrace.h>
+#include <linux/mman.h>
+#include <linux/mm.h>
+#include <linux/hugetlb.h>
+#include <linux/swap.h>
+#include <linux/smp.h>
+#include <linux/init.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/poison.h>
+#include <linux/bootmem.h>
+#include <linux/slab.h>
+#include <linux/proc_fs.h>
+#include <linux/efi.h>
+#include <linux/memory_hotplug.h>
+#include <linux/uaccess.h>
+#include <asm/mmu_context.h>
+#include <asm/processor.h>
+#include <asm/system.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/dma.h>
+#include <asm/fixmap.h>
+#include <asm/tlb.h>
+#include <asm/tlbflush.h>
+#include <asm/sections.h>
+#include <asm/setup.h>
+#include <asm/homecache.h>

+#include <hv/hypervisor.h>
+#include <arch/chip.h>
+

+#include "migrate.h"
+
+/*
+ * We could set FORCE_MAX_ZONEORDER to "(HPAGE_SHIFT - PAGE_SHIFT + 1)"
+ * in the Tile Kconfig, but this generates configure warnings.
+ * Do it here and force people to get it right to compile this file.
+ * The problem is that with 4KB small pages and 16MB huge pages,
+ * the default value doesn't allow us to group enough small pages
+ * together to make up a huge page.
+ */
+#if CONFIG_FORCE_MAX_ZONEORDER < HPAGE_SHIFT - PAGE_SHIFT + 1
+# error "Change FORCE_MAX_ZONEORDER in arch/tile/Kconfig to match page size"
+#endif
+
+#define clear_pgd(pmdptr) (*(pmdptr) = hv_pte(0))
+
+unsigned long VMALLOC_RESERVE = CONFIG_VMALLOC_RESERVE;
+
+DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
+
+/* Create an L2 page table */
+static pte_t * __init alloc_pte(void)
+{
+ return __alloc_bootmem(L2_KERNEL_PGTABLE_SIZE, HV_PAGE_TABLE_ALIGN, 0);
+}
+
+/*
+ * L2 page tables per controller. We allocate these all at once from
+ * the bootmem allocator and store them here. This saves on kernel L2
+ * page table memory, compared to allocating a full 64K page per L2
+ * page table, and also means that in cases where we use huge pages,
+ * we are guaranteed to later be able to shatter those huge pages and
+ * switch to using these page tables instead, without requiring
+ * further allocation. Each l2_ptes[] entry points to the first page
+ * table for the first hugepage-size piece of memory on the
+ * controller; other page tables are just indexed directly, i.e. the
+ * L2 page tables are contiguous in memory for each controller.
+ */
+static pte_t *l2_ptes[MAX_NUMNODES];
+static int num_l2_ptes[MAX_NUMNODES];
+
+static void init_prealloc_ptes(int node, int pages)
+{
+ BUG_ON(pages & (HV_L2_ENTRIES-1));
+ if (pages) {
+ num_l2_ptes[node] = pages;
+ l2_ptes[node] = __alloc_bootmem(pages * sizeof(pte_t),
+ HV_PAGE_TABLE_ALIGN, 0);
+ }
+}
+
+pte_t *get_prealloc_pte(unsigned long pfn)
+{
+ int node = pfn_to_nid(pfn);
+ pfn &= ~(-1UL << (NR_PA_HIGHBIT_SHIFT - PAGE_SHIFT));
+ BUG_ON(node >= MAX_NUMNODES);
+ BUG_ON(pfn >= num_l2_ptes[node]);
+ return &l2_ptes[node][pfn];
+}
+
+/*
+ * What caching do we expect pages from the heap to have when
+ * they are allocated during bootup? (Once we've installed the
+ * "real" swapper_pg_dir.)
+ */
+static int initial_heap_home(void)
+{
+#if CHIP_HAS_CBOX_HOME_MAP()
+ if (hash_default)
+ return PAGE_HOME_HASH;
+#endif
+ return smp_processor_id();
+}
+
+/*
+ * Place a pointer to an L2 page table in a middle page
+ * directory entry.
+ */
+static void __init assign_pte(pmd_t *pmd, pte_t *page_table)
+{
+ phys_addr_t pa = __pa(page_table);
+ unsigned long l2_ptfn = pa >> HV_LOG2_PAGE_TABLE_ALIGN;
+ pte_t pteval = hv_pte_set_ptfn(__pgprot(_PAGE_TABLE), l2_ptfn);
+ BUG_ON((pa & (HV_PAGE_TABLE_ALIGN-1)) != 0);
+ pteval = pte_set_home(pteval, initial_heap_home());
+ *(pte_t *)pmd = pteval;
+ if (page_table != (pte_t *)pmd_page_vaddr(*pmd))
+ BUG();
+}
+
+#ifdef __tilegx__
+
+#if HV_L1_SIZE != HV_L2_SIZE
+# error Rework assumption that L1 and L2 page tables are same size.
+#endif
+
+/* Since pmd_t arrays and pte_t arrays are the same size, just use casts. */
+static inline pmd_t *alloc_pmd(void)
+{
+ return (pmd_t *)alloc_pte();
+}
+
+static inline void assign_pmd(pud_t *pud, pmd_t *pmd)
+{
+ assign_pte((pmd_t *)pud, (pte_t *)pmd);
+}
+
+#endif /* __tilegx__ */
+
+/* Replace the given pmd with a full PTE table. */
+void __init shatter_pmd(pmd_t *pmd)
+{
+ pte_t *pte = get_prealloc_pte(pte_pfn(*(pte_t *)pmd));
+ assign_pte(pmd, pte);
+}
+
+#ifdef CONFIG_HIGHMEM
+/*
+ * This function initializes a certain range of kernel virtual memory
+ * with new bootmem page tables, everywhere page tables are missing in
+ * the given range.
+ */
+
+/*
+ * NOTE: The pagetables are allocated contiguous on the physical space
+ * so we can cache the place of the first one and move around without
+ * checking the pgd every time.
+ */
+static void __init page_table_range_init(unsigned long start,
+ unsigned long end, pgd_t *pgd_base)
+{
+ pgd_t *pgd;
+ int pgd_idx;
+ unsigned long vaddr;
+
+ vaddr = start;
+ pgd_idx = pgd_index(vaddr);
+ pgd = pgd_base + pgd_idx;
+
+ for ( ; (pgd_idx < PTRS_PER_PGD) && (vaddr != end); pgd++, pgd_idx++) {
+ pmd_t *pmd = pmd_offset(pud_offset(pgd, vaddr), vaddr);
+ if (pmd_none(*pmd))
+ assign_pte(pmd, alloc_pte());
+ vaddr += PMD_SIZE;
+ }
+}
+#endif /* CONFIG_HIGHMEM */
+
+
+#if CHIP_HAS_CBOX_HOME_MAP()
+
+static int __initdata ktext_hash = 1; /* .text pages */
+static int __initdata kdata_hash = 1; /* .data and .bss pages */
+int __write_once hash_default = 1; /* kernel allocator pages */
+EXPORT_SYMBOL(hash_default);
+int __write_once kstack_hash = 1; /* if no homecaching, use h4h */
+#endif /* CHIP_HAS_CBOX_HOME_MAP */
+
+/*
+ * CPUs to use to for striping the pages of kernel data. If hash-for-home
+ * is available, this is only relevant if kcache_hash sets up the
+ * .data and .bss to be page-homed, and we don't want the default mode
+ * of using the full set of kernel cpus for the striping.
+ */
+static __initdata struct cpumask kdata_mask;
+static __initdata int kdata_arg_seen;
+
+int __write_once kdata_huge; /* if no homecaching, small pages */
+
+
+/* Combine a generic pgprot_t with cache home to get a cache-aware pgprot. */
+static pgprot_t __init construct_pgprot(pgprot_t prot, int home)
+{
+ prot = pte_set_home(prot, home);
+#if CHIP_HAS_CBOX_HOME_MAP()
+ if (home == PAGE_HOME_IMMUTABLE) {
+ if (ktext_hash)
+ prot = hv_pte_set_mode(prot, HV_PTE_MODE_CACHE_HASH_L3);
+ else
+ prot = hv_pte_set_mode(prot, HV_PTE_MODE_CACHE_NO_L3);
+ }
+#endif
+ return prot;
+}
+
+/*
+ * For a given kernel data VA, how should it be cached?
+ * We return the complete pgprot_t with caching bits set.
+ */
+static pgprot_t __init init_pgprot(ulong address)
+{
+ int cpu;
+ unsigned long page;
+ enum { CODE_DELTA = MEM_SV_INTRPT - PAGE_OFFSET };
+
+#if CHIP_HAS_CBOX_HOME_MAP()
+ /* For kdata=huge, everything is just hash-for-home. */
+ if (kdata_huge)
+ return construct_pgprot(PAGE_KERNEL, PAGE_HOME_HASH);
+#endif
+
+ /* We map the aliased pages of permanent text inaccessible. */
+ if (address < (ulong) _sinittext - CODE_DELTA)
+ return PAGE_NONE;
+
+ /*
+ * We map read-only data non-coherent for performance. We could
+ * use neighborhood caching on TILE64, but it's not clear it's a win.
+ */
+ if ((address >= (ulong) __start_rodata &&
+ address < (ulong) __end_rodata) ||
+ address == (ulong) empty_zero_page) {
+ return construct_pgprot(PAGE_KERNEL_RO, PAGE_HOME_IMMUTABLE);
+ }
+
+ /* As a performance optimization, keep the boot init stack here. */
+ if (address >= (ulong)&init_thread_union &&
+ address < (ulong)&init_thread_union + THREAD_SIZE)
+ return construct_pgprot(PAGE_KERNEL, smp_processor_id());
+
+#ifndef __tilegx__
+#if !ATOMIC_LOCKS_FOUND_VIA_TABLE()
+ /* Force the atomic_locks[] array page to be hash-for-home. */
+ if (address == (ulong) atomic_locks)
+ return construct_pgprot(PAGE_KERNEL, PAGE_HOME_HASH);
+#endif
+#endif
+
+ /*
+ * Everything else that isn't data or bss is heap, so mark it
+ * with the initial heap home (hash-for-home, or this cpu). This
+ * includes any addresses after the loaded image; any address before
+ * _einittext (since we already captured the case of text before
+ * _sinittext); and any init-data pages.
+ *
+ * All the LOWMEM pages that we mark this way will get their
+ * struct page homecache properly marked later, in set_page_homes().
+ * The HIGHMEM pages we leave with a default zero for their
+ * homes, but with a zero free_time we don't have to actually
+ * do a flush action the first time we use them, either.
+ */
+ if (address >= (ulong) _end || address < (ulong) _sdata ||
+ (address >= (ulong) _sinitdata &&
+ address < (ulong) _einitdata))
+ return construct_pgprot(PAGE_KERNEL, initial_heap_home());
+
+#if CHIP_HAS_CBOX_HOME_MAP()
+ /* Use hash-for-home if requested for data/bss. */
+ if (kdata_hash)
+ return construct_pgprot(PAGE_KERNEL, PAGE_HOME_HASH);
+#endif
+
+ /*
+ * Otherwise we just hand out consecutive cpus. To avoid
+ * requiring this function to hold state, we just walk forward from
+ * _sdata by PAGE_SIZE, skipping the readonly and init data, to reach
+ * the requested address, while walking cpu home around kdata_mask.
+ * This is typically no more than a dozen or so iterations.
+ */
+ BUG_ON(_einitdata != __bss_start);
+ for (page = (ulong)_sdata, cpu = NR_CPUS; ; ) {
+ cpu = cpumask_next(cpu, &kdata_mask);
+ if (cpu == NR_CPUS)
+ cpu = cpumask_first(&kdata_mask);
+ if (page >= address)
+ break;
+ page += PAGE_SIZE;
+ if (page == (ulong)__start_rodata)
+ page = (ulong)__end_rodata;
+ if (page == (ulong)&init_thread_union)
+ page += THREAD_SIZE;
+ if (page == (ulong)_sinitdata)
+ page = (ulong)_einitdata;
+ if (page == (ulong)empty_zero_page)
+ page += PAGE_SIZE;
+#ifndef __tilegx__
+#if !ATOMIC_LOCKS_FOUND_VIA_TABLE()
+ if (page == (ulong)atomic_locks)
+ page += PAGE_SIZE;
+#endif
+#endif
+
+ }
+ return construct_pgprot(PAGE_KERNEL, cpu);
+}
+
+/*
+ * This function sets up how we cache the kernel text. If we have
+ * hash-for-home support, normally that is used instead (see the
+ * kcache_hash boot flag for more information). But if we end up
+ * using a page-based caching technique, this option sets up the
+ * details of that. In addition, the "ktext=nocache" option may
+ * always be used to disable local caching of text pages, if desired.
+ */
+
+static int __initdata ktext_arg_seen;
+static int __initdata ktext_small;
+static int __initdata ktext_local;
+static int __initdata ktext_all;
+static int __initdata ktext_nondataplane;
+static int __initdata ktext_nocache;
+static struct cpumask __initdata ktext_mask;
+
+static int __init setup_ktext(char *str)
+{
+ if (str == NULL)
+ return -EINVAL;
+
+ /* If you have a leading "nocache", turn off ktext caching */
+ if (strncmp(str, "nocache", 7) == 0) {
+ ktext_nocache = 1;
+ printk("ktext: disabling local caching of kernel text\n");
+ str += 7;
+ if (*str == ',')
+ ++str;
+ if (*str == '\0')
+ return 0;
+ }
+
+ ktext_arg_seen = 1;
+
+ /* Default setting on Tile64: use a huge page */
+ if (strcmp(str, "huge") == 0)
+ printk("ktext: using one huge locally cached page\n");
+
+ /* Pay TLB cost but get no cache benefit: cache small pages locally */
+ else if (strcmp(str, "local") == 0) {
+ ktext_small = 1;
+ ktext_local = 1;
+ printk("ktext: using small pages with local caching\n");
+ }
+
+ /* Neighborhood cache ktext pages on all cpus. */
+ else if (strcmp(str, "all") == 0) {
+ ktext_small = 1;
+ ktext_all = 1;
+ printk("ktext: using maximal caching neighborhood\n");
+ }
+
+
+ /* Neighborhood ktext pages on specified mask */
+ else if (cpulist_parse(str, &ktext_mask) == 0) {
+ char buf[NR_CPUS * 5];
+ cpulist_scnprintf(buf, sizeof(buf), &ktext_mask);
+ if (cpumask_weight(&ktext_mask) > 1) {
+ ktext_small = 1;
+ printk("ktext: using caching neighborhood %s "
+ "with small pages\n", buf);
+ } else {
+ printk("ktext: caching on cpu %s with one huge page\n",
+ buf);
+ }
+ }
+
+ else if (*str)
+ return -EINVAL;

+
+ return 0;
+}
+

+early_param("ktext", setup_ktext);
+
+
+static inline pgprot_t ktext_set_nocache(pgprot_t prot)
+{
+ if (!ktext_nocache)
+ prot = hv_pte_set_nc(prot);
+#if CHIP_HAS_NC_AND_NOALLOC_BITS()
+ else
+ prot = hv_pte_set_no_alloc_l2(prot);
+#endif
+ return prot;
+}
+
+#ifndef __tilegx__
+static pmd_t *__init get_pmd(pgd_t pgtables[], unsigned long va)
+{
+ return pmd_offset(pud_offset(&pgtables[pgd_index(va)], va), va);
+}
+#else
+static pmd_t *__init get_pmd(pgd_t pgtables[], unsigned long va)
+{
+ pud_t *pud = pud_offset(&pgtables[pgd_index(va)], va);
+ if (pud_none(*pud))
+ assign_pmd(pud, alloc_pmd());
+ return pmd_offset(pud, va);
+}
+#endif
+
+/* Temporary page table we use for staging. */
+static pgd_t pgtables[PTRS_PER_PGD]
+ __attribute__((section(".init.page")));
+
+/*
+ * This maps the physical memory to kernel virtual address space, a total
+ * of max_low_pfn pages, by creating page tables starting from address
+ * PAGE_OFFSET.
+ *
+ * This routine transitions us from using a set of compiled-in large
+ * pages to using some more precise caching, including removing access
+ * to code pages mapped at PAGE_OFFSET (executed only at MEM_SV_START)
+ * marking read-only data as locally cacheable, striping the remaining
+ * .data and .bss across all the available tiles, and removing access
+ * to pages above the top of RAM (thus ensuring a page fault from a bad
+ * virtual address rather than a hypervisor shoot down for accessing
+ * memory outside the assigned limits).
+ */
+static void __init kernel_physical_mapping_init(pgd_t *pgd_base)
+{
+ unsigned long address, pfn;
+ pmd_t *pmd;
+ pte_t *pte;
+ int pte_ofs;
+ const struct cpumask *my_cpu_mask = cpumask_of(smp_processor_id());
+ struct cpumask kstripe_mask;
+ int rc, i;
+
+#if CHIP_HAS_CBOX_HOME_MAP()
+ if (ktext_arg_seen && ktext_hash) {
+ printk("warning: \"ktext\" boot argument ignored"
+ " if \"kcache_hash\" sets up text hash-for-home\n");
+ ktext_small = 0;
+ }
+
+ if (kdata_arg_seen && kdata_hash) {
+ printk("warning: \"kdata\" boot argument ignored"
+ " if \"kcache_hash\" sets up data hash-for-home\n");
+ }
+
+ if (kdata_huge && !hash_default) {
+ printk("warning: disabling \"kdata=huge\"; requires"
+ " kcache_hash=all or =allbutstack\n");
+ kdata_huge = 0;
+ }
+#endif
+
+ /*
+ * Set up a mask for cpus to use for kernel striping.
+ * This is normally all cpus, but minus dataplane cpus if any.
+ * If the dataplane covers the whole chip, we stripe over
+ * the whole chip too.
+ */
+ cpumask_copy(&kstripe_mask, cpu_possible_mask);
+ if (!kdata_arg_seen)
+ kdata_mask = kstripe_mask;
+
+ /* Allocate and fill in L2 page tables */
+ for (i = 0; i < MAX_NUMNODES; ++i) {
+#ifdef CONFIG_HIGHMEM
+ unsigned long end_pfn = node_lowmem_end_pfn[i];
+#else
+ unsigned long end_pfn = node_end_pfn[i];
+#endif
+ unsigned long end_huge_pfn = 0;
+
+ /* Pre-shatter the last huge page to allow per-cpu pages. */
+ if (kdata_huge)
+ end_huge_pfn = end_pfn - (HPAGE_SIZE >> PAGE_SHIFT);
+
+ pfn = node_start_pfn[i];
+
+ /* Allocate enough memory to hold L2 page tables for node. */
+ init_prealloc_ptes(i, end_pfn - pfn);
+
+ address = (unsigned long) pfn_to_kaddr(pfn);
+ while (pfn < end_pfn) {
+ BUG_ON(address & (HPAGE_SIZE-1));
+ pmd = get_pmd(pgtables, address);
+ pte = get_prealloc_pte(pfn);
+ if (pfn < end_huge_pfn) {
+ pgprot_t prot = init_pgprot(address);
+ *(pte_t *)pmd = pte_mkhuge(pfn_pte(pfn, prot));
+ for (pte_ofs = 0; pte_ofs < PTRS_PER_PTE;
+ pfn++, pte_ofs++, address += PAGE_SIZE)
+ pte[pte_ofs] = pfn_pte(pfn, prot);
+ } else {
+ if (kdata_huge)
+ printk(KERN_DEBUG "pre-shattered huge"
+ " page at %#lx\n", address);
+ for (pte_ofs = 0; pte_ofs < PTRS_PER_PTE;
+ pfn++, pte_ofs++, address += PAGE_SIZE) {
+ pgprot_t prot = init_pgprot(address);
+ pte[pte_ofs] = pfn_pte(pfn, prot);
+ }
+ assign_pte(pmd, pte);
+ }
+ }
+ }
+
+ /*
+ * Set or check ktext_map now that we have cpu_possible_mask
+ * and kstripe_mask to work with.
+ */
+ if (ktext_all)
+ cpumask_copy(&ktext_mask, cpu_possible_mask);
+ else if (ktext_nondataplane)
+ ktext_mask = kstripe_mask;
+ else if (!cpumask_empty(&ktext_mask)) {
+ /* Sanity-check any mask that was requested */
+ struct cpumask bad;
+ cpumask_andnot(&bad, &ktext_mask, cpu_possible_mask);
+ cpumask_and(&ktext_mask, &ktext_mask, cpu_possible_mask);
+ if (!cpumask_empty(&bad)) {
+ char buf[NR_CPUS * 5];
+ cpulist_scnprintf(buf, sizeof(buf), &bad);
+ printk("ktext: not using unavailable cpus %s\n", buf);
+ }
+ if (cpumask_empty(&ktext_mask)) {
+ printk("ktext: no valid cpus; caching on %d.\n",
+ smp_processor_id());
+ cpumask_copy(&ktext_mask,
+ cpumask_of(smp_processor_id()));
+ }
+ }
+
+ address = MEM_SV_INTRPT;
+ pmd = get_pmd(pgtables, address);
+ if (ktext_small) {
+ /* Allocate an L2 PTE for the kernel text */
+ int cpu = 0;
+ pgprot_t prot = construct_pgprot(PAGE_KERNEL_EXEC,
+ PAGE_HOME_IMMUTABLE);
+
+ if (ktext_local) {
+ if (ktext_nocache)
+ prot = hv_pte_set_mode(prot,
+ HV_PTE_MODE_UNCACHED);
+ else
+ prot = hv_pte_set_mode(prot,
+ HV_PTE_MODE_CACHE_NO_L3);
+ } else {
+ prot = hv_pte_set_mode(prot,
+ HV_PTE_MODE_CACHE_TILE_L3);
+ cpu = cpumask_first(&ktext_mask);
+
+ prot = ktext_set_nocache(prot);
+ }
+
+ BUG_ON(address != (unsigned long)_stext);
+ pfn = 0; /* code starts at PA 0 */
+ pte = alloc_pte();
+ for (pte_ofs = 0; address < (unsigned long)_einittext;
+ pfn++, pte_ofs++, address += PAGE_SIZE) {
+ if (!ktext_local) {
+ prot = set_remote_cache_cpu(prot, cpu);
+ cpu = cpumask_next(cpu, &ktext_mask);
+ if (cpu == NR_CPUS)
+ cpu = cpumask_first(&ktext_mask);
+ }
+ pte[pte_ofs] = pfn_pte(pfn, prot);
+ }
+ assign_pte(pmd, pte);
+ } else {
+ pte_t pteval = pfn_pte(0, PAGE_KERNEL_EXEC);
+ pteval = pte_mkhuge(pteval);
+#if CHIP_HAS_CBOX_HOME_MAP()
+ if (ktext_hash) {
+ pteval = hv_pte_set_mode(pteval,
+ HV_PTE_MODE_CACHE_HASH_L3);
+ pteval = ktext_set_nocache(pteval);
+ } else
+#endif /* CHIP_HAS_CBOX_HOME_MAP() */
+ if (cpumask_weight(&ktext_mask) == 1) {
+ pteval = set_remote_cache_cpu(pteval,
+ cpumask_first(&ktext_mask));
+ pteval = hv_pte_set_mode(pteval,
+ HV_PTE_MODE_CACHE_TILE_L3);
+ pteval = ktext_set_nocache(pteval);
+ } else if (ktext_nocache)
+ pteval = hv_pte_set_mode(pteval,
+ HV_PTE_MODE_UNCACHED);
+ else
+ pteval = hv_pte_set_mode(pteval,
+ HV_PTE_MODE_CACHE_NO_L3);
+ *(pte_t *)pmd = pteval;
+ }
+
+ /* Set swapper_pgprot here so it is flushed to memory right away. */
+ swapper_pgprot = init_pgprot((unsigned long)swapper_pg_dir);
+
+ /*
+ * Since we may be changing the caching of the stack and page
+ * table itself, we invoke an assembly helper to do the
+ * following steps:
+ *
+ * - flush the cache so we start with an empty slate
+ * - install pgtables[] as the real page table
+ * - flush the TLB so the new page table takes effect
+ */
+ rc = flush_and_install_context(__pa(pgtables),
+ init_pgprot((unsigned long)pgtables),
+ __get_cpu_var(current_asid),
+ cpumask_bits(my_cpu_mask));
+ BUG_ON(rc != 0);
+
+ /* Copy the page table back to the normal swapper_pg_dir. */
+ memcpy(pgd_base, pgtables, sizeof(pgtables));
+ __install_page_table(pgd_base, __get_cpu_var(current_asid),
+ swapper_pgprot);
+}
+
+/*
+ * devmem_is_allowed() checks to see if /dev/mem access to a certain address
+ * is valid. The argument is a physical page number.
+ *
+ * On Tile, the only valid things for which we can just hand out unchecked
+ * PTEs are the kernel code and data. Anything else might change its
+ * homing with time, and we wouldn't know to adjust the /dev/mem PTEs.
+ * Note that init_thread_union is released to heap soon after boot,
+ * so we include it in the init data.
+ *
+ * For TILE-Gx, we might want to consider allowing access to PA
+ * regions corresponding to PCI space, etc.
+ */
+int devmem_is_allowed(unsigned long pagenr)
+{
+ return pagenr < kaddr_to_pfn(_end) &&
+ !(pagenr >= kaddr_to_pfn(&init_thread_union) ||
+ pagenr < kaddr_to_pfn(_einitdata)) &&
+ !(pagenr >= kaddr_to_pfn(_sinittext) ||
+ pagenr <= kaddr_to_pfn(_einittext-1));
+}
+
+#ifdef CONFIG_HIGHMEM
+static void __init permanent_kmaps_init(pgd_t *pgd_base)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+ unsigned long vaddr;
+
+ vaddr = PKMAP_BASE;
+ page_table_range_init(vaddr, vaddr + PAGE_SIZE*LAST_PKMAP, pgd_base);
+
+ pgd = swapper_pg_dir + pgd_index(vaddr);
+ pud = pud_offset(pgd, vaddr);
+ pmd = pmd_offset(pud, vaddr);
+ pte = pte_offset_kernel(pmd, vaddr);
+ pkmap_page_table = pte;
+}
+#endif /* CONFIG_HIGHMEM */
+
+
+static void __init init_free_pfn_range(unsigned long start, unsigned long end)
+{
+ unsigned long pfn;
+ struct page *page = pfn_to_page(start);
+
+ for (pfn = start; pfn < end; ) {
+ /* Optimize by freeing pages in large batches */
+ int order = __ffs(pfn);
+ int count, i;
+ struct page *p;
+
+ if (order >= MAX_ORDER)
+ order = MAX_ORDER-1;
+ count = 1 << order;
+ while (pfn + count > end) {
+ count >>= 1;
+ --order;
+ }
+ for (p = page, i = 0; i < count; ++i, ++p) {
+ __ClearPageReserved(p);
+ /*
+ * Hacky direct set to avoid unnecessary
+ * lock take/release for EVERY page here.
+ */
+ p->_count.counter = 0;
+ p->_mapcount.counter = -1;
+ }
+ init_page_count(page);
+ __free_pages(page, order);
+ totalram_pages += count;
+
+ page += count;
+ pfn += count;
+ }
+}
+
+static void __init set_non_bootmem_pages_init(void)
+{
+ struct zone *z;
+ for_each_zone(z) {
+ unsigned long start, end;
+ int nid = z->zone_pgdat->node_id;
+
+ start = z->zone_start_pfn;
+ if (start == 0)
+ continue; /* bootmem */
+ end = start + z->spanned_pages;
+ if (zone_idx(z) == ZONE_DMA) {
+ BUG_ON(start != node_start_pfn[nid]);
+ start = node_free_pfn[nid];
+ }
+#ifdef CONFIG_HIGHMEM
+ if (zone_idx(z) == ZONE_HIGHMEM)
+ totalhigh_pages += z->spanned_pages;
+#endif
+ if (kdata_huge) {
+ unsigned long percpu_pfn = node_percpu_pfn[nid];
+ if (start < percpu_pfn && end > percpu_pfn)
+ end = percpu_pfn;
+ }
+#ifdef CONFIG_PCI
+ if (start <= pci_reserve_start_pfn &&
+ end > pci_reserve_start_pfn) {
+ if (end > pci_reserve_end_pfn)
+ init_free_pfn_range(pci_reserve_end_pfn, end);
+ end = pci_reserve_start_pfn;
+ }
+#endif
+ init_free_pfn_range(start, end);
+ }
+}
+
+/*
+ * paging_init() sets up the page tables - note that all of lowmem is
+ * already mapped by head.S.
+ */
+void __init paging_init(void)
+{
+#ifdef CONFIG_HIGHMEM
+ unsigned long vaddr, end;
+#endif
+#ifdef __tilegx__
+ pud_t *pud;
+#endif
+ pgd_t *pgd_base = swapper_pg_dir;
+
+ kernel_physical_mapping_init(pgd_base);
+
+#ifdef CONFIG_HIGHMEM
+ /*
+ * Fixed mappings, only the page table structure has to be
+ * created - mappings will be set by set_fixmap():
+ */
+ vaddr = __fix_to_virt(__end_of_fixed_addresses - 1) & PMD_MASK;
+ end = (FIXADDR_TOP + PMD_SIZE - 1) & PMD_MASK;
+ page_table_range_init(vaddr, end, pgd_base);
+ permanent_kmaps_init(pgd_base);
+#endif
+
+#ifdef __tilegx__
+ /*
+ * Since GX allocates just one pmd_t array worth of vmalloc space,
+ * we go ahead and allocate it statically here, then share it
+ * globally. As a result we don't have to worry about any task
+ * changing init_mm once we get up and running, and there's no
+ * need for e.g. vmalloc_sync_all().
+ */
+ BUILD_BUG_ON(pgd_index(VMALLOC_START) != pgd_index(VMALLOC_END));
+ pud = pud_offset(pgd_base + pgd_index(VMALLOC_START), VMALLOC_START);
+ assign_pmd(pud, alloc_pmd());
+#endif
+}
+
+
+/*
+ * Walk the kernel page tables and derive the page_home() from
+ * the PTEs, so that set_pte() can properly validate the caching
+ * of all PTEs it sees.
+ */
+void __init set_page_homes(void)
+{
+}
+
+static void __init set_max_mapnr_init(void)
+{
+#ifdef CONFIG_FLATMEM
+ max_mapnr = max_low_pfn;
+#endif
+}
+
+void __init mem_init(void)
+{
+ int codesize, datasize, initsize;
+ int i;
+#ifndef __tilegx__
+ void *last;
+#endif
+
+#ifdef CONFIG_FLATMEM
+ if (!mem_map)
+ BUG();
+#endif
+
+#ifdef CONFIG_HIGHMEM
+ /* check that fixmap and pkmap do not overlap */
+ if (PKMAP_ADDR(LAST_PKMAP-1) >= FIXADDR_START) {
+ printk(KERN_ERR "fixmap and kmap areas overlap"
+ " - this will crash\n");
+ printk(KERN_ERR "pkstart: %lxh pkend: %lxh fixstart %lxh\n",
+ PKMAP_BASE, PKMAP_ADDR(LAST_PKMAP-1),
+ FIXADDR_START);
+ BUG();
+ }
+#endif
+
+ set_max_mapnr_init();
+
+ /* this will put all bootmem onto the freelists */
+ totalram_pages += free_all_bootmem();
+
+ /* count all remaining LOWMEM and give all HIGHMEM to page allocator */
+ set_non_bootmem_pages_init();
+
+ codesize = (unsigned long)&_etext - (unsigned long)&_text;
+ datasize = (unsigned long)&_end - (unsigned long)&_sdata;
+ initsize = (unsigned long)&_einittext - (unsigned long)&_sinittext;
+ initsize += (unsigned long)&_einitdata - (unsigned long)&_sinitdata;
+
+ printk(KERN_INFO "Memory: %luk/%luk available (%dk kernel code, %dk data, %dk init, %ldk highmem)\n",
+ (unsigned long) nr_free_pages() << (PAGE_SHIFT-10),
+ num_physpages << (PAGE_SHIFT-10),
+ codesize >> 10,
+ datasize >> 10,
+ initsize >> 10,
+ (unsigned long) (totalhigh_pages << (PAGE_SHIFT-10))
+ );
+
+ /*
+ * In debug mode, dump some interesting memory mappings.
+ */
+#ifdef CONFIG_HIGHMEM
+ printk(KERN_DEBUG " KMAP %#lx - %#lx\n",
+ FIXADDR_START, FIXADDR_TOP + PAGE_SIZE - 1);
+ printk(KERN_DEBUG " PKMAP %#lx - %#lx\n",
+ PKMAP_BASE, PKMAP_ADDR(LAST_PKMAP) - 1);
+#endif
+#ifdef CONFIG_HUGEVMAP
+ printk(KERN_DEBUG " HUGEMAP %#lx - %#lx\n",
+ HUGE_VMAP_BASE, HUGE_VMAP_END - 1);
+#endif
+ printk(KERN_DEBUG " VMALLOC %#lx - %#lx\n",
+ _VMALLOC_START, _VMALLOC_END - 1);
+#ifdef __tilegx__
+ for (i = MAX_NUMNODES-1; i >= 0; --i) {
+ struct pglist_data *node = &node_data[i];
+ if (node->node_present_pages) {
+ unsigned long start = (unsigned long)
+ pfn_to_kaddr(node->node_start_pfn);
+ unsigned long end = start +
+ (node->node_present_pages << PAGE_SHIFT);
+ printk(KERN_DEBUG " MEM%d %#lx - %#lx\n",
+ i, start, end - 1);
+ }
+ }
+#else
+ last = high_memory;
+ for (i = MAX_NUMNODES-1; i >= 0; --i) {
+ if ((unsigned long)vbase_map[i] != -1UL) {
+ printk(KERN_DEBUG " LOWMEM%d %#lx - %#lx\n",
+ i, (unsigned long) (vbase_map[i]),
+ (unsigned long) (last-1));
+ last = vbase_map[i];
+ }
+ }
+#endif
+
+#ifndef __tilegx__
+ /*
+ * Convert from using one lock for all atomic operations to
+ * one per cpu.
+ */
+ __init_atomic_per_cpu();
+#endif
+}
+
+/*
+ * this is for the non-NUMA, single node SMP system case.
+ * Specifically, in the case of x86, we will always add
+ * memory to the highmem for now.
+ */
+#ifndef CONFIG_NEED_MULTIPLE_NODES
+int arch_add_memory(u64 start, u64 size)
+{
+ struct pglist_data *pgdata = &contig_page_data;
+ struct zone *zone = pgdata->node_zones + MAX_NR_ZONES-1;
+ unsigned long start_pfn = start >> PAGE_SHIFT;
+ unsigned long nr_pages = size >> PAGE_SHIFT;
+
+ return __add_pages(zone, start_pfn, nr_pages);
+}
+
+int remove_memory(u64 start, u64 size)
+{
+ return -EINVAL;
+}
+#endif
+
+struct kmem_cache *pgd_cache;
+
+void __init pgtable_cache_init(void)
+{
+ pgd_cache = kmem_cache_create("pgd",
+ PTRS_PER_PGD*sizeof(pgd_t),
+ PTRS_PER_PGD*sizeof(pgd_t),
+ 0,
+ NULL);
+ if (!pgd_cache)
+ panic("pgtable_cache_init(): Cannot create pgd cache");
+}
+
+#if !CHIP_HAS_COHERENT_LOCAL_CACHE()
+/*
+ * The __w1data area holds data that is only written during initialization,
+ * and is read-only and thus freely cacheable thereafter. Fix the page
+ * table entries that cover that region accordingly.
+ */
+static void mark_w1data_ro(void)
+{
+ /* Loop over page table entries */
+ unsigned long addr = (unsigned long)__w1data_begin;
+ BUG_ON((addr & (PAGE_SIZE-1)) != 0);
+ for (; addr <= (unsigned long)__w1data_end - 1; addr += PAGE_SIZE) {
+ unsigned long pfn = kaddr_to_pfn((void *)addr);
+ struct page *page = pfn_to_page(pfn);
+ pte_t *ptep = virt_to_pte(NULL, addr);
+ BUG_ON(pte_huge(*ptep)); /* not relevant for kdata_huge */
+ set_pte_at(&init_mm, addr, ptep, pfn_pte(pfn, PAGE_KERNEL_RO));
+ }
+}
+#endif
+
+#ifdef CONFIG_DEBUG_PAGEALLOC
+static long __write_once initfree;
+#else
+static long __write_once initfree = 1;
+#endif
+
+/* Select whether to free (1) or mark unusable (0) the __init pages. */
+static int __init set_initfree(char *str)
+{
+ strict_strtol(str, 0, &initfree);
+ printk("initfree: %s free init pages\n", initfree ? "will" : "won't");
+ return 1;
+}
+__setup("initfree=", set_initfree);
+
+static void free_init_pages(char *what, unsigned long begin, unsigned long end)
+{
+ unsigned long addr = (unsigned long) begin;
+
+ if (kdata_huge && !initfree) {
+ printk("Warning: ignoring initfree=0:"
+ " incompatible with kdata=huge\n");
+ initfree = 1;
+ }
+ end = (end + PAGE_SIZE - 1) & PAGE_MASK;
+ local_flush_tlb_pages(NULL, begin, PAGE_SIZE, end - begin);
+ for (addr = begin; addr < end; addr += PAGE_SIZE) {
+ /*
+ * Note we just reset the home here directly in the
+ * page table. We know this is safe because our caller
+ * just flushed the caches on all the other cpus,
+ * and they won't be touching any of these pages.
+ */
+ int pfn = kaddr_to_pfn((void *)addr);
+ struct page *page = pfn_to_page(pfn);
+ pte_t *ptep = virt_to_pte(NULL, addr);
+ if (!initfree) {
+ /*
+ * If debugging page accesses then do not free
+ * this memory but mark them not present - any
+ * buggy init-section access will create a
+ * kernel page fault:
+ */
+ pte_clear(&init_mm, addr, ptep);
+ continue;
+ }
+ __ClearPageReserved(page);
+ init_page_count(page);
+ if (pte_huge(*ptep))
+ BUG_ON(!kdata_huge);
+ else
+ set_pte_at(&init_mm, addr, ptep,
+ pfn_pte(pfn, PAGE_KERNEL));
+ memset((void *)addr, POISON_FREE_INITMEM, PAGE_SIZE);
+ free_page(addr);
+ totalram_pages++;
+ }
+ printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
+}
+
+void free_initmem(void)
+{
+ const unsigned long text_delta = MEM_SV_INTRPT - PAGE_OFFSET;
+
+ /*
+ * Evict the dirty initdata on the boot cpu, evict the w1data
+ * wherever it's homed, and evict all the init code everywhere.
+ * We are guaranteed that no one will touch the init pages any
+ * more, and although other cpus may be touching the w1data,
+ * we only actually change the caching on tile64, which won't
+ * be keeping local copies in the other tiles' caches anyway.
+ */
+ homecache_evict(&cpu_cacheable_map);
+
+ /* Free the data pages that we won't use again after init. */
+ free_init_pages("unused kernel data",
+ (unsigned long)_sinitdata,
+ (unsigned long)_einitdata);
+
+ /*
+ * Free the pages mapped from 0xc0000000 that correspond to code
+ * pages from 0xfd000000 that we won't use again after init.
+ */
+ free_init_pages("unused kernel text",
+ (unsigned long)_sinittext - text_delta,
+ (unsigned long)_einittext - text_delta);
+
+#if !CHIP_HAS_COHERENT_LOCAL_CACHE()
+ /*
+ * Upgrade the .w1data section to globally cached.
+ * We don't do this on tilepro, since the cache architecture
+ * pretty much makes it irrelevant, and in any case we end
+ * up having racing issues with other tiles that may touch
+ * the data after we flush the cache but before we update
+ * the PTEs and flush the TLBs, causing sharer shootdowns
+ * later. Even though this is to clean data, it seems like
+ * an unnecessary complication.
+ */
+ mark_w1data_ro();
+#endif
+
+ /* Do a global TLB flush so everyone sees the changes. */
+ flush_tlb_all();
+}
diff --git a/arch/tile/mm/migrate.h b/arch/tile/mm/migrate.h
new file mode 100644
index 0000000..cd45a08
--- /dev/null
+++ b/arch/tile/mm/migrate.h
@@ -0,0 +1,50 @@

+ * Structure definitions for migration, exposed here for use by
+ * arch/tile/kernel/asm-offsets.c.
+ */
+
+#ifndef MM_MIGRATE_H
+#define MM_MIGRATE_H
+
+#include <linux/cpumask.h>
+#include <hv/hypervisor.h>
+
+/*
+ * This function is used as a helper when setting up the initial
+ * page table (swapper_pg_dir).
+ */
+extern int flush_and_install_context(HV_PhysAddr page_table, HV_PTE access,
+ HV_ASID asid,
+ const unsigned long *cpumask);
+
+/*
+ * This function supports migration as a "helper" as follows:
+ *
+ * - Set the stack PTE itself to "migrating".
+ * - Do a global TLB flush for (va,length) and the specified ASIDs.
+ * - Do a cache-evict on all necessary cpus.
+ * - Write the new stack PTE.
+ *
+ * Note that any non-NULL pointers must not point to the page that
+ * is handled by the stack_pte itself.
+ */
+extern int homecache_migrate_stack_and_flush(pte_t stack_pte, unsigned long va,
+ size_t length, pte_t *stack_ptep,
+ const struct cpumask *cache_cpumask,
+ const struct cpumask *tlb_cpumask,
+ HV_Remote_ASID *asids,
+ int asidcount);
+
+#endif /* MM_MIGRATE_H */
diff --git a/arch/tile/mm/migrate_32.S b/arch/tile/mm/migrate_32.S
new file mode 100644
index 0000000..f738765
--- /dev/null
+++ b/arch/tile/mm/migrate_32.S
@@ -0,0 +1,211 @@

+ * This routine is a helper for migrating the home of a set of pages to
+ * a new cpu. See the documentation in homecache.c for more information.

+ */
+
+#include <linux/linkage.h>

+#include <linux/threads.h>
+#include <asm/page.h>
+#include <asm/types.h>
+#include <asm/asm-offsets.h>
+#include <hv/hypervisor.h>
+
+ .text
+
+/*
+ * First, some definitions that apply to all the code in the file.
+ */
+
+/* Locals (caller-save) */
+#define r_tmp r10
+#define r_save_sp r11
+
+/* What we save where in the stack frame; must include all callee-saves. */
+#define FRAME_SP 4
+#define FRAME_R30 8
+#define FRAME_R31 12
+#define FRAME_R32 16
+#define FRAME_R33 20
+#define FRAME_R34 24
+#define FRAME_R35 28
+#define FRAME_SIZE 32
+
+
+
+
+/*
+ * On entry:
+ *
+ * r0 low word of the new context PA to install (moved to r_context_lo)
+ * r1 high word of the new context PA to install (moved to r_context_hi)
+ * r2 low word of PTE to use for context access (moved to r_access_lo)
+ * r3 high word of PTE to use for context access (moved to r_access_lo)
+ * r4 ASID to use for new context (moved to r_asid)
+ * r5 pointer to cpumask with just this cpu set in it (r_my_cpumask)
+ */
+
+/* Arguments (caller-save) */
+#define r_context_lo_in r0
+#define r_context_hi_in r1
+#define r_access_lo_in r2
+#define r_access_hi_in r3
+#define r_asid_in r4
+#define r_my_cpumask r5
+
+/* Locals (callee-save); must not be more than FRAME_xxx above. */
+#define r_save_ics r30
+#define r_context_lo r31
+#define r_context_hi r32
+#define r_access_lo r33
+#define r_access_hi r34
+#define r_asid r35
+
+STD_ENTRY(flush_and_install_context)
+ /*
+ * Create a stack frame; we can't touch it once we flush the
+ * cache until we install the new page table and flush the TLB.
+ */
+ {
+ move r_save_sp, sp
+ sw sp, lr
+ addi sp, sp, -FRAME_SIZE
+ }
+ addi r_tmp, sp, FRAME_SP
+ {
+ sw r_tmp, r_save_sp
+ addi r_tmp, sp, FRAME_R30
+ }
+ {
+ sw r_tmp, r30
+ addi r_tmp, sp, FRAME_R31
+ }
+ {
+ sw r_tmp, r31
+ addi r_tmp, sp, FRAME_R32
+ }
+ {
+ sw r_tmp, r32
+ addi r_tmp, sp, FRAME_R33
+ }
+ {
+ sw r_tmp, r33
+ addi r_tmp, sp, FRAME_R34
+ }
+ {
+ sw r_tmp, r34
+ addi r_tmp, sp, FRAME_R35
+ }
+ sw r_tmp, r35
+
+ /* Move some arguments to callee-save registers. */
+ {
+ move r_context_lo, r_context_lo_in
+ move r_context_hi, r_context_hi_in
+ }
+ {
+ move r_access_lo, r_access_lo_in
+ move r_access_hi, r_access_hi_in
+ }
+ move r_asid, r_asid_in
+
+ /* Disable interrupts, since we can't use our stack. */
+ {
+ mfspr r_save_ics, INTERRUPT_CRITICAL_SECTION
+ movei r_tmp, 1
+ }
+ mtspr INTERRUPT_CRITICAL_SECTION, r_tmp
+
+ /* First, flush our L2 cache. */
+ {
+ move r0, zero /* cache_pa */

+ move r1, zero
+ }

+ {
+ auli r2, zero, ha16(HV_FLUSH_EVICT_L2) /* cache_control */
+ move r3, r_my_cpumask /* cache_cpumask */
+ }
+ {
+ move r4, zero /* tlb_va */
+ move r5, zero /* tlb_length */
+ }
+ {
+ move r6, zero /* tlb_pgsize */
+ move r7, zero /* tlb_cpumask */
+ }
+ {
+ move r8, zero /* asids */
+ move r9, zero /* asidcount */
+ }
+ jal hv_flush_remote
+ bnz r0, .Ldone
+
+ /* Now install the new page table. */
+ {
+ move r0, r_context_lo
+ move r1, r_context_hi
+ }
+ {
+ move r2, r_access_lo
+ move r3, r_access_hi
+ }
+ {
+ move r4, r_asid
+ movei r5, HV_CTX_DIRECTIO
+ }
+ jal hv_install_context
+ bnz r0, .Ldone
+
+ /* Finally, flush the TLB. */
+ {
+ movei r0, 0 /* preserve_global */
+ jal hv_flush_all
+ }
+
+.Ldone:
+ /* Reset interrupts back how they were before. */
+ mtspr INTERRUPT_CRITICAL_SECTION, r_save_ics
+
+ /* Restore the callee-saved registers and return. */
+ addli lr, sp, FRAME_SIZE
+ {
+ lw lr, lr
+ addli r_tmp, sp, FRAME_R30
+ }
+ {
+ lw r30, r_tmp
+ addli r_tmp, sp, FRAME_R31
+ }
+ {
+ lw r31, r_tmp
+ addli r_tmp, sp, FRAME_R32
+ }
+ {
+ lw r32, r_tmp
+ addli r_tmp, sp, FRAME_R33
+ }
+ {
+ lw r33, r_tmp
+ addli r_tmp, sp, FRAME_R34
+ }
+ {
+ lw r34, r_tmp
+ addli r_tmp, sp, FRAME_R35
+ }
+ {
+ lw r35, r_tmp
+ addi sp, sp, FRAME_SIZE
+ }
+ jrp lr
+ STD_ENDPROC(flush_and_install_context)
diff --git a/arch/tile/mm/mmap.c b/arch/tile/mm/mmap.c
new file mode 100644
index 0000000..f96f4ce
--- /dev/null
+++ b/arch/tile/mm/mmap.c
@@ -0,0 +1,75 @@

+ * Taken from the i386 architecture and simplified.
+ */
+
+#include <linux/mm.h>
+#include <linux/random.h>
+#include <linux/limits.h>
+#include <linux/sched.h>
+#include <linux/mman.h>
+#include <linux/compat.h>
+
+/*
+ * Top of mmap area (just below the process stack).
+ *
+ * Leave an at least ~128 MB hole.
+ */
+#define MIN_GAP (128*1024*1024)
+#define MAX_GAP (TASK_SIZE/6*5)
+
+static inline unsigned long mmap_base(struct mm_struct *mm)
+{
+ unsigned long gap = rlimit(RLIMIT_STACK);
+ unsigned long random_factor = 0;
+
+ if (current->flags & PF_RANDOMIZE)
+ random_factor = get_random_int() % (1024*1024);
+
+ if (gap < MIN_GAP)
+ gap = MIN_GAP;
+ else if (gap > MAX_GAP)
+ gap = MAX_GAP;
+
+ return PAGE_ALIGN(TASK_SIZE - gap - random_factor);
+}
+
+/*
+ * This function, called very early during the creation of a new
+ * process VM image, sets up which VM layout function to use:
+ */
+void arch_pick_mmap_layout(struct mm_struct *mm)
+{
+#if !defined(__tilegx__)
+ int is_32bit = 1;
+#elif defined(CONFIG_COMPAT)
+ int is_32bit = is_compat_task();
+#else
+ int is_32bit = 0;
+#endif
+
+ /*
+ * Use standard layout if the expected stack growth is unlimited
+ * or we are running native 64 bits.
+ */
+ if (!is_32bit || rlimit(RLIMIT_STACK) == RLIM_INFINITY) {
+ mm->mmap_base = TASK_UNMAPPED_BASE;
+ mm->get_unmapped_area = arch_get_unmapped_area;
+ mm->unmap_area = arch_unmap_area;
+ } else {
+ mm->mmap_base = mmap_base(mm);
+ mm->get_unmapped_area = arch_get_unmapped_area_topdown;
+ mm->unmap_area = arch_unmap_area_topdown;
+ }
+}
diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
new file mode 100644
index 0000000..289e729
--- /dev/null
+++ b/arch/tile/mm/pgtable.c
@@ -0,0 +1,566 @@

+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/mm.h>
+#include <linux/swap.h>
+#include <linux/smp.h>
+#include <linux/highmem.h>
+#include <linux/slab.h>
+#include <linux/pagemap.h>
+#include <linux/spinlock.h>
+#include <linux/cpumask.h>
+#include <linux/module.h>
+#include <linux/io.h>
+#include <linux/vmalloc.h>
+#include <linux/smp.h>
+
+#include <asm/system.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/fixmap.h>
+#include <asm/tlb.h>
+#include <asm/tlbflush.h>
+#include <asm/homecache.h>
+
+#define K(x) ((x) << (PAGE_SHIFT-10))
+
+/*
+ * The normal show_free_areas() is too verbose on Tile, with dozens
+ * of processors and often four NUMA zones each with high and lowmem.
+ */
+void show_mem(void)
+{
+ struct zone *zone;
+
+ printk("Active:%lu inactive:%lu dirty:%lu writeback:%lu unstable:%lu"
+ " free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu"
+ " pagecache:%lu swap:%lu\n",
+ (global_page_state(NR_ACTIVE_ANON) +
+ global_page_state(NR_ACTIVE_FILE)),
+ (global_page_state(NR_INACTIVE_ANON) +
+ global_page_state(NR_INACTIVE_FILE)),
+ global_page_state(NR_FILE_DIRTY),
+ global_page_state(NR_WRITEBACK),
+ global_page_state(NR_UNSTABLE_NFS),
+ global_page_state(NR_FREE_PAGES),
+ (global_page_state(NR_SLAB_RECLAIMABLE) +
+ global_page_state(NR_SLAB_UNRECLAIMABLE)),
+ global_page_state(NR_FILE_MAPPED),
+ global_page_state(NR_PAGETABLE),
+ global_page_state(NR_BOUNCE),
+ global_page_state(NR_FILE_PAGES),
+ nr_swap_pages);
+
+ for_each_zone(zone) {
+ unsigned long flags, order, total = 0, largest_order = -1;
+
+ if (!populated_zone(zone))
+ continue;
+
+ printk("Node %d %7s: ", zone_to_nid(zone), zone->name);
+ spin_lock_irqsave(&zone->lock, flags);
+ for (order = 0; order < MAX_ORDER; order++) {
+ int nr = zone->free_area[order].nr_free;
+ total += nr << order;
+ if (nr)
+ largest_order = order;
+ }
+ spin_unlock_irqrestore(&zone->lock, flags);
+ printk("%lukB (largest %luKb)\n",
+ K(total), largest_order ? K(1UL) << largest_order : 0);
+ }
+}
+
+/*
+ * Associate a virtual page frame with a given physical page frame
+ * and protection flags for that frame.
+ */
+static void set_pte_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ pgd = swapper_pg_dir + pgd_index(vaddr);
+ if (pgd_none(*pgd)) {
+ BUG();
+ return;
+ }
+ pud = pud_offset(pgd, vaddr);
+ if (pud_none(*pud)) {
+ BUG();
+ return;
+ }
+ pmd = pmd_offset(pud, vaddr);
+ if (pmd_none(*pmd)) {
+ BUG();
+ return;
+ }
+ pte = pte_offset_kernel(pmd, vaddr);
+ /* <pfn,flags> stored as-is, to permit clearing entries */
+ set_pte(pte, pfn_pte(pfn, flags));
+
+ /*
+ * It's enough to flush this one mapping.
+ * This appears conservative since it is only called
+ * from __set_fixmap.
+ */
+ local_flush_tlb_page(NULL, vaddr, PAGE_SIZE);
+}
+
+/*
+ * Associate a huge virtual page frame with a given physical page frame
+ * and protection flags for that frame. pfn is for the base of the page,
+ * vaddr is what the page gets mapped to - both must be properly aligned.
+ * The pmd must already be instantiated.
+ */
+void set_pmd_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+
+ if (vaddr & (PMD_SIZE-1)) { /* vaddr is misaligned */
+ printk(KERN_WARNING "set_pmd_pfn: vaddr misaligned\n");
+ return; /* BUG(); */
+ }
+ if (pfn & (PTRS_PER_PTE-1)) { /* pfn is misaligned */
+ printk(KERN_WARNING "set_pmd_pfn: pfn misaligned\n");
+ return; /* BUG(); */
+ }
+ pgd = swapper_pg_dir + pgd_index(vaddr);
+ if (pgd_none(*pgd)) {
+ printk(KERN_WARNING "set_pmd_pfn: pgd_none\n");
+ return; /* BUG(); */
+ }
+ pud = pud_offset(pgd, vaddr);
+ pmd = pmd_offset(pud, vaddr);
+ set_pmd(pmd, ptfn_pmd(HV_PFN_TO_PTFN(pfn), flags));
+ /*
+ * It's enough to flush this one mapping.
+ * We flush both small and huge TSBs to be sure.
+ */
+ local_flush_tlb_page(NULL, vaddr, HPAGE_SIZE);
+ local_flush_tlb_pages(NULL, vaddr, PAGE_SIZE, HPAGE_SIZE);
+}
+
+void __set_fixmap(enum fixed_addresses idx, unsigned long phys, pgprot_t flags)
+{
+ unsigned long address = __fix_to_virt(idx);
+
+ if (idx >= __end_of_fixed_addresses) {
+ BUG();
+ return;
+ }
+ set_pte_pfn(address, phys >> PAGE_SHIFT, flags);
+}
+
+#if defined(CONFIG_HIGHPTE)
+pte_t *_pte_offset_map(pmd_t *dir, unsigned long address, enum km_type type)
+{
+ pte_t *pte = kmap_atomic(pmd_page(*dir), type) +
+ (pmd_ptfn(*dir) << HV_LOG2_PAGE_TABLE_ALIGN) & ~PAGE_MASK;
+ return &pte[pte_index(address)];
+}
+#endif
+
+/*
+ * List of all pgd's needed so it can invalidate entries in both cached
+ * and uncached pgd's. This is essentially codepath-based locking
+ * against pageattr.c; it is the unique case in which a valid change
+ * of kernel pagetables can't be lazily synchronized by vmalloc faults.
+ * vmalloc faults work because attached pagetables are never freed.
+ * The locking scheme was chosen on the basis of manfred's
+ * recommendations and having no core impact whatsoever.
+ * -- wli
+ */
+DEFINE_SPINLOCK(pgd_lock);
+LIST_HEAD(pgd_list);
+
+static inline void pgd_list_add(pgd_t *pgd)
+{
+ list_add(pgd_to_list(pgd), &pgd_list);
+}
+
+static inline void pgd_list_del(pgd_t *pgd)
+{
+ list_del(pgd_to_list(pgd));
+}
+
+#define KERNEL_PGD_INDEX_START pgd_index(PAGE_OFFSET)
+#define KERNEL_PGD_PTRS (PTRS_PER_PGD - KERNEL_PGD_INDEX_START)
+
+static void pgd_ctor(pgd_t *pgd)
+{
+ unsigned long flags;
+
+ memset(pgd, 0, KERNEL_PGD_INDEX_START*sizeof(pgd_t));
+ spin_lock_irqsave(&pgd_lock, flags);
+
+#ifndef __tilegx__
+ /*
+ * Check that the user interrupt vector has no L2.
+ * It never should for the swapper, and new page tables
+ * should always start with an empty user interrupt vector.
+ */
+ BUG_ON(((u64 *)swapper_pg_dir)[pgd_index(MEM_USER_INTRPT)] != 0);
+#endif
+
+ clone_pgd_range(pgd + KERNEL_PGD_INDEX_START,
+ swapper_pg_dir + KERNEL_PGD_INDEX_START,
+ KERNEL_PGD_PTRS);
+
+ pgd_list_add(pgd);
+ spin_unlock_irqrestore(&pgd_lock, flags);
+}
+
+static void pgd_dtor(pgd_t *pgd)
+{
+ unsigned long flags; /* can be called from interrupt context */
+
+ spin_lock_irqsave(&pgd_lock, flags);
+ pgd_list_del(pgd);
+ spin_unlock_irqrestore(&pgd_lock, flags);
+}
+
+pgd_t *pgd_alloc(struct mm_struct *mm)
+{
+ pgd_t *pgd = kmem_cache_alloc(pgd_cache, GFP_KERNEL);
+ if (pgd)
+ pgd_ctor(pgd);
+ return pgd;
+}
+
+void pgd_free(struct mm_struct *mm, pgd_t *pgd)
+{
+ pgd_dtor(pgd);
+ kmem_cache_free(pgd_cache, pgd);
+}
+
+
+#define L2_USER_PGTABLE_PAGES (1 << L2_USER_PGTABLE_ORDER)
+
+struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
+{
+ int flags = GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO|__GFP_COMP;
+ struct page *p;
+
+#ifdef CONFIG_HIGHPTE
+ flags |= __GFP_HIGHMEM;
+#endif
+
+ p = alloc_pages(flags, L2_USER_PGTABLE_ORDER);
+ if (p == NULL)
+ return NULL;
+
+ pgtable_page_ctor(p);
+ return p;
+}
+
+/*
+ * Free page immediately (used in __pte_alloc if we raced with another
+ * process). We have to correct whatever pte_alloc_one() did before
+ * returning the pages to the allocator.
+ */
+void pte_free(struct mm_struct *mm, struct page *p)
+{
+ pgtable_page_dtor(p);
+ __free_pages(p, L2_USER_PGTABLE_ORDER);
+}
+
+void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte,
+ unsigned long address)
+{
+ int i;
+
+ pgtable_page_dtor(pte);
+ tlb->need_flush = 1;
+ if (tlb_fast_mode(tlb)) {
+ struct page *pte_pages[L2_USER_PGTABLE_PAGES];
+ for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i)
+ pte_pages[i] = pte + i;
+ free_pages_and_swap_cache(pte_pages, L2_USER_PGTABLE_PAGES);
+ return;
+ }
+ for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i) {
+ tlb->pages[tlb->nr++] = pte + i;
+ if (tlb->nr >= FREE_PTE_NR)
+ tlb_flush_mmu(tlb, 0, 0);
+ }
+}
+
+#ifndef __tilegx__
+
+/*
+ * FIXME: needs to be atomic vs hypervisor writes. For now we make the
+ * window of vulnerability a bit smaller by doing an unlocked 8-bit update.
+ */
+int ptep_test_and_clear_young(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+#if HV_PTE_INDEX_ACCESSED < 8 || HV_PTE_INDEX_ACCESSED >= 16
+# error Code assumes HV_PTE "accessed" bit in second byte
+#endif
+ u8 *tmp = (u8 *)ptep;
+ u8 second_byte = tmp[1];
+ if (!(second_byte & (1 << (HV_PTE_INDEX_ACCESSED - 8))))
+ return 0;
+ tmp[1] = second_byte & ~(1 << (HV_PTE_INDEX_ACCESSED - 8));

+ return 1;
+}
+

+/*
+ * This implementation is atomic vs hypervisor writes, since the hypervisor
+ * always writes the low word (where "accessed" and "dirty" are) and this
+ * routine only writes the high word.
+ */
+void ptep_set_wrprotect(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep)
+{
+#if HV_PTE_INDEX_WRITABLE < 32
+# error Code assumes HV_PTE "writable" bit in high word
+#endif
+ u32 *tmp = (u32 *)ptep;
+ tmp[1] = tmp[1] & ~(1 << (HV_PTE_INDEX_WRITABLE - 32));
+}
+
+#endif
+
+pte_t *virt_to_pte(struct mm_struct* mm, unsigned long addr)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+
+ if (pgd_addr_invalid(addr))
+ return NULL;
+
+ pgd = mm ? pgd_offset(mm, addr) : swapper_pg_dir + pgd_index(addr);
+ pud = pud_offset(pgd, addr);
+ if (!pud_present(*pud))
+ return NULL;
+ pmd = pmd_offset(pud, addr);
+ if (pmd_huge_page(*pmd))
+ return (pte_t *)pmd;
+ if (!pmd_present(*pmd))
+ return NULL;
+ return pte_offset_kernel(pmd, addr);
+}
+
+pgprot_t set_remote_cache_cpu(pgprot_t prot, int cpu)
+{
+ unsigned int width = smp_width;
+ int x = cpu % width;
+ int y = cpu / width;
+ BUG_ON(y >= smp_height);
+ BUG_ON(hv_pte_get_mode(prot) != HV_PTE_MODE_CACHE_TILE_L3);
+ BUG_ON(cpu < 0 || cpu >= NR_CPUS);
+ BUG_ON(!cpu_is_valid_lotar(cpu));
+ return hv_pte_set_lotar(prot, HV_XY_TO_LOTAR(x, y));
+}
+
+int get_remote_cache_cpu(pgprot_t prot)
+{
+ HV_LOTAR lotar = hv_pte_get_lotar(prot);
+ int x = HV_LOTAR_X(lotar);
+ int y = HV_LOTAR_Y(lotar);
+ BUG_ON(hv_pte_get_mode(prot) != HV_PTE_MODE_CACHE_TILE_L3);
+ return x + y * smp_width;
+}
+
+void set_pte_order(pte_t *ptep, pte_t pte, int order)
+{
+ unsigned long pfn = pte_pfn(pte);
+ struct page *page = pfn_to_page(pfn);
+
+ /* Update the home of a PTE if necessary */
+ pte = pte_set_home(pte, page_home(page));
+
+#ifdef __tilegx__
+ *ptep = pte;
+#else
+ /*
+ * When setting a PTE, write the high bits first, then write
+ * the low bits. This sets the "present" bit only after the
+ * other bits are in place. If a particular PTE update
+ * involves transitioning from one valid PTE to another, it
+ * may be necessary to call set_pte_order() more than once,
+ * transitioning via a suitable intermediate state.
+ * Note that this sequence also means that if we are transitioning
+ * from any migrating PTE to a non-migrating one, we will not
+ * see a half-updated PTE with the migrating bit off.
+ */
+#if HV_PTE_INDEX_PRESENT >= 32 || HV_PTE_INDEX_MIGRATING >= 32
+# error Must write the present and migrating bits last
+#endif
+ ((u32 *)ptep)[1] = (u32)(pte_val(pte) >> 32);
+ barrier();
+ ((u32 *)ptep)[0] = (u32)(pte_val(pte));
+#endif
+}
+
+/* Can this mm load a PTE with cached_priority set? */
+static inline int mm_is_priority_cached(struct mm_struct *mm)
+{
+ return mm->context.priority_cached;
+}
+
+/*
+ * Add a priority mapping to an mm_context and
+ * notify the hypervisor if this is the first one.
+ */
+void start_mm_caching(struct mm_struct *mm)
+{
+ if (!mm_is_priority_cached(mm)) {
+ mm->context.priority_cached = -1U;
+ hv_set_caching(-1U);
+ }
+}
+
+/*
+ * Validate and return the priority_cached flag. We know if it's zero
+ * that we don't need to scan, since we immediately set it non-zero
+ * when we first consider a MAP_CACHE_PRIORITY mapping.
+ *
+ * We only _try_ to acquire the mmap_sem semaphore; if we can't acquire it,
+ * since we're in an interrupt context (servicing switch_mm) we don't
+ * worry about it and don't unset the "priority_cached" field.
+ * Presumably we'll come back later and have more luck and clear
+ * the value then; for now we'll just keep the cache marked for priority.
+ */
+static unsigned int update_priority_cached(struct mm_struct *mm)
+{
+ if (mm->context.priority_cached && down_write_trylock(&mm->mmap_sem)) {
+ struct vm_area_struct *vm;
+ for (vm = mm->mmap; vm; vm = vm->vm_next) {
+ if (hv_pte_get_cached_priority(vm->vm_page_prot))
+ break;
+ }
+ if (vm == NULL)
+ mm->context.priority_cached = 0;
+ up_write(&mm->mmap_sem);
+ }
+ return mm->context.priority_cached;
+}
+
+/* Set caching correctly for an mm that we are switching to. */
+void check_mm_caching(struct mm_struct *prev, struct mm_struct *next)
+{
+ if (!mm_is_priority_cached(next)) {
+ /*
+ * If the new mm doesn't use priority caching, just see if we
+ * need the hv_set_caching(), or can assume it's already zero.
+ */
+ if (mm_is_priority_cached(prev))
+ hv_set_caching(0);
+ } else {
+ hv_set_caching(update_priority_cached(next));
+ }
+}
+
+#if CHIP_HAS_MMIO()
+
+/* Map an arbitrary MMIO address, homed according to pgprot, into VA space. */
+void __iomem *ioremap_prot(resource_size_t phys_addr, unsigned long size,
+ pgprot_t home)
+{
+ void *addr;
+ struct vm_struct *area;
+ unsigned long offset, last_addr;
+ pgprot_t pgprot;
+
+ /* Don't allow wraparound or zero size */
+ last_addr = phys_addr + size - 1;
+ if (!size || last_addr < phys_addr)
+ return NULL;
+
+ /* Create a read/write, MMIO VA mapping homed at the requested shim. */
+ pgprot = PAGE_KERNEL;
+ pgprot = hv_pte_set_mode(pgprot, HV_PTE_MODE_MMIO);
+ pgprot = hv_pte_set_lotar(pgprot, hv_pte_get_lotar(home));
+
+ /*
+ * Mappings have to be page-aligned
+ */
+ offset = phys_addr & ~PAGE_MASK;
+ phys_addr &= PAGE_MASK;
+ size = PAGE_ALIGN(last_addr+1) - phys_addr;
+
+ /*
+ * Ok, go for it..
+ */
+ area = get_vm_area(size, VM_IOREMAP /* | other flags? */);
+ if (!area)
+ return NULL;
+ area->phys_addr = phys_addr;
+ addr = area->addr;
+ if (ioremap_page_range((unsigned long)addr, (unsigned long)addr + size,
+ phys_addr, pgprot)) {
+ remove_vm_area((void *)(PAGE_MASK & (unsigned long) addr));
+ return NULL;
+ }
+ return (__force void __iomem *) (offset + (char *)addr);
+}
+EXPORT_SYMBOL(ioremap_prot);
+
+/* Map a PCI MMIO bus address into VA space. */
+void __iomem *ioremap(resource_size_t phys_addr, unsigned long size)
+{
+ panic("ioremap for PCI MMIO is not supported");
+}
+EXPORT_SYMBOL(ioremap);
+
+/* Unmap an MMIO VA mapping. */
+void iounmap(volatile void __iomem *addr_in)
+{
+ volatile void __iomem *addr = (volatile void __iomem *)
+ (PAGE_MASK & (unsigned long __force)addr_in);
+#if 1
+ vunmap((void * __force)addr);
+#else
+ /* x86 uses this complicated flow instead of vunmap(). Is
+ * there any particular reason we should do the same? */
+ struct vm_struct *p, *o;
+
+ /* Use the vm area unlocked, assuming the caller
+ ensures there isn't another iounmap for the same address
+ in parallel. Reuse of the virtual address is prevented by
+ leaving it in the global lists until we're done with it.
+ cpa takes care of the direct mappings. */
+ read_lock(&vmlist_lock);
+ for (p = vmlist; p; p = p->next) {
+ if (p->addr == addr)
+ break;
+ }
+ read_unlock(&vmlist_lock);
+
+ if (!p) {
+ printk("iounmap: bad address %p\n", addr);
+ dump_stack();
+ return;
+ }
+
+ /* Finally remove it */
+ o = remove_vm_area((void *)addr);
+ BUG_ON(p != o || o == NULL);
+ kfree(p);
+#endif
+}
+EXPORT_SYMBOL(iounmap);
+
+#endif /* CHIP_HAS_MMIO() */

Chris Metcalf

unread,

May 28, 2010, 11:40:03 PM5/28/10

This change updates MAINTAINERS and provides the top-level
configuration and build files, and the default configuration.

Signed-off-by: Chris Metcalf <cmet...@tilera.com>
---

MAINTAINERS | 6 +
arch/tile/Kbuild | 3 +
arch/tile/Kconfig | 360 +++++++++++
arch/tile/Kconfig.debug | 43 ++
arch/tile/Makefile | 52 ++
arch/tile/configs/tile_defconfig | 1289 ++++++++++++++++++++++++++++++++++++++
6 files changed, 1753 insertions(+), 0 deletions(-)
create mode 100644 arch/tile/Kbuild
create mode 100644 arch/tile/Kconfig
create mode 100644 arch/tile/Kconfig.debug
create mode 100644 arch/tile/Makefile
create mode 100644 arch/tile/configs/tile_defconfig

diff --git a/MAINTAINERS b/MAINTAINERS
index 8e01948..438bfe0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5521,6 +5521,12 @@ F: include/linux/tipc*.h
F: include/net/tipc/
F: net/tipc/

+TILE ARCHITECTURE
+M: Chris Metcalf <cmet...@tilera.com>
+W: http://www.tilera.com/scm/
+S: Supported
+F: arch/tile/
+
TLAN NETWORK DRIVER
M: Samuel Chessman <ches...@tux.org>
L: tlan-...@lists.sourceforge.net (subscribers-only)
diff --git a/arch/tile/Kbuild b/arch/tile/Kbuild
new file mode 100644
index 0000000..a9b9227
--- /dev/null
+++ b/arch/tile/Kbuild
@@ -0,0 +1,3 @@
+
+obj-y += kernel/
+obj-y += mm/
diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
new file mode 100644
index 0000000..b311484
--- /dev/null
+++ b/arch/tile/Kconfig
@@ -0,0 +1,360 @@
+# For a description of the syntax of this configuration file,
+# see Documentation/kbuild/config-language.txt.
+
+config MMU
+ def_bool y
+
+config GENERIC_CSUM
+ def_bool y
+
+config GENERIC_HARDIRQS
+ def_bool y
+
+config GENERIC_HARDIRQS_NO__DO_IRQ
+ def_bool y
+
+config GENERIC_IRQ_PROBE
+ def_bool y
+
+config GENERIC_PENDING_IRQ
+ def_bool y
+ depends on GENERIC_HARDIRQS && SMP
+
+config ZONE_DMA
+ def_bool y
+
+config SEMAPHORE_SLEEPERS
+ def_bool y
+
+config CC_OPTIMIZE_FOR_SIZE
+ def_bool y
+
+config HAVE_ARCH_ALLOC_REMAP
+ def_bool y
+
+config HAVE_SETUP_PER_CPU_AREA
+ def_bool y
+
+config NEED_PER_CPU_PAGE_FIRST_CHUNK
+ def_bool y
+
+config SYS_SUPPORTS_HUGETLBFS
+ def_bool y
+
+config GENERIC_TIME
+ def_bool y
+
+config GENERIC_CLOCKEVENTS
+ def_bool y
+
+config CLOCKSOURCE_WATCHDOG
+ def_bool y
+
+# FIXME: tilegx can implement a more efficent rwsem.
+config RWSEM_GENERIC_SPINLOCK
+ def_bool y
+
+# We have a very flat architecture from a migration point of view,
+# so save boot time by presetting this (particularly useful on tile-sim).
+config DEFAULT_MIGRATION_COST
+ int
+ default "10000000"
+
+# We only support gcc 4.4 and above, so this should work.
+config ARCH_SUPPORTS_OPTIMIZED_INLINING
+ def_bool y
+
+config ARCH_PHYS_ADDR_T_64BIT
+ def_bool y
+
+config LOCKDEP_SUPPORT
+ def_bool y
+
+config STACKTRACE_SUPPORT
+ def_bool y
+ select STACKTRACE
+
+config ARCH_DISCONTIGMEM_ENABLE
+ def_bool y
+
+config ARCH_DISCONTIGMEM_DEFAULT
+ def_bool y
+
+config TRACE_IRQFLAGS_SUPPORT
+ def_bool y
+
+config STRICT_DEVMEM
+ def_bool y
+
+# SMP is required for Tilera Linux.
+config SMP
+ def_bool y
+
+# Allow checking for compile-time determined overflow errors in
+# copy_from_user(). There are still unprovable places in the
+# generic code as of 2.6.34, so this option is not really compatible
+# with -Werror, which is more useful in general.
+config DEBUG_COPY_FROM_USER
+ def_bool n
+
+config SERIAL_CONSOLE
+ def_bool y
+
+config HVC_TILE
+ select HVC_DRIVER
+ def_bool y
+
+config TILE
+ def_bool y
+ select GENERIC_FIND_FIRST_BIT
+ select GENERIC_FIND_NEXT_BIT
+ select RESOURCES_64BIT
+ select USE_GENERIC_SMP_HELPERS
+
+# FIXME: investigate whether we need/want these options.
+# select HAVE_IOREMAP_PROT
+# select HAVE_OPTPROBES
+# select HAVE_REGS_AND_STACK_ACCESS_API
+# select HAVE_HW_BREAKPOINT
+# select PERF_EVENTS
+# select HAVE_USER_RETURN_NOTIFIER
+# config NO_BOOTMEM
+# config ARCH_SUPPORTS_DEBUG_PAGEALLOC
+# config HUGETLB_PAGE_SIZE_VARIABLE
+
+
+mainmenu "Linux/TILE Kernel Configuration"
+
+# Please note: TILE-Gx support is not yet finalized; this is
+# the preliminary support. TILE-Gx drivers are only provided
+# with the alpha or beta test versions for Tilera customers.
+config TILEGX
+ depends on EXPERIMENTAL
+ bool "Building with TILE-Gx (64-bit) compiler and toolchain"
+
+config 64BIT
+ depends on TILEGX
+ def_bool y
+
+config ARCH_DEFCONFIG
+ string
+ default "arch/tile/configs/tile_defconfig" if !TILEGX
+ default "arch/tile/configs/tilegx_defconfig" if TILEGX
+
+source "init/Kconfig"
+
+menu "Tilera-specific configuration"
+
+config NR_CPUS
+ int "Maximum number of tiles (2-255)"
+ range 2 255
+ depends on SMP
+ default "64"
+ ---help---
+ Building with 64 is the recommended value, but a slightly
+ smaller kernel memory footprint results from using a smaller
+ value on chips with fewer tiles.
+
+source "kernel/time/Kconfig"
+
+source "kernel/Kconfig.hz"
+
+config KEXEC
+ bool "kexec system call"
+ ---help---
+ kexec is a system call that implements the ability to shutdown your
+ current kernel, and to start another kernel. It is like a reboot
+ but it is independent of the system firmware. It is used
+ to implement the "mboot" Tilera booter.
+
+ The name comes from the similarity to the exec system call.
+
+config COMPAT
+ bool "Support 32-bit TILE-Gx binaries in addition to 64-bit"
+ depends on TILEGX
+ select COMPAT_BINFMT_ELF
+ default y
+ ---help---
+ If enabled, the kernel will support running TILE-Gx binaries
+ that were built with the -m32 option.
+
+config SYSVIPC_COMPAT
+ def_bool y
+ depends on COMPAT && SYSVIPC
+
+# We do not currently support disabling HIGHMEM on tile64 and tilepro.
+config HIGHMEM
+ bool # "Support for more than 512 MB of RAM"
+ default !TILEGX
+ ---help---
+ Linux can use the full amount of RAM in the system by
+ default. However, the address space of TILE processors is
+ only 4 Gigabytes large. That means that, if you have a large
+ amount of physical memory, not all of it can be "permanently
+ mapped" by the kernel. The physical memory that's not
+ permanently mapped is called "high memory".
+
+ If you are compiling a kernel which will never run on a
+ machine with more than 512 MB total physical RAM, answer
+ "false" here. This will result in the kernel mapping all of
+ physical memory into the top 1 GB of virtual memory space.
+
+ If unsure, say "true".
+
+# We do not currently support disabling NUMA.
+config NUMA
+ bool # "NUMA Memory Allocation and Scheduler Support"
+ depends on SMP && DISCONTIGMEM
+ default y
+ ---help---
+ NUMA memory allocation is required for TILE processors
+ unless booting with memory striping enabled in the
+ hypervisor, or with only a single memory controller.
+ It is recommended that this option always be enabled.
+
+config NODES_SHIFT
+ int "Log base 2 of the max number of memory controllers"
+ default 2
+ depends on NEED_MULTIPLE_NODES
+ ---help---
+ By default, 2, i.e. 2^2 == 4 DDR2 controllers.
+ In a system with more controllers, this value should be raised.
+
+# Need 16MB areas to enable hugetlb
+# See build-time check in arch/tile/mm/init.c.
+config FORCE_MAX_ZONEORDER
+ int
+ default 9
+
+choice
+ depends on !TILEGX
+ prompt "Memory split" if EMBEDDED
+ default VMSPLIT_3G
+ ---help---
+ Select the desired split between kernel and user memory.
+
+ If the address range available to the kernel is less than the
+ physical memory installed, the remaining memory will be available
+ as "high memory". Accessing high memory is a little more costly
+ than low memory, as it needs to be mapped into the kernel first.
+ Note that increasing the kernel address space limits the range
+ available to user programs, making the address space there
+ tighter. Selecting anything other than the default 3G/1G split
+ will also likely make your kernel incompatible with binary-only
+ kernel modules.
+
+ If you are not absolutely sure what you are doing, leave this
+ option alone!
+
+ config VMSPLIT_375G
+ bool "3.75G/0.25G user/kernel split (no kernel networking)"
+ config VMSPLIT_35G
+ bool "3.5G/0.5G user/kernel split"
+ config VMSPLIT_3G
+ bool "3G/1G user/kernel split"
+ config VMSPLIT_3G_OPT
+ bool "3G/1G user/kernel split (for full 1G low memory)"
+ config VMSPLIT_2G
+ bool "2G/2G user/kernel split"
+ config VMSPLIT_1G
+ bool "1G/3G user/kernel split"
+endchoice
+
+config PAGE_OFFSET
+ hex
+ default 0xF0000000 if VMSPLIT_375G
+ default 0xE0000000 if VMSPLIT_35G
+ default 0xB0000000 if VMSPLIT_3G_OPT
+ default 0x80000000 if VMSPLIT_2G
+ default 0x40000000 if VMSPLIT_1G
+ default 0xC0000000
+
+source "mm/Kconfig"
+
+config CMDLINE_BOOL
+ bool "Built-in kernel command line"
+ default n
+ ---help---
+ Allow for specifying boot arguments to the kernel at
+ build time. On some systems (e.g. embedded ones), it is
+ necessary or convenient to provide some or all of the
+ kernel boot arguments with the kernel itself (that is,
+ to not rely on the boot loader to provide them.)
+
+ To compile command line arguments into the kernel,
+ set this option to 'Y', then fill in the
+ the boot arguments in CONFIG_CMDLINE.
+
+ Systems with fully functional boot loaders (e.g. mboot, or
+ if booting over PCI) should leave this option set to 'N'.
+
+config CMDLINE
+ string "Built-in kernel command string"
+ depends on CMDLINE_BOOL
+ default ""
+ ---help---
+ Enter arguments here that should be compiled into the kernel
+ image and used at boot time. If the boot loader provides a
+ command line at boot time, it is appended to this string to
+ form the full kernel command line, when the system boots.
+
+ However, you can use the CONFIG_CMDLINE_OVERRIDE option to
+ change this behavior.
+
+ In most cases, the command line (whether built-in or provided
+ by the boot loader) should specify the device for the root
+ file system.
+
+config CMDLINE_OVERRIDE
+ bool "Built-in command line overrides boot loader arguments"
+ default n
+ depends on CMDLINE_BOOL
+ ---help---
+ Set this option to 'Y' to have the kernel ignore the boot loader
+ command line, and use ONLY the built-in command line.
+
+ This is used to work around broken boot loaders. This should
+ be set to 'N' under normal conditions.
+
+config VMALLOC_RESERVE
+ hex
+ default 0x1000000
+
+endmenu # Tilera-specific configuration
+
+menu "Bus options"
+
+config NO_IOMEM
+ bool
+ def_bool !PCI
+
+source "drivers/pci/Kconfig"
+
+source "drivers/pci/hotplug/Kconfig"
+
+endmenu
+
+menu "Executable file formats"
+
+# only elf supported
+config KCORE_ELF
+ def_bool y
+ depends on PROC_FS
+
+source "fs/Kconfig.binfmt"
+
+endmenu
+
+source "net/Kconfig"
+
+source "drivers/Kconfig"
+
+source "fs/Kconfig"
+
+source "arch/tile/Kconfig.debug"
+
+source "security/Kconfig"
+
+source "crypto/Kconfig"
+
+source "lib/Kconfig"
diff --git a/arch/tile/Kconfig.debug b/arch/tile/Kconfig.debug
new file mode 100644
index 0000000..a81f0fb
--- /dev/null
+++ b/arch/tile/Kconfig.debug
@@ -0,0 +1,43 @@
+menu "Kernel hacking"
+
+source "lib/Kconfig.debug"
+
+config EARLY_PRINTK
+ bool "Early printk" if EMBEDDED && DEBUG_KERNEL
+ default y
+ help
+ Write kernel log output directly via the hypervisor console.
+
+ This is useful for kernel debugging when your machine crashes very
+ early before the console code is initialized. For normal operation
+ it is not recommended because it looks ugly and doesn't cooperate
+ with klogd/syslogd. You should normally N here,
+ unless you want to debug such a crash.
+
+config DEBUG_STACKOVERFLOW
+ bool "Check for stack overflows"
+ depends on DEBUG_KERNEL
+ help
+ This option will cause messages to be printed if free stack space
+ drops below a certain limit.
+
+config DEBUG_STACK_USAGE
+ bool "Stack utilization instrumentation"
+ depends on DEBUG_KERNEL
+ help
+ Enables the display of the minimum amount of free stack which each
+ task has ever had available in the sysrq-T and sysrq-P debug output.
+
+ This option will slow down process creation somewhat.
+
+config DEBUG_EXTRA_FLAGS
+ string "Additional compiler arguments when building with '-g'"
+ depends on DEBUG_INFO
+ default ""
+ help
+ Debug info can be large, and flags like
+ `-femit-struct-debug-baseonly' can reduce the kernel file
+ size and build time noticeably. Such flags are often
+ helpful if the main use of debug info is line number info.
+
+endmenu
diff --git a/arch/tile/Makefile b/arch/tile/Makefile
new file mode 100644
index 0000000..07c4318
--- /dev/null
+++ b/arch/tile/Makefile
@@ -0,0 +1,52 @@
+#
+# This file is subject to the terms and conditions of the GNU General Public
+# License. See the file "COPYING" in the main directory of this archive
+# for more details.
+#
+# This file is included by the global makefile so that you can add your own
+# architecture-specific flags and dependencies. Remember to do have actions
+# for "archclean" and "archdep" for cleaning up and making dependencies for
+# this architecture
+
+ifeq ($(CROSS_COMPILE),)
+# If building with TILERA_ROOT set (i.e. using the Tilera Multicore
+# Development Environment) we can set CROSS_COMPILE based on that.
+ifdef TILERA_ROOT
+CROSS_COMPILE = $(TILERA_ROOT)/bin/tile-
+endif
+endif
+
+# If we're not cross-compiling, make sure we're on the right architecture.
+ifeq ($(CROSS_COMPILE),)
+HOST_ARCH = $(shell uname -m)
+ifneq ($(HOST_ARCH),$(ARCH))
+$(error Set TILERA_ROOT or CROSS_COMPILE when building $(ARCH) on $(HOST_ARCH))
+endif
+endif
+
+
+KBUILD_CFLAGS += $(CONFIG_DEBUG_EXTRA_FLAGS)
+
+LIBGCC_PATH := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name)
+
+# Provide the path to use for "make defconfig".
+KBUILD_DEFCONFIG := $(ARCH)_defconfig
+
+# Used as a file extension when useful, e.g. head_$(BITS).o
+# Not needed for (e.g.) "$(CC) -m32" since the compiler automatically
+# uses the right default anyway.
+export BITS
+ifeq ($(CONFIG_TILEGX),y)
+BITS := 64
+else
+BITS := 32
+endif
+
+head-y := arch/tile/kernel/head_$(BITS).o
+
+libs-y += arch/tile/lib/
+libs-y += $(LIBGCC_PATH)
+
+
+# See arch/tile/Kbuild for content of core part of the kernel
+core-y += arch/tile/
diff --git a/arch/tile/configs/tile_defconfig b/arch/tile/configs/tile_defconfig
new file mode 100644
index 0000000..74a5be3
--- /dev/null
+++ b/arch/tile/configs/tile_defconfig
@@ -0,0 +1,1289 @@
+#
+# Automatically generated make config: don't edit
+# Linux kernel version: 2.6.34
+# Fri May 28 17:51:43 2010
+#
+CONFIG_MMU=y
+CONFIG_GENERIC_CSUM=y
+CONFIG_GENERIC_HARDIRQS=y
+CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
+CONFIG_GENERIC_IRQ_PROBE=y
+CONFIG_GENERIC_PENDING_IRQ=y
+CONFIG_ZONE_DMA=y
+CONFIG_SEMAPHORE_SLEEPERS=y
+CONFIG_CC_OPTIMIZE_FOR_SIZE=y
+CONFIG_HAVE_ARCH_ALLOC_REMAP=y
+CONFIG_HAVE_SETUP_PER_CPU_AREA=y
+CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
+CONFIG_SYS_SUPPORTS_HUGETLBFS=y
+CONFIG_GENERIC_TIME=y
+CONFIG_GENERIC_CLOCKEVENTS=y
+CONFIG_CLOCKSOURCE_WATCHDOG=y
+CONFIG_RWSEM_GENERIC_SPINLOCK=y
+CONFIG_DEFAULT_MIGRATION_COST=10000000
+CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
+CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
+CONFIG_LOCKDEP_SUPPORT=y
+CONFIG_STACKTRACE_SUPPORT=y
+CONFIG_ARCH_DISCONTIGMEM_ENABLE=y
+CONFIG_ARCH_DISCONTIGMEM_DEFAULT=y
+CONFIG_TRACE_IRQFLAGS_SUPPORT=y
+CONFIG_STRICT_DEVMEM=y
+CONFIG_SMP=y
+CONFIG_WERROR=y
+# CONFIG_DEBUG_COPY_FROM_USER is not set
+CONFIG_SERIAL_CONSOLE=y
+CONFIG_HVC_TILE=y
+CONFIG_TILE=y
+# CONFIG_TILEGX is not set
+CONFIG_ARCH_DEFCONFIG="arch/tile/configs/tile_defconfig"
+CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
+CONFIG_CONSTRUCTORS=y
+
+#
+# General setup
+#
+CONFIG_EXPERIMENTAL=y
+CONFIG_LOCK_KERNEL=y
+CONFIG_INIT_ENV_ARG_LIMIT=32
+CONFIG_LOCALVERSION=""
+CONFIG_LOCALVERSION_AUTO=y
+# CONFIG_SWAP is not set
+CONFIG_SYSVIPC=y
+CONFIG_SYSVIPC_SYSCTL=y
+# CONFIG_POSIX_MQUEUE is not set
+# CONFIG_BSD_PROCESS_ACCT is not set
+# CONFIG_TASKSTATS is not set
+# CONFIG_AUDIT is not set
+
+#
+# RCU Subsystem
+#
+CONFIG_TREE_RCU=y
+# CONFIG_TREE_PREEMPT_RCU is not set
+# CONFIG_TINY_RCU is not set
+# CONFIG_RCU_TRACE is not set
+CONFIG_RCU_FANOUT=32
+# CONFIG_RCU_FANOUT_EXACT is not set
+# CONFIG_RCU_FAST_NO_HZ is not set
+# CONFIG_TREE_RCU_TRACE is not set
+# CONFIG_IKCONFIG is not set
+CONFIG_LOG_BUF_SHIFT=17
+# CONFIG_CGROUPS is not set
+# CONFIG_SYSFS_DEPRECATED_V2 is not set
+# CONFIG_RELAY is not set
+# CONFIG_NAMESPACES is not set
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_INITRAMFS_SOURCE="usr/contents.txt"
+CONFIG_INITRAMFS_ROOT_UID=0
+CONFIG_INITRAMFS_ROOT_GID=0
+CONFIG_RD_GZIP=y
+# CONFIG_RD_BZIP2 is not set
+# CONFIG_RD_LZMA is not set
+# CONFIG_RD_LZO is not set
+CONFIG_INITRAMFS_COMPRESSION_NONE=y
+# CONFIG_INITRAMFS_COMPRESSION_GZIP is not set
+# CONFIG_INITRAMFS_COMPRESSION_BZIP2 is not set
+# CONFIG_INITRAMFS_COMPRESSION_LZMA is not set
+# CONFIG_INITRAMFS_COMPRESSION_LZO is not set
+CONFIG_SYSCTL=y
+CONFIG_ANON_INODES=y
+CONFIG_EMBEDDED=y
+CONFIG_SYSCTL_SYSCALL=y
+CONFIG_KALLSYMS=y
+# CONFIG_KALLSYMS_ALL is not set
+# CONFIG_KALLSYMS_EXTRA_PASS is not set
+CONFIG_HOTPLUG=y
+CONFIG_PRINTK=y
+CONFIG_BUG=y
+CONFIG_ELF_CORE=y
+CONFIG_BASE_FULL=y
+CONFIG_FUTEX=y
+CONFIG_EPOLL=y
+CONFIG_SIGNALFD=y
+CONFIG_TIMERFD=y
+CONFIG_EVENTFD=y
+CONFIG_SHMEM=y
+CONFIG_AIO=y
+
+#
+# Kernel Performance Events And Counters
+#
+CONFIG_VM_EVENT_COUNTERS=y
+CONFIG_PCI_QUIRKS=y
+CONFIG_SLUB_DEBUG=y
+# CONFIG_COMPAT_BRK is not set
+# CONFIG_SLAB is not set
+CONFIG_SLUB=y
+# CONFIG_SLOB is not set
+CONFIG_PROFILING=y
+CONFIG_OPROFILE=y
+CONFIG_HAVE_OPROFILE=y
+CONFIG_USE_GENERIC_SMP_HELPERS=y
+
+#
+# GCOV-based kernel profiling
+#
+# CONFIG_SLOW_WORK is not set
+# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
+CONFIG_SLABINFO=y
+CONFIG_RT_MUTEXES=y
+CONFIG_BASE_SMALL=0
+CONFIG_MODULES=y
+# CONFIG_MODULE_FORCE_LOAD is not set
+CONFIG_MODULE_UNLOAD=y
+# CONFIG_MODULE_FORCE_UNLOAD is not set
+# CONFIG_MODVERSIONS is not set
+# CONFIG_MODULE_SRCVERSION_ALL is not set
+CONFIG_STOP_MACHINE=y
+CONFIG_BLOCK=y
+CONFIG_LBDAF=y
+# CONFIG_BLK_DEV_BSG is not set
+# CONFIG_BLK_DEV_INTEGRITY is not set
+
+#
+# IO Schedulers
+#
+CONFIG_IOSCHED_NOOP=y
+# CONFIG_IOSCHED_DEADLINE is not set
+# CONFIG_IOSCHED_CFQ is not set
+# CONFIG_DEFAULT_DEADLINE is not set
+# CONFIG_DEFAULT_CFQ is not set
+CONFIG_DEFAULT_NOOP=y
+CONFIG_DEFAULT_IOSCHED="noop"
+# CONFIG_INLINE_SPIN_TRYLOCK is not set
+# CONFIG_INLINE_SPIN_TRYLOCK_BH is not set
+# CONFIG_INLINE_SPIN_LOCK is not set
+# CONFIG_INLINE_SPIN_LOCK_BH is not set
+# CONFIG_INLINE_SPIN_LOCK_IRQ is not set
+# CONFIG_INLINE_SPIN_LOCK_IRQSAVE is not set
+CONFIG_INLINE_SPIN_UNLOCK=y
+# CONFIG_INLINE_SPIN_UNLOCK_BH is not set
+CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
+# CONFIG_INLINE_SPIN_UNLOCK_IRQRESTORE is not set
+# CONFIG_INLINE_READ_TRYLOCK is not set
+# CONFIG_INLINE_READ_LOCK is not set
+# CONFIG_INLINE_READ_LOCK_BH is not set
+# CONFIG_INLINE_READ_LOCK_IRQ is not set
+# CONFIG_INLINE_READ_LOCK_IRQSAVE is not set
+CONFIG_INLINE_READ_UNLOCK=y
+# CONFIG_INLINE_READ_UNLOCK_BH is not set
+CONFIG_INLINE_READ_UNLOCK_IRQ=y
+# CONFIG_INLINE_READ_UNLOCK_IRQRESTORE is not set
+# CONFIG_INLINE_WRITE_TRYLOCK is not set
+# CONFIG_INLINE_WRITE_LOCK is not set
+# CONFIG_INLINE_WRITE_LOCK_BH is not set
+# CONFIG_INLINE_WRITE_LOCK_IRQ is not set
+# CONFIG_INLINE_WRITE_LOCK_IRQSAVE is not set
+CONFIG_INLINE_WRITE_UNLOCK=y
+# CONFIG_INLINE_WRITE_UNLOCK_BH is not set
+CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
+# CONFIG_INLINE_WRITE_UNLOCK_IRQRESTORE is not set
+CONFIG_MUTEX_SPIN_ON_OWNER=y
+
+#
+# Tilera-specific configuration
+#
+CONFIG_NR_CPUS=64
+CONFIG_HOMECACHE=y
+CONFIG_DATAPLANE=y
+CONFIG_TICK_ONESHOT=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
+CONFIG_HZ_100=y
+# CONFIG_HZ_250 is not set
+# CONFIG_HZ_300 is not set
+# CONFIG_HZ_1000 is not set
+CONFIG_HZ=100
+CONFIG_SCHED_HRTICK=y
+# CONFIG_KEXEC is not set
+CONFIG_HIGHMEM=y
+CONFIG_NUMA=y
+CONFIG_NODES_SHIFT=2
+CONFIG_FORCE_MAX_ZONEORDER=9
+# CONFIG_VMSPLIT_375G is not set
+# CONFIG_VMSPLIT_35G is not set
+CONFIG_VMSPLIT_3G=y
+# CONFIG_VMSPLIT_3G_OPT is not set
+# CONFIG_VMSPLIT_2G is not set
+# CONFIG_VMSPLIT_1G is not set
+CONFIG_PAGE_OFFSET=0xC0000000
+CONFIG_SELECT_MEMORY_MODEL=y
+# CONFIG_FLATMEM_MANUAL is not set
+CONFIG_DISCONTIGMEM_MANUAL=y
+# CONFIG_SPARSEMEM_MANUAL is not set
+CONFIG_DISCONTIGMEM=y
+CONFIG_FLAT_NODE_MEM_MAP=y
+CONFIG_NEED_MULTIPLE_NODES=y
+CONFIG_PAGEFLAGS_EXTENDED=y
+CONFIG_SPLIT_PTLOCK_CPUS=4
+CONFIG_MIGRATION=y
+CONFIG_PHYS_ADDR_T_64BIT=y
+CONFIG_ZONE_DMA_FLAG=1
+CONFIG_BOUNCE=y
+CONFIG_VIRT_TO_BUS=y
+# CONFIG_KSM is not set
+CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
+# CONFIG_CMDLINE_BOOL is not set
+# CONFIG_FEEDBACK_COLLECT is not set
+CONFIG_FEEDBACK_USE=""
+# CONFIG_HUGEVMAP is not set
+CONFIG_VMALLOC_RESERVE=0x1000000
+CONFIG_HARDWALL=y
+CONFIG_MEMPROF=y
+CONFIG_XGBE_MAIN=y
+CONFIG_NET_TILE=y
+CONFIG_PSEUDO_NAPI=y
+CONFIG_TILEPCI_ENDP=y
+CONFIG_TILE_IDE_GPIO=y
+CONFIG_TILE_SOFTUART=y
+
+#
+# Bus options
+#
+CONFIG_PCI=y
+CONFIG_PCI_DOMAINS=y
+# CONFIG_ARCH_SUPPORTS_MSI is not set
+CONFIG_PCI_DEBUG=y
+# CONFIG_PCI_STUB is not set
+# CONFIG_PCI_IOV is not set
+# CONFIG_HOTPLUG_PCI is not set
+
+#
+# Executable file formats
+#
+CONFIG_KCORE_ELF=y
+CONFIG_BINFMT_ELF=y
+# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
+# CONFIG_HAVE_AOUT is not set
+# CONFIG_BINFMT_MISC is not set
+CONFIG_NET=y
+
+#
+# Networking options
+#
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_XFRM=y
+# CONFIG_XFRM_USER is not set
+# CONFIG_XFRM_SUB_POLICY is not set
+# CONFIG_XFRM_MIGRATE is not set
+# CONFIG_XFRM_STATISTICS is not set
+# CONFIG_NET_KEY is not set
+CONFIG_INET=y
+CONFIG_IP_MULTICAST=y
+# CONFIG_IP_ADVANCED_ROUTER is not set
+CONFIG_IP_FIB_HASH=y
+# CONFIG_IP_PNP is not set
+# CONFIG_NET_IPIP is not set
+# CONFIG_NET_IPGRE is not set
+# CONFIG_IP_MROUTE is not set
+# CONFIG_ARPD is not set
+# CONFIG_SYN_COOKIES is not set
+# CONFIG_INET_AH is not set
+# CONFIG_INET_ESP is not set
+# CONFIG_INET_IPCOMP is not set
+# CONFIG_INET_XFRM_TUNNEL is not set
+CONFIG_INET_TUNNEL=y
+# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
+# CONFIG_INET_XFRM_MODE_TUNNEL is not set
+CONFIG_INET_XFRM_MODE_BEET=y
+# CONFIG_INET_LRO is not set
+# CONFIG_INET_DIAG is not set
+# CONFIG_TCP_CONG_ADVANCED is not set
+CONFIG_TCP_CONG_CUBIC=y
+CONFIG_DEFAULT_TCP_CONG="cubic"
+# CONFIG_TCP_MD5SIG is not set
+CONFIG_IPV6=y
+# CONFIG_IPV6_PRIVACY is not set
+# CONFIG_IPV6_ROUTER_PREF is not set
+# CONFIG_IPV6_OPTIMISTIC_DAD is not set
+# CONFIG_INET6_AH is not set
+# CONFIG_INET6_ESP is not set
+# CONFIG_INET6_IPCOMP is not set
+# CONFIG_IPV6_MIP6 is not set
+# CONFIG_INET6_XFRM_TUNNEL is not set
+# CONFIG_INET6_TUNNEL is not set
+CONFIG_INET6_XFRM_MODE_TRANSPORT=y
+CONFIG_INET6_XFRM_MODE_TUNNEL=y
+CONFIG_INET6_XFRM_MODE_BEET=y
+# CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set
+CONFIG_IPV6_SIT=y
+# CONFIG_IPV6_SIT_6RD is not set
+CONFIG_IPV6_NDISC_NODETYPE=y
+# CONFIG_IPV6_TUNNEL is not set
+# CONFIG_IPV6_MULTIPLE_TABLES is not set
+# CONFIG_IPV6_MROUTE is not set
+# CONFIG_NETWORK_SECMARK is not set
+# CONFIG_NETFILTER is not set
+# CONFIG_IP_DCCP is not set
+# CONFIG_IP_SCTP is not set
+# CONFIG_RDS is not set
+# CONFIG_TIPC is not set
+# CONFIG_ATM is not set
+# CONFIG_BRIDGE is not set
+# CONFIG_NET_DSA is not set
+# CONFIG_VLAN_8021Q is not set
+# CONFIG_DECNET is not set
+# CONFIG_LLC2 is not set
+# CONFIG_IPX is not set
+# CONFIG_ATALK is not set
+# CONFIG_X25 is not set
+# CONFIG_LAPB is not set
+# CONFIG_ECONET is not set
+# CONFIG_WAN_ROUTER is not set
+# CONFIG_PHONET is not set
+# CONFIG_IEEE802154 is not set
+# CONFIG_NET_SCHED is not set
+# CONFIG_DCB is not set
+
+#
+# Network testing
+#
+# CONFIG_NET_PKTGEN is not set
+# CONFIG_HAMRADIO is not set
+# CONFIG_CAN is not set
+# CONFIG_IRDA is not set
+# CONFIG_BT is not set
+# CONFIG_AF_RXRPC is not set
+# CONFIG_WIRELESS is not set
+# CONFIG_WIMAX is not set
+# CONFIG_RFKILL is not set
+# CONFIG_NET_9P is not set
+
+#
+# Device Drivers
+#
+
+#
+# Generic Driver Options
+#
+CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
+# CONFIG_DEVTMPFS is not set
+CONFIG_STANDALONE=y
+CONFIG_PREVENT_FIRMWARE_BUILD=y
+CONFIG_FW_LOADER=y
+CONFIG_FIRMWARE_IN_KERNEL=y
+CONFIG_EXTRA_FIRMWARE=""
+# CONFIG_DEBUG_DRIVER is not set
+# CONFIG_DEBUG_DEVRES is not set
+# CONFIG_SYS_HYPERVISOR is not set
+# CONFIG_CONNECTOR is not set
+# CONFIG_MTD is not set
+# CONFIG_PARPORT is not set
+CONFIG_BLK_DEV=y
+# CONFIG_BLK_CPQ_DA is not set
+# CONFIG_BLK_CPQ_CISS_DA is not set
+# CONFIG_BLK_DEV_DAC960 is not set
+# CONFIG_BLK_DEV_UMEM is not set
+# CONFIG_BLK_DEV_COW_COMMON is not set
+# CONFIG_BLK_DEV_LOOP is not set
+
+#
+# DRBD disabled because PROC_FS, INET or CONNECTOR not selected
+#
+# CONFIG_BLK_DEV_NBD is not set
+# CONFIG_BLK_DEV_SX8 is not set
+# CONFIG_BLK_DEV_RAM is not set
+# CONFIG_CDROM_PKTCDVD is not set
+# CONFIG_ATA_OVER_ETH is not set
+# CONFIG_BLK_DEV_HD is not set
+CONFIG_MISC_DEVICES=y
+# CONFIG_AD525X_DPOT is not set
+# CONFIG_PHANTOM is not set
+# CONFIG_SGI_IOC4 is not set
+# CONFIG_TIFM_CORE is not set
+# CONFIG_ICS932S401 is not set
+# CONFIG_ENCLOSURE_SERVICES is not set
+# CONFIG_HP_ILO is not set
+# CONFIG_ISL29003 is not set
+# CONFIG_SENSORS_TSL2550 is not set
+# CONFIG_DS1682 is not set
+# CONFIG_C2PORT is not set
+
+#
+# EEPROM support
+#
+# CONFIG_EEPROM_AT24 is not set
+# CONFIG_EEPROM_LEGACY is not set
+# CONFIG_EEPROM_MAX6875 is not set
+# CONFIG_EEPROM_93CX6 is not set
+# CONFIG_CB710_CORE is not set
+CONFIG_HAVE_IDE=y
+CONFIG_IDE=y
+
+#
+# Please see Documentation/ide/ide.txt for help/info on IDE drives
+#
+# CONFIG_BLK_DEV_IDE_SATA is not set
+CONFIG_IDE_GD=y
+CONFIG_IDE_GD_ATA=y
+# CONFIG_IDE_GD_ATAPI is not set
+# CONFIG_BLK_DEV_IDECD is not set
+# CONFIG_BLK_DEV_IDETAPE is not set
+# CONFIG_IDE_TASK_IOCTL is not set
+CONFIG_IDE_PROC_FS=y
+
+#
+# IDE chipset support/bugfixes
+#
+# CONFIG_BLK_DEV_PLATFORM is not set
+
+#
+# PCI IDE chipsets support
+#
+# CONFIG_BLK_DEV_GENERIC is not set
+# CONFIG_BLK_DEV_OPTI621 is not set
+# CONFIG_BLK_DEV_AEC62XX is not set
+# CONFIG_BLK_DEV_ALI15X3 is not set
+# CONFIG_BLK_DEV_AMD74XX is not set
+# CONFIG_BLK_DEV_CMD64X is not set
+# CONFIG_BLK_DEV_TRIFLEX is not set
+# CONFIG_BLK_DEV_CS5520 is not set
+# CONFIG_BLK_DEV_CS5530 is not set
+# CONFIG_BLK_DEV_HPT366 is not set
+# CONFIG_BLK_DEV_JMICRON is not set
+# CONFIG_BLK_DEV_SC1200 is not set
+# CONFIG_BLK_DEV_PIIX is not set
+# CONFIG_BLK_DEV_IT8172 is not set
+# CONFIG_BLK_DEV_IT8213 is not set
+# CONFIG_BLK_DEV_IT821X is not set
+# CONFIG_BLK_DEV_NS87415 is not set
+# CONFIG_BLK_DEV_PDC202XX_OLD is not set
+# CONFIG_BLK_DEV_PDC202XX_NEW is not set
+# CONFIG_BLK_DEV_SVWKS is not set
+# CONFIG_BLK_DEV_SIIMAGE is not set
+# CONFIG_BLK_DEV_SLC90E66 is not set
+# CONFIG_BLK_DEV_TRM290 is not set
+# CONFIG_BLK_DEV_VIA82CXXX is not set
+# CONFIG_BLK_DEV_TC86C001 is not set
+# CONFIG_BLK_DEV_IDEDMA is not set
+
+#
+# SCSI device support
+#
+CONFIG_SCSI_MOD=y
+# CONFIG_RAID_ATTRS is not set
+CONFIG_SCSI=y
+CONFIG_SCSI_DMA=y
+# CONFIG_SCSI_TGT is not set
+# CONFIG_SCSI_NETLINK is not set
+CONFIG_SCSI_PROC_FS=y
+
+#
+# SCSI support type (disk, tape, CD-ROM)
+#
+CONFIG_BLK_DEV_SD=y
+# CONFIG_CHR_DEV_ST is not set
+# CONFIG_CHR_DEV_OSST is not set
+# CONFIG_BLK_DEV_SR is not set
+# CONFIG_CHR_DEV_SG is not set
+# CONFIG_CHR_DEV_SCH is not set
+# CONFIG_SCSI_MULTI_LUN is not set
+CONFIG_SCSI_CONSTANTS=y
+CONFIG_SCSI_LOGGING=y
+# CONFIG_SCSI_SCAN_ASYNC is not set
+CONFIG_SCSI_WAIT_SCAN=m
+
+#
+# SCSI Transports
+#
+# CONFIG_SCSI_SPI_ATTRS is not set
+# CONFIG_SCSI_FC_ATTRS is not set
+# CONFIG_SCSI_ISCSI_ATTRS is not set
+# CONFIG_SCSI_SAS_LIBSAS is not set
+# CONFIG_SCSI_SRP_ATTRS is not set
+CONFIG_SCSI_LOWLEVEL=y
+# CONFIG_ISCSI_TCP is not set
+# CONFIG_SCSI_BNX2_ISCSI is not set
+# CONFIG_BE2ISCSI is not set
+# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
+# CONFIG_SCSI_HPSA is not set
+# CONFIG_SCSI_3W_9XXX is not set
+# CONFIG_SCSI_3W_SAS is not set
+# CONFIG_SCSI_ACARD is not set
+# CONFIG_SCSI_AACRAID is not set
+# CONFIG_SCSI_AIC7XXX is not set
+# CONFIG_SCSI_AIC7XXX_OLD is not set
+# CONFIG_SCSI_AIC79XX is not set
+# CONFIG_SCSI_AIC94XX is not set
+# CONFIG_SCSI_MVSAS is not set
+# CONFIG_SCSI_DPT_I2O is not set
+# CONFIG_SCSI_ADVANSYS is not set
+# CONFIG_SCSI_ARCMSR is not set
+# CONFIG_MEGARAID_NEWGEN is not set
+# CONFIG_MEGARAID_LEGACY is not set
+# CONFIG_MEGARAID_SAS is not set
+# CONFIG_SCSI_MPT2SAS is not set
+# CONFIG_SCSI_HPTIOP is not set
+# CONFIG_LIBFC is not set
+# CONFIG_LIBFCOE is not set
+# CONFIG_FCOE is not set
+# CONFIG_SCSI_DMX3191D is not set
+# CONFIG_SCSI_FUTURE_DOMAIN is not set
+# CONFIG_SCSI_IPS is not set
+# CONFIG_SCSI_INITIO is not set
+# CONFIG_SCSI_INIA100 is not set
+# CONFIG_SCSI_STEX is not set
+# CONFIG_SCSI_SYM53C8XX_2 is not set
+# CONFIG_SCSI_IPR is not set
+# CONFIG_SCSI_QLOGIC_1280 is not set
+# CONFIG_SCSI_QLA_FC is not set
+# CONFIG_SCSI_QLA_ISCSI is not set
+# CONFIG_SCSI_LPFC is not set
+# CONFIG_SCSI_DC395x is not set
+# CONFIG_SCSI_DC390T is not set
+# CONFIG_SCSI_NSP32 is not set
+# CONFIG_SCSI_DEBUG is not set
+# CONFIG_SCSI_PMCRAID is not set
+# CONFIG_SCSI_PM8001 is not set
+# CONFIG_SCSI_SRP is not set
+# CONFIG_SCSI_BFA_FC is not set
+# CONFIG_SCSI_LOWLEVEL_PCMCIA is not set
+# CONFIG_SCSI_DH is not set
+# CONFIG_SCSI_OSD_INITIATOR is not set
+CONFIG_ATA=y
+# CONFIG_ATA_NONSTANDARD is not set
+CONFIG_ATA_VERBOSE_ERROR=y
+CONFIG_SATA_PMP=y
+# CONFIG_SATA_AHCI is not set
+CONFIG_SATA_SIL24=y
+CONFIG_ATA_SFF=y
+# CONFIG_SATA_SVW is not set
+# CONFIG_ATA_PIIX is not set
+# CONFIG_SATA_MV is not set
+# CONFIG_SATA_NV is not set
+# CONFIG_PDC_ADMA is not set
+# CONFIG_SATA_QSTOR is not set
+# CONFIG_SATA_PROMISE is not set
+# CONFIG_SATA_SX4 is not set
+# CONFIG_SATA_SIL is not set
+# CONFIG_SATA_SIS is not set
+# CONFIG_SATA_ULI is not set
+# CONFIG_SATA_VIA is not set
+# CONFIG_SATA_VITESSE is not set
+# CONFIG_SATA_INIC162X is not set
+# CONFIG_PATA_ALI is not set
+# CONFIG_PATA_AMD is not set
+# CONFIG_PATA_ARTOP is not set
+# CONFIG_PATA_ATP867X is not set
+# CONFIG_PATA_ATIIXP is not set
+# CONFIG_PATA_CMD640_PCI is not set
+# CONFIG_PATA_CMD64X is not set
+# CONFIG_PATA_CS5520 is not set
+# CONFIG_PATA_CS5530 is not set
+# CONFIG_PATA_CYPRESS is not set
+# CONFIG_PATA_EFAR is not set
+# CONFIG_ATA_GENERIC is not set
+# CONFIG_PATA_HPT366 is not set
+# CONFIG_PATA_HPT37X is not set
+# CONFIG_PATA_HPT3X2N is not set
+# CONFIG_PATA_HPT3X3 is not set
+# CONFIG_PATA_IT821X is not set
+# CONFIG_PATA_IT8213 is not set
+# CONFIG_PATA_JMICRON is not set
+# CONFIG_PATA_LEGACY is not set
+# CONFIG_PATA_TRIFLEX is not set
+# CONFIG_PATA_MARVELL is not set
+# CONFIG_PATA_MPIIX is not set
+# CONFIG_PATA_OLDPIIX is not set
+# CONFIG_PATA_NETCELL is not set
+# CONFIG_PATA_NINJA32 is not set
+# CONFIG_PATA_NS87410 is not set
+# CONFIG_PATA_NS87415 is not set
+# CONFIG_PATA_OPTI is not set
+# CONFIG_PATA_OPTIDMA is not set
+# CONFIG_PATA_PDC2027X is not set
+# CONFIG_PATA_PDC_OLD is not set
+# CONFIG_PATA_RADISYS is not set
+# CONFIG_PATA_RDC is not set
+# CONFIG_PATA_RZ1000 is not set
+# CONFIG_PATA_SC1200 is not set
+# CONFIG_PATA_SERVERWORKS is not set
+# CONFIG_PATA_SIL680 is not set
+# CONFIG_PATA_SIS is not set
+# CONFIG_PATA_TOSHIBA is not set
+# CONFIG_PATA_VIA is not set
+# CONFIG_PATA_WINBOND is not set
+# CONFIG_PATA_PLATFORM is not set
+# CONFIG_PATA_SCH is not set
+# CONFIG_MD is not set
+# CONFIG_FUSION is not set
+
+#
+# IEEE 1394 (FireWire) support
+#
+
+#
+# You can enable one or both FireWire driver stacks.
+#
+
+#
+# The newer stack is recommended.
+#
+# CONFIG_FIREWIRE is not set
+# CONFIG_IEEE1394 is not set
+# CONFIG_I2O is not set
+CONFIG_NETDEVICES=y
+# CONFIG_DUMMY is not set
+# CONFIG_BONDING is not set
+# CONFIG_MACVLAN is not set
+# CONFIG_EQUALIZER is not set
+CONFIG_TUN=y
+# CONFIG_VETH is not set
+# CONFIG_ARCNET is not set
+# CONFIG_NET_ETHERNET is not set
+CONFIG_NETDEV_1000=y
+# CONFIG_ACENIC is not set
+# CONFIG_DL2K is not set
+# CONFIG_E1000 is not set
+CONFIG_E1000E=y
+# CONFIG_IP1000 is not set
+# CONFIG_IGB is not set
+# CONFIG_IGBVF is not set
+# CONFIG_NS83820 is not set
+# CONFIG_HAMACHI is not set
+# CONFIG_YELLOWFIN is not set
+# CONFIG_R8169 is not set
+# CONFIG_SIS190 is not set
+# CONFIG_SKGE is not set
+# CONFIG_SKY2 is not set
+# CONFIG_VIA_VELOCITY is not set
+# CONFIG_TIGON3 is not set
+# CONFIG_BNX2 is not set
+# CONFIG_CNIC is not set
+# CONFIG_QLA3XXX is not set
+# CONFIG_ATL1 is not set
+# CONFIG_ATL1E is not set
+# CONFIG_ATL1C is not set
+# CONFIG_JME is not set
+# CONFIG_NETDEV_10000 is not set
+# CONFIG_TR is not set
+# CONFIG_WLAN is not set
+
+#
+# Enable WiMAX (Networking options) to see the WiMAX drivers
+#
+# CONFIG_WAN is not set
+# CONFIG_FDDI is not set
+# CONFIG_HIPPI is not set
+# CONFIG_PPP is not set
+# CONFIG_SLIP is not set
+# CONFIG_NET_FC is not set
+# CONFIG_NETCONSOLE is not set
+# CONFIG_NETPOLL is not set
+# CONFIG_NET_POLL_CONTROLLER is not set
+# CONFIG_VMXNET3 is not set
+# CONFIG_ISDN is not set
+# CONFIG_PHONE is not set
+
+#
+# Input device support
+#
+CONFIG_INPUT=y
+# CONFIG_INPUT_FF_MEMLESS is not set
+# CONFIG_INPUT_POLLDEV is not set
+# CONFIG_INPUT_SPARSEKMAP is not set
+
+#
+# Userland interfaces
+#
+# CONFIG_INPUT_MOUSEDEV is not set
+# CONFIG_INPUT_JOYDEV is not set
+# CONFIG_INPUT_EVDEV is not set
+# CONFIG_INPUT_EVBUG is not set
+
+#
+# Input Device Drivers
+#
+# CONFIG_INPUT_KEYBOARD is not set
+# CONFIG_INPUT_MOUSE is not set
+# CONFIG_INPUT_JOYSTICK is not set
+# CONFIG_INPUT_TABLET is not set
+# CONFIG_INPUT_TOUCHSCREEN is not set
+# CONFIG_INPUT_MISC is not set
+
+#
+# Hardware I/O ports
+#
+# CONFIG_SERIO is not set
+# CONFIG_GAMEPORT is not set
+
+#
+# Character devices
+#
+# CONFIG_VT is not set
+CONFIG_DEVKMEM=y
+# CONFIG_SERIAL_NONSTANDARD is not set
+# CONFIG_NOZOMI is not set
+
+#
+# Serial drivers
+#
+# CONFIG_SERIAL_8250 is not set
+
+#
+# Non-8250 serial port support
+#
+# CONFIG_SERIAL_JSM is not set
+# CONFIG_SERIAL_TIMBERDALE is not set
+CONFIG_UNIX98_PTYS=y
+# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
+# CONFIG_LEGACY_PTYS is not set
+CONFIG_HVC_DRIVER=y
+# CONFIG_IPMI_HANDLER is not set
+# CONFIG_HW_RANDOM is not set
+# CONFIG_R3964 is not set
+# CONFIG_APPLICOM is not set
+
+#
+# PCMCIA character devices
+#
+# CONFIG_RAW_DRIVER is not set
+# CONFIG_TCG_TPM is not set
+CONFIG_I2C=y
+CONFIG_I2C_BOARDINFO=y
+CONFIG_I2C_COMPAT=y
+CONFIG_I2C_CHARDEV=y
+CONFIG_I2C_HELPER_AUTO=y
+
+#
+# I2C Hardware Bus support
+#
+
+#
+# PC SMBus host controller drivers
+#
+# CONFIG_I2C_ALI1535 is not set
+# CONFIG_I2C_ALI1563 is not set
+# CONFIG_I2C_ALI15X3 is not set
+# CONFIG_I2C_AMD756 is not set
+# CONFIG_I2C_AMD8111 is not set
+# CONFIG_I2C_I801 is not set
+# CONFIG_I2C_ISCH is not set
+# CONFIG_I2C_PIIX4 is not set
+# CONFIG_I2C_NFORCE2 is not set
+# CONFIG_I2C_SIS5595 is not set
+# CONFIG_I2C_SIS630 is not set
+# CONFIG_I2C_SIS96X is not set
+# CONFIG_I2C_VIA is not set
+# CONFIG_I2C_VIAPRO is not set
+
+#
+# I2C system bus drivers (mostly embedded / system-on-chip)
+#
+# CONFIG_I2C_OCORES is not set
+# CONFIG_I2C_SIMTEC is not set
+# CONFIG_I2C_XILINX is not set
+
+#
+# External I2C/SMBus adapter drivers
+#
+# CONFIG_I2C_PARPORT_LIGHT is not set
+# CONFIG_I2C_TAOS_EVM is not set
+
+#
+# Other I2C/SMBus bus drivers
+#
+# CONFIG_I2C_PCA_PLATFORM is not set
+# CONFIG_I2C_STUB is not set
+# CONFIG_I2C_DEBUG_CORE is not set
+# CONFIG_I2C_DEBUG_ALGO is not set
+# CONFIG_I2C_DEBUG_BUS is not set
+# CONFIG_SPI is not set
+
+#
+# PPS support
+#
+# CONFIG_PPS is not set
+# CONFIG_W1 is not set
+# CONFIG_POWER_SUPPLY is not set
+# CONFIG_HWMON is not set
+# CONFIG_THERMAL is not set
+CONFIG_WATCHDOG=y
+CONFIG_WATCHDOG_NOWAYOUT=y
+
+#
+# Watchdog Device Drivers
+#
+# CONFIG_SOFT_WATCHDOG is not set
+# CONFIG_ALIM7101_WDT is not set
+
+#
+# PCI-based Watchdog Cards
+#
+# CONFIG_PCIPCWATCHDOG is not set
+# CONFIG_WDTPCI is not set
+CONFIG_SSB_POSSIBLE=y
+
+#
+# Sonics Silicon Backplane
+#
+# CONFIG_SSB is not set
+
+#
+# Multifunction device drivers
+#
+# CONFIG_MFD_CORE is not set
+# CONFIG_MFD_88PM860X is not set
+# CONFIG_MFD_SM501 is not set
+# CONFIG_HTC_PASIC3 is not set
+# CONFIG_TWL4030_CORE is not set
+# CONFIG_MFD_TMIO is not set
+# CONFIG_PMIC_DA903X is not set
+# CONFIG_PMIC_ADP5520 is not set
+# CONFIG_MFD_MAX8925 is not set
+# CONFIG_MFD_WM8400 is not set
+# CONFIG_MFD_WM831X is not set
+# CONFIG_MFD_WM8350_I2C is not set
+# CONFIG_MFD_WM8994 is not set
+# CONFIG_MFD_PCF50633 is not set
+# CONFIG_AB3100_CORE is not set
+# CONFIG_LPC_SCH is not set
+# CONFIG_REGULATOR is not set
+# CONFIG_MEDIA_SUPPORT is not set
+
+#
+# Graphics support
+#
+# CONFIG_VGA_ARB is not set
+# CONFIG_DRM is not set
+# CONFIG_VGASTATE is not set
+# CONFIG_VIDEO_OUTPUT_CONTROL is not set
+# CONFIG_FB is not set
+# CONFIG_BACKLIGHT_LCD_SUPPORT is not set
+
+#
+# Display device support
+#
+# CONFIG_DISPLAY_SUPPORT is not set
+# CONFIG_SOUND is not set
+# CONFIG_HID_SUPPORT is not set
+# CONFIG_USB_SUPPORT is not set
+# CONFIG_UWB is not set
+# CONFIG_MMC is not set
+# CONFIG_MEMSTICK is not set
+# CONFIG_NEW_LEDS is not set
+# CONFIG_ACCESSIBILITY is not set
+# CONFIG_INFINIBAND is not set
+CONFIG_RTC_LIB=y
+CONFIG_RTC_CLASS=y
+CONFIG_RTC_HCTOSYS=y
+CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
+# CONFIG_RTC_DEBUG is not set
+
+#
+# RTC interfaces
+#
+# CONFIG_RTC_INTF_SYSFS is not set
+# CONFIG_RTC_INTF_PROC is not set
+CONFIG_RTC_INTF_DEV=y
+# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
+# CONFIG_RTC_DRV_TEST is not set
+
+#
+# I2C RTC drivers
+#
+# CONFIG_RTC_DRV_DS1307 is not set
+# CONFIG_RTC_DRV_DS1374 is not set
+# CONFIG_RTC_DRV_DS1672 is not set
+# CONFIG_RTC_DRV_MAX6900 is not set
+# CONFIG_RTC_DRV_RS5C372 is not set
+# CONFIG_RTC_DRV_ISL1208 is not set
+# CONFIG_RTC_DRV_X1205 is not set
+# CONFIG_RTC_DRV_PCF8563 is not set
+# CONFIG_RTC_DRV_PCF8583 is not set
+# CONFIG_RTC_DRV_M41T80 is not set
+# CONFIG_RTC_DRV_BQ32K is not set
+# CONFIG_RTC_DRV_S35390A is not set
+# CONFIG_RTC_DRV_FM3130 is not set
+# CONFIG_RTC_DRV_RX8581 is not set
+# CONFIG_RTC_DRV_RX8025 is not set
+
+#
+# SPI RTC drivers
+#
+
+#
+# Platform RTC drivers
+#
+# CONFIG_RTC_DRV_DS1286 is not set
+# CONFIG_RTC_DRV_DS1511 is not set
+# CONFIG_RTC_DRV_DS1553 is not set
+# CONFIG_RTC_DRV_DS1742 is not set
+# CONFIG_RTC_DRV_STK17TA8 is not set
+# CONFIG_RTC_DRV_M48T86 is not set
+# CONFIG_RTC_DRV_M48T35 is not set
+# CONFIG_RTC_DRV_M48T59 is not set
+# CONFIG_RTC_DRV_MSM6242 is not set
+# CONFIG_RTC_DRV_BQ4802 is not set
+# CONFIG_RTC_DRV_RP5C01 is not set
+# CONFIG_RTC_DRV_V3020 is not set
+
+#
+# on-CPU RTC drivers
+#
+# CONFIG_DMADEVICES is not set
+# CONFIG_AUXDISPLAY is not set
+# CONFIG_UIO is not set
+
+#
+# TI VLYNQ
+#
+# CONFIG_STAGING is not set
+
+#
+# File systems
+#
+CONFIG_EXT2_FS=y
+# CONFIG_EXT2_FS_XATTR is not set
+# CONFIG_EXT2_FS_XIP is not set
+CONFIG_EXT3_FS=y
+# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
+CONFIG_EXT3_FS_XATTR=y
+# CONFIG_EXT3_FS_POSIX_ACL is not set
+# CONFIG_EXT3_FS_SECURITY is not set
+# CONFIG_EXT4_FS is not set
+CONFIG_JBD=y
+CONFIG_FS_MBCACHE=y
+# CONFIG_REISERFS_FS is not set
+# CONFIG_JFS_FS is not set
+# CONFIG_FS_POSIX_ACL is not set
+# CONFIG_XFS_FS is not set
+# CONFIG_GFS2_FS is not set
+# CONFIG_OCFS2_FS is not set
+# CONFIG_BTRFS_FS is not set
+# CONFIG_NILFS2_FS is not set
+CONFIG_FILE_LOCKING=y
+CONFIG_FSNOTIFY=y
+CONFIG_DNOTIFY=y
+# CONFIG_INOTIFY is not set
+CONFIG_INOTIFY_USER=y
+# CONFIG_QUOTA is not set
+# CONFIG_AUTOFS_FS is not set
+# CONFIG_AUTOFS4_FS is not set
+CONFIG_FUSE_FS=y
+# CONFIG_CUSE is not set
+
+#
+# Caches
+#
+# CONFIG_FSCACHE is not set
+
+#
+# CD-ROM/DVD Filesystems
+#
+# CONFIG_ISO9660_FS is not set
+# CONFIG_UDF_FS is not set
+
+#
+# DOS/FAT/NT Filesystems
+#
+CONFIG_FAT_FS=y
+CONFIG_MSDOS_FS=y
+CONFIG_VFAT_FS=m
+CONFIG_FAT_DEFAULT_CODEPAGE=437
+CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
+# CONFIG_NTFS_FS is not set
+
+#
+# Pseudo filesystems
+#
+CONFIG_PROC_FS=y
+# CONFIG_PROC_KCORE is not set
+CONFIG_PROC_SYSCTL=y
+CONFIG_PROC_PAGE_MONITOR=y
+CONFIG_SYSFS=y
+CONFIG_TMPFS=y
+# CONFIG_TMPFS_POSIX_ACL is not set
+CONFIG_HUGETLBFS=y
+CONFIG_HUGETLB_PAGE=y
+# CONFIG_CONFIGFS_FS is not set
+CONFIG_MISC_FILESYSTEMS=y
+# CONFIG_ADFS_FS is not set
+# CONFIG_AFFS_FS is not set
+# CONFIG_HFS_FS is not set
+# CONFIG_HFSPLUS_FS is not set
+# CONFIG_BEFS_FS is not set
+# CONFIG_BFS_FS is not set
+# CONFIG_EFS_FS is not set
+# CONFIG_LOGFS is not set
+# CONFIG_CRAMFS is not set
+# CONFIG_SQUASHFS is not set
+# CONFIG_VXFS_FS is not set
+# CONFIG_MINIX_FS is not set
+# CONFIG_OMFS_FS is not set
+# CONFIG_HPFS_FS is not set
+# CONFIG_QNX4FS_FS is not set
+# CONFIG_ROMFS_FS is not set
+# CONFIG_SYSV_FS is not set
+# CONFIG_UFS_FS is not set
+CONFIG_NETWORK_FILESYSTEMS=y
+CONFIG_NFS_FS=m
+CONFIG_NFS_V3=y
+# CONFIG_NFS_V3_ACL is not set
+# CONFIG_NFS_V4 is not set
+# CONFIG_NFSD is not set
+CONFIG_LOCKD=m
+CONFIG_LOCKD_V4=y
+CONFIG_NFS_COMMON=y
+CONFIG_SUNRPC=m
+# CONFIG_RPCSEC_GSS_KRB5 is not set
+# CONFIG_RPCSEC_GSS_SPKM3 is not set
+# CONFIG_SMB_FS is not set
+# CONFIG_CEPH_FS is not set
+# CONFIG_CIFS is not set
+# CONFIG_NCP_FS is not set
+# CONFIG_CODA_FS is not set
+# CONFIG_AFS_FS is not set
+
+#
+# Partition Types
+#
+# CONFIG_PARTITION_ADVANCED is not set
+CONFIG_MSDOS_PARTITION=y
+CONFIG_NLS=y
+CONFIG_NLS_DEFAULT="iso8859-1"
+CONFIG_NLS_CODEPAGE_437=y
+# CONFIG_NLS_CODEPAGE_737 is not set
+# CONFIG_NLS_CODEPAGE_775 is not set
+# CONFIG_NLS_CODEPAGE_850 is not set
+# CONFIG_NLS_CODEPAGE_852 is not set
+# CONFIG_NLS_CODEPAGE_855 is not set
+# CONFIG_NLS_CODEPAGE_857 is not set
+# CONFIG_NLS_CODEPAGE_860 is not set
+# CONFIG_NLS_CODEPAGE_861 is not set
+# CONFIG_NLS_CODEPAGE_862 is not set
+# CONFIG_NLS_CODEPAGE_863 is not set
+# CONFIG_NLS_CODEPAGE_864 is not set
+# CONFIG_NLS_CODEPAGE_865 is not set
+# CONFIG_NLS_CODEPAGE_866 is not set
+# CONFIG_NLS_CODEPAGE_869 is not set
+# CONFIG_NLS_CODEPAGE_936 is not set
+# CONFIG_NLS_CODEPAGE_950 is not set
+# CONFIG_NLS_CODEPAGE_932 is not set
+# CONFIG_NLS_CODEPAGE_949 is not set
+# CONFIG_NLS_CODEPAGE_874 is not set
+# CONFIG_NLS_ISO8859_8 is not set
+# CONFIG_NLS_CODEPAGE_1250 is not set
+# CONFIG_NLS_CODEPAGE_1251 is not set
+# CONFIG_NLS_ASCII is not set
+CONFIG_NLS_ISO8859_1=y
+# CONFIG_NLS_ISO8859_2 is not set
+# CONFIG_NLS_ISO8859_3 is not set
+# CONFIG_NLS_ISO8859_4 is not set
+# CONFIG_NLS_ISO8859_5 is not set
+# CONFIG_NLS_ISO8859_6 is not set
+# CONFIG_NLS_ISO8859_7 is not set
+# CONFIG_NLS_ISO8859_9 is not set
+# CONFIG_NLS_ISO8859_13 is not set
+# CONFIG_NLS_ISO8859_14 is not set
+# CONFIG_NLS_ISO8859_15 is not set
+# CONFIG_NLS_KOI8_R is not set
+# CONFIG_NLS_KOI8_U is not set
+# CONFIG_NLS_UTF8 is not set
+# CONFIG_DLM is not set
+
+#
+# Kernel hacking
+#
+# CONFIG_PRINTK_TIME is not set
+CONFIG_ENABLE_WARN_DEPRECATED=y
+CONFIG_ENABLE_MUST_CHECK=y
+CONFIG_FRAME_WARN=2048
+CONFIG_MAGIC_SYSRQ=y
+# CONFIG_STRIP_ASM_SYMS is not set
+# CONFIG_UNUSED_SYMBOLS is not set
+# CONFIG_DEBUG_FS is not set
+# CONFIG_HEADERS_CHECK is not set
+CONFIG_DEBUG_KERNEL=y
+# CONFIG_DEBUG_SHIRQ is not set
+CONFIG_DETECT_SOFTLOCKUP=y
+# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
+CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
+CONFIG_DETECT_HUNG_TASK=y
+# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
+CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
+CONFIG_SCHED_DEBUG=y
+# CONFIG_SCHEDSTATS is not set
+# CONFIG_TIMER_STATS is not set
+# CONFIG_DEBUG_OBJECTS is not set
+# CONFIG_SLUB_DEBUG_ON is not set
+# CONFIG_SLUB_STATS is not set
+# CONFIG_DEBUG_RT_MUTEXES is not set
+# CONFIG_RT_MUTEX_TESTER is not set
+# CONFIG_DEBUG_SPINLOCK is not set
+# CONFIG_DEBUG_MUTEXES is not set
+# CONFIG_DEBUG_LOCK_ALLOC is not set
+# CONFIG_PROVE_LOCKING is not set
+# CONFIG_LOCK_STAT is not set
+CONFIG_DEBUG_SPINLOCK_SLEEP=y
+# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
+CONFIG_STACKTRACE=y
+# CONFIG_DEBUG_KOBJECT is not set
+# CONFIG_DEBUG_HIGHMEM is not set
+CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_VM=y
+# CONFIG_DEBUG_WRITECOUNT is not set
+# CONFIG_DEBUG_MEMORY_INIT is not set
+# CONFIG_DEBUG_LIST is not set
+# CONFIG_DEBUG_SG is not set
+# CONFIG_DEBUG_NOTIFIERS is not set
+# CONFIG_DEBUG_CREDENTIALS is not set
+# CONFIG_RCU_TORTURE_TEST is not set
+# CONFIG_RCU_CPU_STALL_DETECTOR is not set
+# CONFIG_BACKTRACE_SELF_TEST is not set
+# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
+# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
+# CONFIG_FAULT_INJECTION is not set
+# CONFIG_SYSCTL_SYSCALL_CHECK is not set
+# CONFIG_PAGE_POISONING is not set
+CONFIG_RING_BUFFER=y
+CONFIG_RING_BUFFER_ALLOW_SWAP=y
+CONFIG_TRACING_SUPPORT=y
+CONFIG_FTRACE=y
+# CONFIG_IRQSOFF_TRACER is not set
+# CONFIG_SCHED_TRACER is not set
+# CONFIG_ENABLE_DEFAULT_TRACERS is not set
+# CONFIG_BOOT_TRACER is not set
+CONFIG_BRANCH_PROFILE_NONE=y
+# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
+# CONFIG_PROFILE_ALL_BRANCHES is not set
+# CONFIG_KMEMTRACE is not set
+# CONFIG_WORKQUEUE_TRACER is not set
+# CONFIG_BLK_DEV_IO_TRACE is not set
+# CONFIG_RING_BUFFER_BENCHMARK is not set
+# CONFIG_SAMPLES is not set
+CONFIG_EARLY_PRINTK=y
+CONFIG_DEBUG_STACKOVERFLOW=y
+# CONFIG_DEBUG_STACK_USAGE is not set
+CONFIG_DEBUG_EXTRA_FLAGS="-femit-struct-debug-baseonly"
+
+#
+# Security options
+#
+# CONFIG_KEYS is not set
+# CONFIG_SECURITY is not set
+# CONFIG_SECURITYFS is not set
+# CONFIG_DEFAULT_SECURITY_SELINUX is not set
+# CONFIG_DEFAULT_SECURITY_SMACK is not set
+# CONFIG_DEFAULT_SECURITY_TOMOYO is not set
+CONFIG_DEFAULT_SECURITY_DAC=y
+CONFIG_DEFAULT_SECURITY=""
+CONFIG_CRYPTO=y
+
+#
+# Crypto core or helper
+#
+# CONFIG_CRYPTO_FIPS is not set
+CONFIG_CRYPTO_ALGAPI=m
+CONFIG_CRYPTO_ALGAPI2=m
+CONFIG_CRYPTO_RNG=m
+CONFIG_CRYPTO_RNG2=m
+# CONFIG_CRYPTO_MANAGER is not set
+# CONFIG_CRYPTO_MANAGER2 is not set
+# CONFIG_CRYPTO_GF128MUL is not set
+# CONFIG_CRYPTO_NULL is not set
+# CONFIG_CRYPTO_PCRYPT is not set
+# CONFIG_CRYPTO_CRYPTD is not set
+# CONFIG_CRYPTO_AUTHENC is not set
+# CONFIG_CRYPTO_TEST is not set
+
+#
+# Authenticated Encryption with Associated Data
+#
+# CONFIG_CRYPTO_CCM is not set
+# CONFIG_CRYPTO_GCM is not set
+# CONFIG_CRYPTO_SEQIV is not set
+
+#
+# Block modes
+#
+# CONFIG_CRYPTO_CBC is not set
+# CONFIG_CRYPTO_CTR is not set
+# CONFIG_CRYPTO_CTS is not set
+# CONFIG_CRYPTO_ECB is not set
+# CONFIG_CRYPTO_LRW is not set
+# CONFIG_CRYPTO_PCBC is not set
+# CONFIG_CRYPTO_XTS is not set
+
+#
+# Hash modes
+#
+# CONFIG_CRYPTO_HMAC is not set
+# CONFIG_CRYPTO_XCBC is not set
+# CONFIG_CRYPTO_VMAC is not set
+
+#
+# Digest
+#
+# CONFIG_CRYPTO_CRC32C is not set
+# CONFIG_CRYPTO_GHASH is not set
+# CONFIG_CRYPTO_MD4 is not set
+# CONFIG_CRYPTO_MD5 is not set
+# CONFIG_CRYPTO_MICHAEL_MIC is not set
+# CONFIG_CRYPTO_RMD128 is not set
+# CONFIG_CRYPTO_RMD160 is not set
+# CONFIG_CRYPTO_RMD256 is not set
+# CONFIG_CRYPTO_RMD320 is not set
+# CONFIG_CRYPTO_SHA1 is not set
+# CONFIG_CRYPTO_SHA256 is not set
+# CONFIG_CRYPTO_SHA512 is not set
+# CONFIG_CRYPTO_TGR192 is not set
+# CONFIG_CRYPTO_WP512 is not set
+
+#
+# Ciphers
+#
+CONFIG_CRYPTO_AES=m
+# CONFIG_CRYPTO_ANUBIS is not set
+# CONFIG_CRYPTO_ARC4 is not set
+# CONFIG_CRYPTO_BLOWFISH is not set
+# CONFIG_CRYPTO_CAMELLIA is not set
+# CONFIG_CRYPTO_CAST5 is not set
+# CONFIG_CRYPTO_CAST6 is not set
+# CONFIG_CRYPTO_DES is not set
+# CONFIG_CRYPTO_FCRYPT is not set
+# CONFIG_CRYPTO_KHAZAD is not set
+# CONFIG_CRYPTO_SALSA20 is not set
+# CONFIG_CRYPTO_SEED is not set
+# CONFIG_CRYPTO_SERPENT is not set
+# CONFIG_CRYPTO_TEA is not set
+# CONFIG_CRYPTO_TWOFISH is not set
+
+#
+# Compression
+#
+# CONFIG_CRYPTO_DEFLATE is not set
+# CONFIG_CRYPTO_ZLIB is not set
+# CONFIG_CRYPTO_LZO is not set
+
+#
+# Random Number Generation
+#
+CONFIG_CRYPTO_ANSI_CPRNG=m
+CONFIG_CRYPTO_HW=y
+# CONFIG_CRYPTO_DEV_HIFN_795X is not set
+# CONFIG_BINARY_PRINTF is not set
+
+#
+# Library routines
+#
+CONFIG_BITREVERSE=y
+CONFIG_GENERIC_FIND_FIRST_BIT=y
+CONFIG_GENERIC_FIND_NEXT_BIT=y
+CONFIG_GENERIC_FIND_LAST_BIT=y
+# CONFIG_CRC_CCITT is not set
+# CONFIG_CRC16 is not set
+# CONFIG_CRC_T10DIF is not set
+# CONFIG_CRC_ITU_T is not set
+CONFIG_CRC32=y
+# CONFIG_CRC7 is not set
+# CONFIG_LIBCRC32C is not set
+CONFIG_ZLIB_INFLATE=y
+CONFIG_DECOMPRESS_GZIP=y
+CONFIG_HAS_IOMEM=y
+CONFIG_HAS_IOPORT=y
+CONFIG_HAS_DMA=y
+CONFIG_NLATTR=y

Chris Metcalf

unread,

May 28, 2010, 11:40:03 PM5/28/10

Signed-off-by: Chris Metcalf <cmet...@tilera.com>
---

drivers/char/Makefile | 1 +
drivers/char/hvc_tile.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 68 insertions(+), 0 deletions(-)
create mode 100644 drivers/char/hvc_tile.c

diff --git a/drivers/char/Makefile b/drivers/char/Makefile
index f957edf..f77099d 100644
--- a/drivers/char/Makefile
+++ b/drivers/char/Makefile
@@ -46,6 +46,7 @@ obj-$(CONFIG_RIO) += rio/ generic_serial.o
obj-$(CONFIG_HVC_CONSOLE) += hvc_vio.o hvsi.o
obj-$(CONFIG_HVC_ISERIES) += hvc_iseries.o
obj-$(CONFIG_HVC_RTAS) += hvc_rtas.o
+obj-$(CONFIG_HVC_TILE) += hvc_tile.o
obj-$(CONFIG_HVC_BEAT) += hvc_beat.o
obj-$(CONFIG_HVC_DRIVER) += hvc_console.o
obj-$(CONFIG_HVC_IRQ) += hvc_irq.o
diff --git a/drivers/char/hvc_tile.c b/drivers/char/hvc_tile.c
new file mode 100644
index 0000000..75715b3
--- /dev/null
+++ b/drivers/char/hvc_tile.c
@@ -0,0 +1,67 @@

+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ *

+ * Tilera TILE Processor hypervisor console
+ */
+
+#include <linux/console.h>
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/moduleparam.h>
+#include <linux/types.h>
+
+#include <hv/hypervisor.h>
+
+#include "hvc_console.h"
+
+static int hvc_tile_put_chars(uint32_t vt, const char *buf, int count)
+{
+ return hv_console_write((HV_VirtAddr)buf, count);
+}
+
+static int hvc_tile_get_chars(uint32_t vt, char *buf, int count)
+{
+ int i, c;
+
+ for (i = 0; i < count; ++i) {
+ c = hv_console_read_if_ready();
+ if (c < 0)
+ break;
+ buf[i] = c;
+ }
+
+ return i;
+}
+
+static const struct hv_ops hvc_tile_get_put_ops = {
+ .get_chars = hvc_tile_get_chars,
+ .put_chars = hvc_tile_put_chars,
+};
+
+static int __init hvc_tile_console_init(void)
+{
+ extern void disable_early_printk(void);
+ hvc_instantiate(0, 0, &hvc_tile_get_put_ops);
+ add_preferred_console("hvc", 0, NULL);
+ disable_early_printk();
+ return 0;
+}
+console_initcall(hvc_tile_console_init);
+
+static int __init hvc_tile_init(void)
+{
+ hvc_alloc(0, 0, &hvc_tile_get_put_ops, 128);
+ return 0;
+}
+device_initcall(hvc_tile_init);

Chris Metcalf

unread,

May 28, 2010, 11:40:03 PM5/28/10

On May 20 I wrote:
> At Tilera we have been running Linux 2.6.26 on our architecture for a
> while and distributing the sources to our customers. We just sync'ed up
> our sources to 2.6.34 and would like to return it to the community more
> widely, so I'm hoping to take advantage of the merge window for 2.6.35
> to integrate support for our architecture.

Thanks to some much-appreciated volunteer work reviewing that initial
patch, I now have a revised set of patches which I would like to offer
for submission to the mainline.

The largest chunk of work has been adopting the "generic" syscall ABI in
asm-generic. It required some work extending it to a new architecture,
particularly to support both 32- and 64-bit chips, and to support a
32-bit compat layer on the 64-bit architecture. Many thanks to Arnd
Bergmann for working with me on this.

In addition, I have wised up and removed most of the driver material
from this patch. They are not immediately essential for booting up
Tilera Linux, and I will plan to feed the drivers back over time after
this initial baseline patch has been accepted.

The patches are in eight commits, the first being the changes to the
"generic" ABI. The remaining seven patches are the tile architecture
code, broken up into somewhat digestible chunks.

Many thanks to the folks who took the time to review parts of
the initial diffs:

Arnd Bergmann <ar...@arndb.de>
Barry Song <21c...@gmail.com>
Sam Ravnborg <s...@ravnborg.org>
Thomas Gleixner <tg...@linutronix.de>
Marc Gauthier <ma...@tensilica.com>
Jamie Lokier <ja...@shareable.org>
and Linus for reviewing my one-line lowmem_page_address() patch :-)

(Sorry for the double-posting to linux-arch and Linus; I included
the Cc's but dropped the "To" when I ran git format-patch the first time.)

Chris Metcalf (8):
Fix up the "generic" unistd.h ABI to be more useful.
arch/tile: infrastructure and configuration-related files.
arch/tile: header files for the Tile architecture.
arch/tile: core kernel/ code.
arch/tile: the kernel/tile-desc_32.c file.
arch/tile: the mm/ directory.
arch/tile: lib/ directory.
arch/tile: hypervisor console driver.

MAINTAINERS | 6 +
arch/tile/Kbuild | 3 +
arch/tile/Kconfig | 360 +

arch/tile/lib/Makefile | 16 +
arch/tile/lib/__invalidate_icache.S | 106 +

arch/tile/mm/Makefile | 9 +
arch/tile/mm/elf.c | 164 +

drivers/char/Makefile | 1 +
drivers/char/hvc_tile.c | 67 +

include/asm-generic/unistd.h | 26 +-
include/linux/syscalls.h | 4 +
204 files changed, 49504 insertions(+), 6 deletions(-)

create mode 100644 arch/tile/Kbuild
create mode 100644 arch/tile/Kconfig
create mode 100644 arch/tile/Kconfig.debug
create mode 100644 arch/tile/Makefile
create mode 100644 arch/tile/configs/tile_defconfig

create mode 100644 arch/tile/include/arch/abi.h
create mode 100644 arch/tile/include/arch/chip.h
create mode 100644 arch/tile/include/arch/chip_tile64.h
create mode 100644 arch/tile/include/arch/chip_tilepro.h
create mode 100644 arch/tile/include/arch/interrupts.h
create mode 100644 arch/tile/include/arch/interrupts_32.h
create mode 100644 arch/tile/include/arch/sim_def.h
create mode 100644 arch/tile/include/arch/spr_def.h
create mode 100644 arch/tile/include/arch/spr_def_32.h
create mode 100644 arch/tile/include/asm/Kbuild
create mode 100644 arch/tile/include/asm/asm-offsets.h
create mode 100644 arch/tile/include/asm/atomic.h
create mode 100644 arch/tile/include/asm/atomic_32.h
create mode 100644 arch/tile/include/asm/auxvec.h
create mode 100644 arch/tile/include/asm/backtrace.h
create mode 100644 arch/tile/include/asm/bitops.h
create mode 100644 arch/tile/include/asm/bitops_32.h
create mode 100644 arch/tile/include/asm/bitsperlong.h
create mode 100644 arch/tile/include/asm/bug.h
create mode 100644 arch/tile/include/asm/bugs.h
create mode 100644 arch/tile/include/asm/byteorder.h
create mode 100644 arch/tile/include/asm/cache.h
create mode 100644 arch/tile/include/asm/cacheflush.h
create mode 100644 arch/tile/include/asm/checksum.h
create mode 100644 arch/tile/include/asm/compat.h
create mode 100644 arch/tile/include/asm/cputime.h
create mode 100644 arch/tile/include/asm/current.h
create mode 100644 arch/tile/include/asm/delay.h
create mode 100644 arch/tile/include/asm/device.h
create mode 100644 arch/tile/include/asm/div64.h
create mode 100644 arch/tile/include/asm/dma-mapping.h
create mode 100644 arch/tile/include/asm/dma.h
create mode 100644 arch/tile/include/asm/elf.h
create mode 100644 arch/tile/include/asm/emergency-restart.h
create mode 100644 arch/tile/include/asm/errno.h
create mode 100644 arch/tile/include/asm/fcntl.h
create mode 100644 arch/tile/include/asm/fixmap.h
create mode 100644 arch/tile/include/asm/ftrace.h
create mode 100644 arch/tile/include/asm/futex.h
create mode 100644 arch/tile/include/asm/hardirq.h
create mode 100644 arch/tile/include/asm/highmem.h
create mode 100644 arch/tile/include/asm/homecache.h
create mode 100644 arch/tile/include/asm/hugetlb.h
create mode 100644 arch/tile/include/asm/hv_driver.h
create mode 100644 arch/tile/include/asm/hw_irq.h
create mode 100644 arch/tile/include/asm/ide.h
create mode 100644 arch/tile/include/asm/io.h
create mode 100644 arch/tile/include/asm/ioctl.h
create mode 100644 arch/tile/include/asm/ioctls.h
create mode 100644 arch/tile/include/asm/ipc.h
create mode 100644 arch/tile/include/asm/ipcbuf.h
create mode 100644 arch/tile/include/asm/irq.h
create mode 100644 arch/tile/include/asm/irq_regs.h
create mode 100644 arch/tile/include/asm/irqflags.h
create mode 100644 arch/tile/include/asm/kdebug.h
create mode 100644 arch/tile/include/asm/kexec.h
create mode 100644 arch/tile/include/asm/kmap_types.h
create mode 100644 arch/tile/include/asm/linkage.h
create mode 100644 arch/tile/include/asm/local.h
create mode 100644 arch/tile/include/asm/memprof.h
create mode 100644 arch/tile/include/asm/mman.h
create mode 100644 arch/tile/include/asm/mmu.h
create mode 100644 arch/tile/include/asm/mmu_context.h
create mode 100644 arch/tile/include/asm/mmzone.h
create mode 100644 arch/tile/include/asm/module.h
create mode 100644 arch/tile/include/asm/msgbuf.h
create mode 100644 arch/tile/include/asm/mutex.h
create mode 100644 arch/tile/include/asm/opcode-tile.h
create mode 100644 arch/tile/include/asm/opcode-tile_32.h
create mode 100644 arch/tile/include/asm/opcode-tile_64.h
create mode 100644 arch/tile/include/asm/opcode_constants.h
create mode 100644 arch/tile/include/asm/opcode_constants_32.h
create mode 100644 arch/tile/include/asm/opcode_constants_64.h
create mode 100644 arch/tile/include/asm/page.h
create mode 100644 arch/tile/include/asm/param.h
create mode 100644 arch/tile/include/asm/pci-bridge.h
create mode 100644 arch/tile/include/asm/pci.h
create mode 100644 arch/tile/include/asm/percpu.h
create mode 100644 arch/tile/include/asm/pgalloc.h
create mode 100644 arch/tile/include/asm/pgtable.h
create mode 100644 arch/tile/include/asm/pgtable_32.h
create mode 100644 arch/tile/include/asm/poll.h
create mode 100644 arch/tile/include/asm/posix_types.h
create mode 100644 arch/tile/include/asm/processor.h
create mode 100644 arch/tile/include/asm/ptrace.h
create mode 100644 arch/tile/include/asm/resource.h
create mode 100644 arch/tile/include/asm/scatterlist.h
create mode 100644 arch/tile/include/asm/sections.h
create mode 100644 arch/tile/include/asm/sembuf.h
create mode 100644 arch/tile/include/asm/setup.h
create mode 100644 arch/tile/include/asm/shmbuf.h
create mode 100644 arch/tile/include/asm/shmparam.h
create mode 100644 arch/tile/include/asm/sigcontext.h
create mode 100644 arch/tile/include/asm/sigframe.h
create mode 100644 arch/tile/include/asm/siginfo.h
create mode 100644 arch/tile/include/asm/signal.h
create mode 100644 arch/tile/include/asm/smp.h
create mode 100644 arch/tile/include/asm/socket.h
create mode 100644 arch/tile/include/asm/sockios.h
create mode 100644 arch/tile/include/asm/spinlock.h
create mode 100644 arch/tile/include/asm/spinlock_32.h
create mode 100644 arch/tile/include/asm/spinlock_types.h
create mode 100644 arch/tile/include/asm/stack.h
create mode 100644 arch/tile/include/asm/stat.h
create mode 100644 arch/tile/include/asm/statfs.h
create mode 100644 arch/tile/include/asm/string.h
create mode 100644 arch/tile/include/asm/swab.h
create mode 100644 arch/tile/include/asm/syscall.h
create mode 100644 arch/tile/include/asm/syscalls.h
create mode 100644 arch/tile/include/asm/system.h
create mode 100644 arch/tile/include/asm/termbits.h
create mode 100644 arch/tile/include/asm/termios.h
create mode 100644 arch/tile/include/asm/thread_info.h
create mode 100644 arch/tile/include/asm/timex.h
create mode 100644 arch/tile/include/asm/tlb.h
create mode 100644 arch/tile/include/asm/tlbflush.h
create mode 100644 arch/tile/include/asm/topology.h
create mode 100644 arch/tile/include/asm/traps.h
create mode 100644 arch/tile/include/asm/types.h
create mode 100644 arch/tile/include/asm/uaccess.h
create mode 100644 arch/tile/include/asm/ucontext.h
create mode 100644 arch/tile/include/asm/unaligned.h
create mode 100644 arch/tile/include/asm/unistd.h
create mode 100644 arch/tile/include/asm/user.h
create mode 100644 arch/tile/include/asm/xor.h
create mode 100644 arch/tile/include/hv/drv_pcie_rc_intf.h
create mode 100644 arch/tile/include/hv/hypervisor.h
create mode 100644 arch/tile/include/hv/syscall_public.h
create mode 100644 arch/tile/kernel/Makefile
create mode 100644 arch/tile/kernel/asm-offsets.c
create mode 100644 arch/tile/kernel/backtrace.c
create mode 100644 arch/tile/kernel/compat.c
create mode 100644 arch/tile/kernel/compat_signal.c
create mode 100644 arch/tile/kernel/early_printk.c
create mode 100644 arch/tile/kernel/entry.S
create mode 100644 arch/tile/kernel/head_32.S
create mode 100644 arch/tile/kernel/hvglue.lds
create mode 100644 arch/tile/kernel/init_task.c
create mode 100644 arch/tile/kernel/intvec_32.S
create mode 100644 arch/tile/kernel/irq.c
create mode 100644 arch/tile/kernel/machine_kexec.c
create mode 100644 arch/tile/kernel/messaging.c
create mode 100644 arch/tile/kernel/module.c
create mode 100644 arch/tile/kernel/pci-dma.c
create mode 100644 arch/tile/kernel/proc.c
create mode 100644 arch/tile/kernel/process.c
create mode 100644 arch/tile/kernel/ptrace.c
create mode 100644 arch/tile/kernel/reboot.c
create mode 100644 arch/tile/kernel/regs_32.S
create mode 100644 arch/tile/kernel/relocate_kernel.S
create mode 100644 arch/tile/kernel/setup.c
create mode 100644 arch/tile/kernel/signal.c
create mode 100644 arch/tile/kernel/single_step.c
create mode 100644 arch/tile/kernel/smp.c
create mode 100644 arch/tile/kernel/smpboot.c
create mode 100644 arch/tile/kernel/stack.c
create mode 100644 arch/tile/kernel/sys.c
create mode 100644 arch/tile/kernel/tile-desc_32.c
create mode 100644 arch/tile/kernel/time.c
create mode 100644 arch/tile/kernel/tlb.c
create mode 100644 arch/tile/kernel/traps.c
create mode 100644 arch/tile/kernel/vmlinux.lds.S

create mode 100644 arch/tile/lib/Makefile
create mode 100644 arch/tile/lib/__invalidate_icache.S
create mode 100644 arch/tile/lib/atomic_32.c
create mode 100644 arch/tile/lib/atomic_asm_32.S
create mode 100644 arch/tile/lib/checksum.c
create mode 100644 arch/tile/lib/cpumask.c
create mode 100644 arch/tile/lib/delay.c
create mode 100644 arch/tile/lib/exports.c
create mode 100644 arch/tile/lib/mb_incoherent.S
create mode 100644 arch/tile/lib/memchr_32.c
create mode 100644 arch/tile/lib/memcpy_32.S
create mode 100644 arch/tile/lib/memcpy_tile64.c
create mode 100644 arch/tile/lib/memmove_32.c
create mode 100644 arch/tile/lib/memset_32.c
create mode 100644 arch/tile/lib/spinlock_32.c
create mode 100644 arch/tile/lib/spinlock_common.h
create mode 100644 arch/tile/lib/strchr_32.c
create mode 100644 arch/tile/lib/strlen_32.c
create mode 100644 arch/tile/lib/uaccess.c
create mode 100644 arch/tile/lib/usercopy_32.S

create mode 100644 arch/tile/mm/Makefile
create mode 100644 arch/tile/mm/elf.c
create mode 100644 arch/tile/mm/extable.c
create mode 100644 arch/tile/mm/fault.c
create mode 100644 arch/tile/mm/highmem.c
create mode 100644 arch/tile/mm/homecache.c
create mode 100644 arch/tile/mm/hugetlbpage.c
create mode 100644 arch/tile/mm/init.c
create mode 100644 arch/tile/mm/migrate.h
create mode 100644 arch/tile/mm/migrate_32.S
create mode 100644 arch/tile/mm/mmap.c
create mode 100644 arch/tile/mm/pgtable.c

create mode 100644 drivers/char/hvc_tile.c

Chris Metcalf

unread,

May 28, 2010, 11:40:03 PM5/28/10

Reserve 16 "architecture-specific" syscall numbers starting at 244.

Allow use of the sys_sync_file_range2() API with the generic unistd.h
by specifying __ARCH_WANT_SYNC_FILE_RANGE2 before including it.

Allow using the generic unistd.h to create the "compat" syscall table
by specifying __SYSCALL_COMPAT before including it.

Use sys_fadvise64_64 for __NR3264_fadvise64 in both 32- and 64-bit mode.

Request the appropriate __ARCH_WANT_COMPAT_SYS_xxx values when
some deprecated syscall modes are selected.

As part of this change to fix up the syscalls, also provide a couple
of missing signal-related syscall prototypes in <linux/syscalls.h>.

Signed-off-by: Chris Metcalf <cmet...@tilera.com>
---

include/asm-generic/unistd.h | 26 ++++++++++++++++++++------
include/linux/syscalls.h | 4 ++++
2 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/unistd.h b/include/asm-generic/unistd.h
index 6a0b30f..30218b4 100644
--- a/include/asm-generic/unistd.h
+++ b/include/asm-generic/unistd.h
@@ -18,7 +18,7 @@
#define __SYSCALL(x, y)
#endif

-#if __BITS_PER_LONG == 32
+#if __BITS_PER_LONG == 32 || defined(__SYSCALL_COMPAT)
#define __SC_3264(_nr, _32, _64) __SYSCALL(_nr, _32)
#else
#define __SC_3264(_nr, _32, _64) __SYSCALL(_nr, _64)
@@ -241,8 +241,13 @@ __SYSCALL(__NR_sync, sys_sync)
__SYSCALL(__NR_fsync, sys_fsync)
#define __NR_fdatasync 83
__SYSCALL(__NR_fdatasync, sys_fdatasync)
+#ifdef __ARCH_WANT_SYNC_FILE_RANGE2
+#define __NR_sync_file_range2 84
+__SYSCALL(__NR_sync_file_range2, sys_sync_file_range2)
+#else
#define __NR_sync_file_range 84
-__SYSCALL(__NR_sync_file_range, sys_sync_file_range) /* .long sys_sync_file_range2, */
+__SYSCALL(__NR_sync_file_range, sys_sync_file_range)
+#endif

/* fs/timerfd.c */
#define __NR_timerfd_create 85
@@ -580,7 +585,7 @@ __SYSCALL(__NR_execve, sys_execve) /* .long sys_execve_wrapper */
__SC_3264(__NR3264_mmap, sys_mmap2, sys_mmap)
/* mm/fadvise.c */
#define __NR3264_fadvise64 223
-__SC_3264(__NR3264_fadvise64, sys_fadvise64_64, sys_fadvise64)
+__SYSCALL(__NR3264_fadvise64, sys_fadvise64_64)

/* mm/, CONFIG_MMU only */
#ifndef __ARCH_NOMMU
@@ -627,8 +632,14 @@ __SYSCALL(__NR_accept4, sys_accept4)
#define __NR_recvmmsg 243
__SYSCALL(__NR_recvmmsg, sys_recvmmsg)

+/*
+ * Architectures may provide up to 16 syscalls of their own
+ * starting with this value.
+ */
+#define __NR_arch_specific_syscall 244
+
#undef __NR_syscalls
-#define __NR_syscalls 244
+#define __NR_syscalls 260

/*
* All syscalls below here should go away really,
@@ -694,7 +705,8 @@ __SYSCALL(__NR_signalfd, sys_signalfd)
#define __NR_syscalls (__NR_signalfd+1)
#endif /* __ARCH_WANT_SYSCALL_NO_FLAGS */

-#if __BITS_PER_LONG == 32 && defined(__ARCH_WANT_SYSCALL_OFF_T)
+#if (__BITS_PER_LONG == 32 || defined(__SYSCALL_COMPAT)) && \
+ defined(__ARCH_WANT_SYSCALL_OFF_T)
#define __NR_sendfile 1046
__SYSCALL(__NR_sendfile, sys_sendfile)
#define __NR_ftruncate 1047
@@ -740,6 +752,7 @@ __SYSCALL(__NR_getpgrp, sys_getpgrp)
__SYSCALL(__NR_pause, sys_pause)
#define __NR_time 1062
#define __ARCH_WANT_SYS_TIME
+#define __ARCH_WANT_COMPAT_SYS_TIME
__SYSCALL(__NR_time, sys_time)
#define __NR_utime 1063
#define __ARCH_WANT_SYS_UTIME
@@ -801,7 +814,7 @@ __SYSCALL(__NR_fork, sys_ni_syscall)
* Here we map the numbers so that both versions
* use the same syscall table layout.
*/
-#if __BITS_PER_LONG == 64
+#if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT)
#define __NR_fcntl __NR3264_fcntl
#define __NR_statfs __NR3264_statfs
#define __NR_fstatfs __NR3264_fstatfs
@@ -848,6 +861,7 @@ __SYSCALL(__NR_fork, sys_ni_syscall)
#endif
#define __ARCH_WANT_SYS_RT_SIGACTION
#define __ARCH_WANT_SYS_RT_SIGSUSPEND
+#define __ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND

/*
* "Conditional" syscalls
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 057929b..d39ddb3 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -387,9 +387,13 @@ asmlinkage long sys_init_module(void __user *umod, unsigned long len,
asmlinkage long sys_delete_module(const char __user *name_user,
unsigned int flags);

+asmlinkage long sys_rt_sigaction(int sig, const struct sigaction __user *act,
+ struct sigaction __user *oact,
+ size_t sigsetsize);
asmlinkage long sys_rt_sigprocmask(int how, sigset_t __user *set,
sigset_t __user *oset, size_t sigsetsize);
asmlinkage long sys_rt_sigpending(sigset_t __user *set, size_t sigsetsize);
+asmlinkage long sys_rt_sigsuspend(sigset_t __user *unewset, size_t sigsetsize);
asmlinkage long sys_rt_sigtimedwait(const sigset_t __user *uthese,
siginfo_t __user *uinfo,
const struct timespec __user *uts,
--
1.6.5.2

Arnd Bergmann

unread,

May 29, 2010, 7:30:02 AM5/29/10

On Saturday 29 May 2010, Chris Metcalf wrote:
> On May 20 I wrote:
> > At Tilera we have been running Linux 2.6.26 on our architecture for a
> > while and distributing the sources to our customers. We just sync'ed up
> > our sources to 2.6.34 and would like to return it to the community more
> > widely, so I'm hoping to take advantage of the merge window for 2.6.35
> > to integrate support for our architecture.
>
> Thanks to some much-appreciated volunteer work reviewing that initial
> patch, I now have a revised set of patches which I would like to offer
> for submission to the mainline.

It seems that you have addressed all my review comments and all the
other comments that I have seen in the best possible ways.
All the controversial parts from the original code are either corrected
or (in case of nonessential drivers) deferred to a future review.
I did not expect this to be possibly in such a short time, and it
continues to amaze me.

Consequently, I fully support this series to go into 2.6.35.
To the entire series:

Acked-by: Arnd Bergmann <ar...@arndb.de>

FUJITA Tomonori

unread,

May 30, 2010, 11:00:02 PM5/30/10

On Fri, 28 May 2010 23:10:39 -0400
Chris Metcalf <cmet...@tilera.com> wrote:

> This omits just the tile-desc_32.c file, which is large enough to
> merit being in a separate commit.
>
> Signed-off-by: Chris Metcalf <cmet...@tilera.com>

(snip)

> diff --git a/arch/tile/kernel/pci-dma.c b/arch/tile/kernel/pci-dma.c
> new file mode 100644
> index 0000000..b1ddc80
> --- /dev/null
> +++ b/arch/tile/kernel/pci-dma.c
> @@ -0,0 +1,231 @@

> +/*
> + * Copyright 2010 Tilera Corporation. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation, version 2.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT. See the GNU General Public License for
> + * more details.

> + */
> +
> +#include <linux/mm.h>

> +#include <linux/dma-mapping.h>
> +#include <linux/vmalloc.h>

> +#include <asm/tlbflush.h>
> +#include <asm/homecache.h>
> +

> +/* Generic DMA mapping functions: */
> +
> +/*
> + * Allocate what Linux calls "coherent" memory, which for us just
> + * means uncached.
> + */
> +void *dma_alloc_coherent(struct device *dev,
> + size_t size,
> + dma_addr_t *dma_handle,
> + gfp_t gfp)
> +{
> + int order;
> + struct page *pg;
> +
> + gfp |= GFP_KERNEL | __GFP_ZERO;
> +
> + order = get_order(size);
> + /* alloc on node 0 so the paddr fits in a u32 */

What "the paddr fits in a u32" means? If dev->coherent_dma_mask is
larger than DMA_BIT_MASK(32), you can return an address above it?

> + pg = homecache_alloc_pages_node(0, gfp, order, PAGE_HOME_UNCACHED);
> + if (pg == NULL)
> + return NULL;
> +
> + *dma_handle = page_to_pa(pg);
> + return (void *) page_address(pg);
> +}
> +EXPORT_SYMBOL(dma_alloc_coherent);
> +
> +/*
> + * Free memory that was allocated with dma_alloc_coherent.
> + */
> +void dma_free_coherent(struct device *dev, size_t size,
> + void *vaddr, dma_addr_t dma_handle)
> +{
> + homecache_free_pages((unsigned long)vaddr, get_order(size));
> +}
> +EXPORT_SYMBOL(dma_free_coherent);
> +
> +/*
> + * The map routines "map" the specified address range for DMA
> + * accesses. The memory belongs to the device after this call is
> + * issued, until it is unmapped with dma_unmap_single.
> + *
> + * We don't need to do any mapping, we just flush the address range
> + * out of the cache and return a DMA address.
> + *
> + * The unmap routines do whatever is necessary before the processor
> + * accesses the memory again, and must be called before the driver
> + * touches the memory. We can get away with a cache invalidate if we
> + * can count on nothing having been touched.
> + */
> +
> +
> +/*
> + * dma_map_single can be passed any memory address, and there appear
> + * to be no alignment constraints.
> + *
> + * There is a chance that the start of the buffer will share a cache
> + * line with some other data that has been touched in the meantime.
> + */
> +dma_addr_t dma_map_single(struct device *dev, void *ptr, size_t size,
> + enum dma_data_direction direction)

> +{
> + struct page *page;

> + dma_addr_t dma_addr;
> + int thispage;
> +
> + BUG_ON(!valid_dma_direction(direction));
> + WARN_ON(size == 0);
> +
> + dma_addr = __pa(ptr);
> +
> + /* We might have been handed a buffer that wraps a page boundary */
> + while ((int)size > 0) {
> + /* The amount to flush that's on this page */
> + thispage = PAGE_SIZE - ((unsigned long)ptr & (PAGE_SIZE - 1));
> + thispage = min((int)thispage, (int)size);
> + /* Is this valid for any page we could be handed? */
> + page = pfn_to_page(kaddr_to_pfn(ptr));
> + homecache_flush_cache(page, 0);
> + ptr += thispage;
> + size -= thispage;
> + }
> +
> + return dma_addr;
> +}
> +EXPORT_SYMBOL(dma_map_single);
> +
> +void dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
> + enum dma_data_direction direction)
> +{
> + BUG_ON(!valid_dma_direction(direction));
> +}
> +EXPORT_SYMBOL(dma_unmap_single);
> +
> +int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
> + enum dma_data_direction direction)

> +{
> + int i;
> +

> + BUG_ON(!valid_dma_direction(direction));
> +
> + WARN_ON(nents == 0 || sg[0].length == 0);
> +
> + for (i = 0; i < nents; i++) {
> + struct page *page;
> + sg[i].dma_address = sg_phys(sg + i);
> + page = pfn_to_page(sg[i].dma_address >> PAGE_SHIFT);
> + homecache_flush_cache(page, 0);
> + }

Can you use for_each_sg()?

> + return nents;
> +}
> +EXPORT_SYMBOL(dma_map_sg);
> +
> +void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nhwentries,
> + enum dma_data_direction direction)
> +{
> + BUG_ON(!valid_dma_direction(direction));
> +}
> +EXPORT_SYMBOL(dma_unmap_sg);
> +
> +dma_addr_t dma_map_page(struct device *dev, struct page *page,
> + unsigned long offset, size_t size,
> + enum dma_data_direction direction)
> +{
> + BUG_ON(!valid_dma_direction(direction));
> +
> + homecache_flush_cache(page, 0);
> +
> + return page_to_pa(page) + offset;
> +}
> +EXPORT_SYMBOL(dma_map_page);
> +
> +void dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size,
> + enum dma_data_direction direction)
> +{
> + BUG_ON(!valid_dma_direction(direction));
> +}
> +EXPORT_SYMBOL(dma_unmap_page);
> +
> +void dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle,
> + size_t size, enum dma_data_direction direction)
> +{
> + BUG_ON(!valid_dma_direction(direction));
> +}
> +EXPORT_SYMBOL(dma_sync_single_for_cpu);
> +
> +void dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle,
> + size_t size, enum dma_data_direction direction)
> +{
> + unsigned long start = PFN_DOWN(dma_handle);
> + unsigned long end = PFN_DOWN(dma_handle + size - 1);
> + unsigned long i;
> +
> + BUG_ON(!valid_dma_direction(direction));
> + for (i = start; i <= end; ++i)
> + homecache_flush_cache(pfn_to_page(i), 0);
> +}
> +EXPORT_SYMBOL(dma_sync_single_for_device);
> +
> +void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems,
> + enum dma_data_direction direction)
> +{
> + BUG_ON(!valid_dma_direction(direction));
> + WARN_ON(nelems == 0 || sg[0].length == 0);
> +}
> +EXPORT_SYMBOL(dma_sync_sg_for_cpu);
> +
> +/*
> + * Flush and invalidate cache for scatterlist.
> + */
> +void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
> + int nelems, enum dma_data_direction direction)

> +{
> + int i;
> +

> + BUG_ON(!valid_dma_direction(direction));
> + WARN_ON(nelems == 0 || sg[0].length == 0);
> +
> + for (i = 0; i < nelems; i++)
> + dma_sync_single_for_device(dev, sg[i].dma_address,
> + sg[i].dma_length, direction);

ditto.

FUJITA Tomonori

unread,

May 30, 2010, 11:00:01 PM5/30/10

On Fri, 28 May 2010 23:10:07 -0400
Chris Metcalf <cmet...@tilera.com> wrote:

> This includes the relevant Linux headers in asm/; the low-level
> low-level "Tile architecture" headers in arch/, which are
> shared with the hypervisor, etc., and are build-system agnostic;
> and the relevant hypervisor headers in hv/.
>
> Signed-off-by: Chris Metcalf <cmet...@tilera.com>

(snip)

> +++ b/arch/tile/include/asm/dma-mapping.h
> @@ -0,0 +1,106 @@

> +#ifndef _ASM_TILE_DMA_MAPPING_H
> +#define _ASM_TILE_DMA_MAPPING_H
> +
> +/*
> + * IOMMU interface. See Documentation/PCI/PCI-DMA-mapping.txt and
> + * Documentation/DMA-API.txt for documentation.
> + */

Documentation/PCI/PCI-DMA-mapping.txt was renamed to
Documentation/DMA-API-HOWTO. Anyway, I recommend removing the useless
comment.

> +#include <linux/mm.h>
> +#include <linux/scatterlist.h>
> +#include <linux/cache.h>
> +#include <linux/io.h>
> +
> +/*
> + * Note that on x86 and powerpc, there is a "struct dma_mapping_ops"
> + * that is used for all the DMA operations. For now, we don't have an
> + * equivalent on tile, because we only have a single way of doing DMA.
> + */

I think that using "struct dma_mapping_ops" is a good idea even if you
support the single dma ops. You can avoid lots of duplicated code.

(snip)

> --- /dev/null
> --- /dev/null
> +++ b/arch/tile/include/asm/scatterlist.h
> @@ -0,0 +1 @@
> +#include <asm-generic/scatterlist.h>

Did you compile this patchset with 2.6.35-rc1? I think that you need
to define ISA_DMA_THRESHOLD here.

Paul Mundt

unread,

May 31, 2010, 3:50:03 AM5/31/10

On Fri, May 28, 2010 at 11:09:12PM -0400, Chris Metcalf wrote:
> +config ZONE_DMA
> + def_bool y
> +

Do you really want ZONE_DMA? Looking through the code it seems like you
are just using this in place of ZONE_NORMAL instead of for dealing with
any specific DMA limitations.

> +config CC_OPTIMIZE_FOR_SIZE
> + def_bool y
> +

This is a duplicate of the init/Kconfig entry. If you always want this
enabled you can select it.

> +config CLOCKSOURCE_WATCHDOG
> + def_bool y
> +

Are you also sure that you want this? It doesn't seem like you have any
of the clocksource stability issues that x86 does, so it's not obvious
why you are enabling this.

> +config ARCH_DISCONTIGMEM_ENABLE
> + def_bool y
> +
> +config ARCH_DISCONTIGMEM_DEFAULT
> + def_bool y
> +

Have you considered sparsemem instead?

> +# SMP is required for Tilera Linux.
> +config SMP
> + def_bool y
> +

Forcing on SMP is fairly unusual, you do not support booting UP kernels
at all?

> +config SERIAL_CONSOLE
> + def_bool y
> +

This seems unused and looks like it was just copied over from some other
architecture?

> +config HVC_TILE
> + select HVC_DRIVER
> + def_bool y
> +
> +config TILE
> + def_bool y
> + select GENERIC_FIND_FIRST_BIT
> + select GENERIC_FIND_NEXT_BIT
> + select RESOURCES_64BIT
> + select USE_GENERIC_SMP_HELPERS
> +

RESOURCES_64BIT is more legacy stuff, you don't need this anymore by
virtue of the 64-bit phys_addr_t that you're already forcing on.

> +menu "Bus options"
> +
> +config NO_IOMEM
> + bool
> + def_bool !PCI
> +

Have you inverted the logic here? Judging from your I/O routines it's the
PIO stuff you want disabled, not MMIO. As such, it's NO_IOPORT that you
want. Some of the PCI drivers will still use inb/outb and friends for PCI
IO space so disabling it for the !PCI case is fine.

Chris Metcalf

unread,

Jun 3, 2010, 2:00:02 PM6/3/10

On 5/31/2010 3:47 AM, Paul Mundt wrote:
> On Fri, May 28, 2010 at 11:09:12PM -0400, Chris Metcalf wrote:
>
>> +config ZONE_DMA
>> + def_bool y
>> +
>>
> Do you really want ZONE_DMA? Looking through the code it seems like you
> are just using this in place of ZONE_NORMAL instead of for dealing with
> any specific DMA limitations.
>

Yes, this dates back to 2.6.18 or so, when I think you had to have it.
In any case I've switched it over to ZONE_NORMAL throughout our code
now, and it seems fine. Thanks.

>> +config CLOCKSOURCE_WATCHDOG
>> + def_bool y
>> +
>>
> Are you also sure that you want this? It doesn't seem like you have any
> of the clocksource stability issues that x86 does, so it's not obvious
> why you are enabling this.
>

Ah, good catch. Thanks; I'm not sure where this config option came
from, but it's gone now.

>> +config ARCH_DISCONTIGMEM_ENABLE
>> + def_bool y
>> +
>> +config ARCH_DISCONTIGMEM_DEFAULT
>> + def_bool y
>> +
>>
> Have you considered sparsemem instead?
>

I looked at both of them a while ago (2.6.18 or 2.6.26, not sure which),
and at the time it seemed easier to do discontig. I vaguely recall
there was some awkwardness with our architecture when I tried to figure
out the sparsemem route. I filed a tracking bug on this issue
internally so we can revisit it at some point.

>> +# SMP is required for Tilera Linux.
>> +config SMP
>> + def_bool y
>> +
>>
> Forcing on SMP is fairly unusual, you do not support booting UP kernels
> at all?
>

We've written the code to try to support UP, but the couple of times
we've tried to build with !SMP, there have been some subtle bugs.
There's no reason we'd ever sell a chip with a single cpu on it (that I
can see, anyway), so it's not very pressing to investigate failures in
this mode, so it's disabled.

>> +config SERIAL_CONSOLE
>> + def_bool y
>> +
>>
> This seems unused and looks like it was just copied over from some other
> architecture?
>

Thanks, good catch.

>> +menu "Bus options"
>> +
>> +config NO_IOMEM
>> + bool
>> + def_bool !PCI
>> +
>>
> Have you inverted the logic here? Judging from your I/O routines it's the
> PIO stuff you want disabled, not MMIO. As such, it's NO_IOPORT that you
> want. Some of the PCI drivers will still use inb/outb and friends for PCI
> IO space so disabling it for the !PCI case is fine.
>

If we don't have PCI, we don't have IOMEM, since our 32-bit chips don't
support any kind of direct MMIO. I would also have set NO_IOPORT
unconditionally, but it turns out some generic code (e.g. some IDE
stuff) breaks in this case. At some point I'll investigate this in more
detail, though probably only after we convert our GPIO-based ATA driver
to not use IDE at all.

Thanks for your feedback! I'll put out a [PATCH 9/8] for now to
hopefully wrap this first set of changes up, and I'm also going to get
all this stuff into a GIT repository on kernel.org now that I have an
account there.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Arnd Bergmann

unread,

Jun 3, 2010, 4:50:02 PM6/3/10

On Saturday 29 May 2010 13:29:10 Arnd Bergmann wrote:
> On Saturday 29 May 2010, Chris Metcalf wrote:
> >
> > Thanks to some much-appreciated volunteer work reviewing that initial
> > patch, I now have a revised set of patches which I would like to offer
> > for submission to the mainline.
>
> It seems that you have addressed all my review comments and all the
> other comments that I have seen in the best possible ways.
> All the controversial parts from the original code are either corrected
> or (in case of nonessential drivers) deferred to a future review.
> I did not expect this to be possibly in such a short time, and it
> continues to amaze me.
>
> Consequently, I fully support this series to go into 2.6.35.
> To the entire series:
>
> Acked-by: Arnd Bergmann <ar...@arndb.de>

Hi Chris,

You evidently didn't make it into -rc1, probably because Linus considered
your submission to be too late, or possibly because some of the bigger
patches got lost in an email filter.

To go forward with your architecture, I suggest that you start adding it
to the linux-next tree. Until you have a git tree, the easiest way
to do that is to put a tarball in quilt format at a http url under
your control, and ask Stephen to include that.

Feel free to add a 'Reviewed-by: Arnd Bergmann <ar...@arndb.de>' to your
existing patches, and do your further work as patches on top of that.

Arnd

Chris Metcalf

unread,

Jun 3, 2010, 5:50:02 PM6/3/10

This change addresses DMA-related comments by FUJITA Tomonori
<fujita....@lab.ntt.co.jp> and Kconfig-related comments by Paul
Mundt <let...@linux-sh.org>.

Signed-off-by: Chris Metcalf <cmet...@tilera.com>
---

arch/tile/Kconfig | 20 +++-------
arch/tile/include/asm/dma-mapping.h | 6 +---
arch/tile/include/asm/io.h | 65 +++++++++++++++++++++++++++++++++--
arch/tile/include/asm/scatterlist.h | 21 +++++++++++
arch/tile/kernel/pci-dma.c | 23 +++++++-----
arch/tile/kernel/setup.c | 8 ++--
arch/tile/mm/init.c | 2 +-
7 files changed, 108 insertions(+), 37 deletions(-)

diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index b311484..290ef41 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -20,15 +20,9 @@ config GENERIC_PENDING_IRQ
def_bool y

depends on GENERIC_HARDIRQS && SMP

-config ZONE_DMA
- def_bool y
-
config SEMAPHORE_SLEEPERS
def_bool y

-config CC_OPTIMIZE_FOR_SIZE
- def_bool y
-
config HAVE_ARCH_ALLOC_REMAP
def_bool y

@@ -47,9 +41,6 @@ config GENERIC_TIME
config GENERIC_CLOCKEVENTS
def_bool y

-config CLOCKSOURCE_WATCHDOG
- def_bool y
-

# FIXME: tilegx can implement a more efficent rwsem.

config RWSEM_GENERIC_SPINLOCK
def_bool y
@@ -74,6 +65,8 @@ config STACKTRACE_SUPPORT
def_bool y
select STACKTRACE

+# We use discontigmem for now; at some point we may want to switch
+# to sparsemem (Tilera bug 7996).
config ARCH_DISCONTIGMEM_ENABLE
def_bool y

@@ -97,9 +90,6 @@ config SMP
config DEBUG_COPY_FROM_USER
def_bool n

-config SERIAL_CONSOLE
- def_bool y
-
config HVC_TILE
select HVC_DRIVER
def_bool y
@@ -108,8 +98,8 @@ config TILE
def_bool y
select GENERIC_FIND_FIRST_BIT
select GENERIC_FIND_NEXT_BIT
- select RESOURCES_64BIT
select USE_GENERIC_SMP_HELPERS
+ select CC_OPTIMIZE_FOR_SIZE

# FIXME: investigate whether we need/want these options.

# select HAVE_IOREMAP_PROT
@@ -325,7 +315,9 @@ endmenu # Tilera-specific configuration
menu "Bus options"

config NO_IOMEM
- bool
+ def_bool !PCI
+
+config NO_IOPORT
def_bool !PCI

source "drivers/pci/Kconfig"
diff --git a/arch/tile/include/asm/dma-mapping.h b/arch/tile/include/asm/dma-mapping.h
index 7083e42..cf466b3 100644
--- a/arch/tile/include/asm/dma-mapping.h
+++ b/arch/tile/include/asm/dma-mapping.h
@@ -15,11 +15,6 @@
#ifndef _ASM_TILE_DMA_MAPPING_H
#define _ASM_TILE_DMA_MAPPING_H

-/*
- * IOMMU interface. See Documentation/PCI/PCI-DMA-mapping.txt and
- * Documentation/DMA-API.txt for documentation.
- */
-
#include <linux/mm.h>
#include <linux/scatterlist.h>
#include <linux/cache.h>
@@ -29,6 +24,7 @@

* Note that on x86 and powerpc, there is a "struct dma_mapping_ops"

* that is used for all the DMA operations. For now, we don't have an

* equivalent on tile, because we only have a single way of doing DMA.

+ * (Tilera bug 7994 to use dma_mapping_ops.)
*/

#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f)
diff --git a/arch/tile/include/asm/io.h b/arch/tile/include/asm/io.h
index f6fcf18..8c95bef 100644
--- a/arch/tile/include/asm/io.h
+++ b/arch/tile/include/asm/io.h
@@ -75,6 +75,63 @@ extern void _tile_writew(u16 val, unsigned long addr);
extern void _tile_writel(u32 val, unsigned long addr);
extern void _tile_writeq(u64 val, unsigned long addr);

+#else
+
+/*
+ * The Tile architecture does not support IOMEM unless PCI is enabled.
+ * Unfortunately we can't yet simply not declare these methods,
+ * since some generic code that compiles into the kernel, but
+ * we never run, uses them unconditionally.
+ */
+
+static inline int iomem_panic(void)
+{
+ panic("readb/writeb and friends do not exist on tile without PCI");

+ return 0;
+}
+

+static inline u8 _tile_readb(unsigned long addr)
+{
+ return iomem_panic();
+}
+
+static inline u16 _tile_readw(unsigned long addr)
+{
+ return iomem_panic();
+}
+
+static inline u32 _tile_readl(unsigned long addr)
+{
+ return iomem_panic();
+}
+
+static inline u64 _tile_readq(unsigned long addr)
+{
+ return iomem_panic();
+}
+
+static inline void _tile_writeb(u8 val, unsigned long addr)
+{
+ iomem_panic();
+}
+
+static inline void _tile_writew(u16 val, unsigned long addr)
+{
+ iomem_panic();
+}
+
+static inline void _tile_writel(u32 val, unsigned long addr)
+{
+ iomem_panic();
+}
+
+static inline void _tile_writeq(u64 val, unsigned long addr)
+{
+ iomem_panic();
+}
+
+#endif
+
#define readb(addr) _tile_readb((unsigned long)addr)
#define readw(addr) _tile_readw((unsigned long)addr)
#define readl(addr) _tile_readl((unsigned long)addr)
@@ -125,8 +182,6 @@ static inline void *memcpy_toio(void *dst, void *src, int len)
return dst;
}

-#endif
-
/*
* The Tile architecture does not support IOPORT, even with PCI.
* Unfortunately we can't yet simply not declare these methods,
@@ -134,7 +189,11 @@ static inline void *memcpy_toio(void *dst, void *src, int len)
* we never run, uses them unconditionally.
*/

-extern int ioport_panic(void);
+static inline int ioport_panic(void)
+{
+ panic("inb/outb and friends do not exist on tile");
+ return 0;
+}

static inline u8 inb(unsigned long addr)
{
diff --git a/arch/tile/include/asm/scatterlist.h b/arch/tile/include/asm/scatterlist.h
index 35d786f..c560424 100644
--- a/arch/tile/include/asm/scatterlist.h
+++ b/arch/tile/include/asm/scatterlist.h
@@ -1 +1,22 @@

+#ifndef _ASM_TILE_SCATTERLIST_H
+#define _ASM_TILE_SCATTERLIST_H
+
+#define ISA_DMA_THRESHOLD (~0UL)
+
#include <asm-generic/scatterlist.h>
+
+#endif /* _ASM_TILE_SCATTERLIST_H */
diff --git a/arch/tile/kernel/pci-dma.c b/arch/tile/kernel/pci-dma.c
index b1ddc80..ed52447 100644
--- a/arch/tile/kernel/pci-dma.c
+++ b/arch/tile/kernel/pci-dma.c
@@ -112,19 +112,20 @@ void dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
}
EXPORT_SYMBOL(dma_unmap_single);

-int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+int dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents,
enum dma_data_direction direction)
{
+ struct scatterlist *sg;
int i;

BUG_ON(!valid_dma_direction(direction));

- WARN_ON(nents == 0 || sg[0].length == 0);
+ WARN_ON(nents == 0 || sglist->length == 0);

- for (i = 0; i < nents; i++) {
+ for_each_sg(sglist, sg, nents, i) {
struct page *page;
- sg[i].dma_address = sg_phys(sg + i);
- page = pfn_to_page(sg[i].dma_address >> PAGE_SHIFT);
+ sg->dma_address = sg_phys(sg);
+ page = pfn_to_page(sg->dma_address >> PAGE_SHIFT);
homecache_flush_cache(page, 0);
}

@@ -189,17 +190,19 @@ EXPORT_SYMBOL(dma_sync_sg_for_cpu);
/*

* Flush and invalidate cache for scatterlist.

*/
-void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
+void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist,

int nelems, enum dma_data_direction direction)
{

+ struct scatterlist *sg;
int i;

BUG_ON(!valid_dma_direction(direction));
- WARN_ON(nelems == 0 || sg[0].length == 0);
+ WARN_ON(nelems == 0 || sglist->length == 0);

- for (i = 0; i < nelems; i++)
- dma_sync_single_for_device(dev, sg[i].dma_address,
- sg[i].dma_length, direction);
+ for_each_sg(sglist, sg, nelems, i) {
+ dma_sync_single_for_device(dev, sg->dma_address,
+ sg_dma_len(sg), direction);
+ }
}
EXPORT_SYMBOL(dma_sync_sg_for_device);

diff --git a/arch/tile/kernel/setup.c b/arch/tile/kernel/setup.c
index 333262d..934136b 100644
--- a/arch/tile/kernel/setup.c
+++ b/arch/tile/kernel/setup.c
@@ -653,14 +653,14 @@ static void __init zone_sizes_init(void)

#ifdef CONFIG_HIGHMEM
if (start > lowmem_end) {
- zones_size[ZONE_DMA] = 0;
+ zones_size[ZONE_NORMAL] = 0;
zones_size[ZONE_HIGHMEM] = end - start;
} else {
- zones_size[ZONE_DMA] = lowmem_end - start;
+ zones_size[ZONE_NORMAL] = lowmem_end - start;
zones_size[ZONE_HIGHMEM] = end - lowmem_end;
}
#else
- zones_size[ZONE_DMA] = end - start;
+ zones_size[ZONE_NORMAL] = end - start;
#endif

/*
@@ -679,7 +679,7 @@ static void __init zone_sizes_init(void)
PFN_UP(node_percpu[i]));

/* Track the type of memory on each node */
- if (zones_size[ZONE_DMA])
+ if (zones_size[ZONE_NORMAL])
node_set_state(i, N_NORMAL_MEMORY);
#ifdef CONFIG_HIGHMEM
if (end != start)
diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index 31b5c09..125ac53 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -742,7 +742,7 @@ static void __init set_non_bootmem_pages_init(void)
if (start == 0)
continue; /* bootmem */

end = start + z->spanned_pages;

- if (zone_idx(z) == ZONE_DMA) {
+ if (zone_idx(z) == ZONE_NORMAL) {

BUG_ON(start != node_start_pfn[nid]);

start = node_free_pfn[nid];
}
--
1.6.5.2

Chris Metcalf

unread,

Jun 3, 2010, 5:50:02 PM6/3/10

On 6/3/2010 4:40 PM, Arnd Bergmann wrote:
> You evidently didn't make it into -rc1, probably because Linus considered
> your submission to be too late, or possibly because some of the bigger
> patches got lost in an email filter.
>
> To go forward with your architecture, I suggest that you start adding it
> to the linux-next tree. Until you have a git tree, the easiest way
> to do that is to put a tarball in quilt format at a http url under
> your control, and ask Stephen to include that.
>
> Feel free to add a 'Reviewed-by: Arnd Bergmann <ar...@arndb.de>' to your
> existing patches, and do your further work as patches on top of that.
>

I will plan to push the commits I have mailed to LKML up to a tree on
kernel.org, since I now have an account there, probably tomorrow. I'll
send an email to Stephen with a pointer and see where it goes from there.

And Arnd, many thanks -- it's confusing to navigate the jungle of how
code actually makes it into the kernel, and a little help goes a long way!

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Paul Mundt

unread,

Jun 3, 2010, 9:00:03 PM6/3/10

On Thu, Jun 03, 2010 at 05:32:17PM -0400, Chris Metcalf wrote:
> This change addresses DMA-related comments by FUJITA Tomonori
> <fujita....@lab.ntt.co.jp> and Kconfig-related comments by Paul
> Mundt <let...@linux-sh.org>.
>
> Signed-off-by: Chris Metcalf <cmet...@tilera.com>

Looks good to me. Feel free to add my reviewed-by to the series.

Reviewed-by: Paul Mundt <let...@linux-sh.org>

FUJITA Tomonori

unread,

Jun 3, 2010, 9:40:02 PM6/3/10

On Thu, 3 Jun 2010 17:32:17 -0400
Chris Metcalf <cmet...@tilera.com> wrote:

> This change addresses DMA-related comments by FUJITA Tomonori

What about the comment on dma_alloc_coherent()?

The rest changes look fine.

Chris Metcalf

unread,

Jun 4, 2010, 5:40:01 PM6/4/10

On 6/3/2010 4:40 PM, Arnd Bergmann wrote:

> You evidently didn't make it into -rc1, probably because Linus considered
> your submission to be too late, or possibly because some of the bigger
> patches got lost in an email filter.
>
> To go forward with your architecture, I suggest that you start adding it
> to the linux-next tree. Until you have a git tree, the easiest way
> to do that is to put a tarball in quilt format at a http url under
> your control, and ask Stephen to include that.
>

I've set up a GIT tree for the Tilera architecture support here:

git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git

I kept the commit to fix up the "generic" unistd.h ABI and added Arnd's
Acked-by to the commit message, and I combined the v2 Tilera-specific
series of patches along with responses to the v2 patches into a single
commit, with appropriate Acked-by and Reviewed-by based on what folks
had said.

Stephen, in an ideal world you could add this repository to your set of
things you pull from into linux-next, and going forward I would then be
the gatekeeper for "arch/tile/" changes. I'll plan to continue mailing
the diffs to LKML for public review, and push them up to git.kernel.org
after feedback has died down and they are ready to go to linux-next.
Does that sound good to you?

Many thanks!

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Stephen Rothwell

unread,

Jun 5, 2010, 9:00:02 AM6/5/10

Hi Chris,

On Fri, 04 Jun 2010 17:32:52 -0400 Chris Metcalf <cmet...@tilera.com> wrote:
>
> On 6/3/2010 4:40 PM, Arnd Bergmann wrote:
> > You evidently didn't make it into -rc1, probably because Linus considered
> > your submission to be too late, or possibly because some of the bigger
> > patches got lost in an email filter.
> >
> > To go forward with your architecture, I suggest that you start adding it
> > to the linux-next tree. Until you have a git tree, the easiest way
> > to do that is to put a tarball in quilt format at a http url under
> > your control, and ask Stephen to include that.
>
> I've set up a GIT tree for the Tilera architecture support here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git
>
> I kept the commit to fix up the "generic" unistd.h ABI and added Arnd's
> Acked-by to the commit message, and I combined the v2 Tilera-specific
> series of patches along with responses to the v2 patches into a single
> commit, with appropriate Acked-by and Reviewed-by based on what folks
> had said.
>
> Stephen, in an ideal world you could add this repository to your set of
> things you pull from into linux-next, and going forward I would then be
> the gatekeeper for "arch/tile/" changes. I'll plan to continue mailing
> the diffs to LKML for public review, and push them up to git.kernel.org
> after feedback has died down and they are ready to go to linux-next.
> Does that sound good to you?

That sounds pretty good. I have added the master branch from that tree
and it will appear in linux-next from Monday.

Thanks for adding your subsystem tree as a participant of linux-next. As
you may know, this is not a judgment of your code. The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window.

You will need to ensure that the patches/commits in your tree/series have
been:
* submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
* posted to the relevant mailing list,
* reviewed by you (or another maintainer of your subsystem tree),
* successfully unit tested, and
* destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch). It is allowed to be rebased if you deem it necessary.

--
Cheers,
Stephen Rothwell
s...@canb.auug.org.au

Legal Stuff:
By participating in linux-next, your subsystem tree contributions are
public and will be included in the linux-next trees. You may be sent
e-mail messages indicating errors or other issues when the
patches/commits from your subsystem tree are merged and tested in
linux-next. These messages may also be cross-posted to the linux-next
mailing list, the linux-kernel mailing list, etc. The linux-next tree
project and IBM (my employer) make no warranties regarding the linux-next
project, the testing procedures, the results, the e-mails, etc. If you
don't agree to these ground rules, let me know and I'll remove your tree
from participation in linux-next.

Chris Metcalf

unread,

Jun 5, 2010, 9:40:02 AM6/5/10

On 6/5/2010 8:56 AM, Stephen Rothwell wrote:
> On Fri, 04 Jun 2010 17:32:52 -0400 Chris Metcalf <cmet...@tilera.com> wrote:
>
>> I've set up a GIT tree for the Tilera architecture support here:
>> git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git
>>

>> [...]

>>
> That sounds pretty good. I have added the master branch from that tree
> and it will appear in linux-next from Monday.
>

Thanks, Stephen. One question: has linux-next reached the point where
Linus pulls from it automatically during each merge window, or should I
still ask Linus explicitly to pull for the 2.6.36 merge window?

Stephen Rothwell

unread,

Jun 5, 2010, 10:20:02 AM6/5/10

Hi Chris,

On Sat, 05 Jun 2010 09:30:40 -0400 Chris Metcalf <cmet...@tilera.com> wrote:
>
> Thanks, Stephen. One question: has linux-next reached the point where
> Linus pulls from it automatically during each merge window, or should I
> still ask Linus explicitly to pull for the 2.6.36 merge window?

You need to ask Linus to pull your tree when you are ready.

P.S. I noticed that you just added a commit with no Signed-off-by ...

--
Cheers,
Stephen Rothwell s...@canb.auug.org.au

http://www.canb.auug.org.au/~sfr/

FUJITA Tomonori

unread,

Jun 7, 2010, 1:30:01 AM6/7/10

On Fri, 4 Jun 2010 10:31:17 +0900
FUJITA Tomonori <fujita....@lab.ntt.co.jp> wrote:

> On Thu, 3 Jun 2010 17:32:17 -0400
> Chris Metcalf <cmet...@tilera.com> wrote:
>
> > This change addresses DMA-related comments by FUJITA Tomonori
>
> What about the comment on dma_alloc_coherent()?

I saw that you addressed two issues (coherent_mask and GFP_KERNEL
usage) in linux-next. Both changes look fine to me.

Chris Metcalf

unread,

May 18, 2011, 2:10:02 PM5/18/11

This change introduces a few of the less controversial /proc and
/proc/sys interfaces for tile, along with a sysfs attribute for
something that was originally proposed as a /proc/tile file.

Arnd Bergmann reviewed the initial arch/tile submission, which
included a complete set of all the /proc/tile and /proc/sys/tile
knobs that we had added in a somewhat ad hoc way during initial
development, and provided feedback on where most of them should go.

One knob turned out to be similar enough to the existing
/proc/sys/debug/exception-trace that it was re-implemented to use
that model instead (in a separate commit).

Another knob was /proc/tile/grid, which reported the "grid" dimensions
of a tile chip (e.g. 8x8 processors = 64-core chip). He suggested
looking at sysfs for that, so this change moves that information
to a pair of sysfs attributes (chip_width and chip_height) in the
/sys/devices/system/cpu directory.

The entries that don't seem to have an obvious place in /sys
or elsewhere, and that are added with this patch, are:

/proc/tile/hv
Version information about the running Tilera hypervisor

/proc/tile/hvconfig
Detailed configuration description of the hypervisor config

/proc/tile/board
Information on part numbers, serial numbers, etc., of the
hardware that the kernel is executing on

/proc/tile/switch
The type of control path for the onboard network switch, if any.

/proc/tile/hardwall
Information on the set of currently active hardwalls (note that
the implementation is already present in arch/tile/kernel/hardwall.c;
this change just enables it)

/proc/sys/tile/unaligned_fixup/
Knobs controlling the kernel code to fix up unaligned exceptions

Signed-off-by: Chris Metcalf <cmet...@tilera.com>
---

arch/tile/kernel/Makefile | 2 +-
arch/tile/kernel/proc.c | 178 +++++++++++++++++++++++++++++++++++++++++++++
arch/tile/kernel/sysfs.c | 52 +++++++++++++
3 files changed, 231 insertions(+), 1 deletions(-)
create mode 100644 arch/tile/kernel/sysfs.c

diff --git a/arch/tile/kernel/Makefile b/arch/tile/kernel/Makefile
index b4c8e8e..b4dbc05 100644
--- a/arch/tile/kernel/Makefile
+++ b/arch/tile/kernel/Makefile
@@ -5,7 +5,7 @@
extra-y := vmlinux.lds head_$(BITS).o
obj-y := backtrace.o entry.o init_task.o irq.o messaging.o \
pci-dma.o proc.o process.o ptrace.o reboot.o \
- setup.o signal.o single_step.o stack.o sys.o time.o traps.o \
+ setup.o signal.o single_step.o stack.o sys.o sysfs.o time.o traps.o \
intvec_$(BITS).o regs_$(BITS).o tile-desc_$(BITS).o

obj-$(CONFIG_HARDWALL) += hardwall.o
diff --git a/arch/tile/kernel/proc.c b/arch/tile/kernel/proc.c
index 2e02c41..c871674 100644
--- a/arch/tile/kernel/proc.c
+++ b/arch/tile/kernel/proc.c
@@ -27,6 +27,7 @@
#include <asm/processor.h>
#include <asm/sections.h>
#include <asm/homecache.h>
+#include <asm/hardwall.h>
#include <arch/chip.h>

@@ -88,3 +89,180 @@ const struct seq_operations cpuinfo_op = {
.stop = c_stop,
.show = show_cpuinfo,
};
+
+/*
+ * Support /proc/tile directory
+ */
+
+static struct proc_dir_entry *proc_tile_root;
+
+/*
+ * Define a /proc/tile file which uses a seq_file to provide a more
+ * complex set of data.
+ */
+#define SEQ_PROC_ENTRY(name) \
+ static int proc_tile_##name##_open(struct inode *inode, \
+ struct file *file) \
+ { \
+ return single_open(file, proc_tile_##name##_show, NULL); \
+ } \
+ static const struct file_operations proc_tile_##name##_fops = { \
+ .open = proc_tile_##name##_open, \
+ .read = seq_read, \
+ .llseek = seq_lseek, \
+ .release = single_release, \
+ }; \
+ static void proc_tile_##name##_init(void) \
+ { \
+ struct proc_dir_entry *entry = \
+ create_proc_entry(#name, 0444, proc_tile_root); \
+ if (entry) \
+ entry->proc_fops = &proc_tile_##name##_fops; \
+ }
+
+/* Print to a seq_file the result of hv_confstr(query). */
+static void proc_tile_seq_strconf(struct seq_file *sf, char* what, int query)
+{
+ char tmpbuf[256];
+ char *bufptr = tmpbuf;
+ int buflen = sizeof(tmpbuf);
+ int len = hv_confstr(query, (HV_VirtAddr) bufptr, buflen);
+
+ if (len > buflen) {
+ bufptr = kmalloc(len, GFP_KERNEL);
+ if (!bufptr)
+ return;
+ buflen = len;
+ len = hv_confstr(query, (HV_VirtAddr) bufptr, buflen);
+ }
+
+ bufptr[buflen - 1] = 0;
+ /* Length includes the trailing null, so if it's 1, it's empty. */
+ if (len > 1) {
+ if (what)
+ seq_printf(sf, "%s: %s\n", what, bufptr);
+ else
+ seq_printf(sf, "%s", bufptr);
+ }
+
+ if (bufptr != tmpbuf)
+ kfree(bufptr);
+}
+
+static int proc_tile_hv_show(struct seq_file *sf, void *v)
+{
+ proc_tile_seq_strconf(sf, "version", HV_CONFSTR_HV_SW_VER);
+ proc_tile_seq_strconf(sf, "config_version", HV_CONFSTR_HV_CONFIG_VER);
+ return 0;
+}
+SEQ_PROC_ENTRY(hv)
+
+static int proc_tile_hvconfig_show(struct seq_file *sf, void *v)
+{
+ proc_tile_seq_strconf(sf, NULL, HV_CONFSTR_HV_CONFIG);
+ return 0;
+}
+SEQ_PROC_ENTRY(hvconfig)
+
+static int proc_tile_board_show(struct seq_file *sf, void *v)
+{
+ proc_tile_seq_strconf(sf, "board_part", HV_CONFSTR_BOARD_PART_NUM);
+ proc_tile_seq_strconf(sf, "board_serial", HV_CONFSTR_BOARD_SERIAL_NUM);
+ proc_tile_seq_strconf(sf, "chip_serial", HV_CONFSTR_CHIP_SERIAL_NUM);
+ proc_tile_seq_strconf(sf, "chip_revision", HV_CONFSTR_CHIP_REV);
+ proc_tile_seq_strconf(sf, "board_revision", HV_CONFSTR_BOARD_REV);
+ proc_tile_seq_strconf(sf, "board_description", HV_CONFSTR_BOARD_DESC);
+ proc_tile_seq_strconf(sf, "mezz_part", HV_CONFSTR_MEZZ_PART_NUM);
+ proc_tile_seq_strconf(sf, "mezz_serial", HV_CONFSTR_MEZZ_SERIAL_NUM);
+ proc_tile_seq_strconf(sf, "mezz_revision", HV_CONFSTR_MEZZ_REV);
+ proc_tile_seq_strconf(sf, "mezz_description", HV_CONFSTR_MEZZ_DESC);
+ return 0;
+}
+SEQ_PROC_ENTRY(board)
+
+static int proc_tile_switch_show(struct seq_file *sf, void *v)
+{
+ proc_tile_seq_strconf(sf, "control", HV_CONFSTR_SWITCH_CONTROL);
+ return 0;
+}
+SEQ_PROC_ENTRY(switch)
+
+
+#ifdef CONFIG_HARDWALL
+/* See arch/tile/kernel/hardwall.c for the implementation. */
+SEQ_PROC_ENTRY(hardwall)
+#endif
+
+static int __init proc_tile_init(void)
+{
+ proc_tile_root = proc_mkdir("tile", NULL);
+ if (!proc_tile_root)
+ return 0;
+
+ proc_tile_board_init();
+ proc_tile_switch_init();
+ proc_tile_hv_init();
+ proc_tile_hvconfig_init();
+#ifdef CONFIG_HARDWALL
+ proc_tile_hardwall_init();
+#endif
+

+ return 0;
+}
+

+arch_initcall(proc_tile_init);
+
+/*
+ * Support /proc/sys/tile directory
+ */
+
+#ifndef __tilegx__ /* FIXME: GX: no support for unaligned access yet */
+static ctl_table unaligned_subtable[] = {
+ {
+ .procname = "enabled",
+ .data = &unaligned_fixup,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec
+ },
+ {
+ .procname = "printk",
+ .data = &unaligned_printk,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec
+ },
+ {
+ .procname = "count",
+ .data = &unaligned_fixup_count,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec
+ },
+ {}
+};
+
+static ctl_table unaligned_table[] = {
+ {
+ .procname = "unaligned_fixup",
+ .mode = 0555,
+ .child = unaligned_subtable
+ },
+ {}
+};
+#endif
+
+static struct ctl_path tile_path[] = {
+ { .procname = "tile" },
+ { }
+};
+
+static int __init proc_sys_tile_init(void)
+{
+#ifndef __tilegx__ /* FIXME: GX: no support for unaligned access yet */
+ register_sysctl_paths(tile_path, unaligned_table);
+#endif

+ return 0;
+}
+

+arch_initcall(proc_sys_tile_init);
diff --git a/arch/tile/kernel/sysfs.c b/arch/tile/kernel/sysfs.c
new file mode 100644
index 0000000..151deeb
--- /dev/null
+++ b/arch/tile/kernel/sysfs.c
@@ -0,0 +1,52 @@
+/*
+ * Copyright 2011 Tilera Corporation. All Rights Reserved.

+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ *

+ * /sys entry support.
+ */
+
+#include <linux/sysdev.h>
+#include <linux/cpu.h>
+#include <linux/smp.h>
+
+static ssize_t chip_width_show(struct sysdev_class *dev,
+ struct sysdev_class_attribute *attr,
+ char *page)
+{
+ return sprintf(page, "%u\n", smp_width);
+}
+static SYSDEV_CLASS_ATTR(chip_width, 0444, chip_width_show, NULL);
+
+static ssize_t chip_height_show(struct sysdev_class *dev,
+ struct sysdev_class_attribute *attr,
+ char *page)
+{
+ return sprintf(page, "%u\n", smp_height);
+}
+static SYSDEV_CLASS_ATTR(chip_height, 0444, chip_height_show, NULL);
+
+
+static int __init create_cpu_entries(void)
+{
+ struct sysdev_class *cls = &cpu_sysdev_class;
+ int err = 0;
+
+ if (!err)
+ err = sysfs_create_file(&cls->kset.kobj,
+ &attr_chip_width.attr);
+ if (!err)
+ err = sysfs_create_file(&cls->kset.kobj,
+ &attr_chip_height.attr);
+
+ return err;
+}
+subsys_initcall(create_cpu_entries);
--
1.6.5.2

Chris Metcalf

unread,

May 18, 2011, 2:10:01 PM5/18/11

This change adds support for /proc/sys/debug/exception-trace to tile.
Like x86 and sparc, by default it is set to "1", generating a one-line
printk whenever a user process crashes. By setting it to "2", we get
a much more complete userspace diagnostic at crash time, including
a user-space backtrace, register dump, and memory dump around the
address of the crash.

Some vestiges of the Tilera-internal version of this support are
removed with this patch (the show_crashinfo variable and the
arch_coredump_signal function). We retain a "crashinfo" boot parameter
which allows you to set the boot-time value of exception-trace.

Signed-off-by: Chris Metcalf <cmet...@tilera.com>
---

Arnd Bergmann originally requested this (see parent email in thread)
in code review of an early batch of arch/tile code.

diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index d6b43dd..34c1e01 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -257,10 +257,6 @@ static inline void cpu_relax(void)
barrier();
}

-struct siginfo;
-extern void arch_coredump_signal(struct siginfo *, struct pt_regs *);
-#define arch_coredump_signal arch_coredump_signal
-
/* Info on this processor (see fs/proc/cpuinfo.c) */
struct seq_operations;
extern const struct seq_operations cpuinfo_op;
@@ -271,9 +267,6 @@ extern char chip_model[64];
/* Data on which physical memory controller corresponds to which NUMA node. */
extern int node_controller[];

-/* Do we dump information to the console when a user application crashes? */
-extern int show_crashinfo;
-
#if CHIP_HAS_CBOX_HOME_MAP()
/* Does the heap allocator return hash-for-home pages by default? */
extern int hash_default;
diff --git a/arch/tile/include/asm/signal.h b/arch/tile/include/asm/signal.h
index 81d92a4..1e1e616 100644
--- a/arch/tile/include/asm/signal.h
+++ b/arch/tile/include/asm/signal.h
@@ -28,6 +28,10 @@ struct pt_regs;
int restore_sigcontext(struct pt_regs *, struct sigcontext __user *);
int setup_sigcontext(struct sigcontext __user *, struct pt_regs *);
void do_signal(struct pt_regs *regs);
+void signal_fault(const char *type, struct pt_regs *,
+ void __user *frame, int sig);
+void trace_unhandled_signal(const char *type, struct pt_regs *regs,
+ unsigned long address, int signo);
#endif

#endif /* _ASM_TILE_SIGNAL_H */
diff --git a/arch/tile/kernel/compat_signal.c b/arch/tile/kernel/compat_signal.c
index dbb0dfc..a7869ad 100644
--- a/arch/tile/kernel/compat_signal.c
+++ b/arch/tile/kernel/compat_signal.c
@@ -317,7 +317,7 @@ long compat_sys_rt_sigreturn(struct pt_regs *regs)
return 0;

badframe:
- force_sig(SIGSEGV, current);
+ signal_fault("bad sigreturn frame", regs, frame, 0);
return 0;
}

@@ -431,6 +431,6 @@ int compat_setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
return 0;

give_sigsegv:
- force_sigsegv(sig, current);
+ signal_fault("bad setup frame", regs, frame, sig);
return -EFAULT;
}
diff --git a/arch/tile/kernel/signal.c b/arch/tile/kernel/signal.c
index 1260321..bedaf4e 100644
--- a/arch/tile/kernel/signal.c
+++ b/arch/tile/kernel/signal.c
@@ -39,7 +39,6 @@

#define _BLOCKABLE (~(sigmask(SIGKILL) | sigmask(SIGSTOP)))

-
SYSCALL_DEFINE3(sigaltstack, const stack_t __user *, uss,
stack_t __user *, uoss, struct pt_regs *, regs)
{
@@ -78,6 +77,13 @@ int restore_sigcontext(struct pt_regs *regs,
return err;
}

+void signal_fault(const char *type, struct pt_regs *regs,
+ void __user *frame, int sig)
+{
+ trace_unhandled_signal(type, regs, (unsigned long)frame, SIGSEGV);
+ force_sigsegv(sig, current);
+}
+
/* The assembly shim for this function arranges to ignore the return value. */
SYSCALL_DEFINE1(rt_sigreturn, struct pt_regs *, regs)
{
@@ -105,7 +111,7 @@ SYSCALL_DEFINE1(rt_sigreturn, struct pt_regs *, regs)
return 0;

badframe:
- force_sig(SIGSEGV, current);
+ signal_fault("bad sigreturn frame", regs, frame, 0);
return 0;
}

@@ -231,7 +237,7 @@ static int setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
return 0;

give_sigsegv:
- force_sigsegv(sig, current);
+ signal_fault("bad setup frame", regs, frame, sig);
return -EFAULT;
}

@@ -245,7 +251,6 @@ static int handle_signal(unsigned long sig, siginfo_t *info,
{
int ret;

-
/* Are we from a system call? */
if (regs->faultnum == INT_SWINT_1) {
/* If so, check system call restarting.. */
@@ -363,3 +368,118 @@ done:
/* Avoid double syscall restart if there are nested signals. */
regs->faultnum = INT_SWINT_1_SIGRETURN;
}
+
+int show_unhandled_signals = 1;
+
+static int __init crashinfo(char *str)
+{
+ unsigned long val;
+ const char *word;
+

+ if (*str == '\0')

+ val = 2;
+ else if (*str != '=' || strict_strtoul(++str, 0, &val) != 0)
+ return 0;
+ show_unhandled_signals = val;
+ switch (show_unhandled_signals) {
+ case 0:
+ word = "No";
+ break;
+ case 1:
+ word = "One-line";
+ break;
+ default:
+ word = "Detailed";
+ break;
+ }
+ pr_info("%s crash reports will be generated on the console\n", word);
+ return 1;
+}
+__setup("crashinfo", crashinfo);
+
+static void dump_mem(void __user *address)
+{
+ void __user *addr;
+ enum { region_size = 256, bytes_per_line = 16 };
+ int i, j, k;
+ int found_readable_mem = 0;
+
+ pr_err("\n");
+ if (!access_ok(VERIFY_READ, address, 1)) {
+ pr_err("Not dumping at address 0x%lx (kernel address)\n",
+ (unsigned long)address);
+ return;
+ }
+
+ addr = (void __user *)
+ (((unsigned long)address & -bytes_per_line) - region_size/2);
+ if (addr > address)
+ addr = NULL;
+ for (i = 0; i < region_size;
+ addr += bytes_per_line, i += bytes_per_line) {
+ unsigned char buf[bytes_per_line];
+ char line[100];
+ if (copy_from_user(buf, addr, bytes_per_line))
+ continue;
+ if (!found_readable_mem) {
+ pr_err("Dumping memory around address 0x%lx:\n",
+ (unsigned long)address);
+ found_readable_mem = 1;
+ }
+ j = sprintf(line, REGFMT":", (unsigned long)addr);
+ for (k = 0; k < bytes_per_line; ++k)
+ j += sprintf(&line[j], " %02x", buf[k]);
+ pr_err("%s\n", line);
+ }
+ if (!found_readable_mem)
+ pr_err("No readable memory around address 0x%lx\n",
+ (unsigned long)address);
+}
+
+void trace_unhandled_signal(const char *type, struct pt_regs *regs,
+ unsigned long address, int sig)
+{
+ struct task_struct *tsk = current;
+
+ if (show_unhandled_signals == 0)
+ return;
+
+ /* If the signal is handled, don't show it here. */
+ if (!is_global_init(tsk)) {
+ void __user *handler =
+ tsk->sighand->action[sig-1].sa.sa_handler;
+ if (handler != SIG_IGN && handler != SIG_DFL)
+ return;
+ }
+
+ /* Rate-limit the one-line output, not the detailed output. */
+ if (show_unhandled_signals <= 1 && !printk_ratelimit())
+ return;
+
+ printk("%s%s[%d]: %s at %lx pc "REGFMT" signal %d",
+ task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
+ tsk->comm, task_pid_nr(tsk), type, address, regs->pc, sig);
+
+ print_vma_addr(KERN_CONT " in ", regs->pc);
+
+ printk(KERN_CONT "\n");
+
+ if (show_unhandled_signals > 1) {
+ switch (sig) {
+ case SIGILL:
+ case SIGFPE:
+ case SIGSEGV:
+ case SIGBUS:
+ pr_err("User crash: signal %d,"
+ " trap %ld, address 0x%lx\n",
+ sig, regs->faultnum, address);
+ show_regs(regs);
+ dump_mem((void __user *)address);
+ break;
+ default:
+ pr_err("User crash: signal %d, trap %ld\n",
+ sig, regs->faultnum);

+ break;
+ }
+ }
+}

diff --git a/arch/tile/kernel/single_step.c b/arch/tile/kernel/single_step.c
index 86df5a2..4032ca8 100644
--- a/arch/tile/kernel/single_step.c
+++ b/arch/tile/kernel/single_step.c
@@ -186,6 +186,8 @@ static tile_bundle_bits rewrite_load_store_unaligned(
.si_code = SEGV_MAPERR,
.si_addr = addr
};
+ trace_unhandled_signal("segfault", regs,
+ (unsigned long)addr, SIGSEGV);
force_sig_info(info.si_signo, &info, current);
return (tile_bundle_bits) 0;
}
@@ -196,6 +198,8 @@ static tile_bundle_bits rewrite_load_store_unaligned(
.si_code = BUS_ADRALN,
.si_addr = addr
};
+ trace_unhandled_signal("unaligned trap", regs,
+ (unsigned long)addr, SIGBUS);
force_sig_info(info.si_signo, &info, current);
return (tile_bundle_bits) 0;
}
diff --git a/arch/tile/kernel/traps.c b/arch/tile/kernel/traps.c
index 5474fc2..f9803df 100644
--- a/arch/tile/kernel/traps.c
+++ b/arch/tile/kernel/traps.c
@@ -308,6 +308,7 @@ void __kprobes do_trap(struct pt_regs *regs, int fault_num,

info.si_addr = (void __user *)address;

if (signo == SIGILL)
info.si_trapno = fault_num;
+ trace_unhandled_signal("trap", regs, address, signo);
force_sig_info(signo, &info, current);
}

diff --git a/arch/tile/mm/fault.c b/arch/tile/mm/fault.c
index 24ca54a..25b7b90 100644
--- a/arch/tile/mm/fault.c
+++ b/arch/tile/mm/fault.c
@@ -43,8 +43,11 @@

#include <arch/interrupts.h>

-static noinline void force_sig_info_fault(int si_signo, int si_code,
- unsigned long address, int fault_num, struct task_struct *tsk)
+static noinline void force_sig_info_fault(const char *type, int si_signo,
+ int si_code, unsigned long address,
+ int fault_num,
+ struct task_struct *tsk,
+ struct pt_regs *regs)
{
siginfo_t info;

@@ -59,6 +62,7 @@ static noinline void force_sig_info_fault(int si_signo, int si_code,
info.si_code = si_code;

info.si_addr = (void __user *)address;

info.si_trapno = fault_num;
+ trace_unhandled_signal(type, regs, address, si_signo);
force_sig_info(si_signo, &info, tsk);
}

@@ -71,11 +75,12 @@ SYSCALL_DEFINE2(cmpxchg_badaddr, unsigned long, address,
struct pt_regs *, regs)
{
if (address >= PAGE_OFFSET)
- force_sig_info_fault(SIGSEGV, SEGV_MAPERR, address,
- INT_DTLB_MISS, current);
+ force_sig_info_fault("atomic segfault", SIGSEGV, SEGV_MAPERR,
+ address, INT_DTLB_MISS, current, regs);
else
- force_sig_info_fault(SIGBUS, BUS_ADRALN, address,
- INT_UNALIGN_DATA, current);
+ force_sig_info_fault("atomic alignment fault", SIGBUS,
+ BUS_ADRALN, address,
+ INT_UNALIGN_DATA, current, regs);

/*

* Adjust pc to point at the actual instruction, which is unusual

@@ -471,8 +476,8 @@ bad_area_nosemaphore:
*/
local_irq_enable();

- force_sig_info_fault(SIGSEGV, si_code, address,
- fault_num, tsk);
+ force_sig_info_fault("segfault", SIGSEGV, si_code, address,
+ fault_num, tsk, regs);
return 0;
}

@@ -547,7 +552,8 @@ do_sigbus:
if (is_kernel_mode)
goto no_context;

- force_sig_info_fault(SIGBUS, BUS_ADRERR, address, fault_num, tsk);
+ force_sig_info_fault("bus error", SIGBUS, BUS_ADRERR, address,
+ fault_num, tsk, regs);
return 0;
}

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c0bb324..aaec934 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1496,7 +1496,7 @@ static struct ctl_table fs_table[] = {

static struct ctl_table debug_table[] = {
#if defined(CONFIG_X86) || defined(CONFIG_PPC) || defined(CONFIG_SPARC) || \
- defined(CONFIG_S390)
+ defined(CONFIG_S390) || defined(CONFIG_TILE)
{
.procname = "exception-trace",
.data = &show_unhandled_signals,

Chris Metcalf

unread,

May 18, 2011, 2:20:02 PM5/18/11

Resending with Andi Kleen's current email address (a...@suse.de was in the
git log for the x86 version of show-unhandled-signals).

Chris Metcalf, Tilera Corp.
http://www.tilera.com

Arnd Bergmann

unread,

May 19, 2011, 9:50:01 AM5/19/11

On Tuesday 17 May 2011, Chris Metcalf wrote:

> /proc/tile/hv
> Version information about the running Tilera hypervisor
>
> /proc/tile/hvconfig
> Detailed configuration description of the hypervisor config
>
> /proc/tile/board
> Information on part numbers, serial numbers, etc., of the
> hardware that the kernel is executing on
>
> /proc/tile/switch
> The type of control path for the onboard network switch, if any.
>
> /proc/tile/hardwall
> Information on the set of currently active hardwalls (note that
> the implementation is already present in arch/tile/kernel/hardwall.c;
> this change just enables it)

These all look like ideal candidates for sysfs attributes under
/sys/hypervisor, doing them one value per file, instead of grouping
them into multiple entries per file.

You can also turn each of these files into one directory under
/sys/hypervisor, with one or more files under it.

This should use sysdev_create_file instead of open-coding it.

Arnd

Chris Metcalf

unread,

May 19, 2011, 11:20:02 AM5/19/11

On 5/19/2011 9:41 AM, Arnd Bergmann wrote:
> These all [below] look like ideal candidates for sysfs attributes under

> /sys/hypervisor, doing them one value per file, instead of grouping
> them into multiple entries per file.
>
> You can also turn each of these files into one directory under
> /sys/hypervisor, with one or more files under it.
>

> On Tuesday 17 May 2011, Chris Metcalf wrote
>> /proc/tile/hv
>> Version information about the running Tilera hypervisor

Yes, for "hv" this does make sense; I've coded it up. I had to add a
"select SYS_HYPERVISOR" for "config TILE" since otherwise tile doesn't
normally get a /sys/hypervisor directory. The upshot is
/sys/hypervisor/version and /sys/hypervisor/config_version files. The
"config_version" can be long (typically in the hundreds of characters) but
should rarely get up to the page size, and it's probably OK to just
truncate it in that case. It looks like Xen also tries to do things in
this directory, but we don't currently support Xen (we're working on KVM
instead) so I won't worry about it.

>> /proc/tile/hvconfig
>> Detailed configuration description of the hypervisor config

I'm concerned about moving this one out of /proc, since it's just (copious)
free text. An "hvconfig" (hypervisor config) file describes hypervisor
driver "dedicated tiles" that run things like network packet or PCIe
ingress/egress processing, etc. In addition it lists hypervisor driver
options, boot flags for the kernel, etc, all kinds of things -- and you
can't really guarantee that it will fit on a 4KB page, though in practice
it usually does. The hypervisor reads this file from the boot stream when
it boots, and then makes it available to Linux not for Linux's use, or even
for programmatic userspace use, but just for end users to be able to review
and verify that the configuration they think they booted is really what
they got, for customer remote debugging, etc. The "remote debugging"
aspect makes truncation to page size a particularly worrisome idea.

>> /proc/tile/board
>> Information on part numbers, serial numbers, etc., of the
>> hardware that the kernel is executing on
>>
>> /proc/tile/switch
>> The type of control path for the onboard network switch, if any.

These two report information about the hardware, not the hypervisor. For
example:

# cat /proc/tile/board
board_part: 402-00002-05
board_serial: NBS-5002-00012
chip_serial: P62338.01.110
chip_revision: A0
board_revision: 2.2
board_description: Tilera TILExpressPro-64, TILEPro64 processor (866 MHz-capable), 1 10GbE, 6 1GbE
# cat /proc/tile/switch
control: mdio gbe/0

The chip_serial and chip_revision can certainly hang off
/sys/devices/system/cpu along with chip_height and chip_width (I've made
this change now) but I don't know where the remaining "board" level
description could go. Note that (as you can see in the source) certain
boards will also include four lines of output with the "mezzanine board"
part number, serial number, revision, and description; this particular
example doesn't have a mezzanine board. The "switch" info is broken out
into a separate file just to make it easier to script some /etc/rc code
that launches a configurator for the Marvell switch on some of our boards,
but is conceptually part of the board info.

>> /proc/tile/hardwall
>> Information on the set of currently active hardwalls (note that
>> the implementation is already present in arch/tile/kernel/hardwall.c;
>> this change just enables it)

This one is not a hypervisor-related file. It just lists information about
the set of Linux hardwalls currently active. Again, it's not primarily
intended for programmatic use, but as a diagnostic tool.

>> diff --git a/arch/tile/kernel/sysfs.c b/arch/tile/kernel/sysfs.c
>> new file mode 100644
>> index 0000000..151deeb
>> --- /dev/null
>> +++ b/arch/tile/kernel/sysfs.c

>> [...]

>> +static int __init create_cpu_entries(void)
>> +{
>> + struct sysdev_class *cls = &cpu_sysdev_class;
>> + int err = 0;
>> +
>> + if (!err)
>> + err = sysfs_create_file(&cls->kset.kobj,
>> + &attr_chip_width.attr);
>> + if (!err)
>> + err = sysfs_create_file(&cls->kset.kobj,
>> + &attr_chip_height.attr);
>> +
>> + return err;
>> +}
> This should use sysdev_create_file instead of open-coding it.

My impression was that I had to associate my new attributes to the
sysdev_class corresponding to "/sys/devices/system/cpu/", since I'm
registering these as top-level items in the cpu directory, e.g.
/sys/devices/system/cpu/chip_width; they are not properties of individual
cpus. It doesn't appear that there is a sys_device corresponding to where
I want to register them.

As always, thanks, Arnd!

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Arnd Bergmann

unread,

May 19, 2011, 11:30:01 AM5/19/11

(adding virtualization mailing list)

On Thursday 19 May 2011, Chris Metcalf wrote:
> On 5/19/2011 9:41 AM, Arnd Bergmann wrote:
> >> /proc/tile/hvconfig
> >> Detailed configuration description of the hypervisor config
>
> I'm concerned about moving this one out of /proc, since it's just (copious)
> free text. An "hvconfig" (hypervisor config) file describes hypervisor
> driver "dedicated tiles" that run things like network packet or PCIe
> ingress/egress processing, etc. In addition it lists hypervisor driver
> options, boot flags for the kernel, etc, all kinds of things -- and you
> can't really guarantee that it will fit on a 4KB page, though in practice
> it usually does. The hypervisor reads this file from the boot stream when
> it boots, and then makes it available to Linux not for Linux's use, or even
> for programmatic userspace use, but just for end users to be able to review
> and verify that the configuration they think they booted is really what
> they got, for customer remote debugging, etc. The "remote debugging"
> aspect makes truncation to page size a particularly worrisome idea.

Since it's not the kernel that is imposing the format here, you could
make it a binary sysfs attribute, which works in the same way as
a proc file and does not have the size limitations.

> >> /proc/tile/board
> >> Information on part numbers, serial numbers, etc., of the
> >> hardware that the kernel is executing on
> >>
> >> /proc/tile/switch
> >> The type of control path for the onboard network switch, if any.
>
> These two report information about the hardware, not the hypervisor. For
> example:
>
> # cat /proc/tile/board
> board_part: 402-00002-05
> board_serial: NBS-5002-00012
> chip_serial: P62338.01.110
> chip_revision: A0
> board_revision: 2.2
> board_description: Tilera TILExpressPro-64, TILEPro64 processor (866 MHz-capable), 1 10GbE, 6 1GbE
> # cat /proc/tile/switch
> control: mdio gbe/0

I think it's ok to have it below /sys/hypervisor, because the information
is provided through a hypervisor ABI, even though it describes something
else. This is more like /sys/firmware, but the boundaries between that
and /sys/hypervisor are not clearly defined when running virtualized anyway.

> The chip_serial and chip_revision can certainly hang off
> /sys/devices/system/cpu along with chip_height and chip_width (I've made
> this change now) but I don't know where the remaining "board" level
> description could go. Note that (as you can see in the source) certain
> boards will also include four lines of output with the "mezzanine board"
> part number, serial number, revision, and description; this particular
> example doesn't have a mezzanine board. The "switch" info is broken out
> into a separate file just to make it easier to script some /etc/rc code
> that launches a configurator for the Marvell switch on some of our boards,
> but is conceptually part of the board info.
>
> >> /proc/tile/hardwall
> >> Information on the set of currently active hardwalls (note that
> >> the implementation is already present in arch/tile/kernel/hardwall.c;
> >> this change just enables it)
>
> This one is not a hypervisor-related file. It just lists information about
> the set of Linux hardwalls currently active. Again, it's not primarily
> intended for programmatic use, but as a diagnostic tool.

same here, I'd still put it into the hypervisor structure.

Arnd

Chris Metcalf

unread,

May 20, 2011, 10:30:01 AM5/20/11

On 5/19/2011 11:22 AM, Arnd Bergmann wrote:
> On Thursday 19 May 2011, Chris Metcalf wrote:
>>>> /proc/tile/board
>>>> Information on part numbers, serial numbers, etc., of the
>>>> hardware that the kernel is executing on
>>>>
>>>> /proc/tile/switch
>>>> The type of control path for the onboard network switch, if any.
>> These two report information about the hardware, not the hypervisor. For
>> example:
>>
>> # cat /proc/tile/board
>> board_part: 402-00002-05
>> board_serial: NBS-5002-00012
>> chip_serial: P62338.01.110
>> chip_revision: A0
>> board_revision: 2.2
>> board_description: Tilera TILExpressPro-64, TILEPro64 processor (866 MHz-capable), 1 10GbE, 6 1GbE
>> # cat /proc/tile/switch
>> control: mdio gbe/0
> I think it's ok to have it below /sys/hypervisor, because the information
> is provided through a hypervisor ABI, even though it describes something
> else. This is more like /sys/firmware, but the boundaries between that
> and /sys/hypervisor are not clearly defined when running virtualized anyway.

I'll create a /sys/hypervisor/board/ and report the attributes there.

>>>> /proc/tile/hardwall
>>>> Information on the set of currently active hardwalls (note that
>>>> the implementation is already present in arch/tile/kernel/hardwall.c;
>>>> this change just enables it)
>> This one is not a hypervisor-related file. It just lists information about
>> the set of Linux hardwalls currently active. Again, it's not primarily
>> intended for programmatic use, but as a diagnostic tool.
> same here, I'd still put it into the hypervisor structure.

Since /proc/tile/hardwall has no connection to the hypervisor whatsoever,
I'm reluctant to put it under /sys/hypervisor.

Perhaps in this case it would be reasonable to just have the hardwall
subsystem put the file in /proc/driver/hardwall, or even /proc/hardwall?
Or I could make the /dev/hardwall char device dump out the ASCII text that
we currently get from /proc/hardwall if you read from it, which is a little
weird but not inconceivable. For example it currently shows things like this:

# cat /proc/tile/hardwall
2x2 1,1 pids: 484@2,1 479@1,1
2x2 0,3 pids:

In this example "2x2 1,1" is a 2x2 grid of cpus starting at grid (x,y)
position (1,1), with task 484 bound to the cpu at (x,y) position (2,1).

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Arnd Bergmann

unread,

May 20, 2011, 10:40:01 AM5/20/11

On Friday 20 May 2011 16:26:57 Chris Metcalf wrote:
> >>>> /proc/tile/hardwall
> >>>> Information on the set of currently active hardwalls (note that
> >>>> the implementation is already present in arch/tile/kernel/hardwall.c;
> >>>> this change just enables it)
> >> This one is not a hypervisor-related file. It just lists information about
> >> the set of Linux hardwalls currently active. Again, it's not primarily
> >> intended for programmatic use, but as a diagnostic tool.
> > same here, I'd still put it into the hypervisor structure.
>
> Since /proc/tile/hardwall has no connection to the hypervisor whatsoever,
> I'm reluctant to put it under /sys/hypervisor.

Ah, I see. I didn't notice that it was in the other file. You are
absolutely right, this does not belong into /sys/hypervisor and
fits well into procfs, we just need to find the right place.

> Perhaps in this case it would be reasonable to just have the hardwall
> subsystem put the file in /proc/driver/hardwall, or even /proc/hardwall?
> Or I could make the /dev/hardwall char device dump out the ASCII text that
> we currently get from /proc/hardwall if you read from it, which is a little
> weird but not inconceivable. For example it currently shows things like this:
>
> # cat /proc/tile/hardwall
> 2x2 1,1 pids: 484@2,1 479@1,1
> 2x2 0,3 pids:
>
> In this example "2x2 1,1" is a 2x2 grid of cpus starting at grid (x,y)
> position (1,1), with task 484 bound to the cpu at (x,y) position (2,1).

Any chance you can still restructure the information? I would recommend
making it a first-class procfs member, since the data is really per-task.

You can add a conditional entry to tgid_base_stuff[] in fs/proc/base.c
to make it show up for each pid, and then just have the per-task information
in there to do the lookup the other way round:

# cat /proc/484/hardwall
2x2 1,1 @2,1

# cat /proc/479/hardwall
2x2 1,1 @1,1

Arnd

Chris Metcalf

unread,

May 20, 2011, 11:10:01 AM5/20/11

On 5/20/2011 10:37 AM, Arnd Bergmann wrote:
> On Friday 20 May 2011 16:26:57 Chris Metcalf wrote:
>>>>>> /proc/tile/hardwall
>>>>>> Information on the set of currently active hardwalls (note that
>>>>>> the implementation is already present in arch/tile/kernel/hardwall.c;
>>>>>> this change just enables it)

> Ah, I see. I didn't notice that it was in the other file. You are
> absolutely right, this does not belong into /sys/hypervisor and
> fits well into procfs, we just need to find the right place.
>> Perhaps in this case it would be reasonable to just have the hardwall
>> subsystem put the file in /proc/driver/hardwall, or even /proc/hardwall?
>> Or I could make the /dev/hardwall char device dump out the ASCII text that
>> we currently get from /proc/hardwall if you read from it, which is a little
>> weird but not inconceivable. For example it currently shows things like this:
>>
>> # cat /proc/tile/hardwall
>> 2x2 1,1 pids: 484@2,1 479@1,1
>> 2x2 0,3 pids:
>>
>> In this example "2x2 1,1" is a 2x2 grid of cpus starting at grid (x,y)
>> position (1,1), with task 484 bound to the cpu at (x,y) position (2,1).
> Any chance you can still restructure the information? I would recommend
> making it a first-class procfs member, since the data is really per-task.
>
> You can add a conditional entry to tgid_base_stuff[] in fs/proc/base.c
> to make it show up for each pid, and then just have the per-task information
> in there to do the lookup the other way round:
>
> # cat /proc/484/hardwall
> 2x2 1,1 @2,1
>
> # cat /proc/479/hardwall
> 2x2 1,1 @1,1

It's not unreasonable to do what you're suggesting, i.e. "what's this
task's hardwall?", but it's not something that we've come up with any kind
of use case for in the past, so I'm not currently planning to implement
this. If we did, I agree, your solution looks like the right one.

The proposed /proc/tile/hardwall really is intended as system-wide
information. Each hardwall (one line in the output file example above)
corresponds to a "struct file" that may be shared by multiple processes (or
threads). Processes may pass the "struct file" to other processes via fork
(and maybe exec), or by passing it over Unix sockets. Then those processes
can choose a cpu within a hardwall rectangle, affinitize to that cpu only,
"activate" the hardwall fd with an ioctl(), and then get access from the OS
so they can work together within a hardwall to exchange data across the
Tilera "user dynamic network" (a wormhole routed grid network that moves
data at 32 bits/cycle with almost no latency). Processes can create a new
hardwall as long as it doesn't overlap geometrically with any other
existing hardwall on the system.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Arnd Bergmann

unread,

May 20, 2011, 11:20:02 AM5/20/11

On Friday 20 May 2011 17:00:47 Chris Metcalf wrote:
> > Any chance you can still restructure the information? I would recommend
> > making it a first-class procfs member, since the data is really per-task.
> >
> > You can add a conditional entry to tgid_base_stuff[] in fs/proc/base.c
> > to make it show up for each pid, and then just have the per-task information
> > in there to do the lookup the other way round:
> >
> > # cat /proc/484/hardwall
> > 2x2 1,1 @2,1
> >
> > # cat /proc/479/hardwall
> > 2x2 1,1 @1,1
>
> It's not unreasonable to do what you're suggesting, i.e. "what's this
> task's hardwall?", but it's not something that we've come up with any kind
> of use case for in the past, so I'm not currently planning to implement
> this. If we did, I agree, your solution looks like the right one.

It's fairly easy to aggregate in user space though, we do similar
things for 'lsof' and 'top', which walk all of procfs in order
to show the complete picture. This is obviously more overhead than
walking the lists in the kernel, but still not an expensive
operation, and it keeps the data format much simpler.

Arnd

Arnd Bergmann

unread,

May 20, 2011, 4:10:02 PM5/20/11

On Friday 20 May 2011 17:13:25 Arnd Bergmann wrote:
> On Friday 20 May 2011 17:00:47 Chris Metcalf wrote:
> > > Any chance you can still restructure the information? I would recommend
> > > making it a first-class procfs member, since the data is really per-task.
> > >
> > > You can add a conditional entry to tgid_base_stuff[] in fs/proc/base.c
> > > to make it show up for each pid, and then just have the per-task information
> > > in there to do the lookup the other way round:
> > >
> > > # cat /proc/484/hardwall
> > > 2x2 1,1 @2,1
> > >
> > > # cat /proc/479/hardwall
> > > 2x2 1,1 @1,1
> >
> > It's not unreasonable to do what you're suggesting, i.e. "what's this
> > task's hardwall?", but it's not something that we've come up with any kind
> > of use case for in the past, so I'm not currently planning to implement
> > this. If we did, I agree, your solution looks like the right one.
>
> It's fairly easy to aggregate in user space though, we do similar
> things for 'lsof' and 'top', which walk all of procfs in order
> to show the complete picture. This is obviously more overhead than
> walking the lists in the kernel, but still not an expensive
> operation, and it keeps the data format much simpler.

Another problem with the existing interface is that it doesn't currently
support PID name spaces. That could of course be retrofitted, but having
the data split by pid directory would make it work implicitly.

Another approach would be to have a /proc/hardwall/ directory with
one entry per hardwall instance, and symlinks from /proc/<pid>/hardwall
to the respective file.

Arnd Bergmann

unread,

May 24, 2011, 11:40:04 AM5/24/11

On Thursday 19 May 2011, Arnd Bergmann wrote:
> >
> > # cat /proc/tile/board
> > board_part: 402-00002-05
> > board_serial: NBS-5002-00012
> > chip_serial: P62338.01.110
> > chip_revision: A0
> > board_revision: 2.2
> > board_description: Tilera TILExpressPro-64, TILEPro64 processor (866 MHz-capable), 1 10GbE, 6 1GbE
> > # cat /proc/tile/switch
> > control: mdio gbe/0
>
> I think it's ok to have it below /sys/hypervisor, because the information
> is provided through a hypervisor ABI, even though it describes something
> else. This is more like /sys/firmware, but the boundaries between that
> and /sys/hypervisor are not clearly defined when running virtualized anyway.

A minor point that I meant to bring up but had not gotten to:

When you do a /sys/hypervisor/ interface, put everything into a subdirectory
under /sys/hypervisor with the name of your hypervisor, to avoid naming
conflicts, e.g.

/sys/hypervisor/tilera-hv/board/board_serial

Chris Metcalf

unread,

May 25, 2011, 3:20:02 PM5/25/11

(Resending with no HTML for LKML.)

On 5/20/2011 3:59 PM, Arnd Bergmann wrote:
> Any chance you can still restructure the information? I would recommend
> making it a first-class procfs member, since the data is really per-task.
>
> You can add a conditional entry to tgid_base_stuff[] in fs/proc/base.c
> to make it show up for each pid, and then just have the per-task information
> in there to do the lookup the other way round:
>
> # cat /proc/484/hardwall
> 2x2 1,1 @2,1
>
> # cat /proc/479/hardwall
> 2x2 1,1 @1,1

> Another problem with the existing interface is that it doesn't currently
> support PID name spaces. That could of course be retrofitted, but having
> the data split by pid directory would make it work implicitly.
>
> Another approach would be to have a /proc/hardwall/ directory with
> one entry per hardwall instance, and symlinks from /proc/<pid>/hardwall
> to the respective file.

I went ahead and implemented this, and will send out a v2 patch shortly. I
added the "hardwall" entry to both the tgid_base (since everything is
reflected there) but also to the tid_base_stuff[], since it can be
different (in principle) for different threads.

I played around with using a symlink, but the bottom line seems to be that
if I make it a symlink (via a SYM() macro in the table) it always has to
exist -- so what does it point to when there's no hardwall activated? I
tried making it point to /dev/null, but that just seemed silly. In the end
I made /proc/PID/hardwall a file, either empty, or else containing the
hardwall id.

The actual hardwalls are then in /proc/tile/hardwall/NN, where NN is the
hardwall id. I wrote a very simple hardwall id allocate/free pair; the pid
allocator seemed too tied to task_structs. We only need at most NR_CPUS
hardwall ids, so it's pretty simple to just use a cpumask to hold the set
of allocated hardwall IDs.

The contents of the hardwall ID file are then just a cpulist of the cpus
covered by the hardwall, rather than introducing a new convention (as
quoted above, e.g. "2x2 1,1"). Individual tasks that are in the hardwall
can be found by reading the "hardwall" files, and we can learn where they
are bound in the hardwall by reading the "stat" file as is normal for
learning process affinity.

> When you do a /sys/hypervisor/ interface, put everything into a subdirectory
> under /sys/hypervisor with the name of your hypervisor, to avoid naming
> conflicts, e.g.
>
> /sys/hypervisor/tilera-hv/board/board_serial

I don't see an easy way to put a directory in /sys/hypervisor. It seems
complex to create a kobject and a suitable class, etc., just for a
subdirectory. Or is there something simple I'm missing? I'll keep looking.

I also suspect just "tile" is an adequate subdirectory name here in the
context of /sys/hypervisor, e.g. /sys/hypervisor/tile/version.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Arnd Bergmann

unread,

May 25, 2011, 4:30:02 PM5/25/11

On Wednesday 25 May 2011 21:18:05 Chris Metcalf wrote:
> (Resending with no HTML for LKML.)
>
> On 5/20/2011 3:59 PM, Arnd Bergmann wrote:
> > Any chance you can still restructure the information? I would recommend
> > making it a first-class procfs member, since the data is really per-task.
> >
> > You can add a conditional entry to tgid_base_stuff[] in fs/proc/base.c
> > to make it show up for each pid, and then just have the per-task information
> > in there to do the lookup the other way round:
> >
> > # cat /proc/484/hardwall
> > 2x2 1,1 @2,1
> >
> > # cat /proc/479/hardwall
> > 2x2 1,1 @1,1
> > Another problem with the existing interface is that it doesn't currently
> > support PID name spaces. That could of course be retrofitted, but having
> > the data split by pid directory would make it work implicitly.
> >
> > Another approach would be to have a /proc/hardwall/ directory with
> > one entry per hardwall instance, and symlinks from /proc/<pid>/hardwall
> > to the respective file.
>
> I went ahead and implemented this, and will send out a v2 patch shortly. I
> added the "hardwall" entry to both the tgid_base (since everything is
> reflected there) but also to the tid_base_stuff[], since it can be
> different (in principle) for different threads.

Ok, sounds good.

> I played around with using a symlink, but the bottom line seems to be that
> if I make it a symlink (via a SYM() macro in the table) it always has to
> exist -- so what does it point to when there's no hardwall activated? I
> tried making it point to /dev/null, but that just seemed silly. In the end
> I made /proc/PID/hardwall a file, either empty, or else containing the
> hardwall id.

ok. I suppose you could make a non-hardwall file that you can link to,
but an empty file also sounds ok.

> The actual hardwalls are then in /proc/tile/hardwall/NN, where NN is the
> hardwall id. I wrote a very simple hardwall id allocate/free pair; the pid
> allocator seemed too tied to task_structs. We only need at most NR_CPUS
> hardwall ids, so it's pretty simple to just use a cpumask to hold the set
> of allocated hardwall IDs.

ok.

> The contents of the hardwall ID file are then just a cpulist of the cpus
> covered by the hardwall, rather than introducing a new convention (as
> quoted above, e.g. "2x2 1,1"). Individual tasks that are in the hardwall
> can be found by reading the "hardwall" files, and we can learn where they
> are bound in the hardwall by reading the "stat" file as is normal for
> learning process affinity.

Be careful with listing PID values in the hardwall files, as the PIDs
may not be unique or visible if you combine this with PID name spaces.
I guess the right solution would be to only list the tasks that are
present in the name space of the thread reading the file.

> > When you do a /sys/hypervisor/ interface, put everything into a subdirectory
> > under /sys/hypervisor with the name of your hypervisor, to avoid naming
> > conflicts, e.g.
> >
> > /sys/hypervisor/tilera-hv/board/board_serial
>
> I don't see an easy way to put a directory in /sys/hypervisor. It seems
> complex to create a kobject and a suitable class, etc., just for a
> subdirectory. Or is there something simple I'm missing? I'll keep looking.
>
> I also suspect just "tile" is an adequate subdirectory name here in the
> context of /sys/hypervisor, e.g. /sys/hypervisor/tile/version.

I just checked for other users. The only one I could find was
drivers/xen/sys-hypervisor.c, and it also doesn't use a subdirectory to
identify that hypervisor. It's probably more consistent if you also don't
do it then.

You can create a directory with multiple files using sysfs_create_group()
as the xen code does, but not nested directories.

Arnd

Arnd Bergmann

unread,

May 25, 2011, 4:40:02 PM5/25/11

On Wednesday 25 May 2011 22:31:37 Chris Metcalf wrote:

> On 5/25/2011 4:20 PM, Arnd Bergmann wrote:
> > On Wednesday 25 May 2011 21:18:05 Chris Metcalf wrote:
> >> The contents of the hardwall ID file are then just a cpulist of the cpus
> >> covered by the hardwall, rather than introducing a new convention (as
> >> quoted above, e.g. "2x2 1,1"). Individual tasks that are in the hardwall
> >> can be found by reading the "hardwall" files, and we can learn where they
> >> are bound in the hardwall by reading the "stat" file as is normal for
> >> learning process affinity.
> > Be careful with listing PID values in the hardwall files, as the PIDs
> > may not be unique or visible if you combine this with PID name spaces.
> > I guess the right solution would be to only list the tasks that are
> > present in the name space of the thread reading the file.
>

> Sorry not to be clearer -- I am no longer listing any PID values in the
> hardwall files, for that exact reason. You have to look at
> /proc/*/hardwall (or /proc/*/tasks/*/hardwall) to find the files that are
> in a particular hardwall. This pattern is not one that's normally directly
> useful, though, so I'm happy leaving it to userspace if it's desired.

Ok, thanks for the clarification.

Chris Metcalf

unread,

May 25, 2011, 4:40:02 PM5/25/11

On 5/25/2011 4:20 PM, Arnd Bergmann wrote:

> On Wednesday 25 May 2011 21:18:05 Chris Metcalf wrote:
>> The contents of the hardwall ID file are then just a cpulist of the cpus
>> covered by the hardwall, rather than introducing a new convention (as
>> quoted above, e.g. "2x2 1,1"). Individual tasks that are in the hardwall
>> can be found by reading the "hardwall" files, and we can learn where they
>> are bound in the hardwall by reading the "stat" file as is normal for
>> learning process affinity.
> Be careful with listing PID values in the hardwall files, as the PIDs
> may not be unique or visible if you combine this with PID name spaces.
> I guess the right solution would be to only list the tasks that are
> present in the name space of the thread reading the file.

Sorry not to be clearer -- I am no longer listing any PID values in the

hardwall files, for that exact reason. You have to look at
/proc/*/hardwall (or /proc/*/tasks/*/hardwall) to find the files that are
in a particular hardwall. This pattern is not one that's normally directly
useful, though, so I'm happy leaving it to userspace if it's desired.

>>> When you do a /sys/hypervisor/ interface, put everything into a subdirectory

>>> under /sys/hypervisor with the name of your hypervisor, to avoid naming
>>> conflicts, e.g.
>>>
>>> /sys/hypervisor/tilera-hv/board/board_serial

> I just checked for other users. The only one I could find was
> drivers/xen/sys-hypervisor.c, and it also doesn't use a subdirectory to
> identify that hypervisor. It's probably more consistent if you also don't
> do it then.
>
> You can create a directory with multiple files using sysfs_create_group()
> as the xen code does, but not nested directories.

I'll look into sysfs_create_group(), and then send a revised patch with all
the /proc and /sys changes. Thanks!

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

Chris Metcalf

unread,

May 26, 2011, 12:50:02 PM5/26/11

This change introduces a few of the less controversial /proc and

/proc/sys interfaces for tile, along with sysfs attributes for
various things that were originally proposed as /proc/tile files.
It also adjusts the "hardwall" proc API.

One knob turned out to be similar enough to the existing
/proc/sys/debug/exception-trace that it was re-implemented to use

that model instead.

Another knob was /proc/tile/grid, which reported the "grid" dimensions

of a tile chip (e.g. 8x8 processors = 64-core chip). Arnd suggested

looking at sysfs for that, so this change moves that information
to a pair of sysfs attributes (chip_width and chip_height) in the

/sys/devices/system/cpu directory. We also put the "chip_serial"
and "chip_revision" information from our old /proc/tile/board file
as attributes in /sys/devices/system/cpu.

Other information collected via hypervisor APIs is now placed in
/sys/hypervisor. We create a /sys/hypervisor/type file (holding the
constant string "tilera") to be parallel with the Xen use of
/sys/hypervisor/type holding "xen". We create three top-level files,
"version" (the hypervisor's own version), "config_version" (the
version of the configuration file), and "hvconfig" (the contents of
the configuration file). The remaining information from our old
/proc/tile/board and /proc/tile/switch files becomes an attribute
group appearing under /sys/hypervisor/board/.

Finally, after some feedback from Arnd Berghamm for the previous
version of this patch, the /proc/tile/hardwall file is split up into
two conceptual parts. First, a directory /proc/tile/hardwall/ which
contains one file per active hardwall, each file named after the
hardwall's ID and holding a cpulist that says which cpus are enclosed by
the hardwall. Second, a /proc/PID file "hardwall" that is either
empty (for non-hardwall-using processes) or contains the hardwall ID.

Finally, this change pushes the /proc/sys/tile/unaligned_fixup/
directory, with knobs controlling the kernel code for handling the
fixup of unaligned exceptions.

Signed-off-by: Chris Metcalf <cmet...@tilera.com>
---

arch/tile/Kconfig | 1 +
arch/tile/include/asm/hardwall.h | 15 +++-
arch/tile/kernel/Makefile | 2 +-
arch/tile/kernel/hardwall.c | 90 ++++++++++++++-----
arch/tile/kernel/proc.c | 73 +++++++++++++++
arch/tile/kernel/sysfs.c | 185 ++++++++++++++++++++++++++++++++++++++
fs/proc/base.c | 9 ++
7 files changed, 347 insertions(+), 28 deletions(-)
create mode 100644 arch/tile/kernel/sysfs.c

diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 635e1bf..3f7d63c 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -12,6 +12,7 @@ config TILE
select GENERIC_IRQ_PROBE
select GENERIC_PENDING_IRQ if SMP
select GENERIC_IRQ_SHOW
+ select SYS_HYPERVISOR

# FIXME: investigate whether we need/want these options.
# select HAVE_IOREMAP_PROT

diff --git a/arch/tile/include/asm/hardwall.h b/arch/tile/include/asm/hardwall.h
index 0bed3ec..2ac4228 100644
--- a/arch/tile/include/asm/hardwall.h
+++ b/arch/tile/include/asm/hardwall.h
@@ -40,6 +40,10 @@
#define HARDWALL_DEACTIVATE \
_IO(HARDWALL_IOCTL_BASE, _HARDWALL_DEACTIVATE)

+#define _HARDWALL_GET_ID 4
+#define HARDWALL_GET_ID \
+ _IO(HARDWALL_IOCTL_BASE, _HARDWALL_GET_ID)
+
#ifndef __KERNEL__

/* This is the canonical name expected by userspace. */
@@ -47,9 +51,14 @@

#else

-/* Hook for /proc/tile/hardwall. */
-struct seq_file;
-int proc_tile_hardwall_show(struct seq_file *sf, void *v);
+/* /proc hooks for hardwall. */
+struct proc_dir_entry;
+#ifdef CONFIG_HARDWALL
+void proc_tile_hardwall_init(struct proc_dir_entry *root);
+int proc_pid_hardwall(struct task_struct *task, char *buffer);
+#else
+static inline void proc_tile_hardwall_init(struct proc_dir_entry *root) {}
+#endif

#endif

diff --git a/arch/tile/kernel/hardwall.c b/arch/tile/kernel/hardwall.c
index 3bddef7..8c41891 100644
--- a/arch/tile/kernel/hardwall.c
+++ b/arch/tile/kernel/hardwall.c
@@ -40,16 +40,25 @@
struct hardwall_info {
struct list_head list; /* "rectangles" list */
struct list_head task_head; /* head of tasks in this hardwall */
+ struct cpumask cpumask; /* cpus in the rectangle */
int ulhc_x; /* upper left hand corner x coord */
int ulhc_y; /* upper left hand corner y coord */
int width; /* rectangle width */
int height; /* rectangle height */
+ int id; /* integer id for this hardwall */
int teardown_in_progress; /* are we tearing this one down? */
};

/* Currently allocated hardwall rectangles */
static LIST_HEAD(rectangles);

+/* /proc/tile/hardwall */
+static struct proc_dir_entry *hardwall_proc_dir;
+
+/* Functions to manage files in /proc/tile/hardwall. */
+static void hardwall_add_proc(struct hardwall_info *rect);
+static void hardwall_remove_proc(struct hardwall_info *rect);
+
/*
* Guard changes to the hardwall data structures.
* This could be finer grained (e.g. one lock for the list of hardwall
@@ -105,6 +114,8 @@ static int setup_rectangle(struct hardwall_info *r, struct cpumask *mask)
r->ulhc_y = cpu_y(ulhc);
r->width = cpu_x(lrhc) - r->ulhc_x + 1;
r->height = cpu_y(lrhc) - r->ulhc_y + 1;
+ cpumask_copy(&r->cpumask, mask);
+ r->id = ulhc; /* The ulhc cpu id can be the hardwall id. */

/* Width and height must be positive */
if (r->width <= 0 || r->height <= 0)
@@ -388,6 +399,9 @@ static struct hardwall_info *hardwall_create(
/* Set up appropriate hardwalling on all affected cpus. */
hardwall_setup(rect);

+ /* Create a /proc/tile/hardwall entry. */
+ hardwall_add_proc(rect);
+
return rect;
}

@@ -645,6 +659,9 @@ static void hardwall_destroy(struct hardwall_info *rect)
/* Restart switch and disable firewall. */
on_each_cpu_mask(&mask, restart_udn_switch, NULL, 1);

+ /* Remove the /proc/tile/hardwall entry. */
+ hardwall_remove_proc(rect);
+
/* Now free the rectangle from the list. */
spin_lock_irqsave(&hardwall_lock, flags);
BUG_ON(!list_empty(&rect->task_head));
@@ -654,35 +671,57 @@ static void hardwall_destroy(struct hardwall_info *rect)
}

-/*
- * Dump hardwall state via /proc; initialized in arch/tile/sys/proc.c.
- */
-int proc_tile_hardwall_show(struct seq_file *sf, void *v)
+static int hardwall_proc_show(struct seq_file *sf, void *v)
{
- struct hardwall_info *r;
+ struct hardwall_info *rect = sf->private;
+ char buf[256];

- if (udn_disabled) {
- seq_printf(sf, "%dx%d 0,0 pids:\n", smp_width, smp_height);
- return 0;
- }
-
- spin_lock_irq(&hardwall_lock);
- list_for_each_entry(r, &rectangles, list) {
- struct task_struct *p;
- seq_printf(sf, "%dx%d %d,%d pids:",
- r->width, r->height, r->ulhc_x, r->ulhc_y);
- list_for_each_entry(p, &r->task_head, thread.hardwall_list) {
- unsigned int cpu = cpumask_first(&p->cpus_allowed);
- unsigned int x = cpu % smp_width;
- unsigned int y = cpu / smp_width;
- seq_printf(sf, " %d@%d,%d", p->pid, x, y);
- }
- seq_printf(sf, "\n");
- }
- spin_unlock_irq(&hardwall_lock);
+ int rc = cpulist_scnprintf(buf, sizeof(buf), &rect->cpumask);
+ buf[rc++] = '\n';
+ seq_write(sf, buf, rc);
return 0;
}

+static int hardwall_proc_open(struct inode *inode,

+ struct file *file)
+{

+ return single_open(file, hardwall_proc_show, PDE(inode)->data);
+}
+
+static const struct file_operations hardwall_proc_fops = {
+ .open = hardwall_proc_open,

+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+

+static void hardwall_add_proc(struct hardwall_info *rect)
+{
+ char buf[64];
+ snprintf(buf, sizeof(buf), "%d", rect->id);
+ proc_create_data(buf, 0444, hardwall_proc_dir,
+ &hardwall_proc_fops, rect);
+}
+
+static void hardwall_remove_proc(struct hardwall_info *rect)
+{
+ char buf[64];
+ snprintf(buf, sizeof(buf), "%d", rect->id);
+ remove_proc_entry(buf, hardwall_proc_dir);
+}
+
+int proc_pid_hardwall(struct task_struct *task, char *buffer)
+{
+ struct hardwall_info *rect = task->thread.hardwall;
+ return rect ? sprintf(buffer, "%d\n", rect->id) : 0;
+}
+
+void proc_tile_hardwall_init(struct proc_dir_entry *root)
+{
+ if (!udn_disabled)
+ hardwall_proc_dir = proc_mkdir("hardwall", root);
+}
+

/*
* Character device support via ioctl/close.
@@ -716,6 +755,9 @@ static long hardwall_ioctl(struct file *file, unsigned int a, unsigned long b)
return -EINVAL;
return hardwall_deactivate(current);

+ case _HARDWALL_GET_ID:
+ return rect ? rect->id : -EINVAL;
+
default:
return -EINVAL;
}
diff --git a/arch/tile/kernel/proc.c b/arch/tile/kernel/proc.c
index 2e02c41..62d8208 100644

--- a/arch/tile/kernel/proc.c
+++ b/arch/tile/kernel/proc.c
@@ -27,6 +27,7 @@
#include <asm/processor.h>
#include <asm/sections.h>
#include <asm/homecache.h>
+#include <asm/hardwall.h>
#include <arch/chip.h>

@@ -88,3 +89,75 @@ const struct seq_operations cpuinfo_op = {

.stop = c_stop,
.show = show_cpuinfo,
};
+
+/*
+ * Support /proc/tile directory
+ */
+

+static int __init proc_tile_init(void)
+{

+ struct proc_dir_entry *root = proc_mkdir("tile", NULL);
+ if (root == NULL)
+ return 0;
+
+ proc_tile_hardwall_init(root);

index 0000000..b671a86
--- /dev/null
+++ b/arch/tile/kernel/sysfs.c
@@ -0,0 +1,185 @@

+/*
+ * Copyright 2011 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ *
+ * /sys entry support.
+ */
+
+#include <linux/sysdev.h>
+#include <linux/cpu.h>

+#include <linux/slab.h>
+#include <linux/smp.h>
+#include <hv/hypervisor.h>
+
+/* Return a string queried from the hypervisor, truncated to page size. */
+static ssize_t get_hv_confstr(char *page, int query)
+{
+ ssize_t n = hv_confstr(query, (unsigned long)page, PAGE_SIZE - 1);
+ n = n < 0 ? 0 : min(n, (ssize_t)PAGE_SIZE - 1) - 1;
+ if (n)
+ page[n++] = '\n';
+ page[n] = '\0';
+ return n;
+}

+
+static ssize_t chip_width_show(struct sysdev_class *dev,
+ struct sysdev_class_attribute *attr,
+ char *page)
+{
+ return sprintf(page, "%u\n", smp_width);
+}
+static SYSDEV_CLASS_ATTR(chip_width, 0444, chip_width_show, NULL);
+
+static ssize_t chip_height_show(struct sysdev_class *dev,
+ struct sysdev_class_attribute *attr,
+ char *page)
+{
+ return sprintf(page, "%u\n", smp_height);
+}
+static SYSDEV_CLASS_ATTR(chip_height, 0444, chip_height_show, NULL);
+

+static ssize_t chip_serial_show(struct sysdev_class *dev,

+ struct sysdev_class_attribute *attr,
+ char *page)
+{

+ return get_hv_confstr(page, HV_CONFSTR_CHIP_SERIAL_NUM);
+}
+static SYSDEV_CLASS_ATTR(chip_serial, 0444, chip_serial_show, NULL);
+
+static ssize_t chip_revision_show(struct sysdev_class *dev,

+ struct sysdev_class_attribute *attr,
+ char *page)
+{

+ return get_hv_confstr(page, HV_CONFSTR_CHIP_REV);
+}
+static SYSDEV_CLASS_ATTR(chip_revision, 0444, chip_revision_show, NULL);
+
+
+static ssize_t type_show(struct sysdev_class *dev,

+ struct sysdev_class_attribute *attr,
+ char *page)
+{

+ return sprintf(page, "tilera\n");
+}
+static SYSDEV_CLASS_ATTR(type, 0444, type_show, NULL);
+
+#define HV_CONF_ATTR(name, conf) \
+ static ssize_t name ## _show(struct sysdev_class *dev, \

+ struct sysdev_class_attribute *attr, \
+ char *page) \
+ { \

+ return get_hv_confstr(page, conf); \
+ } \
+ static SYSDEV_CLASS_ATTR(name, 0444, name ## _show, NULL);
+
+HV_CONF_ATTR(version, HV_CONFSTR_HV_SW_VER)
+HV_CONF_ATTR(config_version, HV_CONFSTR_HV_CONFIG_VER)
+
+HV_CONF_ATTR(board_part, HV_CONFSTR_BOARD_PART_NUM)
+HV_CONF_ATTR(board_serial, HV_CONFSTR_BOARD_SERIAL_NUM)
+HV_CONF_ATTR(board_revision, HV_CONFSTR_BOARD_REV)
+HV_CONF_ATTR(board_description, HV_CONFSTR_BOARD_DESC)
+HV_CONF_ATTR(mezz_part, HV_CONFSTR_MEZZ_PART_NUM)
+HV_CONF_ATTR(mezz_serial, HV_CONFSTR_MEZZ_SERIAL_NUM)
+HV_CONF_ATTR(mezz_revision, HV_CONFSTR_MEZZ_REV)
+HV_CONF_ATTR(mezz_description, HV_CONFSTR_MEZZ_DESC)
+HV_CONF_ATTR(switch_control, HV_CONFSTR_SWITCH_CONTROL)
+
+static struct attribute *board_attrs[] = {
+ &attr_board_part.attr,
+ &attr_board_serial.attr,
+ &attr_board_revision.attr,
+ &attr_board_description.attr,
+ &attr_mezz_part.attr,
+ &attr_mezz_serial.attr,
+ &attr_mezz_revision.attr,
+ &attr_mezz_description.attr,
+ &attr_switch_control.attr,
+ NULL
+};
+
+static struct attribute_group board_attr_group = {
+ .name = "board",
+ .attrs = board_attrs,
+};
+
+
+static struct bin_attribute hvconfig_bin;
+
+static ssize_t
+hvconfig_bin_read(struct file *filp, struct kobject *kobj,
+ struct bin_attribute *bin_attr,
+ char *buf, loff_t off, size_t count)
+{
+ static size_t size;
+
+ /* Lazily learn the true size (minus the trailing NUL). */
+ if (size == 0)
+ size = hv_confstr(HV_CONFSTR_HV_CONFIG, 0, 0) - 1;
+
+ /* Check and adjust input parameters. */
+ if (off > size)
+ return -EINVAL;
+ if (count > size - off)
+ count = size - off;
+
+ if (count) {
+ /* Get a copy of the hvc and copy out the relevant portion. */
+ char *hvc;
+
+ size = off + count;
+ hvc = kmalloc(size, GFP_KERNEL);
+ if (hvc == NULL)
+ return -ENOMEM;
+ hv_confstr(HV_CONFSTR_HV_CONFIG, (unsigned long)hvc, size);
+ memcpy(buf, hvc + off, count);
+ kfree(hvc);
+ }
+
+ return count;
+}
+
+static int __init create_sysfs_entries(void)

+{
+ struct sysdev_class *cls = &cpu_sysdev_class;
+ int err = 0;
+

+#define create_cpu_attr(name) \
+ if (!err) \
+ err = sysfs_create_file(&cls->kset.kobj, &attr_##name.attr);
+ create_cpu_attr(chip_width);
+ create_cpu_attr(chip_height);
+ create_cpu_attr(chip_serial);
+ create_cpu_attr(chip_revision);
+
+#define create_hv_attr(name) \
+ if (!err) \
+ err = sysfs_create_file(hypervisor_kobj, &attr_##name.attr);
+ create_hv_attr(type);
+ create_hv_attr(version);
+ create_hv_attr(config_version);
+
+ if (!err)
+ err = sysfs_create_group(hypervisor_kobj, &board_attr_group);
+
+ if (!err) {
+ sysfs_bin_attr_init(&hvconfig_bin);
+ hvconfig_bin.attr.name = "hvconfig";
+ hvconfig_bin.attr.mode = S_IRUGO;
+ hvconfig_bin.read = hvconfig_bin_read;
+ hvconfig_bin.size = PAGE_SIZE;
+ err = sysfs_create_bin_file(hypervisor_kobj, &hvconfig_bin);
+ }

+
+ return err;
+}

+subsys_initcall(create_sysfs_entries);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index dfa5327..3ad615f 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -83,6 +83,9 @@
#include <linux/pid_namespace.h>
#include <linux/fs_struct.h>
#include <linux/slab.h>
+#ifdef CONFIG_HARDWALL
+#include <asm/hardwall.h>
+#endif
#include "internal.h"

/* NOTE:
@@ -2894,6 +2897,9 @@ static const struct pid_entry tgid_base_stuff[] = {
#ifdef CONFIG_TASK_IO_ACCOUNTING
INF("io", S_IRUGO, proc_tgid_io_accounting),
#endif
+#ifdef CONFIG_HARDWALL
+ INF("hardwall", S_IRUGO, proc_pid_hardwall),
+#endif
};

static int proc_tgid_base_readdir(struct file * filp,
@@ -3232,6 +3238,9 @@ static const struct pid_entry tid_base_stuff[] = {
#ifdef CONFIG_TASK_IO_ACCOUNTING
INF("io", S_IRUGO, proc_tid_io_accounting),
#endif
+#ifdef CONFIG_HARDWALL
+ INF("hardwall", S_IRUGO, proc_pid_hardwall),
+#endif
};

static int proc_tid_base_readdir(struct file * filp,

Arnd Bergmann

unread,

May 27, 2011, 10:30:01 AM5/27/11

On Thursday 26 May 2011, Chris Metcalf wrote:
> This change introduces a few of the less controversial /proc and
> /proc/sys interfaces for tile, along with sysfs attributes for
> various things that were originally proposed as /proc/tile files.
> It also adjusts the "hardwall" proc API.

Looks good to me now, except

> Finally, after some feedback from Arnd Berghamm for the previous

typo ^^^^

Reviewed-by: Arnd Bergmann <ar...@arndb.de>

0 new messages