Which windows ? win2k, nt4, or what ? They are different ...
> that require more than 2GB of memory
> for the place and route? I have read that Xilinx now supports their
> tools running Linux (using WINE).
Doesn't help you here ...
> Does this new method allow users to
> exceed the 2GB limitations or is this a function of application
> executable? We are looking to complete an upcoming design that may
> require a Virtex 2 (xc2v8000) and xilinx recommends that you need at
> least 3 GB of memory for place and route.
There is/was a special version of NT4, supporting 3 GByte User space
> What are most people doing
> to overcome this? Are there any relatively cheap unix based machines
> ($<10,000) that can accomplish this? Any recommendations would be
> greatly appreciated.
No recommendations. Just more questions ... ;-)
cheers
http://www.xilinx.com/prs_rls/software/0225_Em_Linux.html
Austin
Regards
mschre...@yahoo.com (M Schreiber) wrote in message news:<e8caa675.02030...@posting.google.com>...
That even with Linux on the PC platform, that 2 Gbytes is the limit. There
are people looking into why this is. Stay tuned for the answer. This may be
related to the hardware they are using, or the build of Linux....
Austin
Are you sure it doesn't relate to the 32 bit address space (4 GB) of the x86
class processor? Windows, I know reserves the high 2 Gb of memory for the
operating system, leaving the lower 2 GB for application data. It may be
possible to squeeze the OS space to 3GB as a special version is reputed to
do, but the basic limitation in the CPU addressing.
We've run into memory problems when running PAR for a V2-6000 on Win2K
machines (more than 2 GB RAM installed).
Aside from general address-space problems, we've found that Xilinx has
(had?) a problem with its tools: when running the PAR tool, it didn't
release allocated memory when moving from Place to Route phase. We've
worked around this problem by splitting the PAR into two separate
processes. To do that, you must change from GUI design-flow to script
design-flow (which you probably have to do anyway in a large design:
the GUI doesn't allow full control of all required tool options).
General Memory Problem: the problem is that a 32-bit processor (like all
x86 PCs) has 4 GB of address space. x86 processors have been able for
some years to support more than 4 GB of physical memory, but accessing
it is awkward (similar to the infamous segmented addresses of 16-bit x86).
All MS OSes split the virtual address seen by each application into two:
user and OS space. In most versions, each application has 2 GB of address
space and the other 2 GB is allocated to the OS.
There are Advanced Server versions of both NT 4 and Win2K which allocate
3 GB to the user application; but they are VERY expensive (thousands of $);
it's an expensive solution to install in every workstation.
I've read somewhere that Linux can allocate 3 GB of virtual address-space
for user applications, but I am not familiar with Linux.
The best solution is to move to 64-bit processors: Sun, IBM and HP (and
some others) offer 64-bit workstations now, and there are a lot of EDA tools
available (including all Xilinx tools). This is also a pretty expensive
solution: workstations are much more expensive than PCs (probably slower
nowaday), and EDA S/W for Unix is usually much more expensive than the
same S/W for Windows (not familiar with Linux pricing).
Right now, we are eagerly waiting for the AMD Hammer - this appears to be
the perfect solution for us Windows users who are trying to build big FPGAs.
The hammer looks like a GREAT solution, assuming Microsoft will
support it. The cost of the 64 bit extention is remarkably low (4%
area penalty, according to AMD), and AMD is already beating Intel on
the performance side. The SMP solution for 2-8 nodes is also nice and
clean.
What I worry is that M$ will only support the hammer under NT "Server"
variants, meaning that it wouldn't be accesable for workstation use
and CAD tools.
Also, it will still require a recompilation/restructuring to take
advantage of the 64 bit address space.
--
Nicholas C. Weaver nwe...@cs.berkeley.edu
hopefully Xilinx will start supporting native Linux so there won't be
a need for any MS OS to run FPGA EDA tools.
Muzaffer Kal
http://www.dspia.com
ASIC/FPGA design/verification consulting specializing in DSP algorithm implementations
And it will be an even better solution, if M$ software isn't running
at all ;-) Linux for x86-64 is up and running afaik ;-)
And since Xilinx and most other EDA vendors support Linux now, why
step down to the bluescreen?
And yes, Linux user processes are normally limited to 3 GB memory
on 32bit plattforms like x86.
Lars
I'm not sure exactly, but there was an article in uk.comp.os.linux
recently which mentioned that by default there are limits on how RAM is
allocated. If you are prepared to re-compile your kernel (*) you can
fiddle with
CONFIG_NOHIGHMEM=y
and change it to n (no).
Alan
(*) I'm of course referring to the Linux kernel here, not Windows :-)
<snip>
--
Alan Fitch
DOULOS Ltd.
Church Hatch, 22 Market Place, Ringwood, Hampshire BH24 1AW, United Kingdom
Tel: +44 1425 471223 Email: alan....@doulos.com
Fax: +44 1425 471573 Web: http://www.doulos.com
**********************************
** Developing design know-how **
**********************************
This e-mail and any attachments are confidential and Doulos Ltd. reserves
all rights of privilege in respect thereof. It is intended for the use of
the addressee only. If you are not the intended recipient please delete it
from your system, any use, disclosure, or copying of this document is
unauthorised. The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
So unless you specialy fine tune, the linux kernel (by moving the
0xc0000000 kernel start to something higher, to winn some 512 mega bytes
or so) and the c-librarys way of allocating memory and placing shared
libraries you probably stuck with the 2Gig limit.
The large memory extention (upto 64Gig physical memory) will allow you
to have more processes into memory than would fit in a 4Gig address
space. You could think of this as swapping to RAM, instead of swapping
to disk, it has cost but is way faster than disk-swapping.
But unlike the tools are specialy designed to deal with the memory limit
(you never going to get more than 4Gig linear memory on a 32bit machine)
you will have to move to a 64bit pladform like an alpha or maybe
upcomming ia64 or AMD x86-64 (but that will needed porting of the tools
also, cause 64bit pointers don't fit into a 32bit int :-).
- Erwin
Actually, you seem to support running the tools on Wine, which has been
said to work for quite a while now. Wine might be emulating the memory
models of Windows, so the same memory limit would apply.
Hamish
--
Hamish Moffatt VK3SB <ham...@debian.org> <ham...@cloud.net.au>
> The best solution is to move to 64-bit processors: Sun, IBM and HP (and
> some others) offer 64-bit workstations now, and there are a lot of EDA
> tools available (including all Xilinx tools). This is also a pretty
> expensive solution: workstations are much more expensive than PCs
> (probably slower nowaday), and EDA S/W for Unix is usually much more
> expensive than the same S/W for Windows (not familiar with Linux
> pricing).
>
> Right now, we are eagerly waiting for the AMD Hammer - this appears to
> be the perfect solution for us Windows users who are trying to build big
> FPGAs.
This is the one solution. Use 64-bit pointers. Available in Linux
nowadays, upcoming for Windows in the future (but perhaps only in special
versions). Or on Workstations.
Another idea would be to use a cluster for P&R. So, therefore P&R must be
done for some independant and seperabel parts. I don't know how P&R
software works and I'm afraid, it always needs the whole memory region,
but if smart programmers find a way to split P&R into several standalone
processes with only few communication, the easiest solution is a cluster.
On machines with more than 3GB RAM each process itself can use 3GB.
Several processes so can use the whole memory. So, it isn't even necessary
to distribute the processes to numerous PCs.
Linux is used very often for clusters, look for Mosix or Beowulf. In an
office with several FPGA designers all their PCs can be equiped with e.g.
Mosix and if one of them needs to do a P&R cycle, all PCs work together.
The priority system in Linux will do the rest, so all other workers don't
recognize a lot of the heavy load on their machine, because local
processes get fast reaction.
"Fork and forget". Use more than one process for P&R and the rest is done
by the OS and the cluster Software.
Advantage:
- each PC can be a "normal" one with a rational amount of RAM and a not
too expensive processor
- enormous speed up of P&R
- we will get native Linux tools :-)
Or, why should your processor wait for your next keypress running at
2GHz? ;-)
Bye
Hansi
> to overcome this? Are there any relatively cheap unix based machines
> ($<10,000) that can accomplish this? Any recommendations would be
A SUN Blade 1000 costs ca. $10k with a single GigaByte memory if
memory serves me right. Under Solaris you can run multiple PAR jobs in
parallel using the -m option in PAR. This option is not available
under Windows.
Petter
--
________________________________________________________________________
Petter Gustad 8'h2B | (~8'h2B) - Hamlet in Verilog http://gustad.com
Ray Andraka wrote:
> It sounds like we are all barking up the wrong tree. The solution is not to
> go to more and more memory...There is something to be said about using low
> cost PCs...instead, Xilinx needs to get the modular flow working so that one
> can truely partition a design and run each partition through the entire
> process, including place and route. The individual completed parts can then
> be stitched together. Until the modular flow is working though, we are stuck
> with the memory limits of the machines we work on.
>
> Johann Glaser wrote:
>
Ray,
What are the limitations on the modular design flow in real life ? I've looked at
it and it seems clumsy and awkward but useable as long as the device density
isn't too high, otherwise the limitation to rectangular regions will make layout
difficult.
It also seems that the new X-Y grid system might make laying out the blocks less
fraught ?
Because they are supporting the tools in WINE, so you keep the windows
API and associated limitations.
Just to be clear here, the Xilinx Linux solution is currently limited to
command-line only on WINE. For those unfamiliar with this approach, check
out http://www.winehq.com . Briefly, WINE is a piece of software that sits
between a Windows application and Linux and maps your Windows system calls
to Linux OS calls. The whole point to WINE is that you don't have to change
your Windows application at all - rather you make sure that WINE is smart
enough to handle all the pieces of Windows that your program uses. Running
Xilinx tools on Linux via WINE means that Xilinx made sure that WINE was
able to handle all the system calls the the command-line version of their
tools required. There were probably few (if any) changes made to the
original Xilinx Windows software.
Any limitations that the Xilinx Windows software has under Windows will
still exist on WINE-on-Linux.
BLATANT ALTERA PROMOTIONAL PLUG:
The Altera Quartus II Linux port is a true Linux port tested for
compatibility with RedHat's v7.1 Linux release.
END BLATANT ALTERA PROMOTIONAL PLUG.
-Pete-
>It sounds like we are all barking up the wrong tree. The solution is not to
>go to more and more memory...There is something to be said about using low
>cost PCs...instead, Xilinx needs to get the modular flow working so that one
>can truely partition a design and run each partition through the entire
>process, including place and route. The individual completed parts can then
>be stitched together. Until the modular flow is working though, we are stuck
>with the memory limits of the machines we work on.
Quite apart from the reduction in memory, this would also make PAR
much quicker!
I'm sure some of the algorithms they use must be at least O(N^2), so
breaking the problem into two halves will halve the total time taken.
(Or something like that.)
BTW, I see a significant speedup for large designs in Synplify if I
use the syn_hier attribute, which stops optimisations across module
boundaries.
Regards,
Allan.
No to mention that if you can do a good partitioning, you are a long
way towards making the tools able to use SMP and SMT architectures.
> Lars Rzymianowicz <lar...@atoll-net.de> wrote in message
> news:3C85C76F...@atoll-net.de...
> > And since Xilinx and most other EDA vendors support Linux now, why
> > step down to the bluescreen?
>
> Just to be clear here, the Xilinx Linux solution is currently limited to
> command-line only on WINE. For those unfamiliar with this approach, check
[...snip...]
Does anybody know why Xilinx did this rather than porting their
Solaris/HP-UX version to Linux?
> It sounds like we are all barking up the wrong tree. The solution is not to
> go to more and more memory...There is something to be said about using low
> cost PCs...instead, Xilinx needs to get the modular flow working so that one
> can truely partition a design and run each partition through the entire
[snip]
Johann Glaser:
> > Another idea would be to use a cluster for P&R. So, therefore P&R must be
> > done for some independant and seperabel parts. I don't know how P&R
[snip]
I agree with both of you. Even in a hierarchical/modular design
methodology you could benefit greatly of a cluster of cheap PCs do do
your job.
The EDA vendors seems to spend a lot of effort working on their GUIs.
I would like to see some effort spent on parallel synthesis and PAR
tools. The problem is that this is difficult to add later, you have to
design it in from the start.
I found that the tool was just unable to properly place the logic. In
particular it had no idea how to place the interconnecting flip-flops
between the blocks. The modules I defined had very clean interfaces
(of up to 100 bits). P&R times were huge with very poor results.
I conclude that the idea is good but the current implementation is
unusable in my application.
We've found that Xilinx tools running on HP Unix aren't as fast
as those running on a (comparable ?) PC.
I think this is because the tools are written on PCs and ported
to Unix/workstation architecture.
Porting this port back to Linux/PC architecture again might have the
effect of slowing them even more.
Nial (speculating wildly).
> BTW, I see a significant speedup for large designs in Synplify if I
> use the syn_hier attribute, which stops optimisations across module
> boundaries.
How big a speedup? I have wondered about this but never had an
opportunity to measure it on truly large designs.
Presumably the gain increases as the design size goes up. When
syn_hier is asserted I would expect compilation time to be linear
with design size. Without this attribute, is compilation time
quadratic with size, or worse?
If the company coding style include flops at the exit of all logic
blocks I would not expect syn_hier to have much effect, though this
style can be pretty hard to enforce for the control logic part of an
application.
Maybe Ken McElvain, or another Synplicity expert, can comment?
> Petter Gustad wrote:
> >
> > "Peter Ormsby" <faepete.d...@attbi.com> writes:
> >
> > > Lars Rzymianowicz <lar...@atoll-net.de> wrote in message
> > > news:3C85C76F...@atoll-net.de...
> > > > And since Xilinx and most other EDA vendors support Linux now, why
> > > > step down to the bluescreen?
> > >
> > > Just to be clear here, the Xilinx Linux solution is currently limited to
> > > command-line only on WINE. For those unfamiliar with this approach, check
> > [...snip...]
> >
> > Does anybody know why Xilinx did this rather than porting their
> > Solaris/HP-UX version to Linux?
> >
> > Petter
>
> We've found that Xilinx tools running on HP Unix aren't as fast
> as those running on a (comparable ?) PC.
I've found that the Xilinx tools running on some 5-6 year old SUN
machines are not as fast as those running on a 6 month old PC. I
expeced this was due to the slow machine.
Does anybody here have access to a 900MHz UltraSparc III style machine
to do a benchmark against a Windows PC?
> I think this is because the tools are written on PCs and ported
> to Unix/workstation architecture.
I heard some rumors that Xilinx developers used Linux as a development
platform internally since it doesn't crash as frequently as Windows.
Actually the command line tools (ngdbuild, map, par etc.) appears to
be UNIX style programs, at least they use -options rather than
/options. The first time I used XACT was on a SPARC/SunOS plattform.
> Porting this port back to Linux/PC architecture again might have the
> effect of slowing them even more.
Somebody (here?) reported a speed increase over Windows when they were
running under WINE. This would indicate the contrary it there were no
measurement errors.
> Actually the command line tools (ngdbuild, map, par etc.) appears to
> be UNIX style programs, at least they use -options rather than
> /options. The first time I used XACT was on a SPARC/SunOS plattform.
The first time I used XACT was on a 8MHz 8086, running MS-DOS 3.1.
If this was developed on UNIX, the boys and girls at Xilinx sure
knew a thing or two about writing portable code :-)
>Allan Herriman wrote
>
>> BTW, I see a significant speedup for large designs in Synplify if I
>> use the syn_hier attribute, which stops optimisations across module
>> boundaries.
>
>How big a speedup? I have wondered about this but never had an
>opportunity to measure it on truly large designs.
A single syn_heir on one of the top level modules took the time from 4
hours to about 2.5 hours for one particular design.
>Presumably the gain increases as the design size goes up. When
>syn_hier is asserted I would expect compilation time to be linear
>with design size. Without this attribute, is compilation time
>quadratic with size, or worse?
>
>If the company coding style include flops at the exit of all logic
>blocks I would not expect syn_hier to have much effect, though this
>style can be pretty hard to enforce for the control logic part of an
>application.
Synplify will "optimise" to a certain extent across flip flops. It
will do things like fanout control, and (optionally) pipelining, which
allows it to move combinatorial logic to the other side of a flip
flop, to improve the overall cycle time.
I always have pipelining turned off though.
Regards,
Allan.
That must have been long before me then. It must have been on a
SparcStation II in the early 1990's.
Regards
Petter Gustad <newsma...@gustad.com> wrote in message news:<m3k7so7...@scimul.dolphinics.no>...