(Note that this sort of thing isn't specific to
Synplicity. Leonardo does a lot of wacky things too.
Sadly, Synplicity is probably the better tool.)
You'd think a simple 56-bit counter would be no
problem. The code below should synthesize to
1 logic level using 56 LUTs, 56 FFs and a carry chain.
Instead, Synplicity adds an extra 57 LUTs and an extra
level of logic.
module Cnt56(K, CE, R, Out);
input K, CE, R;
output [55:0] Out;
reg [55:0] Q;
assign Out= Q;
always @(posedge K) Q <= R ? 0 : CE ? Q+1 : Q;
// if (R) Q <= 0; // This doesn't work either
// else if (CE) Q <= Q+1;
endmodule
Why? Instead of running the R signal directly to the
reset pin of the flip flop, Synplicity merges it into
the equations for Q and CE. So we get:
module Cnt56(K, CE, R, Out);
input K, CE, R;
output [55:0] Out;
reg [55:0] Q;
assign Out= Q;
wire [55:0] Q_plus_1 = Q+1;
wire [55:0] Q_and_R = R ? Q_plus_1 : 0;
wire CE_or_R = CE | R;
always @(posedge K) if (CE_or_R) Q <= Q_and_R;
endmodule
Workaround? I guess I have to instantiate an array of
56 flip flops and connect the signals the correct way.
This is really ugly, and makes my design Xilinx-specific.
If I want to use an Altera chip, I have to re-write the
code.
Which brings me to my point:
High level, abstract synthesis will never work well. I realize
this is an extreme statement, but if today's synthesizers can't
do better than a factor of 2 for really simple code, it's
hard to imagine a synthesizer in the near future that can
compile complex code efficiently.
Don Husby wrote:
On the specific point you are right in that Synplify doesn't appear to
be using resources efficiently in this case. Esp since it does the reset
function by first inverting the reset input and then feeding the
inverted value into the adder LUTs - missing a clear opportunity to
optimise. The MAP program may do this for you later on.
A work-around is to use an async reset.
On the more general case IMO you are missing some of the idea behind
synthesis from HDLs. This is to get the results you need from the most
portable/maintainable/reuseable/retargetable source format. This is very
much in the spirit of the `C' vs. Assembler debate of long ago. Memory
got bigger and cheaper, uPs got faster, and compilers got better =>
Assembler went Dodo for all but a handful of special cases. Similarly
FPGA and ASICs are getting bigger and faster so the inefficiencies of
synthesised code will become less important [synth tools are getting
better as well, as are FPGA/ASIC P&R tools].
Again IMO the question you need to ask is whether the synth'ed result
meets your needs in terms of speed/timing. If not then the second level
question is whether, in a time2market dominated industry, going up a
speed grade is better than spending a lot of time hand-tuning the logic.
Its always possible to hand-craft technology specific logic that goes
faster - the question is whether its worth it. YMMV.
A.
"Rick Filipkiewicz" <ri...@algor.co.uk> wrote in message
news:3BC4C920...@algor.co.uk...
>
>
> Don Husby wrote:
>
> > More wacky synthesis results from Synplicity:
> > module Cnt56(K, CE, R, Out);
> > input K, CE, R;
> > output [55:0] Out;
> > reg [55:0] Q;
> > assign Out= Q;
> > always @(posedge K) Q <= R ? 0 : CE ? Q+1 : Q;
> > // if (R) Q <= 0; // This doesn't work either
> > // else if (CE) Q <= Q+1;
> > endmodule
I dont know that much about verilog, but what I see in your
code is (in my opinion :) ) a synchronous reset, which ofcourse can
not be fed directly to the asynchronous reset inputs of the flip-flops.
Furthermore, we cannot implement such a huge counter with only 56 LUTs.
Normally, LUTs are say 4 input memory blocks, in which case we can do
any logical function of four inputs and one output. However, the highest
bit of the 56-bit counter needs the 55 lower bits in its enable input
plus the one used for enabling the actual counter. This is true for all
the bits, which increases the number of LUTs signicantly. I would bet
that 8-bit counter of the same type would consume about 12 LUTs (without
knowing the targer device).
So the counter is actually very optimized, its the designer who is not
in this case :) Furthermore, counters synthesize very efficiently in
todays tools, if used properly (the hdl code is clear). In any
commercial product I would advise to construct all counters from smaller
counters, say 4-bit counters. It eases the final production testing.
cya,
juza
: (Note that this sort of thing isn't specific to
A.
P.S. since we started on coding style :? is outlawed in C these days - it
should probably so the same way in verilog !
"Lähteenmäki Jussi" <ju...@cc.tut.fi> wrote in message
news:9q3saf$hqh$1...@news.cc.tut.fi...
> Don Husby <hus...@yahoo.com> wrote:
>
> I dont know that much about verilog, but what I see in your
> code is (in my opinion :) ) a synchronous reset, which ofcourse can
> not be fed directly to the asynchronous reset inputs of the flip-flops.
>
Andrew Brown wrote:
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email r...@andraka.com
http://www.andraka.com
"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
"Ray Andraka" <r...@andraka.com> wrote in message
news:3BC5A521...@andraka.com...
>If you have a global reset in your design, synplicity won't also use the reset
>input to the FF because it doesn't recognize the global reset as being a hidden
>dedicated net. I still have not found a satisfactory work around for this.
Hi Ray,
It used to recognise the global reset. This led to a problem with
6.0.0 in that it would remove the redundant reset from the EDIF which
in turn meant that it didn't retain the reset value (0 or 1) that you
had asked for. The back end tools would substitute the default (0).
Bad luck if you wanted 1.
AFAIK I was the first user to report this bug (and it took some weeks
to convince them that it was a bug). I wonder if the fix in more
recent versions of Synplify involved removing the ability to recognise
the use of GSR?
Regards,
Allan.
It would have been somewhat impressive if it managed to work
the reset function into the adder LUTs. At least this wouldn't
have cost anything in terms of size. Instead, it inserted
the reset logic after the adder carry chain. The mapper isn't
(and shouldn't be) smart enough to fix this.
> A work-around is to use an async reset.
It would have taken me longer to restructure my logic
to use an async reset than it would to simply instantiate
the 56 flip-flops. Besides, async resets bring bad luck. :)
> On the more general case IMO you are missing some of the idea behind
> synthesis from HDLs. This is to get the results you need from the most
> portable/maintainable/reuseable/retargetable source format.
I'm not missing the point. I'd dearly like to beleive the
dogma, but time and time again, I've found that it doesn't
even pay to try it. The reality is that the design entry
is about 20% of the job for a high speed FPGA design.
Far more time is spent trying to figure out how to convince the
tools to implement it sensibly. Usually this involves instatiating
technology-specific library components. I've found that
the fastest way to do a design is to throw away any illusions
of portability, etc., and code in a way that allows good control
of mapping and placement if needed later. And it WILL be needed
for some small percentage of the design.
(If you doubt the 20% number, take the example I gave. It takes
less than a minute to describe a 56-bit counter. It takes tens
of minutes to compile the design, find out that the counter isn't
meeting spec, push into the chip editor or floorplanner
to find out why, and then hack at the code to fix it. Of course
this doesn't count the extra time spent whining about it on
comp.arch.fpga)
> This is very much in the spirit of the `C' vs. Assembler
> debate of long ago.
It's similar, but not the same thing. For an FPGA design,
there is usually a hard performance limit. There is a fixed
clock rate, and a fixed number of resources. When the design
violates those limits, it doesn't work. Software has soft
limits. It has nearly unlimited virtual memory, and time
limited only by the user's patience. A factor of 2 performance
hit doesn't break the software.
It is a basic problem that stems from the primitive models I think, since there is
no global reset pin on them the global reset can't be shown as a net in the HDL.
The work around has been to put the GSR into the reset pin on the FFs, which works
only if your local reset is also async (the synthesis ORs them).
I dont know much about anything, but what I see in your answer
(in my opinion :) ) is that you dont know much about common current
FPGAs .
While Don did not mention which FPGAs he is targetting, almost all
FPGAs with 4 input LUTs have something called Carry-Logic, which
means that they can implement a counter at a cost of 1 LUT per bit.
In Xilinx Virtex, Virtex-E, Virtex-EM, Virtex-II, and Spartan-II parts, all
the flipflops include synchronous or asynchronous reset capability.
Your following recomendation of cutting the counter up into little 4
bit pieces might make sense if you were still playing with 74161
type devices, but it has not been true of RAM based FPGAs ever!
(May still be appropriate for antifuse FPGAs, but no-one uses them).
(in my opinion :) )
Philip
>So the counter is actually very optimized, its the designer who is not
>in this case :) Furthermore, counters synthesize very efficiently in
>todays tools, if used properly (the hdl code is clear). In any
>commercial product I would advise to construct all counters from smaller
>counters, say 4-bit counters. It eases the final production testing.
>
>cya,
>juza
Philip Freidin
Fliptronics
Sadly, you are right. BUT my inner voice says
DONT LETS THOSE SOFTWARE-CRAP BE YOUR GUIDE.
I mean, when the "Hello World" takes 100 Kb in C++, THIS IS REALLY CRAP.
I wont complain about a few LUTs or FF wasted, an for many design the
speed limit of the FPGA is far higher than required, but in general I
would like to do good, fast and slim designs. Not this loser C++ crap
;-)
--
MFG
Falk
Simply synchonize it, so its synchronus again but uses the advantages of
the asynchronous inputs.
--
MFG
Falk
If C++ or VHDL allows a good, fast, and slim - and solid, reusable -
specification of your design at that high level, that may be worth a
hundred times over any wasted gates and slow speed in the result, so
long as its implementation performance is adequate. In that sense then,
your design continues to be good, fast and slim, probably more so than
ever before.
Specifically on C++, it really is worth persuing the tenets of that
faith before writing it off - it may change your view of things.
However, if you take heed or not, best of luck!
|Sadly, you are right. BUT my inner voice says
|
|DONT LETS THOSE SOFTWARE-CRAP BE YOUR GUIDE.
|
|I mean, when the "Hello World" takes 100 Kb in C++, THIS IS REALLY
CRAP.
|I wont complain about a few LUTs or FF wasted, an for many design the
|speed limit of the FPGA is far higher than required, but in general I
|would like to do good, fast and slim designs. Not this loser C++ crap
|;-)
|
|--
|MFG
|Falk
--
Posted using Nographer - News in your Browser
http://www.nographer.com - free, open source, and smart
Philip Freidin <phi...@fliptronics.com> wrote in message news:<9vjbst4i5ma5f3jn0...@4ax.com>...
The tool does precisely what you ask for - it implements a synchronous
reset since R is not in the sensitivity list.
If you need to reset the counter asynchronously, you must use "an
always statement whose event list contains edge events representing
the clock and asynchronous control variables".
Try this.
module Cnt56(K, CE, R, Out);
input K, CE, R;
output [55:0] Out;
reg [55:0] Q;
assign Out= Q;
always @(posedge K or posedge R)
if (R) Q <= 0;
else if (CE) Q <= Q+1;
endmodule
I do not have Synplicity, but with Leonardo it works perfectly well.
And the global reset is another issue. A tool may recognize it or not,
depending on the code and the tool capabilities. So it may be either
the global (chip wide) reset, or just an asynchronous reset, which can
be assigned to an IO pin, or used locally.
Regards,
Vitaliy
Apparently, more than 1 person is under the impression that I wanted
an asynchronous reset. What I wanted (and what I got) was a
synchronous reset. Unfortunately, the synchronous reset was
implemented using 57 LUTs and an extra level of logic instead of
simply routing the reset signal to the synchronous reset input of the
flip flops.
Sorry I didn't make this clear the first time.
You can't be seriously saying "Hello World" is an appropriate use of C++
methodology. Yeah - so it takes a lot to write a very small program, but
large programs are easier to develop and maintain. You ever tried writing a
flop in VHDL - far too much typing for the application - verilog everytime.
But VHDL is a very structured language which should make larger designs
easier. (lets not start he VHDL/verilog war here).
A.
Andrew Brown wrote:
> This is the sort of case that should be fed directly to synplicity to allow
> them to improve their tools. If we don't tell them about it how can they
> fix it!
>
> A.
The problem here is that
o post IPO, Synplicity's support and response time has gone to the dogs.
They're spending their time on a fool's chase after the ASIC synth market.
o Unlike Xilinx there is no publicly accessible ``bug list''.
o I have had a case outstanding for a long time on a related issue where
``register replication'' works or doesn't depending on the type of set/reset
specified. Synplicity have admitted its a bug but I cannot get the slightest
info as to when it will be fixed.
I think I've just given up.
A.
"Rick Filipkiewicz" <ri...@algor.co.uk> wrote in message
news:3BC6C30D...@algor.co.uk...
That wouldn't be embedded software then? Much more like FPGA design,
hard real-time limit - very real limits on code and data space... a
factor of 10% can break a highly tuned embedded software solution.
Cheers,
Martin
--
martin.j...@trw.com
TRW Automotive Technical Centre, Solihull, UK
The correct behavior, IMHO, would be for the synthesis to recognize an
explicit global reset, remove it from the netlist, but then put the proper
INIT=attributes on the inferred FF's to properly initialize those that are
set by global reset. The global reset has to be there for the simulation
to match the hardware. The base of this problem is that the global reset
is an invisible net (ie it is not on the primitives), which makes it
impossible to correctly simulate without artifices built around the pins
that are there.
Don Husby wrote:
--
Anyone from synplify out there???
A.
What Synplify version/ FPGA / target frequency are you using?
I've seen results like this before, where Synplify seems to
"try too hard" to hit a speed target, replacing a simpler
implementation with a more complex one as you pass a certain
frequency constraint, particularly when long carry chains are
involved.
Taking your counter code:
`define CNT_MSB 55
module Cnt56(K, CE, R, Out);
input K, CE, R;
output [`CNT_MSB:0] Out;
reg [`CNT_MSB:0] Q;
assign Out= Q;
always @(posedge K) Q <= R ? 0 : CE ? Q+1 : Q;
endmodule
And tweaking counter size/target frequency, gives:
Synplify Synplify
CNT_MSB Frequency LUT count
__________________________________________
55 77 57
55 78 110
31 95 33
31 96 46
31 122 46
31 123 83
when using Synplify 6.2.4 & XCV600E-6.
Below a certain target frequency, it does hit about 1-LUT per
bit, albeit without using the sync. reset.
You could probably fake Synplify out by putting a dummy
frequency or multicycle constraint on the counter, then nuke
it from the .ncf file before running the back-end tools.
Brian
Ken McElvain reads this list, at least sometimes...
Rene
--
Ing.Buero R.Tschaggelar - http://www.ibrtses.com
No. I've given up trying to use GSR, except for post-PAR
simulations.
Although GSR is a completely different issue, it seems
to me that it would almost be trivial to implement it
using the syntax for variable initialization. VHDL allows
a signal (register) to be initialized when declared. Verilog
has the "initial" construct. It should be fairly easy to
infer GSR from these, and it would make simulation agree
with synthesis.
We do appreciate getting small examples that demonstrate potential
improvements. In general there is no need to go to the effort
to produce the structural form. All we need is a short description
of what we missed - "You should have used the synch reset instead of
building it in logic in front of the flip-flop".
We have a large queue of improvements sorted by a function of how
common they are, how much gain we are likely to get and how hard they
are to implement. This doesn't mean that you shouldn't bother to tell
us about your desired improvement. Even if we already have it in our
queue, you are bumping our notion of how common the problem is.
The best way to send us such a test case is via email at
Please put both "QOR" and the target FPGA in the subject to
help us route it. The FPGA synthesis development team is pretty large.
Thanks,
Ken McElvain, CTO
> I mean, when the "Hello World" takes 100 Kb in C++, THIS IS REALLY CRAP.
Aw, c'mon. You're talking about what happens when you use Visual C++ to
write a WINDOWS version of "Hello, World."
Write a short hello.cpp for your linux box, that runs on the command
line, and tell me how big it is.
-andy
Jan Gray, Gray Research LLC
(ex VC++ dev)
Ken McElvain wrote:
> We are here...
>
> We do appreciate getting small examples that demonstrate potential
> improvements. In general there is no need to go to the effort
> to produce the structural form. All we need is a short description
> of what we missed - "You should have used the synch reset instead of
> building it in logic in front of the flip-flop".
>
> We have a large queue of improvements sorted by a function of how
> common they are, how much gain we are likely to get and how hard they
> are to implement. This doesn't mean that you shouldn't bother to tell
> us about your desired improvement. Even if we already have it in our
> queue, you are bumping our notion of how common the problem is.
>
> The best way to send us such a test case is via email at
>
> sup...@synplicity.com
>
> Please put both "QOR" and the target FPGA in the subject to
> help us route it. The FPGA synthesis development team is pretty large.
>
> Thanks,
> Ken McElvain, CTO
>
> Andrew Brown wrote:
>
>
Ken,
The problem here is that there is no feedback.
In other words what would help is a publicly accessible database of
problems/inefficiencies and their workarounds/solutions like the Xilinx answers
DB. It sometimes seems that when I (maybe others as well) send in a bug report
it seems to vanish into a vacuum. I get an acknowlegement that there's a
problem and that's it. It gets worse when, as in a recent test case of mine, I
send in what appears to be a Xilinx MAP problem only to get told by Xilinx that
its really a synthesis problem and its been passed over to Synplicity.
This needs to be combined with a lisiting in the release notes for each version
of all issues (bugs + imporovements) that have been fixed. ModelSIM is the
paragon here.
I do appreciate that HDL synthesis is complex and there's always the
possibility that fixing one thing will break something else or that my ``bug''
may be unique to me; stemming perhaps from pushing the boundaries of
synthesisability. That information is, in itself, very valuable.
IMO Synplify is the best of the bunch - at least for Xilinx parts - but with
some attention to the issues above you could win the engineer's ultimate
accolade `Synplify ? Great tool, low hassle'.
As a test case that's relevant to this thread since the problem relates to the
reset type - async/sync:
What's the status of bug report #33437 (register replication) ?
No.
> But VHDL is a very structured language which should make larger designs
Yes.
> easier. (lets not start he VHDL/verilog war here).
;-)
--
MFG
Falk
Ahhh, yes ;-)
> Write a short hello.cpp for your linux box, that runs on the command
> line, and tell me how big it is.
I aint got linux, yet ;-)
But using TurboPascal, it takes 4k.
--
MFG
Falk
then your compiler has built-in HelloWorld optimizer!!
utku
heh. funny. i use linux all day long, but let me clear this up. you go ahead
and make your hello world app, but link it *static* and you won't see a nice
little file. you'll be lucky to see hello world take less than 1-2MB.
if you link it dynamic, it doesn't make it any smaller, it just means everyone
gets that one copy of the libc. the linking system on linux/unix leaves much
to be desired. (i remember those days in the 80's of turbo pascal... before
kernel hacking... fun.)
Rick Filipkiewicz wrote:
>
> In other words what would help is a publicly accessible database of
> problems/inefficiencies and their workarounds/solutions like the Xilinx answers
> DB. It sometimes seems that when I (maybe others as well) send in a bug report
> it seems to vanish into a vacuum. I get an acknowlegement that there's a
> problem and that's it. It gets worse when, as in a recent test case of mine, I
> send in what appears to be a Xilinx MAP problem only to get told by Xilinx that
> its really a synthesis problem and its been passed over to Synplicity.
The concept is good and the economic reasons to go this route increase
with the size of the customer base. It costs a lot to put out a
sanitized bug data base that deletes any reference to customer
information that might be sensitive, plus merging duplicate problem
reports, plus editing the info so you can understand it (Notations in
our internal bug system can become somewhat cryptic). For now we
are relying on direct contact with our customer support engineers via
phone or email.
>
> This needs to be combined with a lisiting in the release notes for each version
> of all issues (bugs + imporovements) that have been fixed. ModelSIM is the
> paragon here.
>
> I do appreciate that HDL synthesis is complex and there's always the
> possibility that fixing one thing will break something else or that my ``bug''
> may be unique to me; stemming perhaps from pushing the boundaries of
> synthesisability. That information is, in itself, very valuable.
>
> IMO Synplify is the best of the bunch - at least for Xilinx parts - but with
> some attention to the issues above you could win the engineer's ultimate
> accolade `Synplify ? Great tool, low hassle'.
>
> As a test case that's relevant to this thread since the problem relates to the
> reset type - async/sync:
>
> What's the status of bug report #33437 (register replication) ?
>
My understanding is that this was about a failure to replicate
registers with sync resets to improve timing. The fix has been
made and will be included our next major release (after 7.0).
Here we go...
#include <iostream.h>
int main(void)
{
cout << "Hello World.\n";
}
c++ -o hello++ hello.cpp; strip hello++
-rwxr-xr-x 1 mergler users 3516 Nov 2 14:59 hello++
< 3.5 KB
#include <stdio.h>
int main(void)
{
printf("Hello World.\n");
}
gcc -o hello hello.c; strip hello
-rwxr-xr-x 1 mergler users 3016 Nov 2 15:00 hello
< 3 KB
Iwo
You're cheating. If you do a "file hello" you will see that it's
dynamically linked. Most of the code resides in libraries which are
linked in at run time. Try to add a -static to your compile command
line and check the size of the executable then...
Petter
--
________________________________________________________________________
Petter Gustad 8'h2B | (~8'h2B) - Hamlet in Verilog http://gustad.com
Okay - so the code on the boot sector of a floppy which uses int 10 (i
think) to write is only 512 bytes, int 10 isn't that big either (a few K).
It writes to the screen and could easily write helloe world.
The code IS small.
If you want to link in several meg of crap that's up to u.
I didn't want to link in several megs of crap. I just wanted to point
out that the size of a any dynamically linked executable can be made
very small, simply by moving most of the code into the library.