embedded forth

Dale Scott

no leída,

21 ene 2017, 14:27:0721/1/17

a

Hi all, I'm being lured by the siren song of Forth for an embedded system data logger project. It's being prototyped using the MSP432 launchpad platform and has thirty-two I2C sensors, receiving ~300Kbps of data each on all four i2c ports, and streams the data to an SD card using SD over SPI. The basic platform could have application in quite a few projects, and so I thought Forth might be the perfect development environment.

I've been trying to install Mecrisp on a launchpad, but seem to be having tool version problems (an old version of DSLite is needed to load the mecrisp-stellaris bin file, but current XDS110 debug probe firmware no longer seems to support the old version of DSLite). So now I'm looking for where the Mecrisp community (and msp432 sub-community) hang out.

Is this the correct forum for mecrisp forth users? What about MSP432 mecrisp users? (43.oh? Element14?)

Does mecrisp seem a reasonable solution for what I'm trying to achieve?

Thanks!

Dale

Paul Rubin

no leída,

21 ene 2017, 14:31:5721/1/17

a

Dale Scott <dalesc...@gmail.com> writes:
> Is this the correct forum for mecrisp forth users? What about MSP432
> mecrisp users? (43.oh? Element14?)

This is the best place for all things Forth.

> Does mecrisp seem a reasonable solution for what I'm trying to achieve?

I'd say yes, if you want to use Forth, either because Forth is
interesting or because you're familiar with Forth already. If you're
not used to Forth, using it effectively has a learning curve despite the
language itself being very simple.

ste...@mpeforth.com

no leída,

22 ene 2017, 3:52:0222/1/17

a

On Saturday, 21 January 2017 20:27:07 UTC+1, Dale Scott wrote:
> Hi all, I'm being lured by the siren song of Forth for an embedded system data
> logger project. It's being prototyped using the MSP432 launchpad platform and
> has thirty-two I2C sensors, receiving ~300Kbps of data each on all four i2c ports,
> and streams the data to an SD card using SD over SPI. The basic platform could
> have application in quite a few projects, and so I thought Forth might be the perfect
> development environment.

If someone is paying wages to get this done, you should really consider using one
the professional tool chains from MPE (http://www.mpeforth.com) or Forth Inc
(http://www.forth.com). The price differences will get lost in the noise of the
project. The use of better tool chains usually saves money in the end.

Stephen
P.S. I'm the managing director at MPE.

Anton Ertl

no leída,

22 ene 2017, 12:13:3222/1/17

a

Dale Scott <dalesc...@gmail.com> writes:
>Is this the correct forum for mecrisp forth users? What about MSP432 mecris=
>p users? (43.oh? Element14?)

Mecrisp is on-topic here. The Mecrisp docs may document the preferred
places for discussions, support requests etc.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2016: http://www.euroforth.org/ef16/

Dale Scott

no leída,

22 ene 2017, 12:43:1622/1/17

a

Thanks Stephen and Anton for your information. How difficult is it to work with new I2C and SPI hardware devices in Forth? "Difficult" is a loaded question but are there any generalities? Do protocol stacks exist for I2C? SPI? In general, how How would one develop a low-level I2C or SPI device driver in Forth compared to C?

Thanks,
Dale

P.S. At the moment the work is sweat equity, I'm working on a feasibility prototype for a provisional patent a colleague has filed, so $$'s for pro-tools is limited at the moment (but hopefully not forever ;-)).

rickman

no leída,

22 ene 2017, 13:44:2222/1/17

a

I've worked with Mecrisp, but it was on an ARM based launchpad. I
recall I had trouble loading the download driver as TI never bothered to
get it signed, so installing it required jumping through all manner of
hoops. Instead I used a Raspberry Pi where the appropriate driver
worked just fine.

I ended up controlling the target board remotely from my PC which worked
ok until I needed to reset it. Then I used sneaker net.

I did have trouble figuring out how to use a terminal emulator in the
pi. Seems just a plain command window is actually a "terminal" and the
support for terminal emulators is very crude. The programs I found did
not work as documented and I had to figure out ways around the issues.
It's been a couple of years, so I don't recall much of the details.

--

Rick C

ste...@mpeforth.com

no leída,

22 ene 2017, 15:19:1722/1/17

a

On Sunday, 22 January 2017 18:43:16 UTC+1, Dale Scott wrote:
> Thanks Stephen and Anton for your information. How difficult is it to work with new
> I2C and SPI hardware devices in Forth? "Difficult" is a loaded question but are there
> any generalities? Do protocol stacks exist for I2C? SPI? In general, how How would
> one develop a low-level I2C or SPI device driver in Forth compared to C?

You would write some code, and compile and test interactively. It's the
interactivity that matters. Until you've done it you won't believe it.

To assume a protocol stack for something as simple as I2C and SPI is to
be a bit grandiose - it stems from operating systems not having direct hardware
access. Once you have direct hardware access the world changes. MPE's I2C
and SPI code uses a common set of words regardless of the hardware. For I2C
we often use a simple bit-banged driver that just needs two GPIO pins.

> P.S. At the moment the work is sweat equity, I'm working on a feasibility
> prototype for a provisional patent a colleague has filed, so $$'s for pro-tools
> is limited at the moment (but hopefully not forever ;-)).

Translates to: we'll pay for hardware but not for software. The downside of this approach
is that you'll waste a vast amount of time writing code that is already present in the
commercial tools. You really want to implement all the SD commands for SPI, debug them
and then test on a range of cards? What next, a FAT file system ...

Stephen

Albert van der Horst

no leída,

22 ene 2017, 15:26:1922/1/17

a

In article <e20554d8-a66d-4a9d...@googlegroups.com>,
Dale Scott <dalesc...@gmail.com> wrote:
>Thanks Stephen and Anton for your information. How difficult is it to work =
>with new I2C and SPI hardware devices in Forth? "Difficult" is a loaded qu=

I2C and SPI are not related to Forth but to the particular Operating System
or microcontroller you're using. You cannot believe how different I2C
is between 8051, Atmel, TI MSP430 and several hundreds microcontrollers
neither you or I ever heard of.

Almost all programs have been written in a Forth version, but that doesn't
mean you can find them.

Groetjes Albert
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

john

no leída,

22 ene 2017, 16:04:3622/1/17

a

In article <e20554d8-a66d-4a9d...@googlegroups.com>,
dalesc...@gmail.com says...

>
> P.S. At the moment the work is sweat equity, I'm working on a feasibility prototype for a provisional patent a colleague has filed, so $$'s for pro-tools is limited at the moment (but hopefully not forever ;-)).
>
>

Take stephen's advice and find the funds.

--

john

=========================
http://johntech.co.uk

"Bleeding Edge Forum"
http://johntech.co.uk/forum/

=========================

Paul Rubin

no leída,

22 ene 2017, 16:38:4122/1/17

a

rickman <gnu...@gmail.com> writes:
> On 1/21/2017 2:27 PM, Dale Scott wrote:
>> It's being prototyped using the MSP432 launchpad platform

> I've worked with Mecrisp, but it was on an ARM based launchpad.

The MSP432 is an ARM based line, whose name resembles the older 16-bit
MSP430 line, presumably to call attention to its low-power emphasis.
The MSP432 launchpad is a nice little board for $12.99 USD

http://www.ti.com/tool/msp-exp432p401r

A 48 mhz Cortex M4F with 256k flash, 64k ram, 24 channel adc and other
i/o galore, all supposedly running at 80 uA/Mhz. Impressive.

I agree with Stephen that Dale will likely have an easier time with
MPEforth than Mecrisp, and I'm sympathetic to "Translates to: we'll pay
for hardware but not for software." In reality though, the software is
an OOM more expensive than the hardware so there's a higher entry
barrier.

ste...@mpeforth.com

no leída,

22 ene 2017, 17:08:4822/1/17

a

Dale said

> P.S. At the moment the work is sweat equity, I'm working on a feasibility
> prototype for a provisional patent a colleague has filed, so $$'s for
> pro-tools is limited at the moment (but hopefully not forever ;-)).

I don't normally wear a hat as cynical as this, but ...

Sweat equity means that you will make a profit later. You have already
chosen your side of the time/money pair. How are you going to pay the
patent lawyer? Any commercial software tool costs are peanuts compared
to the lawyers' fees.

I won't be so grumpy when winter is over.

Stephen

Elizabeth D. Rather

no leída,

22 ene 2017, 17:30:2522/1/17

a

On 1/22/17 11:04 AM, john wrote:
> In article <e20554d8-a66d-4a9d...@googlegroups.com>,
> dalesc...@gmail.com says...
>>
>> P.S. At the moment the work is sweat equity, I'm working on a feasibility prototype for a provisional patent a colleague has filed, so $$'s for pro-tools is limited at the moment (but hopefully not forever ;-)).
>>
>>
>
> Take stephen's advice and find the funds.

Stephen is right. I'm speaking for FORTH, Inc. here, although I'm now
retired and no longer actually work for them. I recommend that you get
the free evaluation version of SwiftX (FORTH, Inc.'s cross-development
system for embedded controllers), as well as sampling MPE's products. At
the very least you'll get some very detailed documentation as well as a
lot of source code that will answer a lot of your questions and give you
an opportunity to practice your Forth skills and judge the ease of use
of the respective development chains.

Check out FORTH, Inc. at www.forth.com.

Cheers,
Elizabeth

--
Elizabeth R. Conklin
FORTH, Inc.
6080 Center Drive, Suite 600
Los Angeles, CA 90045
USA

Paul Rubin

no leída,

22 ene 2017, 17:55:3522/1/17

a

ste...@mpeforth.com writes:
> Sweat equity means that you will make a profit later. You have already
> chosen your side of the time/money pair. How are you going to pay the
> patent lawyer? Any commercial software tool costs are peanuts compared
> to the lawyers' fees.

The patent lawyer only gets paid if the project produces a promising
results that look worth investing money in. That's much later than the
beginning of development. Commercial software tools require spending
the money BEFORE finding out whether the idea is worthwhile.

Free software and cheap hardware (think of Launchpad/RPi/Arduino-priced
boards knocking out the $200 development boards from earlier) are
attractive partly because you can get an idea and start hacking without
any significant financial or personal commitment. You then make some
progress or perhaps fail to make any, and THEN decide whether the
project looks worth pursuing.

If your idea works, you can then quit your day job, start spending money
on licenses and lawyers, seek outside funding, etc. If not, you haven't
lost anything except a little bit of hacking time that you can think of
as hobby activity. Expensive tools flip the process around too much.

Dale Scott

no leída,

22 ene 2017, 23:41:4822/1/17

a

> You really want to implement all the SD commands
> for SPI, debug them and then test on a range of
> cards? What next, a FAT file system ...

Not really (wanting to write even more code from scratch), and yes, actually (FAT file system).

I think I will experiment with Forth using the free MPE lite cross-compiler and an MSP430 (non-ARM) launchpad sitting on my shelf, stay the course in C for the time being for my MSP432 (ARM Cortex M4) project, and then migrate the ARM project to Forth down the road.

I already have a working i2c engine, sd-over-spi (using FatFs) and some low
-level i2c device drivers written in C, but only the barest skeleton of an application so far. Is it common (or preferred, or even discouraged) to re-use existing device-level C-code? Presumably one would start by creating new words in Forth that simply wrap the C-code device drivers or i2c/spi library functions.

Thanks for the info from everyone. I really appreciate the help coming up to speed.

Cheers,
Dale

hughag...@gmail.com

no leída,

22 ene 2017, 23:45:2022/1/17

a

On Sunday, January 22, 2017 at 2:04:36 PM UTC-7, john wrote:
> In article <e20554d8-a66d-4a9d...@googlegroups.com>,
> dalesc...@gmail.com says...
> >
> > P.S. At the moment the work is sweat equity, I'm working on a feasibility prototype for a provisional patent a colleague has filed, so $$'s for pro-tools is limited at the moment (but hopefully not forever ;-)).
> >
> >
>
> Take stephen's advice and find the funds.

Testra had a motion-control board based on the Dallas 80c320. The code was written in UR/Forth's Forth cross-compiler, and over time almost all of it had been converted into assembly-language. It was too slow --- the competitor's MC68000 provided much higher performance --- the primary application was laser-etching which has to fast.

Their solution was to build a custom Forth processor based on a Lattice isp1048 PLD. They did not "find the funds" to hire Stephen Pelc --- they found the funds to hire me --- it was very much of a shoestring budget.

This worked though --- everybody involved already knew Forth, so the "sweat equity" was worth something --- there can't be any "coming up to speed" if you are going to be successful.

Paul Rubin

no leída,

23 ene 2017, 0:19:1223/1/17

a

Dale Scott <dalesc...@gmail.com> writes:
> I think I will experiment with Forth using the free MPE lite
> cross-compiler and an MSP430 (non-ARM) launchpad sitting on my shelf,

Cool, I don't remember what the MPE Lite situation is with the ARM. You
could also check SwiftForth eval versions.

> stay the course in C for the time being for my MSP432 (ARM Cortex M4)
> project, and then migrate the ARM project to Forth down the road.

You might try MicroPython (micropython.org) which supports various
boards including its own, which is quite nice if a bit expensive ($45).
It supports some of the ST Discovery boards but unfortunately the
Launchpads might require some porting.

hughag...@gmail.com

no leída,

23 ene 2017, 0:36:3023/1/17

a

Can you tell us anything about this provisional patent without giving away all the secrets in the patent? Is performance an issue? What do you have that is worth a patent, that nobody else can figure out on their own? You can't patent the obvious --- it has to be something that nobody else would be likely to figure out on their own.

At Testra they had multiple patents, but had lost interest in getting any more patents --- you have to explain the whole thing in the patent, but then patent trolls will steal your idea, and then you have the expense of suing to protect your secrets that you just gave away (very expensive, and definitely not going to work against Beijing).

Their solution with the MiniForth was that the PLD design can't be reverse-engineered from the PLD, so nobody could build a MiniForth processor except themselves.

That was in 1994. I understand that a lot of processors now have internal non-volatile memory and have a security switch that can be set to prevent people from obtaining the code. Physical security measures such as this are your best bet --- making your code public in a patent and then relying on lawyers to sue after your code has been stolen is like leaving the barn door open and then relying on somebody to go get your horse for you after it has run off.

I'm just saying --- most people who are planning on patenting something are making a huge expensive mistake --- I have never heard of any case in which this actually worked out in a profitable way for the guy with the patent.

rickman

no leída,

23 ene 2017, 0:45:2823/1/17

a

On 1/22/2017 4:38 PM, Paul Rubin wrote:
> rickman <gnu...@gmail.com> writes:
>> On 1/21/2017 2:27 PM, Dale Scott wrote:
>>> It's being prototyped using the MSP432 launchpad platform
>> I've worked with Mecrisp, but it was on an ARM based launchpad.
>
> The MSP432 is an ARM based line, whose name resembles the older 16-bit
> MSP430 line, presumably to call attention to its low-power emphasis.
> The MSP432 launchpad is a nice little board for $12.99 USD
>
> http://www.ti.com/tool/msp-exp432p401r
>
> A 48 mhz Cortex M4F with 256k flash, 64k ram, 24 channel adc and other
> i/o galore, all supposedly running at 80 uA/Mhz. Impressive.

Yeah, that rings a bell now. I think naming a TI ARM chip MSP4xx is
rather confusing, if not misleading. Don't know that I have an MSP432
board, but I have a couple of TI ARM boards, one is actually a Stellaris
board with a potentiometer and a very small OLED display. I used it in
a contest and wrote a Sudoku solver, level 1 I would say. It could
solve some fairly simple Sudoku puzzles.

I coded this in C and it ended up using most of the 16 kB available on
the board. I find that sort of tool to be a PITA to manage so rather
than try to figure out how to optimize the library usage, etc. I started
to recode it in Forth. I don't think I ever finished that.

> I agree with Stephen that Dale will likely have an easier time with
> MPEforth than Mecrisp, and I'm sympathetic to "Translates to: we'll pay
> for hardware but not for software." In reality though, the software is
> an OOM more expensive than the hardware so there's a higher entry
> barrier.

Certainly in the start up mode where you can use $10 launchpads, yes.
It's not hard to be an OOM more expensive than that, even two OOMs, if I
remember the price of the commercial tools. But all tools have a
learning curve that adds greatly to the cost of using. In the end it
may swamp out the cost of the tool. Unfortunately Mecrisp is not nearly
as well documented, so the commercial tools are likely more cost
effective there too.

--

Rick C

Dale Scott

no leída,

23 ene 2017, 1:09:4723/1/17

a

The patent has to do with what is being measured, and how it is analyzed to determine a result (how's that for vague ;-))

Otherwise it's a basic data logger that packs as many I2C sensors in as small a space as possible, the aggregates the samples taken as fast as possible, and either stores the data data locally or sends somewhere else e.g. by WiFi. I think there could be other people in other domains with similar needs, who might appreciate such a data logger. It was the need to easily scale horizontally that made me think of Forth.

Dale Scott

no leída,

23 ene 2017, 1:44:1323/1/17

a

I'm still not clear about the lineage, it feels like TI may have been finding its way. I think the Stellaris line was intended for high-speed machine control, I'm not sure where the Beaglebone processor fits (similar performance to RPi). The focus of the MSP432 line is the power of a Cortex M4F with the low power and peripheral set of the 16-bit MSP430 line. TI has done a great job of support with an Eclipse IDE and BSD-like code licensing, and US$10'ish LaunchPads, but of course all the provided source is C code. Anyway, before getting too far off topic, I think TI have been doing a great job and although ST released inexpensive ARM parts first, I hope the commercial Forth vendors have plans to support TI as well.

rickman

no leída,

23 ene 2017, 3:43:5123/1/17

a

Uh, actually you can. There are many ways to reverse engineer
something. A FSM as simple as the PLD you describe would not be so hard
to reverse engineer. Even if it requires monitoring the
instruction/data stream from memory it is far from an impossible or even
difficult job to reverse engineer a simple CPU design, especially when
you have some knowledge of what it is intended to do.

A logic analyzer and a few days would likely do the job.

> That was in 1994. I understand that a lot of processors now have
> internal non-volatile memory and have a security switch that can be
> set to prevent people from obtaining the code. Physical security
> measures such as this are your best bet --- making your code public
> in a patent and then relying on lawyers to sue after your code has
> been stolen is like leaving the barn door open and then relying on
> somebody to go get your horse for you after it has run off.

Trade secrets are fine, but don't count on the simple security fuse. It
is not really a hard thing at all to get past those. Check on the web,
there are lots of examples of people opening chips and reverse
engineering if not outright reading the data inside or the chip design.
The only real security I've seen in ICs is to have a low power static
RAM with battery backup internal to your chip. That is *much* harder to
get around. I see this in RAM based FPGAs.

> I'm just saying --- most people who are planning on patenting
> something are making a huge expensive mistake --- I have never heard
> of any case in which this actually worked out in a profitable way for
> the guy with the patent.

What about the guy who patented the intermittent windshield wiper? It
took him a while, but eventually he won out.

Copyright only protects a specific work, trade secrets can't protect you
from reverse engineering and patents can be expensive to enforce.
That's why I like niche applications where there is enough profit to
keep me happy, but not enough for a competitor to be interested,
especially the big ones. The other way to protect your work is to have
a relationship with your customers that goes beyond just selling stuff.
If you have custom tweaked a design for them, they just plain won't go
out looking for an alternate source, especially if they are large
enough. That's right, I said "large" enough. A larger company won't
*care* about cutting you out to capture your profit. The company I sell
to won't even consider a new design unless it is $100 million in
potential sales. So clearly they won't be interested in the fraction of
that which they would save by cutting me out. Also, large companies
have the problem of right and left hands not knowing what the other is
doing. That lets you get away with a lot. :)

--

Rick C

Paul Rubin

no leída,

23 ene 2017, 3:48:2923/1/17

a

Dale Scott <dalesc...@gmail.com> writes:
> I'm still not clear about the lineage, it feels like TI may have been
> finding its way. I think the Stellaris line was intended for
> high-speed machine control, I'm not sure where the Beaglebone
> processor fits (similar performance to RPi).

The Beaglebone like the RPi is a mobile phone-like cpu wrapped in a dev
board. It has a lot more i/o than the rpi and its cpu (single core) is
somewhat faster than the single core rpi's, but the new rpis with 4
cores can compute a lot more.

The BB might still be interesting for what you're doing. It's easy to
program Linux, and there are a pair of fast (200mhz) realtime
coprocessors that can handle stuff like signal acquisition. They
however are a pain to program and I don't know how many of the BB i/o
pins they can access.

If it's just lots of I2C channels that you're logging, maybe you want an
FPGA to aggregate them all. Or even a GA144, heh.

rickman

no leída,

23 ene 2017, 3:51:4423/1/17

a

The Stellaris line is really just generic MCU stuff. They were looking
for an edge so they designed in some PWM stuff early on. Later they
branched out a bit before TI bought them.

TI has several ARM lines which each has very different applications.
There are RT processors which are redundant, intended for safety
critical systems. The A processors are higher performance intended for
use with an OS running "apps". I suppose they get used some without an
OS, but I don't know what for.

They are all separate lines and there is not much point in worrying
about how they interrelate in my opinion.

Lots of people have inexpensive ARM parts. NXP was at the head of the
list for a long time. Atmel is not to be sneezed at either. Some are
newer and more focused. I don't remember all the names. At one time I
had a spread sheet of all the common ARM MCU devices that quickly got
too large to manage. I had it up on my gnuarm.com web site where
someone else would post gnuarm tools. He gave up the gnuarm toolset for
another one he liked better and eventually I let the site go. Now it
would cost you well over $1000 to buy that domain name because of the
traffic that still shows up looking to download those tools, lol.

--

Rick C

Anton Ertl

no leída,

23 ene 2017, 4:03:3723/1/17

a

rickman <gnu...@gmail.com> writes:
>I've worked with Mecrisp, but it was on an ARM based launchpad.

Despite its name, MSP432 has an ARM Cortex-M4 CPU. Isn't marketing
great?

rickman

no leída,

23 ene 2017, 4:15:0823/1/17

a

Depending on the details of what you are doing, an FPGA would not be
totally out of line. If you really want not just speed, but precise
timing measurement, an FPGA essentially runs everything in true
parallel, not sequential multitasking like a processor does. It would
be a trivial matter to run any number of I2C interfaces. But you may
not really need that if the timing resolution needed is milliseconds
rather than low microseconds.

Can you explain what the risk area is? Is it the utility or the
practicality of the algorithm rather than issues specific to
implementing them in the data logger?

What do you mean by scaling horizontally? Why would Forth make that
easier than in other languages?

--

Rick C

john

no leída,

23 ene 2017, 5:02:3923/1/17

a

In article <87lgu2e...@nightsong.com>, no.e...@nospam.invalid says...

>
> rickman <gnu...@gmail.com> writes:
> > On 1/21/2017 2:27 PM, Dale Scott wrote:
> >> It's being prototyped using the MSP432 launchpad platform
> > I've worked with Mecrisp, but it was on an ARM based launchpad.
>
> The MSP432 is an ARM based line, whose name resembles the older 16-bit
> MSP430 line, presumably to call attention to its low-power emphasis.
> The MSP432 launchpad is a nice little board for $12.99 USD
>
> http://www.ti.com/tool/msp-exp432p401r
>

From the TI link above:

"You can browse all documentation online with TI Resource Explorer and start
development with the online CCS Cloud IDE."

Presumably TI users have had their brains replaced by cauliflowers.

john

no leída,

23 ene 2017, 5:31:5023/1/17

a

In article <o64fl2$iim$1...@dont-email.me>, gnu...@gmail.com says...

>
> Copyright only protects a specific work, trade secrets can't protect you
> from reverse engineering and patents can be expensive to enforce.
>

Patents can also bankrupt you down the line if you tread on someone elses
toes... for example by using a poor search done by a patent attorney not getting
paid upfront... Getting a patent for almost anything is pretty easy - it's the
consequences of having it that can be problematic.

Wolfgang Allinger

no leída,

23 ene 2017, 7:00:1023/1/17

a

On 23 Jan 17 at group /comp/lang/forth in article o64fl2$iim$1...@dont-email.me

<gnu...@gmail.com> (rickman) wrote:

> On 1/23/2017 12:36 AM, hughag...@gmail.com wrote:

>> [keeping SW secrets]

>> Their solution with the MiniForth was that the PLD design can't be
>> reverse-engineered from the PLD, so nobody could build a MiniForth
>> processor except themselves.

> Uh, actually you can. There are many ways to reverse engineer
> something. A FSM as simple as the PLD you describe would not be so hard
> to reverse engineer. Even if it requires monitoring the
> instruction/data stream from memory it is far from an impossible or even
> difficult job to reverse engineer a simple CPU design, especially when
> you have some knowledge of what it is intended to do.

> A logic analyzer and a few days would likely do the job.

I'm shure, you never tried to reverse engineering a FORTH Code.
BTDT

In 1985(?) I compiled a target EPROM with a special 16bit constant for the
testing of a big ADC controlled by a 8051. I gave the EPROM, the source
and a description to the HW-developer and told him, that I save no copy of
anything. (No RVCS exist. With 9 SW-Engineers we had a total 5MB (yes MB)
of disk space on our central PDP-11.

He solved the problem with the ADC board. But 1 month later he showed up
with nearly the same problem. And asked me, to tell him the special
constant. Of coarse I don't remember and he throwed any of may docs away,
only the EPROM left. PHUGGGH! It cost me and him a hole day, just to
reengineer that code to find the constant. The modified init word for the
ADC didn't change, but a lot of other definitions. So I fired two LA and
an ICE just to find this routine, because most of the CFAs changed in the
meantime.

And I have my sources of a released version some days after the special
one. However it's a pain in your ass, to reverse engineer FORTH even if
you have some sources and excellent knowledge of whats going on.

Don't forget to generate headerless targets to make it worse.

>> That was in 1994. I understand that a lot of processors now have
>> internal non-volatile memory and have a security switch that can be
>> set to prevent people from obtaining the code. Physical security
>> measures such as this are your best bet --- making your code public
>> in a patent and then relying on lawyers to sue after your code has
>> been stolen is like leaving the barn door open and then relying on
>> somebody to go get your horse for you after it has run off.

> Trade secrets are fine, but don't count on the simple security fuse. It
> is not really a hard thing at all to get past those. Check on the web,
> there are lots of examples of people opening chips and reverse
> engineering if not outright reading the data inside or the chip design.
> The only real security I've seen in ICs is to have a low power static
> RAM with battery backup internal to your chip. That is *much* harder to
> get around. I see this in RAM based FPGAs.

But if you combine the security fuse with some words with a lot of stack
juggling a flag ( DUP DROP ROT ROLL >R R@ R> pick -ROLL OVER DROP MUL DIV
XOR MOD SHIFT -SHIFT 2* 2/ ... ) on the stack with nonsense data (I think
16 entries are terrible enough) and some seconds after starting the
program, another stack juggling... you retrieve the flag and stop the
program: "Checksum ERROR!"

I think that they need several days just to find out, that it is FORTH,
especially if you don't have any hint to FORTH in your code and doc.

Just to obfuscate FORTH, put a string with a hint to a very rare C, ADA or
whatsoever Compiler in the target. Just to keep them busy.

For some nasty customers (they want evaluation board before paying HAHAHA)
I did it. The never stole a program of mine :)

Another trick: put a serial number TO92 (DALLAS DSxxxx?) to the board and
check this number, then you even couldn't clone the program.

Saludos (an alle Vernünftigen, Rest sh. sig)
Wolfgang

--
Ich bin in Paraguay lebender Trollallergiker :) reply Adresse gesetzt!
Ich diskutiere zukünftig weniger mit Idioten, denn sie ziehen mich auf
ihr Niveau herunter und schlagen mich dort mit ihrer Erfahrung! :p
(lt. alter usenet Weisheit) iPod, iPhone, iPad, iTunes, iRak, iDiot

Albert van der Horst

no leída,

23 ene 2017, 7:05:4323/1/17

a

In article <2017Jan2...@mips.complang.tuwien.ac.at>,

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>rickman <gnu...@gmail.com> writes:
>>I've worked with Mecrisp, but it was on an ARM based launchpad.
>
>Despite its name, MSP432 has an ARM Cortex-M4 CPU. Isn't marketing
>great?

What do you mean? The similarity with the name MSP430?
I expect that many TI users use the same c-API's on the
MSP430 and MSP432 to address the same peripherals.
For them the peripherals and how to use them is more
important then the underlying processor. There are already
20 bit MSP430, the noforth team is working on that.
It is a crutch however for a 16 bit processor and it make
sense for TI to migrate to 32 bits.

A 50 Mhz MSP432 suggest that we will never see a fast
(>100 Mhz) MSP430. :-(

Forthers would like to see a 1000 Mhz 6809. That too has
never happened.

>
>- anton

Rod Pemberton

no leída,

23 ene 2017, 7:46:3623/1/17

a

On Mon, 23 Jan 2017 08:58:00 -0300
"Wolfgang Allinger" <all...@spambog.com> wrote:

> It cost me and him a hole day,

Nice! That's one of the best puns or Freudian slips, ever. The word's
"whole" and "hole" are homophones.

Rod Pemberton

Dale Scott

no leída,

23 ene 2017, 11:41:3123/1/17

a

< an FPGA would not be totally out of line.

The current system is for proof of concept. The I2C sensor using supports eight different addresses and the MSP432 has four I2C ports, and 32 sensors total will be enough for validation purposes. However the marketable system will need two hundred to a thousand sensors, and I'm thinking will probably use an FPGA to connect directly to the I2C sensors. I was thinking of implementing an I2C-to-SPI bridge in the FPGA hardware, and still use an mpu to aggregate two SPI channels and write to the SD card, and provide a UI/API to the outside world. Several FPGAs would be connected to each SPI channel and essentially time-multiplexed.

The IP involved in the project relates to a very specific orientation/positioning of the sensors, and what can be gleaned about the situation from that relationship, and I think the risk in the data logger itself is relatively low. Beyond this specific project, I wonder if there could be opportunities in other fields where aggregating data from many many sensors is needed, such as a body-suit that monitors physiology for health science purposes. I have read of current research involving a network of sensors embedded in a women's bra to diagnosis potential tumours. I suspect there may also be opportunities in physical process monitoring, where a string of many closely-spaced sensors measure aspects of "something" (liquid, slurry, or a hard object) moving past the sensors. The physical limitations of the actual sensors involved and how they must be electrically interconnected (e.g. long narrow or a square array), will likely vary considerably from one application to another, although the general concept of aggregating the data (possibly including some domain-specific form of reduction or compression) remains the same. It is this situation where I thought Forth might have special benefit, and provide greater efficiencies over C for customizing the concept to a specific situation (scaling horizontally).

rickman

no leída,

23 ene 2017, 14:32:2923/1/17

a

On 1/23/2017 11:41 AM, Dale Scott wrote:
> < an FPGA would not be totally out of line.
>
> The current system is for proof of concept. The I2C sensor using
> supports eight different addresses and the MSP432 has four I2C ports,
> and 32 sensors total will be enough for validation purposes. However
> the marketable system will need two hundred to a thousand sensors,
> and I'm thinking will probably use an FPGA to connect directly to the
> I2C sensors. I was thinking of implementing an I2C-to-SPI bridge in
> the FPGA hardware, and still use an mpu to aggregate two SPI channels
> and write to the SD card, and provide a UI/API to the outside world.
> Several FPGAs would be connected to each SPI channel and essentially
> time-multiplexed.

The number of sensors is not so much an issue as the data rate. I2C
typically uses 7 bit addresses but some have preassigned uses, so there
are 112 usable addresses on each I2C bus. To get 1000 addresses you
will need a minimum of 10 busses if they will support your data rates.
Do you know what data rates you need?

> The IP involved in the project relates to a very specific
> orientation/positioning of the sensors, and what can be gleaned about
> the situation from that relationship, and I think the risk in the
> data logger itself is relatively low. Beyond this specific project, I
> wonder if there could be opportunities in other fields where
> aggregating data from many many sensors is needed, such as a
> body-suit that monitors physiology for health science purposes. I
> have read of current research involving a network of sensors embedded
> in a women's bra to diagnosis potential tumours. I suspect there may
> also be opportunities in physical process monitoring, where a string
> of many closely-spaced sensors measure aspects of "something"
> (liquid, slurry, or a hard object) moving past the sensors. The
> physical limitations of the actual sensors involved and how they must
> be electrically interconnected (e.g. long narrow or a square array),
> will likely vary considerably from one application to another,
> although the general concept of aggregating the data (possibly
> including some domain-specific form of reduction or compression)
> remains the same. It is this situation where I thought Forth might
> have special benefit, and provide greater efficiencies over C for
> customizing the concept to a specific situation (scaling
> horizontally).

I use Forth for it's interactivity and closeness to the hardware. I'm
not sure Forth has any significant advantages in other ways. I don't
know what scaling horizontally means.

--

Rick C

rickman

no leída,

23 ene 2017, 14:38:3823/1/17

a

I'm not sure how having headers makes reverse engineering easier in
compiled code. Anyway, a day to reverse engineer code is not really
that big a deal if you are trying to duplicate a product. A week or two
would be perfectly acceptable.

>>> That was in 1994. I understand that a lot of processors now have
>>> internal non-volatile memory and have a security switch that can be
>>> set to prevent people from obtaining the code. Physical security
>>> measures such as this are your best bet --- making your code public
>>> in a patent and then relying on lawyers to sue after your code has
>>> been stolen is like leaving the barn door open and then relying on
>>> somebody to go get your horse for you after it has run off.
>
>> Trade secrets are fine, but don't count on the simple security fuse. It
>> is not really a hard thing at all to get past those. Check on the web,
>> there are lots of examples of people opening chips and reverse
>> engineering if not outright reading the data inside or the chip design.
>> The only real security I've seen in ICs is to have a low power static
>> RAM with battery backup internal to your chip. That is *much* harder to
>> get around. I see this in RAM based FPGAs.
>
> But if you combine the security fuse with some words with a lot of stack
> juggling a flag ( DUP DROP ROT ROLL >R R@ R> pick -ROLL OVER DROP MUL DIV
> XOR MOD SHIFT -SHIFT 2* 2/ ... ) on the stack with nonsense data (I think
> 16 entries are terrible enough) and some seconds after starting the
> program, another stack juggling... you retrieve the flag and stop the
> program: "Checksum ERROR!"

I'm not sure what you are describing here. The security fuse is
supposed to stop you from being able to read the contents of the program
memory. But it only stops you from *reading* the program memory. There
are other ways of getting the contents if you want to spend some money.

> I think that they need several days just to find out, that it is FORTH,
> especially if you don't have any hint to FORTH in your code and doc.

I'm not sure that is terribly relevant.

> Just to obfuscate FORTH, put a string with a hint to a very rare C, ADA or
> whatsoever Compiler in the target. Just to keep them busy.
>
> For some nasty customers (they want evaluation board before paying HAHAHA)
> I did it. The never stole a program of mine :)
>
> Another trick: put a serial number TO92 (DALLAS DSxxxx?) to the board and
> check this number, then you even couldn't clone the program.

Yeah, I've thought of that, but it is cumbersome and expensive to deal
with that in production.

--

Rick C

rickman

no leída,

23 ene 2017, 14:40:1623/1/17

a

On 1/23/2017 7:05 AM, Albert van der Horst wrote:
> In article <2017Jan2...@mips.complang.tuwien.ac.at>,
> Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>> rickman <gnu...@gmail.com> writes:
>>> I've worked with Mecrisp, but it was on an ARM based launchpad.
>>
>> Despite its name, MSP432 has an ARM Cortex-M4 CPU. Isn't marketing
>> great?
>
> What do you mean? The similarity with the name MSP430?
> I expect that many TI users use the same c-API's on the
> MSP430 and MSP432 to address the same peripherals.
> For them the peripherals and how to use them is more
> important then the underlying processor. There are already
> 20 bit MSP430, the noforth team is working on that.
> It is a crutch however for a 16 bit processor and it make
> sense for TI to migrate to 32 bits.
>
> A 50 Mhz MSP432 suggest that we will never see a fast
> (>100 Mhz) MSP430. :-(

What is the allure of a 100 MHz MSP430?

--

Rick C

Dale Scott

no leída,

23 ene 2017, 14:53:3823/1/17

a

< scaling horizontally

Sorry, I mean from a marketing or product-management perspective, taking a somewhat standard product and adapting it to fit a wide range of customers who have similar high-level needs but who will still need a degree of low-level customization. This is in contrast to scaling vertically, which would mostly involve adding features to satisfy all potential customers within a specific (vertical) market.

rickman

no leída,

23 ene 2017, 15:10:1923/1/17

a

On 1/23/2017 2:53 PM, Dale Scott wrote:
> < scaling horizontally
>
> Sorry, I mean from a marketing or product-management perspective, taking a somewhat standard product and adapting it to fit a wide range of customers who have similar high-level needs but who will still need a degree of low-level customization. This is in contrast to scaling vertically, which would mostly involve adding features to satisfy all potential customers within a specific (vertical) market.

Ok I get that. I like using Forth and many here claim it has huge
advantages over other languages for a variety of reasons. I like it for
two simple reasons. It is interactive which makes test and debug easier
and it uses a very simple tool rather than the complex and messy
compile, assemble, link nonsense of languages like C.

I don't see how any of that helps to support feature customization. I
think any language would support that pretty well.

--

Rick C

Wolfgang Allinger

no leída,

23 ene 2017, 16:59:5123/1/17

a

On 23 Jan 17 at group /comp/lang/forth in article 20170123074719.5c5180ec@_

Sorry I'm a native German :)

Wolfgang Allinger

no leída,

23 ene 2017, 16:59:5223/1/17

a

On 23 Jan 17 at group /comp/lang/forth in article o65m0o$asp$1...@dont-email.me

<gnu...@gmail.com> (rickman) wrote:

> On 1/23/2017 6:58 AM, Wolfgang Allinger wrote:
>>
>> On 23 Jan 17 at group /comp/lang/forth in article
>> o64fl2$iim$1...@dont-email.me <gnu...@gmail.com> (rickman) wrote:
>>
>>> On 1/23/2017 12:36 AM, hughag...@gmail.com wrote:
>>
>>>> [keeping SW secrets]

>>> A logic analyzer and a few days would likely do the job.
>>
>> I'm shure, you never tried to reverse engineering a FORTH Code.
>> BTDT

>>
>> In 1985(?) I compiled a target EPROM with a special 16bit constant for the

>> [...]

>> only the EPROM left. PHUGGGH! It cost me and him a hole day, just to
>> reengineer that code to find the constant.

>>[...]

>> And I have my sources of a released version some days after the special
>> one. However it's a pain in your ass, to reverse engineer FORTH even if
>> you have some sources and excellent knowledge of whats going on.
>>
>> Don't forget to generate headerless targets to make it worse.

> I'm not sure how having headers makes reverse engineering easier in
> compiled code. Anyway, a day to reverse engineer code is not really
> that big a deal if you are trying to duplicate a product. A week or two
> would be perfectly acceptable.

Remember, it was my own code and I knew, what to search for! So only a
reverse engineering of 2 bytes. The problem to find those byte cost me one
day!

I'm shure you wouldn't reverse engineering this program even in 6 month.

128kB target for a 64180 with 12 HW interrupts, some of them in the 10-
20kHz range. If you miss one IR, the programm will not run correct, so
your results are nonsense.

Yes, with header it's a little bit easier. In the described job of 1985,
there where no headers in the target, so I can't find the word with the
ADC constant by just searching for the name. I knew, that name and even
know, that the constant change was the only mod of this word... but...

I'm shure, you never tried reverse engineering of FORTH targets.

You have to single step and find out what and where NEXT is (assumed you
know its a FORTH Target. Then you have to keep track of SP and RP
(assumed, you know, it's FORTH) If you don't know, it's FORTH you are lost
very soon. Especially if you don't realize, that it has 2 stacks.

Then you can set a breakpoint to NEXT an trie to find out, what happened.

That is not trivial on an embedded system with HW interrupts. Likely it
won't work.

>>> Trade secrets are fine, but don't count on the simple security fuse. It
>>> is not really a hard thing at all to get past those. Check on the web,
>>> there are lots of examples of people opening chips and reverse
>>> engineering if not outright reading the data inside or the chip design.

>> But if you combine the security fuse with some words with a lot of stack
>> juggling a flag ( DUP DROP ROT ROLL >R R@ R> pick -ROLL OVER DROP MUL DIV
>> XOR MOD SHIFT -SHIFT 2* 2/ ... ) on the stack with nonsense data (I think
>> 16 entries are terrible enough) and some seconds after starting the
>> program, another stack juggling... you retrieve the flag and stop the
>> program: "Checksum ERROR!"

> I'm not sure what you are describing here. The security fuse is
> supposed to stop you from being able to read the contents of the program
> memory. But it only stops you from *reading* the program memory. There
> are other ways of getting the contents if you want to spend some money.

It's a scenario when someone break the fuse-bit and reads the content of
the program memory...

>> I think that they need several days just to find out, that it is FORTH,
>> especially if you don't have any hint to FORTH in your code and doc.

> I'm not sure that is terribly relevant.

If you don't know, it's FORTH you will be lost!

>> Another trick: put a serial number TO92 (DALLAS DSxxxx?) to the board and
>> check this number, then you even couldn't clone the program.

> Yeah, I've thought of that, but it is cumbersome and expensive to deal
> with that in production.

With the right strategy you can handle that in the final test of the
system.

Wolfgang Allinger

no leída,

23 ene 2017, 16:59:5223/1/17

a

On 23 Jan 17 at group /comp/lang/forth in article o64rhj$6sf$1...@cherry.spenarnc.xs4all.nl

<alb...@cherry.spenarnc.xs4all.nl> (Albert van der Horst) wrote:

> In article <2017Jan2...@mips.complang.tuwien.ac.at>,
> Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:

>> Despite its name, MSP432 has an ARM Cortex-M4 CPU. Isn't marketing
>> great?

> A 50 Mhz MSP432 suggest that we will never see a fast
(>> 100 Mhz) MSP430. :-(

> Forthers would like to see a 1000 Mhz 6809. That too has
> never happened.

A 25 MHz RTX2000 is/was fun enough :)

Paul Rubin

no leída,

23 ene 2017, 17:34:2823/1/17

a

I don't recommend using Forth for this project unless you're already
familiar with Forth. Learning a new language and its techniques at the
same time as trying to get a project out the door is asking for trouble.
Stick with what you know for the project, and try out Forth on the side
in a more exploratory way.

Learning new stuff from scratch takes a much different mindset (explore,
try things out, go off in different directions) than completing a
project (drill as directly as you can to the desired end result, look in
the manual or web if there's some needed info that you don't know, but
don't waste time learning more than the exact thing that you need).

A good programmer has to be able to work in each of these mindsets, but
trying to be in both at the same time creates a balancing problem.

rickman

no leída,

23 ene 2017, 19:15:3823/1/17

a

On 1/23/2017 4:18 PM, Wolfgang Allinger wrote:
>
> On 23 Jan 17 at group /comp/lang/forth in article 20170123074719.5c5180ec@_
> <NeedNotR...@xrsevnneqk.cem> (Rod Pemberton) wrote:
>
>> On Mon, 23 Jan 2017 08:58:00 -0300
>> "Wolfgang Allinger" <all...@spambog.com> wrote:
>
>>> It cost me and him a hole day,
>
>> Nice! That's one of the best puns or Freudian slips, ever. The word's
>> "whole" and "hole" are homophones.
>
> Sorry I'm a native German :)

He's saying it is a play on words. Whole and hole are pronounced the
same. Whole is what you meant (the full day), but by using hole it
sounds like a pun that the day was wasted down a hole. Not bad even if
unintentional. lol

--

Rick C

rickman

no leída,

23 ene 2017, 19:48:2423/1/17

a

As a comparison, I have a GPS that is quite old. We did have access to
the binary program as there were updates that we could apply. I don't
recall if we knew much about the format, but none of us had the
development tools, just this file. The problem was there was a constant
somewhere in the code that specified numbers for two WAAS satellites.
Those satellites had been taken offline permanently and now the WAAS
feature didn't work.

So we needed to figure out how to fix this.

How long do you think that took?

> 128kB target for a 64180 with 12 HW interrupts, some of them in the 10-
> 20kHz range. If you miss one IR, the programm will not run correct, so
> your results are nonsense.
>
> Yes, with header it's a little bit easier. In the described job of 1985,
> there where no headers in the target, so I can't find the word with the
> ADC constant by just searching for the name. I knew, that name and even
> know, that the constant change was the only mod of this word... but...
>
> I'm shure, you never tried reverse engineering of FORTH targets.

No, I have not reverse engineered a Forth program, but I don't see how
that is any harder than any other binary. It's just code. You have to
follow the execution and find the program structure, the data structure
and how the data is manipulated. It may be tedious, but it is never
impossible. Ask the NSA. What do you think it takes to write the code
that performs the crypto functions in portable equipment that might
become compromised? It is many orders of magnitude beyond what you are
describing.

> You have to single step and find out what and where NEXT is (assumed you
> know its a FORTH Target. Then you have to keep track of SP and RP
> (assumed, you know, it's FORTH) If you don't know, it's FORTH you are lost
> very soon. Especially if you don't realize, that it has 2 stacks.
>
> Then you can set a breakpoint to NEXT an trie to find out, what happened.
>
> That is not trivial on an embedded system with HW interrupts. Likely it
> won't work.

Don't ever fool yourself in thinking something is hard when you haven't
actually tried to crack it. Heck, I have seen so many things cracked
that were intentionally designed to be secure it's not funny. Why do
you think they passed the DMCA in the US? Because they simply *can't*
protect things by obscurity or even intentional encryption, so they made
breaking that encryption a felony... which is like locking your front
door, only keeps honest people honest.

>>>> Trade secrets are fine, but don't count on the simple security fuse. It
>>>> is not really a hard thing at all to get past those. Check on the web,
>>>> there are lots of examples of people opening chips and reverse
>>>> engineering if not outright reading the data inside or the chip design.
>
>>> But if you combine the security fuse with some words with a lot of stack
>>> juggling a flag ( DUP DROP ROT ROLL >R R@ R> pick -ROLL OVER DROP MUL DIV
>>> XOR MOD SHIFT -SHIFT 2* 2/ ... ) on the stack with nonsense data (I think
>>> 16 entries are terrible enough) and some seconds after starting the
>>> program, another stack juggling... you retrieve the flag and stop the
>>> program: "Checksum ERROR!"
>
>> I'm not sure what you are describing here. The security fuse is
>> supposed to stop you from being able to read the contents of the program
>> memory. But it only stops you from *reading* the program memory. There
>> are other ways of getting the contents if you want to spend some money.
>
> It's a scenario when someone break the fuse-bit and reads the content of
> the program memory...

I can't say I follow.

>>> I think that they need several days just to find out, that it is FORTH,
>>> especially if you don't have any hint to FORTH in your code and doc.
>
>> I'm not sure that is terribly relevant.
>
> If you don't know, it's FORTH you will be lost!

Hardly. It's just a bunch of either subroutine calls or jumps through
lists. I think someone would get the hang of that pretty quickly. It's
no rocket science.

>>> Another trick: put a serial number TO92 (DALLAS DSxxxx?) to the board and
>>> check this number, then you even couldn't clone the program.
>
>> Yeah, I've thought of that, but it is cumbersome and expensive to deal
>> with that in production.
>
> With the right strategy you can handle that in the final test of the
> system.

But it is extra time spent on every unit. Time is money and the DSxxxx
parts aren't as cheap as they should be. I've never seen one that is
under $0.50 at any quantity including the one that is barely more than a
serial number.

Besides, this is also not secure. If I have one of these boards, I can
find the key in the FPGA and set my own, or I can read the serial number
in the Dallas part and use my own MCU that outputs the same number in
the same way.

--

Rick C

Elizabeth D. Rather

no leída,

23 ene 2017, 21:30:4423/1/17

a

On 1/23/17 2:48 PM, rickman wrote:
> On 1/23/2017 4:43 PM, Wolfgang Allinger wrote:

...

>> Yes, with header it's a little bit easier. In the described job of 1985,
>> there where no headers in the target, so I can't find the word with the
>> ADC constant by just searching for the name. I knew, that name and even
>> know, that the constant change was the only mod of this word... but...
>>
>> I'm shure, you never tried reverse engineering of FORTH targets.
>
> No, I have not reverse engineered a Forth program, but I don't see how
> that is any harder than any other binary. It's just code. You have to
> follow the execution and find the program structure, the data structure
> and how the data is manipulated. It may be tedious, but it is never
> impossible. Ask the NSA. What do you think it takes to write the code
> that performs the crypto functions in portable equipment that might
> become compromised? It is many orders of magnitude beyond what you are
> describing.
>
>
>> You have to single step and find out what and where NEXT is (assumed you
>> know its a FORTH Target. Then you have to keep track of SP and RP
>> (assumed, you know, it's FORTH) If you don't know, it's FORTH you are
>> lost
>> very soon. Especially if you don't realize, that it has 2 stacks.
>>
>> Then you can set a breakpoint to NEXT an trie to find out, what happened.

That's assuming there is a recognizable NEXT, which isn't true in all
implementations.

>> That is not trivial on an embedded system with HW interrupts. Likely it
>> won't work.
>
> Don't ever fool yourself in thinking something is hard when you haven't
> actually tried to crack it. Heck, I have seen so many things cracked
> that were intentionally designed to be secure it's not funny. Why do
> you think they passed the DMCA in the US? Because they simply *can't*
> protect things by obscurity or even intentional encryption, so they made
> breaking that encryption a felony... which is like locking your front
> door, only keeps honest people honest.
>
>

...

>>
>> If you don't know, it's FORTH you will be lost!
>
> Hardly. It's just a bunch of either subroutine calls or jumps through
> lists. I think someone would get the hang of that pretty quickly. It's
> no rocket science.

...along with just plain code. But the "bunches" tend to be very short
(a dozen instructions or fewer), and calls/jumps are to a vast number of
places. A program of any complexity will have a few thousand defined
words, with calls all over the place, so it's hard to find a logical
progression of actions. A friend of mine (non-Forth) tried reading a
Forth listing once, looking up the word names. He complained that he
would put his finger in a page and look something up, but then to
understand that he had to look something else up, and he quickly ran out
of fingers. Dealing with a binary program is a lot harder.

Cheers,
Elizabeth

--
Elizabeth R. Conklin
FORTH, Inc.
6080 Center Drive, Suite 600
Los Angeles, CA 90045
USA

rickman

no leída,

23 ene 2017, 22:23:2923/1/17

a

Harder is not the same as hard.

--

Rick C

Mark Wills

no leída,

24 ene 2017, 4:12:5324/1/17

a

Indeed. At some point it makes sense to move over to a more capable
platform. Moving over to ARM/32 bit has more advantages than just
faster clocks.

One reason for wanting to stick with MSP430 might be the instruction
set if you're a native assembly programmer. In that case, you will
have invested quite a lot of time in learning all the control registers
and the instruction set, so there would be a natural tendency to want
to stick with that, which is understandable. Unfortunately, us assembly
language types are a dying breed. The *vast* majority (I'd wager more
than 90%) of MSP430 programmers (and maybe more generally: microcontroller
programmers) are programming in C or C++, so there's less of an impact
on your software if your underlying CPU changes.

Andrew Haley

no leída,

24 ene 2017, 4:14:5324/1/17

a

Paul Rubin <no.e...@nospam.invalid> wrote:

> I don't recommend using Forth for this project unless you're already
> familiar with Forth. Learning a new language and its techniques at
> the same time as trying to get a project out the door is asking for
> trouble.

It's pretty much the only way that's ever worked for me.

Andrew.

Albert van der Horst

no leída,

24 ene 2017, 8:39:2224/1/17

a

My idea for a very low distortion sine generator is held back
by MSP430 clock speed. It is almost a troll question as
there are always more applications possible at higher speeds.

I give you that if you really need fast processing, you
also need really large memories, or you run out of data
to process. If I want to OCR based on 1200 dpi 32 bit
color pages, I want a run of the mill 8 core x86 with
Gbyte and GHz .

Groetjes Albert

>
>--
>
>Rick C

rickman

no leída,

24 ene 2017, 19:51:1624/1/17

a

On 1/24/2017 8:38 AM, Albert van der Horst wrote:
> In article <o65m3r$asp$2...@dont-email.me>, rickman <gnu...@gmail.com> wrote:
>> On 1/23/2017 7:05 AM, Albert van der Horst wrote:
>>> In article <2017Jan2...@mips.complang.tuwien.ac.at>,
>>> Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>>> rickman <gnu...@gmail.com> writes:
>>>>> I've worked with Mecrisp, but it was on an ARM based launchpad.
>>>>
>>>> Despite its name, MSP432 has an ARM Cortex-M4 CPU. Isn't marketing
>>>> great?
>>>
>>> What do you mean? The similarity with the name MSP430?
>>> I expect that many TI users use the same c-API's on the
>>> MSP430 and MSP432 to address the same peripherals.
>>> For them the peripherals and how to use them is more
>>> important then the underlying processor. There are already
>>> 20 bit MSP430, the noforth team is working on that.
>>> It is a crutch however for a 16 bit processor and it make
>>> sense for TI to migrate to 32 bits.
>>>
>>> A 50 Mhz MSP432 suggest that we will never see a fast
>>> (>100 Mhz) MSP430. :-(
>>
>> What is the allure of a 100 MHz MSP430?
>
> My idea for a very low distortion sine generator is held back
> by MSP430 clock speed. It is almost a troll question as
> there are always more applications possible at higher speeds.

So why can't you use a different processor that has more umpfff? I
don't understand why it needs to be an MSP430.

BTW, sine generation is *very* easy to do in a PLD and can be made to
run *very* fast. I did some work on this a while back and a DDS in an
FPGA can be made using multipliers instead of a large lookup table to
minimize phase noise. In essence you can get any noise level in the
digital logic that you require and generate samples at 100s of MHz. :)

> I give you that if you really need fast processing, you
> also need really large memories, or you run out of data
> to process. If I want to OCR based on 1200 dpi 32 bit
> color pages, I want a run of the mill 8 core x86 with
> Gbyte and GHz .

I don't agree. For a given application more data may be desirable, but
there are always apps that need more crunching on the same data set.
For example, I worked on test equipment that did performance testing on
comms. We needed to up the data rate by about 4 fold... and in the same
FPGA! Our *only* technical problem in this was speed. The amount of
data being processed was essentially the same. If the unit had the
required interface we likely would have been asked to up it 4 fold
again, and be told to stuff it! lol They had another product for that
which used a newer and faster... and larger FPGA.

--

Rick C

Albert van der Horst

no leída,

25 ene 2017, 5:34:3525/1/17

a

I know the msp430 inside out. A sine generator would be an experimental
project. I don't want to learn a new processor for doing that.

>
>BTW, sine generation is *very* easy to do in a PLD and can be made to
>run *very* fast. I did some work on this a while back and a DDS in an
>FPGA can be made using multipliers instead of a large lookup table to
>minimize phase noise. In essence you can get any noise level in the
>digital logic that you require and generate samples at 100s of MHz. :)

That looks like the opposite of my idea, not simple, not elegant,
expensive and a lot of work.

<SNIP>
>
>--
>
>Rick C

Groetjes Albert

rickman

no leída,

25 ene 2017, 22:13:1025/1/17

a

My initial reply to this must have been lost by a glitch in my newsgroup
server. To make it simple...

If you don't understand my approach (which it would seem you don't if
you think it is not simple, elegant and is expensive), I'm happy to
explain it in more detail. I just don't feel like retyping it at the
moment. I have other typing to do.

What is your approach?

--

Rick C

Ilya Tarasov

no leída,

26 ene 2017, 3:19:4426/1/17

a

> BTW, sine generation is *very* easy to do in a PLD and can be made to
> run *very* fast. I did some work on this a while back and a DDS in an
> FPGA can be made using multipliers instead of a large lookup table to
> minimize phase noise. In essence you can get any noise level in the
> digital logic that you require and generate samples at 100s of MHz. :)

You may take a look on CORDIC algorithm as well. It is another example of a module, not very effective for generic MCU, but perfect for FPGA. CORDIC IP-core can provide one bit of sin/cos per clock cycle (pipelining is allowed too), so it is possible to have 16 or 32-bit sine wave with 200+ MHz clock speed. No multipliers are needed.

Albert van der Horst

no leída,

26 ene 2017, 5:37:0226/1/17

a

In my book your approach is not simple by the fact that it uses
a PLD and I don't even know what that is.

>
>What is your approach?

Potentially silly and potentially patentable.

rickman

no leída,

26 ene 2017, 14:07:5926/1/17

a

Ah, that explains a lot. Programmable Logic Device. In my use I am
referring specifically to FPGAs (Field Programmable Logic Arrays, my
area of specialty) which allow you to design with hardware rather than
software. In essence when you design PLD hardware, it all runs 100% of
the time (so can achieve humongous speeds) while when writing software
for sequential processors each step is done one at a time by a single
execution unit (typically with MCUs anyway, otherwise some small number
of execution units) which greatly limits the overall speed. If
sequential execution at the CPU clock speed is good enough, fine. But
you seem to be saying your processor is not fast enough.

That simple fact alone makes me think your algorithm is rather more
complex than other methods. Unless you are creating RF sine waves with
high speed DACs, any MCU would be fast enough to generate an optimum
sine wave at the sample rates the on chip DAC would support.

>> What is your approach?
>
> Potentially silly and potentially patentable.

Ok. What noise level do you expect to achive? What sort of DAC will
you be using?

I will say, if you aren't aware of existing methods of sine wave
generation, it is not likely your method will be better.

--

Rick C

rickman

no leída,

26 ene 2017, 14:15:1826/1/17

a

When I see the claim of the CORDIC algorithm not needing multipliers I
have to smile. No, technically the CORDIC algorithm does not need a
single multiplier. But the algorithm performs an operation that is
functionally equivalent to a multiply, N operations on N bit quantities
which is order N**2 complexity. The difference is that multiplication
is typically supported by hardware and is *very* fast while the CORDIC
algorithm must be done step by step and so is much less efficient. In
FPGAs with hard IP multipliers the CORDIC uses much more space than just
using the hard IP. In software the CORDIC takes N steps where N is the
number of bits and a multiply is often a single instruction.

The CORDIC algorithm saw much more use a decades or two ago when
dedicated multipliers were much less common. Now the CORDIC algorithm
is typically used in niche applications where there is some limitation
on using a multiplier.

--

Rick C

Ilya Tarasov

no leída,

27 ene 2017, 16:08:0527/1/17

a

четверг, 26 января 2017 г., 22:15:18 UTC+3 пользователь rickman написал:

> On 1/26/2017 3:19 AM, Ilya Tarasov wrote:
> >> BTW, sine generation is *very* easy to do in a PLD and can be made to
> >> run *very* fast. I did some work on this a while back and a DDS in an
> >> FPGA can be made using multipliers instead of a large lookup table to
> >> minimize phase noise. In essence you can get any noise level in the
> >> digital logic that you require and generate samples at 100s of MHz. :)
> >
> > You may take a look on CORDIC algorithm as well. It is another example of a module, not very effective for generic MCU, but perfect for FPGA. CORDIC IP-core can provide one bit of sin/cos per clock cycle (pipelining is allowed too), so it is possible to have 16 or 32-bit sine wave with 200+ MHz clock speed. No multipliers are needed.
>
> When I see the claim of the CORDIC algorithm not needing multipliers I
> have to smile. No, technically the CORDIC algorithm does not need a
> single multiplier. But the algorithm performs an operation that is

The idea of CORDIC is to limit one of multiplication operands by the power of 2, so multiplier can be replaced by shifter.

> functionally equivalent to a multiply, N operations on N bit quantities
> which is order N**2 complexity. The difference is that multiplication
> is typically supported by hardware and is *very* fast while the CORDIC
> algorithm must be done step by step and so is much less efficient. In
> FPGAs with hard IP multipliers the CORDIC uses much more space than just
> using the hard IP. In software the CORDIC takes N steps where N is the
> number of bits and a multiply is often a single instruction.

Pipelined version require one cycle per output with N stages of pipeline. Size is N times larger, of course.

> The CORDIC algorithm saw much more use a decades or two ago when
> dedicated multipliers were much less common. Now the CORDIC algorithm
> is typically used in niche applications where there is some limitation
> on using a multiplier.

For FPGA, there is a balance between cells, memory and multipliers. For DSP system, there DSP block are key components, it is likely you will count every DSP48 block and don't be able to spent some of them to sine wave generation. Logic cells may be free and CORDIC is a good solution in this case.

If you know another effective algorithm with hardware multipliers and exact result (not approximated), please give me a link.

rickman

no leída,

27 ene 2017, 17:54:3827/1/17

a

On 1/27/2017 4:08 PM, Ilya Tarasov wrote:
> четверг, 26 января 2017 г., 22:15:18 UTC+3 пользователь rickman
> написал:
>> On 1/26/2017 3:19 AM, Ilya Tarasov wrote:
>>>> BTW, sine generation is *very* easy to do in a PLD and can be
>>>> made to run *very* fast. I did some work on this a while back
>>>> and a DDS in an FPGA can be made using multipliers instead of a
>>>> large lookup table to minimize phase noise. In essence you can
>>>> get any noise level in the digital logic that you require and
>>>> generate samples at 100s of MHz. :)
>>>
>>> You may take a look on CORDIC algorithm as well. It is another
>>> example of a module, not very effective for generic MCU, but
>>> perfect for FPGA. CORDIC IP-core can provide one bit of sin/cos
>>> per clock cycle (pipelining is allowed too), so it is possible to
>>> have 16 or 32-bit sine wave with 200+ MHz clock speed. No
>>> multipliers are needed.
>>
>> When I see the claim of the CORDIC algorithm not needing
>> multipliers I have to smile. No, technically the CORDIC algorithm
>> does not need a single multiplier. But the algorithm performs an
>> operation that is
>
> The idea of CORDIC is to limit one of multiplication operands by the
> power of 2, so multiplier can be replaced by shifter.

Dig into the algorithm. It is iterative over the bits. Each operation
is equivalent to the stepwise operations in a multiply. Not much
difference. I don't have an implementation handy, so there may be some
simplification, but I think a stepwise multiply is actually simpler.
The difference is to calculate a sine calculation requires up to two
multiplies if you want high resolution and minimum phase jitter.

>> functionally equivalent to a multiply, N operations on N bit
>> quantities which is order N**2 complexity. The difference is that
>> multiplication is typically supported by hardware and is *very*
>> fast while the CORDIC algorithm must be done step by step and so is
>> much less efficient. In FPGAs with hard IP multipliers the CORDIC
>> uses much more space than just using the hard IP. In software the
>> CORDIC takes N steps where N is the number of bits and a multiply
>> is often a single instruction.
>
> Pipelined version require one cycle per output with N stages of
> pipeline. Size is N times larger, of course.

Yep. Very similar to a multiplier.

>> The CORDIC algorithm saw much more use a decades or two ago when
>> dedicated multipliers were much less common. Now the CORDIC
>> algorithm is typically used in niche applications where there is
>> some limitation on using a multiplier.
>
> For FPGA, there is a balance between cells, memory and multipliers.
> For DSP system, there DSP block are key components, it is likely you
> will count every DSP48 block and don't be able to spent some of them
> to sine wave generation. Logic cells may be free and CORDIC is a good
> solution in this case.

Multipliers are function blocks like anything else. There are FPGAs
with over 10,000 multipliers. Think those need to be rationed? Pick
the chip that suits your needs.

> If you know another effective algorithm with hardware multipliers and
> exact result (not approximated), please give me a link.

There's no such thing as "exact" digital trig solutions unless you have
an infinitely long word. What do you mean by exact?

--

Rick C

Paul Rubin

no leída,

27 ene 2017, 18:30:0027/1/17

a

rickman <gnu...@gmail.com> writes:
> Multipliers are function blocks like anything else. There are FPGAs
> with over 10,000 multipliers. Think those need to be rationed? Pick
> the chip that suits your needs.

Those exist, but are enormous and expensive :).

> There's no such thing as "exact" digital trig solutions unless you
> have an infinitely long word. What do you mean by exact?

Exact would mean all the bits are correct, though typically an error of
1 ULP (unit in the last place) is ok.

Ilya Tarasov

no leída,

27 ene 2017, 19:28:3027/1/17

a

> > The idea of CORDIC is to limit one of multiplication operands by the
> > power of 2, so multiplier can be replaced by shifter.
>
> Dig into the algorithm. It is iterative over the bits. Each operation
> is equivalent to the stepwise operations in a multiply. Not much
> difference. I don't have an implementation handy, so there may be some
> simplification, but I think a stepwise multiply is actually simpler.

I think you slightly miss the interpretation. Yes, it based on vector rotating, represented by phase increment/decrement, which require a pair of multiplication. We can implement, for example sin(a+b) with sin table and multipliers, where a - coarse-grained angle, and b - fine-grained. With 1 and 0.01 degree step this gives us [0, 0.01, ... 89.99, 90] as an argument. CORDIC doing principally same thing, but rotating by angles with tangent is a power of 2. Mathematically this is a multiplication, but implementation can be done with shifs only.

> The difference is to calculate a sine calculation requires up to two
> multiplies if you want high resolution and minimum phase jitter.

With CORDIC we can measure phase shift between Rubidium and Cesium atomic clocks :) They has about 1E-11 and 1E-14 precision respectively. Pure Forth system.

> Multipliers are function blocks like anything else. There are FPGAs
> with over 10,000 multipliers. Think those need to be rationed? Pick
> the chip that suits your needs.

Xilinx Virtex US+ - more than 12,000. Price is very high. DSP blocks started from 8 in Spartan-6LX4. When designer choose an FPGA, it is not so good to have twice a size just for implementing function generators coupled with MAC blocks.

> > If you know another effective algorithm with hardware multipliers and
> > exact result (not approximated), please give me a link.
>
> There's no such thing as "exact" digital trig solutions unless you have
> an infinitely long word. What do you mean by exact?

All bits, up to least significant, are true. For example, Taylor series don't provide this property.

rickman

no leída,

27 ene 2017, 23:58:1627/1/17

a

On 1/27/2017 7:28 PM, Ilya Tarasov wrote:
>>> The idea of CORDIC is to limit one of multiplication operands by
>>> the power of 2, so multiplier can be replaced by shifter.
>>
>> Dig into the algorithm. It is iterative over the bits. Each
>> operation is equivalent to the stepwise operations in a multiply.
>> Not much difference. I don't have an implementation handy, so
>> there may be some simplification, but I think a stepwise multiply
>> is actually simpler.
>
> I think you slightly miss the interpretation. Yes, it based on vector
> rotating, represented by phase increment/decrement, which require a
> pair of multiplication. We can implement, for example sin(a+b) with
> sin table and multipliers, where a - coarse-grained angle, and b -
> fine-grained. With 1 and 0.01 degree step this gives us [0, 0.01, ...
> 89.99, 90] as an argument. CORDIC doing principally same thing, but
> rotating by angles with tangent is a power of 2. Mathematically this
> is a multiplication, but implementation can be done with shifs only.

What you aren't grasping is that what you call "shifts only" is
virtually the same as multiplication. When you multiply two numbers
examining one multiplier bit at a time, the product gets shifted each
time and the multiplicand is conditionally added each time. Where is
the difference?

It was touted long and loud that the CORDIC was so much more efficient
than a multiplier when in fact it is essentially the same thing.

>> The difference is to calculate a sine calculation requires up to
>> two multiplies if you want high resolution and minimum phase
>> jitter.
>
> With CORDIC we can measure phase shift between Rubidium and Cesium
> atomic clocks :) They has about 1E-11 and 1E-14 precision
> respectively. Pure Forth system.
>
>> Multipliers are function blocks like anything else. There are
>> FPGAs with over 10,000 multipliers. Think those need to be
>> rationed? Pick the chip that suits your needs.
>
> Xilinx Virtex US+ - more than 12,000. Price is very high. DSP blocks
> started from 8 in Spartan-6LX4. When designer choose an FPGA, it is
> not so good to have twice a size just for implementing function
> generators coupled with MAC blocks.

Uh, it is seldom a chip is selected because of the multiplier. You have
to be doing some serious math to burn all the multipliers. You ignore
the large LUT count of a CORDIC done at speed. It is essentially order
N^2 which is a *lot* of FPGA fabric.

>>> If you know another effective algorithm with hardware multipliers
>>> and exact result (not approximated), please give me a link.
>>
>> There's no such thing as "exact" digital trig solutions unless you
>> have an infinitely long word. What do you mean by exact?
>
> All bits, up to least significant, are true. For example, Taylor
> series don't provide this property.

That is not a huge problem. Just look up sine generation using trig
identities. If you want a lot of resolution you use multiplies. If you
need less resolution a look up table will suffice. I recall a lot of
work was done on this in the 70s when chips were much more limited and
they needed to cut down the size of the calculations. CORDIC would have
been way too much logic so they coded the trig lookups and multiplies as
tables. Something like sin(a)cos(b) + cos(a)sin(b), where a would be
the msbs of the phase and b is the lsbs. By using separate tables for
the various trig portions with multiplies, you can get as much
resolution as you desire. Two multiplies is nothing to get this
solution and can run very fast.

I evaluated this in a spread sheet and found that once you are working
in the area of 18 bit resolution, the linear interpolations are
virtually perfect because they are over such a short section. I was
able to achieve over 100 dB of SNR with an 18 bit phase word.

When using CORDIC, how do you change the frequency?

--

Rick C

Albert van der Horst

no leída,

28 ene 2017, 5:10:4528/1/17

a

In article <o6gj07$ner$1...@dont-email.me>, rickman <gnu...@gmail.com> wrote:
>
>
>> If you know another effective algorithm with hardware multipliers and
>> exact result (not approximated), please give me a link.
>
>There's no such thing as "exact" digital trig solutions unless you have
>an infinitely long word. What do you mean by exact?

My idea has a continuously varying frequenties with
0 1e harmonic (2 *)
0 2e harmonic
0 3e harmonic
0 4e harmonic
0 5e harmonic
0 6e harmonic
0 7e harmonic
0 8e harmonic
0 9e harmonic (10 *)

The zeroes are a theoretical zero assuming the digital part
is exact, meaning, it assumes values of i in [1,N] i*dV
which are exact. This is within reach of an MSP430.

>--
>
>Rick C

rickman

no leída,

28 ene 2017, 6:15:1028/1/17

a

On 1/28/2017 5:10 AM, Albert van der Horst wrote:
> In article <o6gj07$ner$1...@dont-email.me>, rickman <gnu...@gmail.com> wrote:
>>
>>
>>> If you know another effective algorithm with hardware multipliers and
>>> exact result (not approximated), please give me a link.
>>
>> There's no such thing as "exact" digital trig solutions unless you have
>> an infinitely long word. What do you mean by exact?
>
> My idea has a continuously varying frequenties with
> 0 1e harmonic (2 *)
> 0 2e harmonic
> 0 3e harmonic
> 0 4e harmonic
> 0 5e harmonic
> 0 6e harmonic
> 0 7e harmonic
> 0 8e harmonic
> 0 9e harmonic (10 *)
>
> The zeroes are a theoretical zero assuming the digital part
> is exact, meaning, it assumes values of i in [1,N] i*dV
> which are exact. This is within reach of an MSP430.

Sorry, I don't follow. This notation does not make sense to me.

--

Rick C

m...@iae.nl

no leída,

28 ene 2017, 10:48:3828/1/17

a

On Saturday, January 28, 2017 at 11:10:45 AM UTC+1, Albert van der Horst wrote:
[..]

> My idea has a continuously varying frequenties with
> 0 1e harmonic (2 *)

[..]

> 0 9e harmonic (10 *)
>
> The zeroes are a theoretical zero assuming the digital part
> is exact, meaning, it assumes values of i in [1,N] i*dV
> which are exact.

I am very sure you know that a continuously varying signal does not have a discrete spectrum (finite set of harmonics). I guess you want a characteristic that is better described in the time-domain. What is it?

-marcel

Albert van der Horst

no leída,

28 ene 2017, 13:48:3828/1/17

a

Ok the statement is:
The sine generator has a continously variable frequency.
The sine generator has no distortion in the first few harmonics.

>
>--
>
>Rick C

Groetjes Albert

Albert van der Horst

no leída,

28 ene 2017, 13:54:2728/1/17

a

In article <b2d958c6-f722-44ef...@googlegroups.com>,

Please restrict lines to 72 characters.

You can complain to everybody who tells what the distortion of his
sine generator is, because no sine generator can be working since
before the big bang. Sheez!

I set my varco to arrive at
1.0101901909101901910910328723987329732978328 Khz

Then I wait an infinite time to stabilize.
Then I measure the harmonics.

>
>-marcel
>

Groetjes Albert

rickman

no leída,

28 ene 2017, 14:37:1728/1/17

a

Can you explain anything about how it works?

When you say "within reach" of an MSP430, I'm not clear on what that
means. Any processor can implement any algorithm. The issue is speed.

--

Rick C

Ilya Tarasov

no leída,

28 ene 2017, 14:58:0628/1/17

a

> > I think you slightly miss the interpretation. Yes, it based on vector
> > rotating, represented by phase increment/decrement, which require a
> > pair of multiplication. We can implement, for example sin(a+b) with
> > sin table and multipliers, where a - coarse-grained angle, and b -
> > fine-grained. With 1 and 0.01 degree step this gives us [0, 0.01, ...
> > 89.99, 90] as an argument. CORDIC doing principally same thing, but
> > rotating by angles with tangent is a power of 2. Mathematically this
> > is a multiplication, but implementation can be done with shifs only.
>
> What you aren't grasping is that what you call "shifts only" is
> virtually the same as multiplication. When you multiply two numbers
> examining one multiplier bit at a time, the product gets shifted each
> time and the multiplicand is conditionally added each time. Where is
> the difference?

The difference is only in amount of resources required for each kind of operations. Shifting costs virtually nothing. If you want to multiply by any operand, many shifters must be used with corresponding AND to calculate partial results, combining by adder tree. If you know multiplicand cannot contain '1' at certain positions, you don't need to calculate that results and adder tree will be simplier. CORDIC use an ultimate approach, where only one bit may be '1'. This is very similar to serial multiplier, when clock cycles are traded for area. So you have two ultimate implementation of N-bit CORDIC - N cycles for result with small area or *N area with 1 cycle.

> It was touted long and loud that the CORDIC was so much more efficient
> than a multiplier when in fact it is essentially the same thing.

Of course, it is not a magic, just a rational way to rotate angle, friendly to silicon resorces.

> Uh, it is seldom a chip is selected because of the multiplier. You have
> to be doing some serious math to burn all the multipliers. You ignore
> the large LUT count of a CORDIC done at speed. It is essentially order
> N^2 which is a *lot* of FPGA fabric.

You may refer to this document
https://www.xilinx.com/support/documentation/ip_documentation/ru/cordic.html
This is not too much resources for serious system. For example, 32-bit sin/cos use 403 LUTs in word serial and 3411 in parallel mode. For Kintex-7, the ratio of LUT/DSP is about 200-300.

And yes, math in FPGA can be serious :)

> That is not a huge problem. Just look up sine generation using trig
> identities. If you want a lot of resolution you use multiplies. If you
> need less resolution a look up table will suffice. I recall a lot of
> work was done on this in the 70s when chips were much more limited and
> they needed to cut down the size of the calculations. CORDIC would have
> been way too much logic so they coded the trig lookups and multiplies as
> tables. Something like sin(a)cos(b) + cos(a)sin(b), where a would be
> the msbs of the phase and b is the lsbs. By using separate tables for
> the various trig portions with multiplies, you can get as much
> resolution as you desire. Two multiplies is nothing to get this
> solution and can run very fast.

Oh, when I think I know what you are talking about.

> I evaluated this in a spread sheet and found that once you are working
> in the area of 18 bit resolution, the linear interpolations are
> virtually perfect because they are over such a short section. I was
> able to achieve over 100 dB of SNR with an 18 bit phase word.

120 dB is nice for us, 160-200 dB achieved in models, but we facing jitter and stability problems with quartz. Rubidium atomic clock is nice for such class of precision, and we can see a difference.

> When using CORDIC, how do you change the frequency?

Simplest way based on the wide phase accumulator and control the phase increment value. Also, you can change base frequency of overall DSP path in hardware using dynamic reconfiguration port in Clock Management Tile (I mean Xilinx chips).

rickman

no leída,

28 ene 2017, 15:36:4328/1/17

a

I think you are still not getting it, but I can't tell for sure. The
point is that either the CORDIC or a multiplier uses pretty much the
same resources if designed in the fabric. Don't focus on one
insignificant part (the shifting which is always free in an FPGA). What
is important is what is done with the data after it is shifted.

Both the multiplier and the CORDIC can be done time sequentially using
logic for one layer of operation or can be pipelined with N layers of
logic and registers producing a result on every clock cycle.

>> It was touted long and loud that the CORDIC was so much more
>> efficient than a multiplier when in fact it is essentially the same
>> thing.
>
> Of course, it is not a magic, just a rational way to rotate angle,
> friendly to silicon resorces.
>
>> Uh, it is seldom a chip is selected because of the multiplier. You
>> have to be doing some serious math to burn all the multipliers.
>> You ignore the large LUT count of a CORDIC done at speed. It is
>> essentially order N^2 which is a *lot* of FPGA fabric.
>
> You may refer to this document
> https://www.xilinx.com/support/documentation/ip_documentation/ru/cordic.html
> This is not too much resources for serious system. For example,
> 32-bit sin/cos use 403 LUTs in word serial and 3411 in parallel mode.
> For Kintex-7, the ratio of LUT/DSP is about 200-300.

That is a pretty large usage. I can do the same thing using a handful
of multipliers. In 403 LUTs I can build a CPU. In 3400 LUTs I can
build a RISC pipelined CPU with advanced features.

The FPGAs I use in a product that generate sine waves (among other
things) has a total of 3000 LUTs. Of course it only runs at 8 ksps and
these are 16 bit samples.

That sounds like any other DDS. You either change the phase step or
change the reference frequency. So the phase step size can be changed
on the fly for frequency modulation?

--

Rick C

Ilya Tarasov

no leída,

28 ene 2017, 18:05:1828/1/17

a

> I think you are still not getting it, but I can't tell for sure. The
> point is that either the CORDIC or a multiplier uses pretty much the
> same resources if designed in the fabric. Don't focus on one
> insignificant part (the shifting which is always free in an FPGA). What
> is important is what is done with the data after it is shifted.

Please specify. Do you mean approach, based on sin(a+b)?
If so, we use sin(a+b) = sin(a)*cos(b) + cos(a)*sin(b), and this require multiplication of sin and cos values. Is it what you talking about?

> Both the multiplier and the CORDIC can be done time sequentially using
> logic for one layer of operation or can be pipelined with N layers of
> logic and registers producing a result on every clock cycle.

We need to compare algorithm with algorithm and implementation with implementation. CORDIC is an implementation, where sequental applying of sin(a+b) formulae is an algorithm. Multiplication is an implementation. What algorithm you use?

> > This is not too much resources for serious system. For example,
> > 32-bit sin/cos use 403 LUTs in word serial and 3411 in parallel mode.
> > For Kintex-7, the ratio of LUT/DSP is about 200-300.
>
> That is a pretty large usage. I can do the same thing using a handful
> of multipliers. In 403 LUTs I can build a CPU. In 3400 LUTs I can
> build a RISC pipelined CPU with advanced features.

This corresponds with what I know. However, area itself is not an indicator. We need to know what else is located inside FPGA, and what resources are most important to save. With several approaches we can consider a workaround when certain type of FPGA resources is running low.

> The FPGAs I use in a product that generate sine waves (among other
> things) has a total of 3000 LUTs. Of course it only runs at 8 ksps and
> these are 16 bit samples.

What about performance index of you approach? For example, (bits * samples_per_second)/LUT. We can calculate this from Xilinx table. In general, if FPGA is not very large, we may start from serial algorithms.

> > Simplest way based on the wide phase accumulator and control the
> > phase increment value. Also, you can change base frequency of overall
> > DSP path in hardware using dynamic reconfiguration port in Clock
> > Management Tile (I mean Xilinx chips).
>
> That sounds like any other DDS. You either change the phase step or
> change the reference frequency. So the phase step size can be changed
> on the fly for frequency modulation?

Yes, with control registers. If we write phase <= phase + phase_delta; it is possible to connect phase_delta register to CPU system bus (and directly control phase register on the fly too). We use Forth CPU for control.

Task of such kind are very informative. Forth CPU allows to connect devices very easy with no overhead on complex system buses like AXI. In this case, it is not so important, what is the performance or area of Forth CPU, because it used for real task with it's own valuability. I'm sure you can imagine the power of hardware module, running on FPGA. With any CPU it's functionality is greatly improving because of reconfiguration capabilities. With Forth CPU we can additionally adapt this system to the man who works with hardware and tightly connect CPU to hardware.

m...@iae.nl

no leída,

28 ene 2017, 18:49:4228/1/17

a

On Saturday, January 28, 2017 at 7:54:27 PM UTC+1, Albert van der Horst wrote:
> In article <b2d958c6-f722-44ef...@googlegroups.com>,
> <m...@iae.nl> wrote:
> >On Saturday, January 28, 2017 at 11:10:45 AM UTC+1, Albert van der Horst wrote:
> >[..]
> >> My idea has a continuously varying frequenties with
> >> 0 1e harmonic (2 *)
> >[..]
> >> 0 9e harmonic (10 *)
> >>
> >> The zeroes are a theoretical zero assuming the digital part
> >> is exact, meaning, it assumes values of i in [1,N] i*dV
> >> which are exact.
> >
> >I am very sure you know that a continuously varying signal
> >does not have a discrete spectrum (finite set of harmonics).
> >I guess you want a characteristic that is better described
> >in the time-domain. What is it?
>
> Please restrict lines to 72 characters.
>
> You can complain to everybody who tells what the distortion of his
> sine generator is, because no sine generator can be working since
> before the big bang. Sheez!
>

Only when they claim an impossibly low distortion.
But indeed, one could claim that a sine oscillator
never produces a sinewave.

> I set my varco to arrive at
> 1.0101901909101901910910328723987329732978328 Khz
>
> Then I wait an infinite time to stabilize.
> Then I measure the harmonics.

That varco will have a drift that is no
way able to keep up with the 43 decimal digits
of your desired resolution.

I think your wish has to do with the effect
that an immediate frequency change produces
a very objectionable auditory effect, not
unlike a click. Old electronic organs suffered
from this. Gating an oscillator has a similar
problem, as does using extremely steep filters.

-marcel

Paul Rubin

no leída,

28 ene 2017, 18:57:4128/1/17

a

Ilya Tarasov <ilya74....@gmail.com> writes:
> Please specify. Do you mean approach, based on sin(a+b)?
> If so, we use sin(a+b) = sin(a)*cos(b) + cos(a)*sin(b), and this
> require multiplication of sin and cos values. Is it what you talking
> about?

I think the idea is b is small, so maybe the multiplication can be
approximated by addition, i.e. treat sin(b)=b and cos(b)=1.

> [Rick:]

>> In 403 LUTs I can build a CPU.

Rick, can your 403 LUT cpu do sin/cos as quickly as 403 LUT of Cordic?

rickman

no leída,

28 ene 2017, 21:51:1628/1/17

a

On 1/28/2017 6:05 PM, Ilya Tarasov wrote:
>> I think you are still not getting it, but I can't tell for sure.
>> The point is that either the CORDIC or a multiplier uses pretty
>> much the same resources if designed in the fabric. Don't focus on
>> one insignificant part (the shifting which is always free in an
>> FPGA). What is important is what is done with the data after it is
>> shifted.
>
> Please specify. Do you mean approach, based on sin(a+b)? If so, we
> use sin(a+b) = sin(a)*cos(b) + cos(a)*sin(b), and this require
> multiplication of sin and cos values. Is it what you talking about?

I am talking about the fact that you are making a big distinction
between a CORDIC computation and a multiplier when there is only one
real difference. A CORDIC computation will use virtually the same
resources in the FPGA fabric as an equivalent multiply, likely more.
But the difference is that most FPGAs include hard IP multipliers! Hard
IP uses *much* less silicon area than anything done in the fabric.
Altera has recently announced hard IP floating point ALUs.

>> Both the multiplier and the CORDIC can be done time sequentially
>> using logic for one layer of operation or can be pipelined with N
>> layers of logic and registers producing a result on every clock
>> cycle.
>
> We need to compare algorithm with algorithm and implementation with
> implementation. CORDIC is an implementation, where sequental applying
> of sin(a+b) formulae is an algorithm. Multiplication is an
> implementation. What algorithm you use?

At this point I have lost my interest in debating this much further.
CORDIC is not the only method of calculating rotating vectors. It is
useful where you have a lack of computing resources. If multiplication
is readily available other methods compare very favorably.

What exactly is a "Forth CPU"?

--

Rick C

rickman

no leída,

28 ene 2017, 21:51:4428/1/17

a

How fast can a CORDIC be done?

--

Rick C

Ilya Tarasov

no leída,

29 ene 2017, 12:53:2829/1/17

a

воскресенье, 29 января 2017 г., 2:57:41 UTC+3 пользователь Paul Rubin написал:

> Ilya Tarasov <ilya74....@gmail.com> writes:
> > Please specify. Do you mean approach, based on sin(a+b)?
> > If so, we use sin(a+b) = sin(a)*cos(b) + cos(a)*sin(b), and this
> > require multiplication of sin and cos values. Is it what you talking
> > about?
>
> I think the idea is b is small, so maybe the multiplication can be
> approximated by addition, i.e. treat sin(b)=b and cos(b)=1.

This causing a big difference. It is very approximating approach.

Ilya Tarasov

no leída,

29 ene 2017, 13:20:3929/1/17

a

> > Please specify. Do you mean approach, based on sin(a+b)? If so, we
> > use sin(a+b) = sin(a)*cos(b) + cos(a)*sin(b), and this require
> > multiplication of sin and cos values. Is it what you talking about?
>
> I am talking about the fact that you are making a big distinction
> between a CORDIC computation and a multiplier when there is only one
> real difference. A CORDIC computation will use virtually the same
> resources in the FPGA fabric as an equivalent multiply, likely more.
> But the difference is that most FPGAs include hard IP multipliers! Hard
> IP uses *much* less silicon area than anything done in the fabric.
> Altera has recently announced hard IP floating point ALUs.

You still don't list your algorithm details. Maybe it will consume LUTs in other parts, for control structures or so on. Of course, I know about hard multipliers (with 10+ years experience of approved Xilinx trainer it is very hard to miss something important). You need to store angles, sin/cos values and provide datapath control (sequental, pipelined or mixed). If you still can provide maximum error with 1 LSB for any datawidth, it is interesting. If not, it is an approximation, suitable for some purposes.

> At this point I have lost my interest in debating this much further.
> CORDIC is not the only method of calculating rotating vectors. It is
> useful where you have a lack of computing resources. If multiplication
> is readily available other methods compare very favorably.

So what is your method?

> > because of reconfiguration capabilities. With Forth CPU we can
> > additionally adapt this system to the man who works with hardware and
> > tightly connect CPU to hardware.
>
> What exactly is a "Forth CPU"?

I have several designs. Now it is 7th generation of microarchitecture, with hardware multithreading, targeting Spartan-6 and 7-series Xilinx FPGAs.

> How fast can a CORDIC be done?

From 175 MHz to 300-400, depends on FPGA famuly, function calculated, datawidth and serial/parallel architecture.

Albert van der Horst

no leída,

29 ene 2017, 13:32:5529/1/17

a

In article <6415800d-9ed0-4eef...@googlegroups.com>,
Ilya Tarasov <ilya74....@gmail.com> wrote:
>=D0=B2=D0=BE=D1=81=D0=BA=D1=80=D0=B5=D1=81=D0=B5=D0=BD=D1=8C=D0=B5, 29 =D1=
>=8F=D0=BD=D0=B2=D0=B0=D1=80=D1=8F 2017 =D0=B3., 2:57:41 UTC+3 =D0=BF=D0=BE=
>=D0=BB=D1=8C=D0=B7=D0=BE=D0=B2=D0=B0=D1=82=D0=B5=D0=BB=D1=8C Paul Rubin =D0=
>=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB:

>> Ilya Tarasov <ilya74....@gmail.com> writes:
>> > Please specify. Do you mean approach, based on sin(a+b)?

>> > If so, we use sin(a+b) =3D sin(a)*cos(b) + cos(a)*sin(b), and this

>> > require multiplication of sin and cos values. Is it what you talking
>> > about?

>>=20

>> I think the idea is b is small, so maybe the multiplication can be

>> approximated by addition, i.e. treat sin(b)=3Db and cos(b)=3D1.

>
>This causing a big difference. It is very approximating approach.

For small b the above becomes
sin(a)*(1-b**2) + cos(a)*b
sin(a) +b*(cos(a)-b*sin(a))

Under no circumstances the coefficient of b can be substantially greater
than one so there are no numerical problems in sight.
The result will not suffer more than the usual rounding errors.

Cecil Bayona

no leída,

29 ene 2017, 13:38:5729/1/17

a

Do you have a website that discuses the CPUs? I would be interested in
looking to get ideas which considered by many as a bad idea to begin with.

--
Cecil - k5nwa

Ilya Tarasov

no leída,

29 ene 2017, 14:49:4529/1/17

a

> > From 175 MHz to 300-400, depends on FPGA famuly, function calculated, datawidth and serial/parallel architecture.
> >
> Do you have a website that discuses the CPUs? I would be interested in
> looking to get ideas which considered by many as a bad idea to begin with.

www.fforum.winglion.ru

Write me your nick for manual approving (too many bots attempting to register).

I ay note there is no 'perfect CPU'. I have no plans to sell my designs or call others to take part. More effective is to explain how to create CPU and make it optimized for certain purpose.

Ilya Tarasov

no leída,

29 ene 2017, 14:53:4829/1/17

a

> >This causing a big difference. It is very approximating approach.
>
> For small b the above becomes
> sin(a)*(1-b**2) + cos(a)*b
> sin(a) +b*(cos(a)-b*sin(a))
>
> Under no circumstances the coefficient of b can be substantially greater
> than one so there are no numerical problems in sight.
> The result will not suffer more than the usual rounding errors.

This is better approximation, but this just extend the area where results are quite good. Anyway, many ways to calculate sin/cos may coexist to choose optimal for certain task. We don't must avoid exact algorithms just because they require too muc resources.

rickman

no leída,

30 ene 2017, 1:10:5430/1/17

a

I don't know why you label this an approximation. The level of accuracy
is controlled by the tradeoff of how many of the total phase bits you
assign to a and b. Any level of accuracy required can be obtained. You
need to appreciate fully how close to a straight line the small section
of a sine function is over very short angles. It is not at all hard to
get the accuracy within a single lsb.

--

Rick C

Ilya Tarasov

no leída,

30 ene 2017, 2:11:3730/1/17

a

> I don't know why you label this an approximation. The level of accuracy
> is controlled by the tradeoff of how many of the total phase bits you
> assign to a and b. Any level of accuracy required can be obtained. You
> need to appreciate fully how close to a straight line the small section
> of a sine function is over very short angles. It is not at all hard to
> get the accuracy within a single lsb.

Because of 'for small b'. This is a limitation. Approximation is not a disadvantage by default, but we may face a problem attempting to improve precision. Exact formula don't require any kind of improvement, because it provide any accuracy by default. If it is not exact, we must always answer a question 'how small b must be to obtain desired accuracy?'. Also, different ways to approximation may be applied - what about playing with polynom order?

Paul Rubin

no leída,

30 ene 2017, 2:34:4630/1/17

a

Ilya Tarasov <ilya74....@gmail.com> writes:
> If it is not exact, we must always answer a question 'how small b must
> be to obtain desired accuracy?'.

Why is that a problem? The math is well known and not complicated. And
you can choose the size of b by making big enough lookup tables for a.

> Also, different ways to approximation may be applied - what about
> playing with polynom order?

That's also worthwhile--adding more terms slows things down, but you can
move the coefficients around to control the minimax error over an
interval, generally doing better than the Taylor series of the same
order whose error is worst at one end of the interval. This isn't my
thing but the Wikipedia article on Chebychev polynomials looks like a
good place to start.

Albert van der Horst

no leída,

30 ene 2017, 6:37:3030/1/17

a

In article <e8b3a38a-d6b7-41b3...@googlegroups.com>,
Ilya Tarasov <ilya74....@gmail.com> wrote:
>=20
>> I don't know why you label this an approximation. The level of accuracy=
>=20
>> is controlled by the tradeoff of how many of the total phase bits you=20
>> assign to a and b. Any level of accuracy required can be obtained. You=
>=20
>> need to appreciate fully how close to a straight line the small section=
>=20
>> of a sine function is over very short angles. It is not at all hard to=
>=20

>> get the accuracy within a single lsb.
>

>Because of 'for small b'. This is a limitation. Approximation is not a disa=
>dvantage by default, but we may face a problem attempting to improve precis=
>ion. Exact formula don't require any kind of improvement, because it provid=
>e any accuracy by default. If it is not exact, we must always answer a ques=
>tion 'how small b must be to obtain desired accuracy?'. Also, different way=

>s to approximation may be applied - what about playing with polynom order?

You're fundamentally mistaken. The small angle is only assumed to
analyse rounding errors. Mathematically the formula is exact.
If a represents degrees and b represents minutes, all angles that represent
whole minutes can be calculated mathematically exact.
There is a table needed for the sin and cosine of 0..90 degress and
for 0..60 minutes.
Instead one could use a table for all seconds angles up to one degree,
which require 3600 entries. Now we have the cos and sine for all
angles that are an exact multiple of seconds.

I'm sure that you can interpolate linearly with an arc second, but I'm
too lazy too calculate that for you.

Albert van der Horst

no leída,

30 ene 2017, 6:43:0930/1/17

a

In article <87o9ypy...@nightsong.com>,

On my site there is a ubasic program that will calculates
Chebychev polynomials for functions that you have a prototype for
that calculates values in sufficient precision: dminmax.ub.

WARNING: those polynomials fare badly where there are poles nearby
in the complex plane, such as the arctangent.

rickman

no leída,

30 ene 2017, 10:15:2730/1/17

a

I don't understand what you are going on about. This will do the job to
any level of accuracy required. Are you saying you don't like thinking
about your work? What you call approximation, I call engineering.

--

Rick C

ctm...@gmail.com

no leída,

30 ene 2017, 14:02:4230/1/17

a

For very small angles sin x = x (radians) and cos x = 1.

There is also a Newton method for calculating sin and cos.

c-:

Paul Rubin

no leída,

30 ene 2017, 14:04:2330/1/17

a

ctm...@gmail.com writes:
> There is also a Newton method for calculating sin and cos.

If you have division, rational approximations can do better than
polynomials of similar size, especially around complex poles as Albert
mentions.

Ilya Tarasov

no leída,

30 ene 2017, 15:15:3630/1/17

a

> You're fundamentally mistaken. The small angle is only assumed to
> analyse rounding errors. Mathematically the formula is exact.
> If a represents degrees and b represents minutes, all angles that represent
> whole minutes can be calculated mathematically exact.
> There is a table needed for the sin and cosine of 0..90 degress and
> for 0..60 minutes.
> Instead one could use a table for all seconds angles up to one degree,
> which require 3600 entries. Now we have the cos and sine for all
> angles that are an exact multiple of seconds.

You are trying to retell me a page from my book :)

> I'm sure that you can interpolate linearly with an arc second, but I'm
> too lazy too calculate that for you.

I was not so lazy and calculate seconds when compare it to interpolation. Error is small, but not zero. With seconds, we will have 90 degrees + 60 minutes + 60 seconds table, but 90*60*60 combinations. We can repeat this many times. Combination of tables, angle rotation and interpolation is working but we need to know how exactly.

Ilya Tarasov

no leída,

30 ene 2017, 15:20:2030/1/17

a

> I don't understand what you are going on about. This will do the job to
> any level of accuracy required. Are you saying you don't like thinking
> about your work? What you call approximation, I call engineering.

Engineering is a way to solve a task, but task must be stated first. Let's think about angle rotating. Ok, we have a way to calculate sin(a+b), so we can represent any angle as a sum of several angles, with sin and cos values listed in a table. What is an optimal way to choose that angles? From the first look, we may list degrees, minutes and seconds, but is it optimal?

Paul Rubin

no leída,

30 ene 2017, 17:09:0930/1/17

a

Ilya Tarasov <ilya74....@gmail.com> writes:
> From the first look, we may list degrees, minutes and seconds, but is
> it optimal?

It's convenient, so the question then is whether it's close enough.

Optimal involves non-uniform placement of the interpolation points,
e.g. at the so-called Chebychev nodes according to Wikipedia. I'd
expect that with non-optimal placement, you can still get the same
accuracy by using more points. That seems ok to me if you have enough
memory for the bigger tables.

Using naive deg/min/sec interpolation (see below) I get a max error of
about 1.56e-4 which is well above 1 ULP in single precision. I think
that can be improved by just moving the interpolation point slightly.
There's too stuff going on here at the moment for me to figure that out
right now, but this is a start.

================================================================

: dms.1 ( f:x -- f:int f:frac ) fdup f>s s>f ftuck f- ;
: f>dms ( F:degrees -- F:degrees F:minutes F:seconds )
dms.1 60e f* dms.1 60e f* dms.1 f+ ;

: dms>f ( F:d F:m F:s -- F:dd ) 60e f/ f+ 60e f/ f+ ;
: deg>rad ( F:d -- F:rad ) pi f* 180e f/ ;
: rad>deg ( F:rad -- F:deg ) 180e f* pi f/ ;
: min>rad ( F:min -- F:rad ) 60e f/ deg>rad ;
: sec>rad ( F:min -- F:rad ) 60e f/ min>rad ;

\ approximate sin x using d/m/s interpolation
: xsin ( F:deg -- F:approximate sine )
f>dms sec>rad { F: s } min>rad { F: m } deg>rad { F: d }
d fcos m fsin f*
d fsin m fcos f* f+
fdup fcos s f*
f+ ;

: fclearstack fdepth 0 ?do fdrop loop ;
: cs clearstack fclearstack ;

: err ( deg -- e ) fdup xsin fswap deg>rad fsin f/ 1e f- fabs ;
: maxerr ( -- e ) cs 0e 89999 1 do i s>f 1e-3 f* err fmax loop ;

rickman

no leída,

30 ene 2017, 17:10:4830/1/17

a

I've already explained the division is done in the phase angle
representation. The high order bits are used for the gross angle and
the low order bits are used for the fine angle. No math required. You
can pick how many bits in each to suit your precision requirements. If
you need a formal description of how to do that I will leave you to the
many sources available through a Google search.

--

Rick C

Ilya Tarasov

no leída,

30 ene 2017, 17:38:2830/1/17

a

> I've already explained the division is done in the phase angle
> representation. The high order bits are used for the gross angle and
> the low order bits are used for the fine angle. No math required. You
> can pick how many bits in each to suit your precision requirements. If
> you need a formal description of how to do that I will leave you to the
> many sources available through a Google search.

Try to think deeper and you will reinvent the idea of CORDIC :) This is what you are trying to 'explain' to me, but angles in CORDIC just properly chosen, with no interpolation required. This is like the dichotomy algorithm compared to randomly selected additions. I don't need to search Google for things that I'm know good enough. Now you are near starting point and it is possible to improve your implementation, if you really need this.

Albert van der Horst

no leída,

30 ene 2017, 21:20:0230/1/17

a

In article <87d1f4y...@nightsong.com>,

Interpolating between exact points 1" apart:
The most unfavourable is interpolating between -.5" and +.5"
1" is 4.8E-6 radians, so both cosines are
1- .59E-11 radians. The average is the same, which is the
interpolated value for cos(0), which is then .59E-11 off,
a respectable precision.
This can be accomplised by sin and cos tables for 90 degree
values, 60 minute angles and 60 second angles.
A value can calculated using 7 multiplications.

So this is no shabby method, although probably Chebychev
polynomials are better.

rickman

no leída,

30 ene 2017, 23:09:3130/1/17

a

You still don't get it. CORDIC is implemented using a large amount of
FPGA fabric. Sines can be calculated much more efficiently using hard
IP multipliers which use a much smaller area on the chip and so cost
less in real estate and so chip cost. They also operate very fast.

You seem to be accustomed to designing in the days when multipliers were
either not available or to be avoided and so the "no multiplier" aspect
of the CORDIC algorithm sounded like a good idea. We are way beyond
that with today's chips.

--

Rick C

Paul Rubin

no leída,

30 ene 2017, 23:19:2530/1/17

a

alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> Interpolating between exact points 1" apart: ...

> 1- .59E-11 radians. The average is the same, which is the
> interpolated value for cos(0), which is then .59E-11 off,
> a respectable precision.

That's well within 1 ulp. I only interpolated between minutes, I'll try
putting in seconds. I also didn't use the x**2 term for approximating
the cosine near x=0. I just used cos x = 1.

> This can be accomplised by sin and cos tables for 90 degree values, 60
> minute angles and 60 second angles. A value can calculated using 7
> multiplications.

Nice.

Andrew Haley

no leída,

31 ene 2017, 5:34:1431/1/17

a

Really? For sin and cos? I'm looking at Hart et al and for sin I see
10.67 digits for a degree-5 polynomial and and 10.94 digits for a
degree 3/degree 2 ratio. Division is *expensive*, but multiplication
these days is often very cheap.

Andrew.

Ilya Tarasov

no leída,

31 ene 2017, 12:32:5131/1/17

a

> You still don't get it. CORDIC is implemented using a large amount of
> FPGA fabric. Sines can be calculated much more efficiently using hard
> IP multipliers which use a much smaller area on the chip and so cost
> less in real estate and so chip cost. They also operate very fast.
>
> You seem to be accustomed to designing in the days when multipliers were
> either not available or to be avoided and so the "no multiplier" aspect
> of the CORDIC algorithm sounded like a good idea. We are way beyond
> that with today's chips.

Not only multipliers are cheap. Words are cheap too :)
Ok, let me do your job. Let's create a project for calculating sin with 16-bit argument. Divide that argument to 8-bit coarse and fine parts. We will need to implement: sin(a+b) = sin(a)*cos(b) + cos(a)*sin(b);

Assume Kintex-7 Xilinx FPGA. First, we will need to create sin annd cos tables, otherwises synthesizer will optimize out empty tables to simple constants.

Here is a Forth script to generate VHDL initialization lines
===================================================================
{
" sin_coarse.dat" NEWFILE TO HF-OUT
256 0 DO
I .F " => " PRINTF
I S>F 256.0 F/ PI F* 2.0 F/ FSIN 65535.0 F* F>S
.F
" ," PRINTF CRF
LOOP

HF-OUT CLOSE

" cos_coarse.dat" NEWFILE TO HF-OUT
256 0 DO
I .F " => " PRINTF
I S>F 256.0 F/ PI F* 2.0 F/ FCOS 65535.0 F* F>S
.F
" ," PRINTF CRF
LOOP

HF-OUT CLOSE

" sin_fine.dat" NEWFILE TO HF-OUT
256 0 DO
I .F " => " PRINTF
I S>F 256.0 F/ PI F* 2.0 F/ 256.0 F/ FSIN 65535.0 F* F>S
.F
" ," PRINTF CRF
LOOP

HF-OUT CLOSE

" cos_fine.dat" NEWFILE TO HF-OUT
256 0 DO
I .F " => " PRINTF
I S>F 256.0 F/ PI F* 2.0 F/ 256.0 F/ FCOS 65535.0 F* F>S
.F
" ," PRINTF CRF
LOOP

HF-OUT CLOSE
}
===================================================================

VHDL code

----------------------------------------------------------------------------library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity top is
Port ( clk : in STD_LOGIC;
phase : in STD_LOGIC_VECTOR (15 downto 0);
result : out STD_LOGIC_VECTOR (31 downto 0));
end top;

architecture Behavioral of top is

type TTable is array(0 to 255) of integer range 0 to 65535;

signal SinCoarse : Ttable := (
0 => 0 ,
-- skipped lines generated by Forth
others => 0);

signal SinFine : Ttable := (
0 => 0 ,
-- skipped lines generated by Forth
others => 0 );

signal CosCoarse : Ttable := (
0 => 65535 ,
-- skipped lines generated by Forth
others => 0);

signal CosFine : Ttable := (
0 => 65535 ,
-- skipped lines generated by Forth
others => 0);

signal sina, sinb, cosa, cosb : integer range 0 to 65535;
signal sin : integer;

begin

-- sin(a+b) = sin(a)*cos(b) + cos(a)*sin(b);

process(clk)
begin
if rising_edge(clk) then
sina <= SinCoarse(to_integer(unsigned(phase(15 downto 8))));
end if;
end process;

process(clk)
begin
if rising_edge(clk) then
sinb <= SinFine(to_integer(unsigned(phase(7 downto 0))));
end if;
end process;

process(clk)
begin
if rising_edge(clk) then
cosa <= CosCoarse(to_integer(unsigned(phase(15 downto 8))));
end if;
end process;

process(clk)
begin
if rising_edge(clk) then
cosb <= CosFine(to_integer(unsigned(phase(7 downto 0))));
end if;
end process;

process(clk)
begin
if rising_edge(clk) then
sin <= sina*cosb + cosa*sinb;
end if;
end process;

result <= std_logic_vector(to_unsigned(sin, 32));

end Behavioral;
=========================================================================

Report Cell Usage:
+------+--------+------+
| |Cell |Count |
+------+--------+------+
|1 |BUFG | 1|
|2 |DSP48E1 | 2|
|3 |LUT2 | 2|
|4 |LUT3 | 2|
|5 |LUT4 | 1|
|6 |LUT5 | 10|
|7 |LUT6 | 145|
|8 |MUXF7 | 64|
|9 |MUXF8 | 32|
|10 |IBUF | 17|
|11 |OBUF | 16|
+------+--------+------+

Finally, this project requires two multipliers, and this is clearly from the source formula. All tables goes to distributed memory, since they are not so large. LUT are required for this. BRAM usage can be switched by synthesys options or will be applied automatically when larger tabel size will required.

I have several questions here.

1. If you want to go to 32-bit phase, can we simple change MSB of angle parts, increase table sizes and keep this code? How many DSP blocks will be used in this case? What about memory size?
2. If we will go to (a+b+c) approach, are there any hidden traps? Can we simple double the number of calculating resources, i.e. assume d = a+b, calculate sin(a+b), when calculate finale angle as d + c. How many resources will be used in this case?

rickman

no leída,

4 feb 2017, 4:22:344/2/17

a

On 1/29/2017 1:20 PM, Ilya Tarasov wrote:
>>> Please specify. Do you mean approach, based on sin(a+b)? If so,
>>> we use sin(a+b) = sin(a)*cos(b) + cos(a)*sin(b), and this
>>> require multiplication of sin and cos values. Is it what you
>>> talking about?
>>
>> I am talking about the fact that you are making a big distinction
>> between a CORDIC computation and a multiplier when there is only
>> one real difference. A CORDIC computation will use virtually the
>> same resources in the FPGA fabric as an equivalent multiply, likely
>> more. But the difference is that most FPGAs include hard IP
>> multipliers! Hard IP uses *much* less silicon area than anything
>> done in the fabric. Altera has recently announced hard IP floating
>> point ALUs.
>
> You still don't list your algorithm details. Maybe it will consume
> LUTs in other parts, for control structures or so on. Of course, I
> know about hard multipliers (with 10+ years experience of approved
> Xilinx trainer it is very hard to miss something important). You need
> to store angles, sin/cos values and provide datapath control
> (sequental, pipelined or mixed). If you still can provide maximum
> error with 1 LSB for any datawidth, it is interesting. If not, it is
> an approximation, suitable for some purposes.

You mean I don't have a fully coded VHDL example at 32 bits word width
to show you? No, I don't. But that is not needed. The control logic
would be trivial compared to thousands of LUTs. The only extra logic
required is on the order of a handful of adders and/or multiplexers.

I took the time to do some calculations which show to get results
accurate to all but the last bit of a 32 bit word would require a sin()
table with a quarter of a million values. That would put it outside the
realm of most FPGAs and require external memory. However if external
memory is used, this is not only feasible but simple. The only issue is
to get the step sizes small enough so the two approximations do apply,
sin(b)=b and cos(b)=1.0 to within the required accuracy.

I've never seen an application that needed 32 bits of sin/cos. The DDS
work I've done was related to real world signals which are limited to
some 20 to 24 bits of SNR when converting between analog and digital at
very best. I was pretty happy when I showed myself I could get 20 bits
without any real trouble.

>> At this point I have lost my interest in debating this much
>> further. CORDIC is not the only method of calculating rotating
>> vectors. It is useful where you have a lack of computing
>> resources. If multiplication is readily available other methods
>> compare very favorably.
>
> So what is your method?
>
>>> because of reconfiguration capabilities. With Forth CPU we can
>>> additionally adapt this system to the man who works with hardware
>>> and tightly connect CPU to hardware.
>>
>> What exactly is a "Forth CPU"?
>
> I have several designs. Now it is 7th generation of
> microarchitecture, with hardware multithreading, targeting Spartan-6
> and 7-series Xilinx FPGAs.

But that doesn't tell me what a Forth CPU is. I assume you really mean
a stack machine. When you say "hardware multithreading", I expect this
is using the pipeline to facilitate multiple threads rather than
concurrent instructions in a single thread?

>> How fast can a CORDIC be done?
>
> From 175 MHz to 300-400, depends on FPGA famuly, function calculated,
> datawidth and serial/parallel architecture.

Yeah, that would be a typical pipelined speed.

--

Rick C

Ilya Tarasov

no leída,

4 feb 2017, 20:54:164/2/17

a

> You mean I don't have a fully coded VHDL example at 32 bits word width
> to show you? No, I don't. But that is not needed. The control logic
> would be trivial compared to thousands of LUTs. The only extra logic
> required is on the order of a handful of adders and/or multiplexers.

So you can't show me a code and utilization report, but still claiming area will be small? How small? What about 'only extra logic' and how you can estimate the size of adders/multiplexers for you code?

> I took the time to do some calculations which show to get results
> accurate to all but the last bit of a 32 bit word would require a sin()
> table with a quarter of a million values. That would put it outside the
> realm of most FPGAs and require external memory. However if external
> memory is used, this is not only feasible but simple. The only issue is
> to get the step sizes small enough so the two approximations do apply,
> sin(b)=b and cos(b)=1.0 to within the required accuracy.

Again, I don't see you 'required accuracy'. Can you list your results and explain how you got it?

> I've never seen an application that needed 32 bits of sin/cos. The DDS
> work I've done was related to real world signals which are limited to
> some 20 to 24 bits of SNR when converting between analog and digital at
> very best. I was pretty happy when I showed myself I could get 20 bits
> without any real trouble.

The fact you don't need it is not a fact nobody need it. ADC resolution matters, but this is not the only factor to choose sin/cos resolution. Results with single precision and double-precision floats are not the same for given ADC. Again, I can write a piece of Forth code to show the difference... but maybe you will switch from thinking to coding?

> > I have several designs. Now it is 7th generation of
> > microarchitecture, with hardware multithreading, targeting Spartan-6
> > and 7-series Xilinx FPGAs.
>
> But that doesn't tell me what a Forth CPU is. I assume you really mean
> a stack machine. When you say "hardware multithreading", I expect this
> is using the pipeline to facilitate multiple threads rather than
> concurrent instructions in a single thread?

Yes, I really mean a stack machine, i.e. words like DUP DROP + - AND OR are executed in one clock cycle with all appropriate modifications to stack pointers etc. Multiple threads are supported by time multiplexing, so one data memory and one ALU are shared between all threads, but each clock cycle belongs to its own thread.

hughag...@gmail.com

no leída,

4 feb 2017, 23:26:254/2/17

a

On Saturday, February 4, 2017 at 6:54:16 PM UTC-7, Ilya Tarasov wrote:
> Yes, I really mean a stack machine, i.e. words like DUP DROP + - AND OR are executed in one clock cycle with all appropriate modifications to stack pointers etc. Multiple threads are supported by time multiplexing, so one data memory and one ALU are shared between all threads, but each clock cycle belongs to its own thread.

What you have is nothing like the MiniForth (now known as the RACE). What you have is like Paysan's B16 or my Stundurd design.

The MiniForth had an assembly-language. Primitives such as DUP DROP + - AND OR etc. were written in this assembly-language. It was Harvard Architecture --- the machine-code had a different memory from the data and threaded code --- the two memories could be accessed simultaneously.

There was no CALL primitive --- the 16-bit cfa compiled in the threaded code had one bit that indicated if it was a primitive or a colon word.

The machine-code had up to 5 instructions packed into a single opcode, all of which executed in one clock cycle. My assembler would rearrange the instructions in such a way as to pack as many into the opcodes together as possible (minimize how many NOPs had to be inserted) while still guaranteeing that the code did the same thing as if the instructions were assembled one per opcode in the order that they appeared in the source-code.

This is more robust than your method because there can be any number of primitives --- I had hundreds --- you are limited to 32 or 64 primitives (some small number that can't be increased).

P.S. Do you really think that Rickman has actually built an FPGA processor? Or is this all fantasy on his part? Does he seem like he actually knows VHDL, or did he just read a book on it but never attain any actual experience? I know that he's faking his Forth software expertise, which makes me think he is faking his FPGA expertise too.

Ilya Tarasov

no leída,

5 feb 2017, 10:53:305/2/17

a

> What you have is nothing like the MiniForth (now known as the RACE). What you have is like Paysan's B16 or my Stundurd design.

Yes and no. For a CPU we always can say 'this is like a Finite State Machine'. Or 'your bubble sorting program is just John Smith's bubble sorting'. For a CPU, programming model often depends on microarchitecture, but this is not necessary. I told nothing about microarchitecture.

Also, for an FPGA, targeting hardware resources is also important. For a CPU based on 10+ years old FPGA, these challenges are different. Some projects cannot be directly retarget to a modern FPGAs because of major differencies in hardware blocks interface and behavior. Some may be less efficient because of new features, not used in previous designs.

> The machine-code had up to 5 instructions packed into a single opcode, all of which executed in one clock cycle. My assembler would rearrange the instructions in such a way as to pack as many into the opcodes together as possible (minimize how many NOPs had to be inserted) while still guaranteeing that the code did the same thing as if the instructions were assembled one per opcode in the order that they appeared in the source-code.

> This is more robust than your method because there can be any number of primitives --- I had hundreds --- you are limited to 32 or 64 primitives (some small number that can't be increased).

This is a point to discuss. I have several designs with packed commands. This requires a serializer from hardware side (not difficult), but some efforts from the tool side. I leave this idea at last, because of no visible benefits in real applications.

I use an explicit CALL and JMP commands, with address on the top of data stack, so there are no limitations for the number of words. Also it depends on the purpose of certain CPU design. I prefer to not to highlight I have a CPU design, because I have a design methodology. For a CPU, there is no silver bullet, and certain purpose will lead to certain set of requirements. Wide command means more memory, but also more bandwidth. Narrow command means more code density, but also more cycles to perform some operations. We cannot be sure what is more important.

> P.S. Do you really think that Rickman has actually built an FPGA processor? Or is this all fantasy on his part? Does he seem like he actually knows VHDL, or did he just read a book on it but never attain any actual experience? I know that he's faking his Forth software expertise, which makes me think he is faking his FPGA expertise too.

I think he can't :) As I told to my students 'I need 30 seconds to determine your really knowledge. But I need 15 more minutes to show your knowledge to you'. This relates to conferences too. I ask him some questions, and he can find an answer, based on my code. If he can't - ok, words are cheap :)