IRC logs for #openrisc Saturday, 2015-03-07

--- Log opened Sat Mar 07 00:00:11 2015
stekernpoke53281: I tested your 'mandelpar' against mor1kx's fpu now, got ~500 sec vs ~700 sec with and without -mhard-float09:43
stekernon 4 cores09:43
stekernI modified it to use paletted fb too09:46
stekernmy lk pull-request got merged too09:55
bandvigstekern: good news about lk!!!09:56
bandvigstekern: does the 'mandelpar' includes a lot of trigonometric, logarithms, square roots, pow, etc (i.e. functions, rather arithmetic)?09:57
stekernbandvig: this is the original code I got from poke53281 http://pastie.org/969265009:59
bandvigstekern: I couldn't to download it. "Sorry, there is no pastie #9692650 or it has been removed. Why not create a new pastie?"10:01
stekernoh, I had it in cache obviously...10:02
stekernhttp://pastie.org/1000703010:03
bandvigstekern: btw, which SoC you use for multicore? optimsoc?10:26
poke53281stekern: Great10:41
poke53281But 500ms vs 700ms is not that good.10:42
poke53281The coide contains the log functions. Is this one also executed in hardware?10:43
poke53281Maybe you have to compile musl also with hard-float10:43
poke53281Or is a log calculation provided by gcc and not libc?10:46
poke53281But I hope you like openmp as much as I do.10:48
bandvigstekern: poke53281: yes, it contains log() and (I believe) at least sqrt() as a part of abs(complex). The functions aren't supported in hardware. So it should be checked if these functions (and other ones, of course) are computed with soft-float or hard-float arithmetics.11:05
dalias?11:14
poke53281The question mark tells me, that the software implemented log and sqrt function is provided by gcc and not musl :)11:15
daliasno11:17
daliasi meant the question of hard vs soft does not make sense11:17
daliasif you have hard float and you're using it, soft float will not be used for anything11:18
daliassqrt and log just need to be built up from elementary (hard) float operations rather than having hardware do the whole operation as a unit11:18
poke53281well, the hard-float supports only very basic operations. If the program uses other numeric functions, the library, which contains these function should be also compiled with hard-float.11:18
daliasand the way this is done is the same whether the underlying float arithmetic is hard or soft11:18
daliasah i see. are you thinking of a case where the app was compiled with hard-float but libc was compiled for soft?11:19
poke53281really?11:19
poke53281Yes11:19
daliasyes. sqrt is just a .c file11:19
daliasit doesn't care if +-*/ are implemented with hardware or software11:19
poke53281Yes, the C- file doesn't care. But compiled lib cares.11:20
daliaswell either way it computes it in the same manner. it's just a matter of whether the +-*/ are optimized11:21
poke53281Yes11:21
daliasthat's what i meant11:21
daliassorry for the confusion11:21
poke53281I wonder why the hard-float unit gives only a speed up of 40%.11:22
poke53281And that could be a reason.11:23
daliashow fast is your hard float?11:23
poke53281I don't know. Ask bandvig and stekern.11:24
daliasunless fpu ops are as fast (or nearly) as integer ops, hard float is probably not going to be a "huge" win11:24
poke53281They are probably slower.11:28
daliasyeah11:28
bandvigdalias: poke53281: :))) I've got 10...20 times speed up on Whetstone tests which use arithmetic (+-*/) only. But, I don't see any improvement for Whetstone tests which uses functions. I use NewLIB.11:31
bandvigLe me several lines, I'll put the whole table with results.11:32
daliaswell you would need to compile newlib with hardfloat11:33
poke53281sorry, I don't use logf and sqrtf11:34
poke53281but log and sqrt which uses double.11:35
poke53281So maybe, this is the reason.11:35
bandvigdalias: yes, I believe it is the path for further imrovement also foy 'mandelpar'11:35
bandvigpoke53281: it is also must be corrected11:36
poke53281stekern: Please change every function to the corresponding single floating point function and compile you library also with hard-float. Then test again.11:37
bandvigpoke53281: btw, you use abs(complex<float>). Is it involves sqrtf(float) or sqrt(double)?11:39
poke53281don't know11:40
poke53281well, abs include sqrt11:41
poke53281better would be to calculate the square of it and compare against 4, not 2.11:41
bandvigpoke53281: perhaps it coul be safely to replace it with sqrtf(real(z)*real(z)+imag(z)*imag(z))11:41
daliaspoke53281, using tgmath.c could do that automatically :-p11:42
daliasbut tgmath.h is hideous11:42
daliasbandvig, cabsf?11:42
poke53281never heart about tgmath.11:42
daliastgmath.h was a hideous addition in c9911:43
poke53281Wel, I usually never use float. It is terrible inaccurate. The mandelpar was never meant to be fast. Just a way to test SMP with a parallelized program.11:44
daliasyeah, float is pretty bad for most things11:46
daliasmakes sense for audio samples tho11:46
bandvigdalias: perhaps, yuo are right, cabsf(), I'm not veri familiar with complex lib.11:46
stekernbandvig: poke53281: my test was just of the type "something that use the fpu"11:46
poke53281http://pastie.org/1000712411:47
poke53281try this11:47
daliasbtw it's unfortunate when fpus lack sqrt instruction11:47
daliassqrt is one of the most expensive ops to do in C11:47
daliasbecause it needs to be exact/correctly-rounded, not just a good approximation11:48
stekernnot so much trying to read out the performance of the fpu11:48
bandvigpoke5381: I wold make a correction: (z.real()*z.real()+z.imag()*z.imag() <= 4.0f   or  (z.real()*z.real()+z.imag()*z.imag() <= (float)4.11:48
poke53281Ok11:49
stekernyeah, that shaves of 410 sec of it11:51
stekern+f11:52
stekern(i.e. 90 sec remaining)11:52
poke53281great11:53
stekernand no, I didn't recomile anything else than the actual program with -mhard-float11:53
bandvigdalias: personally I'm interested exactly in float as I widely use it to implement acquizition/tracking algorithms in digital receivers. The float usage speed up design cycle many many times.11:54
stekernvs 170 when compiled with softfloat11:56
poke5328190 vs 170. Sounds better.11:59
poke53281Ok, so there might be still the log fuction for the color calculation.12:01
poke53281And the conversion float to int.12:01
stekernbandvig: did you do any delibarate area optimisations too? I recall that the or1200 fpu was about the same size as or1200, while pfpu32 is only about half the size of mor1kx12:02
stekernpoke53281: the color calculation is precalculated in my modification12:02
poke53281Ok12:03
bandvigpoke53281: float <-> int are supported in FPU12:03
stekernhttp://pastie.org/1000713712:04
poke53281Ok, that means, that the effective speedup is around a factor of two.12:06
bandvigstekern: in fact, the FPU almost completely refactored. In particular OR1200-FPU uses separate post-normalization units for each operation. Mor1kX-FPU uses common align and rounding post-operation steps.12:08
stekernah, ok.12:11
stekernnice work12:12
bandvigThanks. Additionally, OR1200-FPU uses digit recurrence division. In Mor1kX-FPU the Goldshmidth division is implemented and DIV/MUL units share multiplier.12:15
stekernyeah, I saw your question about that on the list earlier and the commit12:17
bandvigstekern: it looks my last post overwrite yor's one with new reference on pastie.org isn't it?12:18
stekernsorry, couldn't parse that, what do you mean?12:20
bandvigwell, how to say... Was the http://pastie.org/10007137 the last your post with reference on pastie.org. If "yes", don't worry.12:25
bandvigwell, next 12 lines will contain Whetstone comparison12:28
bandvigplease, be patient12:29
bandvig          Single Precision C/C++ Whetstone Benchmark12:29
bandvigLoop content                soft-float   OR1200       mor1kx12:29
bandvig                                           FPU         FPU12:29
bandvigN1 floating point (MFLOPS)    0.409       3.200       9.60012:30
bandvigN2 floating point (MFLOPS)    0.336       3.360       6.72012:30
bandvigN3 if then else     (MOPS)    0.000       0.000       0.00012:30
bandvigN4 fixed point      (MOPS)    2.250      31.500      31.50012:30
bandvigN5 sin,cos etc.     (MOPS)    0.019       0.020       0.02012:30
bandvigN6 floating point (MFLOPS)    0.409       2.075       7.70612:31
bandvigN7 assignments      (MOPS)    0.000       0.000       0.00012:31
bandvigN8 exp,sqrt etc.    (MOPS)    0.009       0.009       0.00912:31
bandvigMWIPS                         0.954       1.128       1.15612:31
bandvigdone12:31
bandvigwell, lets back to library building flags. I've disassembled libm.a from NewLIB. I didn't found any lf.* instructions.13:37
bandvigOn the other hand the sinf(), for example, is computed by Tailor series through call __kernel_sinf() and __kernel_cosf().13:37
bandvigIt means that we have to set -mhard-float option somewhere in make files to build NewLIB hard-float variant of libm.m with lf.* instructions.13:38
bandvigSo, could someone consult me how to to it?13:38
bandvig it is interesting that I found l.mul in the disassembled libm.a. It means that it least -mhard-mul was used. Am I right?13:45
rcallanhi. Is there a list of development boards being actively developed on? There seem to be several old webpages and dead links14:11
stekernbandvig: -mhard-mul is default, yes14:25
bandvigstekern: but I haven't found how -mhard-mul is provided into command line of or1k's gcc while building NewLIB. Do you know that?15:17
daliasif it's the compiler default it doens't need to be provided15:34
dalias-mhard-float presumably uses hard mul unless you do -mno-hard-mul too, no?15:35
bandvigdalias: stekern: Oh, I haven't uderstood correclty. I thought "the -mhard-mul is provided to NewLIB building command line by default".16:11
bandvigstekern: btw, the "$or1k-elf-gcc --target-help" lists or1k specific options, but it doesn't say if some of them are active by default. Are there other default options?16:15
bandvigdalias: Actually, I don't know about relations between or1k specefic options. Is there description somewhere?16:18
daliasi dunno either16:18
stekernbandvig: I'm not sure if there's a way to see from the command line the default options, but you can get it from here: https://github.com/openrisc/or1k-gcc/blob/or1k/gcc/common/config/or1k/or1k-common.c#L5919:59
stekernand all the MASK_ options can be seen here: https://github.com/openrisc/or1k-gcc/blob/or1k/gcc/config/or1k/or1k.opt20:00
stekernfrom that you get the Init value of mredzone too20:00
bandvigstekern: Thanks. I'll look at that. And I've got a genius idea. :) Let's add the '(default)' into --target-help output to mark the options activated by default. :)20:06
stekernyeah, but to be honest, I doesn't have the faintest idea how the --target-help list is generated ;)20:14
--- Log closed Sun Mar 08 00:00:12 2015

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!