IRC logs for #openrisc Sunday, 2016-09-25

--- Log opened Sun Sep 25 00:00:27 2016
olofkkc5tja: Congratulations on your progress!00:45
kc5tjaThanks!00:50
mor1kx[mor1kx] spacemonkeydelivers closed pull request #40: Introducing PCU (master...master) https://github.com/openrisc/mor1kx/pull/4008:32
mor1kx[mor1kx] spacemonkeydelivers opened pull request #41: Introducing PCU (master...pcu) https://github.com/openrisc/mor1kx/pull/4108:40
mor1kx[mor1kx] spacemonkeydelivers opened pull request #42: Fixes for saturating counter branch predictor (master...bpred_fixes) https://github.com/openrisc/mor1kx/pull/4208:41
mor1kx[mor1kx] spacemonkeydelivers opened pull request #43: Introducing gshare branch predictor (master...gshare) https://github.com/openrisc/mor1kx/pull/4308:42
bandvigI've downloaded stekern's (http://oompa.chokladfabriken.org/tmp/dhry/) dhry test09:45
bandvigand run dhry-atlys.bin (converted to u-boot image) on the two pipelines: CAPPUCCINO and MAROCCHINO.09:45
bandvigInitial results were:09:45
bandvigCAPPUCCINO: DMIPS / DMIPS/MHz:    69 / 1.38000009:45
bandvigMAROCCHINO: DMIPS / DMIPS/MHz:    82 / 1.64000009:45
bandvigTo achieve better results, I rebuild the benchmark with GCC-5.3.009:45
bandvig(the tool chain was build with "old" newLIB - from openrisc/or1k-src repository,09:46
SMDhome1bandvig: have you modified sources of dhrystone?09:46
bandvigbecause we couldn't build newLIB with "mhard-float" option http://juliusbaxter.net/openrisc-irc/%23openrisc.2016-07-07.log.html)09:46
bandvigCompiler command line was:09:46
bandvigor1k-elf-gcc -flto -pipe -O2 -mcmov -mhard-mul -mhard-div -mhard-float -mboard=atlys dhry.c -lm -o dhry_lto.elf09:46
bandvigIn fact in my cofiguration both pipes contains hardware multiplier, divider and l.mov. Also "link time optimization" was activated: -flto.09:46
bandvigResults:09:47
bandvigCAPPUCCINO: DMIPS / DMIPS/MHz:  183 / 3.66000009:47
bandvigMAROCCHINO: DMIPS / DMIPS/MHz:  197 / 3.94000009:47
bandvigSMDhome1: I used exactly source I downloaded from link http://oompa.chokladfabriken.org/tmp/dhry09:48
bandvigSMDhome1: Moreover, please let me repeat, initially I just used pre-compiled binary: dhry-atlys.bin09:51
SMDhome1bandvig: I believe that results are not quite correct due to compiler could remove some parts of unused code(nothing got printed)09:51
SMDhome1Could you, please, compile and run this one: http://fossies.org/linux/privat/old/dhrystone-2.1.tar.gz/09:51
bandvigSMDhome1: Does the fossies's source contain OR1K timer related stuff?09:55
SMDhome1bandvig: nope, it doesn't09:56
wallentoolofk: still don't get the vhdl library issue11:39
wallentoeverything gets compiled into worklib, right?11:39
wallentois there an example somewhere?11:41
wallentowhich fails11:41
wallentoxil_defaultlib I mean11:43
wallentoolofk: added top_module11:52
ZipCPU|Laptopbandvig SMDhome1: There's some curious differences between the oompa version and the one I've been using.13:50
ZipCPU|LaptopI'll note two:13:51
ZipCPU|Laptop1. The function call to test1(10,20) before the routine, guaranteeing that all of the code will be loaded into the cache before starting.13:51
ZipCPU|Laptop2. The initialization of Ch_Loc to zero, allowing the compiler to then optimize based upon it.13:51
ZipCPU|LaptopFurther, the instructions for Dhrystone specifically state that the *two* Dhrystone files must be compiled separately, not as a single file.13:52
ZipCPU|LaptopThis becomes partly a linker test, then, since they need to be placed together without final optimizations available.13:52
ZipCPU|LaptopEnabling the link-time-optimization flag disables this part of the test, and therefore artificially inflates the score.13:52
ZipCPU|LaptopTo be a valid Dhrystone measure, the dhry.c file needs to be split properly into the two component files, dhry1.c and dhry2.c.13:53
ZipCPU|LaptopThese files must then be compiled separately, linked together, and then the test may begin.13:53
bandvigZipCPU|Laptop: the requirement to prevent using LTO sounds strange for me.14:10
bandvigIn fact If I want achieve maximum performance I'm going to use any king of suitable optimizations. And LTO is one of them.14:10
bandvigRegarding caches. Default cache size for instructions and data is 32Kbytes for each. Dhry binary is ~83.5 Kbytes. It could not be cached completely.14:14
Danbandvig: The no-LTO requirement is a consequence of the Dhrystone instructions.  I did not create that requirement.  I'm just trying to make certain that one Dhrystone number can properly be compared to another.14:34
Danbandvig: As for the caches, I think 1) the impact is minimal (if any), 2) that it would only affect the first time through the loop, and 3) that the code that is actually executed is definitely small enough to fit within the respective caches.14:35
DanKeep in mind, a lot of the data requirement is for the statistics reporting at the end.14:35
-!- Dan is now known as ZipCPU14:35
ZipCPUThere are lots of strings, maintained in the code space, for that reporting.  These are not used until after the code has completed.14:36
ZipCPUI'm sure the same is true of the libraries that link with it.14:36
ZipCPUAs I recall, I was able to get the entire test to fit within 1kW (4kB) when I worked with it.14:37
olofkwallento: Add a dependency on libstorage and try this code to see fails without library assignment http://a6cc216f27fa3665.paste.se/16:08
olofkAha! Finally found why the or1k-basic test fails. It's because of an uninitialized wire when the FPU is enabled16:27
olofkIf anyone sees bandvig, please ask him if spr_bus_ack_fpu_i is supposed to be connected to something. It's currently not, and that brings in an undefined value which causes havoc deep inside mor1kx16:31
olofkMaybe stekern_ or wallento knows16:32
stekern_why are you guys so obsessed with running dhrystone? I always thought coremark was regarded as a better test. I almost exclusively used that to compare the changes I did to mor1kx.17:11
-!- stekern_ is now known as stekern17:12
ZipCPUstekern: I may be the source of the Dhrystone obsession.17:38
ZipCPUI was hoping to have a comparison between the ZipCPU and mor1kx-generic to present as part of ORCONF this year.17:39
ZipCPUWhile I have downloaded Coremark to examine it, I have not gotten it to run and I may not be able to get it to run without violating it's rules.17:39
ZipCPUFor example, coremark depends upon a byte-size of 8-bits.  On the ZipCPU, the byte-size is 32-bits.17:39
ZipCPUThis makes for all kinds of hassles when trying to port software to the ZipCPU.17:40
ZipCPUStill ... I was looking for a benchmark.17:40
ZipCPUI'm open to alternatives ... ?17:40
ZipCPU|LaptopFor bandvig when he returns: I just ran his code in mor1kx-generic, and got nowhere near the score he's claiming.19:43
ZipCPU|Laptop(Even with the criticisms mentioned above ... I didn't fix those ...)19:43
ZipCPU|LaptopI wonder if the problem is in mor1kx-generic ... that it's somehow not up to the speed of the OR1k ATLYS implementation?19:44
ZipCPU|LaptopOkay, I can now just about reproduce bandvig's work.  Looks like my big problem was not including the -mhard-div compiler flag.22:30
ZipCPU|LaptopHowever, if you combine the two files, as I mentioned above, you get an (illegal) artificial performance boost of perhaps 25% or so.22:31
ZipCPU|LaptopSince it violates the "rules" of Dhrystone, his measure remains inflated, while the one I had calculated had been much too low.22:31
ZipCPU|LaptopOh, and I should point out, I'm making my measurements with mor1kx-generic.  While I did try or1200-generic, it was significantly slower.22:36
kc5tjaGetting all of OP-IMM instructions working (I think), but now I've broken JALR somehow.23:03
--- Log closed Mon Sep 26 00:00:06 2016

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!