oetwi | Thanks for the input! But i assume that burst accesses as such work - for instance, i ran dhrystone with caches enabled and got the expected results. My guess is that Linux does not mark the UARTs memory area as uncacheable. That would result in the putc-loop always reading the UART as busy and never actually sending a character. If that is the case, i would need to know how to correct that behaviour. It | 15:58 |
---|
ZipCPU|Laptop | Hence, for the XuLA2-LX25 SoC I built, I can (using Verilator) run a Dhrystone test-bench and count clocks. | 17:43 |
---|---|---|
kc5tja | re: counting Dhrystones, that's sweet! Ideally, I'd like to get a virtual Kestrel-3 that includes video output at some point, just so I can play around with it, even if it's ultra-slow. | 17:55 |
SMDhome1 | bandvig: have you modified sources of dhrystone? | 09:46 |
---|---|---|
SMDhome1 | Could you, please, compile and run this one: http://fossies.org/linux/privat/old/dhrystone-2.1.tar.gz/ | 09:51 |
ZipCPU|Laptop | Further, the instructions for Dhrystone specifically state that the *two* Dhrystone files must be compiled separately, not as a single file. | 13:52 |
ZipCPU|Laptop | To be a valid Dhrystone measure, the dhry.c file needs to be split properly into the two component files, dhry1.c and dhry2.c. | 13:53 |
Dan | bandvig: The no-LTO requirement is a consequence of the Dhrystone instructions. I did not create that requirement. I'm just trying to make certain that one Dhrystone number can properly be compared to another. | 14:34 |
stekern_ | why are you guys so obsessed with running dhrystone? I always thought coremark was regarded as a better test. I almost exclusively used that to compare the changes I did to mor1kx. | 17:11 |
ZipCPU | stekern: I may be the source of the Dhrystone obsession. | 17:38 |
ZipCPU|Laptop | Since it violates the "rules" of Dhrystone, his measure remains inflated, while the one I had calculated had been much too low. | 22:31 |
ZipCPU|Laptop | kc5tja: I intend to discuss the benefit of pipelining without a cache at ORCONF. Indeed, part of my presentation will show Dhrystone measures with and without pipelining. | 15:30 |
---|---|---|
ZipCPU|Laptop | One more comment on the Dhrystone measure: that is with and without pipelining on the *data* channel. The *instruction* channel is both pipelined and cached as soon as the cache in the CPU is enabled, and hence the CPU is pipelined. (The option connects the two within the ZipCPU.) | 15:37 |
ZipCPU|Laptop | Indeed, I get a rough 50% improvement in my Dhrystone score by implementing pipelining ... even without a data cache. | 15:37 |
ZipCPU|Laptop | Certainly, the only reason why their Dhrystone numbers are above 1.0 is because everything must be in a very well designed cache. | 21:58 |
---|
ZipCPU | Yes, I still have the binaries. They are his binaries of the dhrystone algorithm. Executables that would run on de0 or atlys. | 14:57 |
---|---|---|
ZipCPU | Then something's still not right. 'Cause with the newlib build I have, or1k got a very poor dhrystone score, and because | 15:29 |
SMDhome1 | olofk: clear_ram was introduced due to problems w/ uninitialized ram in dhrystone, so it's required. At least for icarus | 02:16 |
---|
SMDhome1 | olofk: I'm running 100k dhrystone loops on icarus and log file is already 8gb | 14:58 |
---|
SMDhome1 | stekern: I've just noticed that in your dhrystone code all printfs are commented out. If you just add printf(%d, Int_1_loc), the result becomes significantly worse. | 03:12 |
---|---|---|
stekern | I've only used that dhrystone test to make comparisons between changes (and between or1200 and mor1kx) | 05:02 |
ZipCPU | stekern: If you've used dhrystone to make comparisons between or1200 and mor1kx, do you remember what the results were? | 07:52 |
kc5tja | I guess I don't know too much about Dhrystones. | 13:14 |
---|---|---|
ZipCPU | kc5tja: We were actually comparing Dhrystone MIPS / MHz. It's a clock independent measure of CPU speed. Multiply it by your clock speed, and you get a measure of Dhrystone MIPS. | 13:15 |
ZipCPU | Dhrystone MIPS is a measure of your CPU speed, when compared with a VAX at 1MHZ clock speed, which is deemed to be 1DMIPS. | 13:16 |
SMDhome1 | I think I've found what's wrong w/ openrisc dhrystone results | 13:59 |
SMDhome1 | In this case we have next options: either we delete printfs or we increase dhrystone loops to eliminate printfs influence | 14:01 |
SMDhome1 | I'm running 1M dhrystone loops now, but for 200k I got better results than ZipCPU | 14:02 |
kc5tja | Another question is which version of Dhrystone is being used. 1.0, 1.1, and 2.1 will all report different values for the same architecture. | 14:03 |
ZipCPU | While the Dhrystone benchmark states that the code must be compilable, must come from GCC, it doesn't necessarily state that the library routines can't be hand-optimized. | 14:39 |
ZipCPU | Well ... not quite. Dhrystone is not meant to be hand optimized. I'm sure there are those that do it, but it's *supposed* to be a measure that includes compiler performance. | 14:40 |
_franck_ | ZipCPU, kc5tja : there is dhrystone numbers here: http://www.juliusbaxter.net/openrisc-irc/search?q=Dhrystone | 15:38 |
kc5tja | Also, it's a common complaint against Dhrystone that you're really testing the compiler's standard library performance more than you are the CPU itself. | 15:47 |
kc5tja | Now, see, I want to find out what Dhrystone ranking I get with my own RISC-V core, as well as with the S64X7. Should be enlightening. :) | 15:55 |
ZipCPU | stekern: You were the one who ran Dhrystone last: Do you have any of the system, software, and/or assembly left behind from when you did it? | 16:16 |
stekern_ | looking through old irc logs, the last dhrystone result I've mentioned seems to be 1.44 | 16:34 |
ZipCPU | DHRYSTONE WORKS!!! (UART TOO!) | 13:38 |
---|---|---|
olofk | ZipCPU: I think you can run dhrystone on picorv32 quite easily | 17:36 |
ZipCPU | Oh, and did you get the news that I have dhrystone measures for mor1kx-generic? or that the UART now works on mor1kx-generic? (at least, it works here ...) | 20:31 |
SMDhome1 | ZipCPU have you tried your uart w/ dhrystone? | 14:06 |
---|
SMDeeepc | I still wonder how to get stdout of verilated fusesoc: need to obtain dhrystone results | 12:01 |
---|---|---|
ZipCPU | My immediate goal is simple to run dhrystone. | 13:56 |
ZipCPU | So, one of the things I'll be presenting is a slide showing LUT counts and Dhrystone measures versus capability. | 14:35 |
---|
ZipCPU | Do you know who did the dhrystone benchmark work? | 10:13 |
---|---|---|
jeremybennett | ZipCPU: Good luck with your Dhrystone work | 14:22 |
ZipCPU | I've spent my day working with the Dhrystone benchmark. I've noticed that many CPU's claim DMIPS/MHz as greater than one, some even as high as 1.5. | 20:13 |
---|---|---|
ZipCPU | For example, I found one reference suggesting that OpenRISC can achieve a 1.00 DMIPS/MIPS performance, or 250 Dhrystone MIPS/second when run with a 250MHz clock. | 20:14 |
ZipCPU|Laptop | kc5tja: ssvb: I've been using Dhrystone as a benchmark. It's not perfect, but it's drawbacks are all well known. | 14:05 |
---|---|---|
ssvb | ZipCPU: CoreMark claims to be better than Dhrystone according to its FAQ - http://www.eembc.org/coremark/faq.php | 14:22 |
ZipCPU|Laptop | I once broke Dhrystone. Ever since then, my CPU hasn't done as well ... :rofl: | 14:25 |
kc5tja | ZipCPU|Laptop: re: breaking Dhrystone -- hehehe. | 14:33 |
olof | And I know that we have some CPI measurements from dhrystone other tools somewhere. stekern or bandvig might know more | 09:39 |
---|---|---|
ZipCPU | olof: I wasn't going to go as far as dhrystone for a benchmark. I was specifically looking for the fastest instruction clock cycle. Dhrystone gets ... confusing. | 09:42 |
ZipCPU | Dhrystone depends upon cache's, branch execution delay, pipeline delays, compiler success ... it's a decent overall measure, | 09:43 |
SMDwrk | Let's say dhrystone. I use baremetal toolchain, dhrystone sources, compile it to elf and then run it on simulator | 13:26 |
---|
olof | or1k-elf-objcopy -O elf32-or1k --gap-fill 0 dhrystone_10.elf dhrystone_10.elf2 | 02:58 |
---|---|---|
SMDwrk | For that dhrystone binary and current pipeline implementation it gets into infinite loop after "int comp should be 17" | 03:58 |
olof | yep. dhrystone completed | 03:59 |
olof | SMDhome: Running dhrystone in icarus now. Can't see that it fetches any unitialized data from RAM | 17:08 |
---|---|---|
olof | SMDhome: I'm running fusesoc sim mor1kx-generic --elf-load=dhrystone_10.elf --timeout=500000 | 17:44 |
olof | And even though the simulator is aborted before dhrystone is finished, I see that it's starting at least, since it prints out stuff from the uart on the terminal | 17:45 |
SMDhome | olof, you get infinite loop, check insn trace diff. Dhrystone is finished and it hangs during result print | 23:36 |
SMDhome | Seems like dhrystone hangs on current mor1kx Oo | 11:18 |
---|---|---|
SMDhome | thanks, I'll try to debug dhrystone thing but it would be nice if someone could try to reproduce it | 13:51 |
SMDhome | olofk: thanks, it works now, even with dhrystone bin, but afaik iverilog sim is more precise | 23:19 |
stekern | it was the dhrystone test that showed a decrease (1.26 => 1.24) | 03:29 |
---|---|---|
stekern | speeding up the emptying of the store buffer onto the dbus helped a lot, it cranked up the dhrystone result to 1.44 | 06:33 |
stekern | so ~5% increase for coremark and ~16% increase for dhrystone, I'm pretty pleased with that | 07:55 |
stekern | all this is just according to dhrystone and coremark though | 15:51 |
---|
stekern | I like when fixing bugs also increase performance, about 0.3% in coremark and 1.3% in dhrystone | 03:44 |
---|
stekern | I get a 1.46->1.50 increase in dhrystone without it | 19:08 |
---|
stekern | let's see about dhrystone, 1.38 was the result before moving the branches | 03:56 |
---|
@stekern | what is that dhrystone vs coremark graph supposed to show? | 21:04 |
---|
juliusb | licensing, noted in the discussion page that the dhrystone and coremark on the board favoured mor1kx | 20:44 |
---|
stekern | oh, and I have got it to run the dhrystone test as fast as gcc-4.8, by tweaking the limit where memcpys vs inserted load/stores goes | 20:49 |
---|
stekern | cmov gives about a 3% increase in dhrystone | 09:47 |
---|
stekern | as far as benchmarks goes, I've ran dhrystone and coremark on or1200 and mor1kx comparing gcc4.5.1, gcc4.8.0 and llvm | 16:22 |
---|---|---|
stekern | mor1kx is faster in dhrystone than or1200 | 16:28 |
gxti | is there a more expedient way to get software to a ORPSoC on a digilent atlys than using impact? at the rate it's going now it'll be at least 30m, and this is just testing with dhrystone... if anything goes wrong, i have to start over | 01:13 |
---|
79 matches in 2795 log files with 136823 lines (2.5 seconds).
Generated by irclogsearch.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!