IRC logs for #openrisc Wednesday, 2016-09-14

--- Log opened Wed Sep 14 00:00:11 2016
-!- LoneTech_ is now known as LoneTech03:53
shorneolofk: agree, I think its better our tree not be so much different from upstream.  I'm looking into how others manage the patches06:53
olofkAnyone got a good idea for an FPGA board with GigE and a couple of user-accessible I/O that can handle ~100MHz LVDS07:14
olofkSmall. Credit-card size would be nice07:14
olofkHmm... come to think of it, a microZed might work, but I would like something that is easier to solder than the Samtec connectors07:15
olofkDE0-Nano-SoC might work07:18
shorneah, this one is a bit too big and not gitE https://www.xilinx.com/products/boards-and-kits/arty.html07:50
olofkYeah, I thought about Arty too, but need faster ethernet07:52
shorneI see some virtex 7 boards with gigE, looks really expensive07:53
shornejust searching xilinx site07:53
shornehttp://www.xilinx.com/products/boards-and-kits/1-3bwl52.html07:56
shornethis looks nice07:56
shornehttp://www.xilinx.com/products/boards-and-kits/1-hlf2sm.html - this lookst ok too, $9907:57
shorneactually snowleo looks really nice07:57
olofkWe got some poeple from Trenz coming to orconf actually07:59
shorneoh, nice08:00
ElIs there a tutorial or documentation to tell me how to create a .jic file from a .sof and a .elf file and program it onto a de0_nano?  I can load a bare-metal file onto my de0_nano using OpenODC/GDB (from https://github.com/embecosm/chiphack/wiki/OpenRISC-SoC-Practical-Session-Instructions) but can't find how to load this onto the EPCS64 to boot the OR1K and app on power-up.11:12
shornestekern: I have tested that 'minimum patch set of fixes'  https://github.com/stffrdhrn/linux/wiki/commit-batches11:15
shornethe one I listed here openrisc-fixes11:15
shorneBranch is here, https://github.com/stffrdhrn/linux/tree/openrisc-fixes-4.811:16
shorneI even created a signed tag, https://github.com/stffrdhrn/linux/releases/tag/or-tag-test11:19
shorneAnyway, It runs on de0 nano, so I guess its good :)11:19
shorneThe initramfs stuff I split out to another directory, need to put some docs and make it a project11:20
shornebut it seems to work fine not being in the kernel11:20
olofkOh, missed the guy asking about jic files11:54
olofkIf he comes back, direct him to me or to the mailing list11:54
olofkshorne: Cool. Good job11:55
olofkCan you build initramfs out-of-tree btw?11:55
stekernshorne: this: "pick 09fc079 openrisc: set OUTPUT_FORMAT to elf32-or1k"11:55
stekernis superseeded by openrisc: Support both old (or32) and new (or1k) toolchain11:55
olofkAlso, is it possible to use external dts files? I'd like to keep them close to the FuseSoC systems, and eventually be able to create them on the fly matching the hw configuration12:13
stekernat least if you're not building them into the kernel it's possible12:34
stekernbuilding into the kernel kind of defeats the "one kernel to rule them all"12:35
olofkYeah. True13:49
olofkBut I can't come up with any practical way to use them separately for or1k13:50
olofkWe could of course store the kernel in SPI Flash on a board, load the dtb (dtb is a compiled dts, right?) and set r313:51
olofkBut that is just more hassle than loading a custom kernel13:51
olofkAnd you still need to remember to enable all the kernel options you potentially want13:52
olofkIt would of course make more sense if we wanted to provide an image with kernel+rootfs that can be used for multiple boards13:55
olofkshorne: That would be a good end use-case for the stuff you've been talking about today13:55
olofkCan u-boot take a kernel and a device tree from an SPI flash and boot?14:04
ZipCPUolofk: stekern: stekern did some wonderful work optimizing the strcpy and strcmp library functions for aligned string accesses.  These optimized versions have not made it into the newlib library for or1k.  Is there any plans to import them?14:24
stekernolofk: yes14:31
ZipCPUstekern: Were your updated versions assembly optimized, or does there exist C code for them?14:34
stekernit wasn't me that did them, it was olofk14:38
ZipCPUSo ... I should pick on olofk then?  :D14:39
olofkZipCPU: I based it on a C algorithm that I found in some other arch that was originally from some guy at Intel I think14:49
olofkBut I hand-wrote them in asm to take advantage of delay slots and such14:50
ZipCPUThat's kind of what I thought.  I'm hoping to have a copy that works within newlib.  I suppose I could disassemble stekern's executables to reverse engineer them and rebuild them, or I might ask you kindly if you have any plans to update those libraries soon.  ;)  ?14:51
olofkZipCPU: Do you mean that stekern's binaries have optimized strcpy/strcmp routines?14:53
ZipCPUYes.14:53
olofkHmm.. are they built for bare-metal or Linux?14:54
ZipCPUnewlib, so probably bare-metal.14:54
olofkWell, then they must be using the generic newlib routines, I guess14:54
olofkNo optimized versions at all14:54
ZipCPUActually, I don't even know that it was newlib.  I'm just assuming bare-metal and newlib.14:54
olofkWell, that's safe to assume14:55
ZipCPUNo, they were definitely optimized.  I disassembled them and had a peek.14:55
olofkDo you have the binaries somewhere?14:55
olofkBecause stekern and I were (I think) just talking about the optimized memset I wrote for Linux14:55
olofkWhich shouldn't affect strcmp or strcpy, and definitely not anything in newlib14:56
ZipCPUYes, I still have the binaries.  They are his binaries of the dhrystone algorithm.  Executables that would run on de0 or atlys.14:57
olofkFound them in the OpenRISC documentation database14:58
olofkI looked at the one in dhry-de0.dis, but I don't have a clue if they look optimized :)15:01
ZipCPUHeheh ...15:01
ZipCPUdhry-de0.dis is very nicely optimized, or at least the two functions I looked at are--strcpy and strcmp15:02
ZipCPUBoth very nicely check for aligned accesses first, and then (if aligned) operate on 4-bytes at a time.15:02
ZipCPUThe speed up is ... very valuable.15:03
ZipCPULet's see ... I annotated the disassembly of the memcpy function.15:04
ZipCPUThere was also some loop unrolling taking place there too.15:05
SMDhomecoremark is still running on icarus sim. 70+ hours!15:09
ZipCPUolofk: Here's an annotated disassembly of the memcpy function: https://justpaste.it/yci015:11
_franck_I started to port this one to or1k assembly some times ago: http://git.musl-libc.org/cgit/musl/tree/src/string/memcpy.c15:13
_franck_never finished15:13
olofkZipCPU: Looking at the newlib code, they use optimized C algorithms15:28
olofkSo it's a generic newlib thing15:29
ZipCPUThen something's still not right.  'Cause with the newlib build I have, or1k got a very poor dhrystone score, and because15:29
ZipCPUthe disassemblies didn't match at all.15:30
olofkHa! But my hand-optimized memset for linux is half the size of the one newlib produces :)15:31
olofkZipCPU: That's strange15:31
olofkMaybe there is something with the newlib build options. In the source code they use #if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__)15:32
olofkSo it could be that your newlib is built to optimize for size15:32
olofkOtherwise I don't know15:34
olofkWhat does your disassembled functions look like?15:34
ZipCPUHmm ... let me go dig into how to adjust those optimization flags, and see if I can get a different number then.15:39
olofkZipCPU: Do you use -Osomething when you compile?15:45
ZipCPUI found the default "-g -O2", and just changed it to "-O3" ... still haven't found how to set the right preference flags.15:46
ZipCPUNo ... that wasn't the difference.16:21
olofk:/16:22
ZipCPUI'm getting almost identical scores to what I was getting before.16:22
ZipCPUCould it be that mor1kx-generic is somehow ... poorly optimized?16:22
ZipCPUI'm now using -O3 and I've verified that the strcmp, memcpy, and strcpy calls are all the optimized newlib versions16:26
ZipCPUI'm still getting scores less than half of what stekern has posted some time ago.16:28
olofkZipCPU: Could be16:34
olofkNot sure how cache size, store buffer, mul/div implementations are set up16:34
ZipCPUmul/div shouldn't have any effect.16:35
ZipCPUCache size and store buffer ... that I don't know.16:35
ZipCPU(Okay, mul/div will have some effect--shouldn't be this much ...)16:35
olofkStore buffer seems to be enabled by default16:40
olofkI and D caches are enabled16:41
olofkThere is only a serial divider, which is optimized by default16:43
olofks/optimized/enabled16:43
olofkNo idea then16:44
olofkOh, I missed that El guy again17:06
shorneZipCPU: there are 2 commit here for optimized routines. https://github.com/stffrdhrn/linux/wiki/commit-batches18:32
shornepick eb6b230 openrisc: Add optimized memcpy routine18:32
ZipCPUshorne: Thanks!18:32
shornepick a728fc8 openrisc: Add optimized memset18:33
shorneone from me one from olofk18:34
shorne_franck_: you can look at my memcpy routine, I also send it to the kernel list and got some response on it18:34
shorneolofk: for our of kernel dts, Ill have a look, it looks like for most archirectures the maintain it in the kernel though18:57
shornestekern: about or32, or1k compile output, thanks, I did seem to remember that, but didnt apply it, will fix18:58
--- Log closed Thu Sep 15 00:00:12 2016

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!