Past Echos the Future (Part II) - O350 / Chimera Stuff...
Notes on vintage SGI computers specific to Chimera Architecture machines including: Origin / Onyx 350 (O350), Onyx4, Tezro & Fuel
Status: 29 July 2021 - Split Chimera material from "Past Echos the Future (Part I).
In "playing" about with vintage SGI computers, I used my blog "Past Echos the Future (Part I) - Notes on SGI / IRIX Stuff" to record various software setup and hardware modifications and fixes.
This blog entry has become long and sprawling .... so I am splitting into two or more parts so it is easier to find my own notes.
As per title, this part if focused on Origin / Onyx 350 (O350), Tezro, Fuel Chimera architecture based machines. I will continue to update the original blog for more general and IRIX related material.
L3 Controller & L2 Emulator Linux Setup
The SGI Origin and Onyx 3000 / 300 series machines have L1 (level 1) controllers on each compute node. These can be connected (via USB) to single L2 (level 2) controller and L3 (Level 3) controllers can access L2 via network.
There is a hardware L3 controller (SGI console) or a linux L3 controller or you can telnet to the L2 controller for remote booting and configuration of a set of nodes.
The linux L3 software is available at a number of places on the web including including here.
The Linux controller require a very early Linux version (I used Fedora 1, Yarrow) with kernel version 2.4 for USB to work.
I had no success getting the USB to work with VMWare and you need the USB to run the L2 simulator as this expects to connect to machines via USB. As I could not get this running on VMWare I have installed it on old PC (must have USB port).
One day I will try with KVM / QEMU ... (see here for progress on this)
NOTE: All of L2 & host L1 RS-232 serial ports are set to: 38,400 - 8 bits, No Parity, 1 stop bits (38,400 8-N-1)
Recovering a bricked L2 Controller following IRIX 6.5.30 Update
As described earlier the SGI "big iron" machines have L1 & L2 controllers. There is a bug in IRIX 6.5.30 (L2 firmware version 1.44) which means if you do IRIX update and include L2 update then it will brick the L2 controller. Connecting to this you will see that it is failing due to not finding a file.
This set of information has been updated based on checking against working L2 Controller. I have also recently re-verified this via a "bricked" L2 from another SGI enthausiant. If you have problems with the instructions then please let me know.
Here is the way to recover the bricked L2:
- You need to connect to L2 via serial (console) port
2. Then get into L2 OS (which is a PowerPC BusyBox implementation), this is by using "shell" command or just "!"
3. Change directories into /tmp (which is writable) and create new mount point: "cd /tmp" & "mkdir TMPLIB"
4. Create a new in memory (temporary) file system and mount it on your new directory: "mount -t tmpfs -o size=800k tmpfs /tmp/TMPLIB"
5. Copy the required sub directory contents onto the temporary mounted file system: "cp /stand/sysco/lib/* /tmp/TMPLIB"
5. Assign (ifconfig) your L2 an IP address and optionally the default route if required (depending on whether your http server is on local subnet or not): "ifconfig eth0 XXX.XXX.XXX.XXX netmask 255.255.255.0 broadcast XXX.XXX.XXX.255" (assuming /24) & "route add default gw XXX.XXX.XXX.XXX eth0"
6. Get the missing file via a web server via http (or tftp server) (the missing file is: libscan.ppclinux.so which is here) and save it into your new temporary lib directory: "wget http://XXX.XXX.XXX.XXX/<LOC>/libscan.ppclinux.so" (note I am using IP addresses as I have not configured DNS in this case).
7. Create another in memory (temporary) files systems and mount it on top of existing "/stand/sysco/lib" directory: "mount -t tmpfs -o size 800k tmpfs /stand/sysco/lib" and copy all the TMPLIB files over to this version "cp /tmp/TMPLIB/* /stand/sysco/lib"
8. Remove the "TMPLIB" to free memory: "umount /tmp/TMPLIB" and exit shell
9. You should now be able proceed with fixing your L2 by doing an flashsc reflash from USB connected IRIX host using patch PATCH SG0007149 which has L2 firmware 1.48
NOTE 1: See man flashsc for flashing instructions
NOTE 2: My other posting on this, which outlines the basic steps
NOTE 3: There might be other more efficient ways to getting updated mount on "/stand/sysco/lib" using mount / remount options, if you know these then please provide feedback for updating.
NOTE 4: This "hack" was originally documented on nekochan by Pymble Software.
NOTE 5: L2 Console port is: 38400,8,N,1
Recovering an L2 Controller with an L3 Controller
Another potential failure with L2 is that the image gets corrupted. This happened to me when the L2 power supply failed part way through doing a recovery from 6.5.30 bricked L2 (as per above procedure).
In this case when you connect to the L2 console port it will boot and indicate that it is in recovery mode and that you need to connect it to an L3 controller via the Console port to and run the l2recovery program:
Validating L2 Controller Flash image....FAILED!
INVALID IMAGE HEADER CHECKSUM (C55DDEBE)
FATAL ERROR!!! Your L2 controller binary image is corrupted!!
You must perform the L2 Controller flash recovery sequence, which
allows you to download a new L2 controller image via the Console
serial connection on the L2 controller
The typical recovery sequence is:
1) Attach a serial (null modem) cable from the L2 Controller serial port
marked "Console", to the serial port on the L3 Controller.
2) Disconnect any terminal program that may be connected to the
serial port.
3) Execute the l2recover command on the L3 controller:
/usr/cpu/firmware/sysco/l2recover /usr/cpu/firmware/sysco/l2.bin
4) When the command completes, the L2 should reboot. If it does not
then power cycle the L2 to reboot it.
So what is an L3 controller ??
It can just be your Tezro, Fuel or O350 IRIX machine, a null modem cable and the flashsc program.
I did this via Fuel using Serial Port #2 (/dev/ttyd2) plugged into the L2 serial port: "cd /usr/cpu/firmware/sysco" & "flashsc -l2recover --dev /dev/ttyd2 l2.bin"
IRIX / L1 Sofware Versions
The following log captures the version of L1 software that is provided by the particular IRIX release (note that L2 versions follow L1 versions):
---
--- IRIX 6.5.21
---
# ./flashsc -v l1.bin
./flashsc: (System Controller Flash Utility) - Version 1.0.7
Multi-image binary contains 2 flash images.
Image 0: L1 version 1.22.2, Built 06/17/2003 10:58:26 [1MB image]
Image 1: L1 version 1.22.2, Built 06/17/2003 10:59:41 [2MB image]
---
--- IRIX 6.5.22
--
# ./flashsc -v l1.bin
./flashsc: (System Controller Flash Utility) - Version 1.0.7
Multi-image binary contains 3 flash images.
Image 0: L1 version 1.24.8, Built 09/15/2003 17:07:44 [Base 1MB image]
Image 1: L1 version 1.24.8, Built 09/15/2003 17:08:18 [Fuel/PE 1MB image]
Image 2: L1 version 1.24.8, Built 09/15/2003 17:08:38 [2MB image]
---
--- IRIX 6.5.25
---
# ./flashsc -v l1.bin
./flashsc: (System Controller Flash Utility) - Version 1.2.1
Multi-image binary contains 3 flash images.
Image 0: L1 version 1.30.6, Built 06/16/2004 14:54:58 [Base 1MB image]
Image 1: L1 version 1.30.6, Built 06/16/2004 14:56:19 [Fuel/PE 1MB image]
Image 2: L1 version 1.30.6, Built 06/16/2004 14:56:38 [2MB image]
---
--- IRIX 6.5.29
---
# ./flashsc -v l1.bin
./flashsc: (System Controller Flash Utility) - Version 1.3.8
Multi-image binary contains 5 flash images.
Image 0: L1 version 1.40.5, Built 12/05/2005 14:00:44 [Base 1MB image]
Image 1: L1 version 1.40.5, Built 12/05/2005 14:01:22 [Fuel/PE/O300 1MB image]
Image 2: L1 version 1.40.5, Built 12/05/2005 14:01:32 [MIPS 2MB image]
Image 3: L1 version 1.40.5, Built 12/05/2005 14:01:53 [2MB image]
Image 4: L1 version 1.40.5, Built 12/05/2005 14:03:27 [Linux L1 image]
---
--- IRIX 6.5.30 (Avoid ...)
---
# ./flashsc -v l1.bin
./flashsc: (System Controller Flash Utility) - Version 1.4.1
Multi-image binary contains 7 flash images.
Image 0: L1 version 1.44.0, Built 07/17/2006 18:19:54 [Base 1MB image]
Image 1: L1 version 1.44.0, Built 07/17/2006 18:20:38 [Fuel/PE/O300 1MB image]
Image 2: L1 version 1.44.0, Built 07/17/2006 18:20:50 [MIPS 2MB image]
Image 3: L1 version 1.44.0, Built 07/17/2006 18:21:13 [Legacy 2MB image]
Image 4: L1 version 1.44.0, Built 07/17/2006 18:23:16 [Legacy Linux L1 image]
Image 5: L1 version 1.44.0, Built 07/17/2006 18:21:46 [2MB image]
Image 6: L1 version 1.44.0, Built 07/17/2006 18:23:57 [Linux L1 image]
pink 13# l1cmd ver
L1 1.48.1 (Image A), Built 01/22/2007 11:34:20 [Fuel/PE/O300 1MB image]
---
--- PATCH SG0007149
---
# flashsc -v l1.bin
flashsc: (System Controller Flash Utility) - Version 1.4.1
Multi-image binary contains 7 flash images.
Image 0: L1 version 1.48.1, Built 01/22/2007 11:33:34 [Base 1MB image]
Image 1: L1 version 1.48.1, Built 01/22/2007 11:34:20 [Fuel/PE/O300 1MB image]
Image 2: L1 version 1.48.1, Built 01/22/2007 11:34:34 [MIPS 2MB image]
Image 3: L1 version 1.48.1, Built 01/22/2007 11:34:57 [Legacy 2MB image]
Image 4: L1 version 1.48.1, Built 01/22/2007 11:36:34 [Legacy Linux L1 image]
Image 5: L1 version 1.48.1, Built 01/22/2007 11:35:27 [2MB image]
Image 6: L1 version 1.48.1, Built 01/23/2007 10:17:58 [Linux L1 image
NOTE: IRIX 6.5.30 L2 is known to brick L2 controller and it has been reported (not confirmed) that L1 has issues with Fuel.
Causes of "TBL Refill Exception" Error
The "TBL Refill Exception" error results in system crash and leave machine in POD/CAC mode on console. The error looks like this:
A 000 001c01: *** TLB Refill Exception on node 0
A 000 001c01: *** EPC: 0xa80000000129c874 (0xa80000000129c874)
A 000 001c01: *** Press ENTER to continue.
The appear to be three causes for this alarming failure:
- Booting with old IRIX Release - some of SGI machines where introduced after initial and earlier IRIX 6.5 releases. You need to make sure you are using an IRIX release that includes support for your machine type. If you use IRIX 6.5.22 or higher then you should get all machine types
- Using newer O350 IP59_4CPU system board with older L1 version - to address this you will need to put in older system board (IP53_4CPU for example) and then once booted update the L1 SW version and then put in new system board (see below for more information)
- Trying to SW install on Fuel via systems console port - in this case you can get to PROM and then attempt to begin install but on running "inst" command the machine crashes. Remedy is to first go into PROM command prompt and set the console port environment variable to "d" (instead of "g" for "graphics"). You can then do install via console. Also be sure to not have keyboard/mouse plugged in, as PROM appear to take this as indication of "server" rather then "workstation". Once you have working graphics installed you will likely need to do fuller installation to get graphics systems running and remember to reset the NVRAM variable back to "console=g".
NOTE #1: See this thread on Irix Network for discussion on the Fuel console "TBL Refill Exception" case
NOTE #2: Here is nekochan thread on Fuel and 1.44.0 (6.5.30) L1 issue with Fuel
L1 & L2 Chimera & Dallas Chip Tips
All of SGI Origin 350, Onyx 350, Onyx 4, (Fuel ?) & Tezro use a variation of the "Chimera" systems board. The non graphics servers were code named as "Chimera Server" and the Onyx and Tezro graphics machines as "Chimera Blade"
This means that they all have L1 Controllers and can be managed via an L2 controller.
In fact you can flip the machines identity by changing its Serial Number and turn rackmount Tezro's into Origin's and Origins into rackmount Tezros.
You can also swap "Dallas" chips across from Origins / Tezro's into Numalink Routers to get around the security of Numalink Router Serials. This allows you to create new and consistent Serial No's across a set of "Chimera" hosts to build up a Origin/Onyx 350 multi-chassis Numalink'ed machine.
To do this you will need to be willing to test various L1 / L2 controller serial and configuration options, some of which might cause a problem with your machine. Most of these are recoverable, but the majority of information covering the various failure and fix scenarios were documented on "Nekochan".
Here are some clues to potential problems and fixes:
- "TLB Refill Exception" - I got this when I was running an older version of L1 software with a newer (IP59_4CPU - 4 x 1 GHZ) Chimera board. The resolution was to put in an older (IP53_4CPU - 4 x 700/800 MHZ) version of Chimera board and then update the L1 version, before putting back the newer board.
- Disabled CPU - another problem is machine boots but has disabled CPUs possibly due to the above problem which results in board being disabled and you cannot re-enable it via regular PROM moniter boot command "enableall". So in this case you need to boot the machine into POD/DEX/CAC (Power-On Diagnostic / Dirty EXclusive / CAChed) Mode by setting the Debug flags via L1. This allows you to by pass PROM boot and hence the disabled CPUs. Debug flag is: "debug 0x10d" . This opens up a whole new arcane world of tweeking. The required sequence to revive machine is to:
- Make sure you are directly connected to the required machines console serial port (38,400-8-N-1) (as it is not possible to do this via L2)
- Set the Debug Flag: "debug 0x10d" (see "More on L1 Debug Flags" below)
- Power Up: "power up"
- Enter Dex mode: "go dex"
- Enter CaC mode : "go cac"
- Clear the logs: "clearalllogs"
- Reinitalise logs: "initalllogs"
- flush the buffers - "flush"
- Now escape back to L1 (Ctl-T) and
- Reset debug to 0: "debug 0"
- Returning to console, do a reset: "reset".
Essentially what this is doing is clearing the fault log which resulted in the CPU being disabled.
See example POD/ DEX/ CAC L1 session further below.
I think the only reference to this is in this "Nekonomicon" trace ... but some of this is in the following SGI document "Hardware Quick-reference Booklet (Origin and Onyx2 Series) - HMQ-380-C" see page 174 for "POD Mode Commands".
For tips on flipping Numalink Router serials see pymblesoft.com blog. I used the chip swapping method outlined there by swapping out Numalink Dallas chip and replacing it with one from a Tezro...
See here for pictures of Dallas DS1742W-120 orientation in O350 (IP53) and Numalink Router, which should came in handy for those having to do this.
More on L1 Debug Flags ..
The L1 "debug" flags provide a way to control the machine boot process. The original flag values come from a combination of physical and virtual "Dip Switch" settings.
For Chimera machines these are all virtual and can be controlled via the L1 "debug" command. To help with getting valid debug flags (and avoid need to always go to doco and then convert values to hex) I created a little "Dip Switch Calculator" with MS Excel:
The values are documented via "man prom" and more completely in the internal Origin 2000 hardware quick reference guide (see below).
See below for an example of the difference in boot behaviour based on the "debug" settings.
Testing Numalink Serial Number Changing
As described above, it is possible to change a numalink router by putting in a alternate Dallas chip (from Tezro for instance) and through some tricky tweeking with L2 getting it to take on a new serial number.
It has also been observed that if you have a Dallas chip with a flat battery in your Numalink router then it will simply take on the serial number from the connected L2 on startup.
I have also done testing with Numalink Router, with various Dallas chips. The resulting behaviour of Numalink is differnt dependent on whether you put in a Dallas chip from: Fuel , Tezro, other Numalink or cleared / unitialised chips.
Behaviour variants include: whether the Numalink Router comes up with existing Rack / Slot configuration, serial secuity being enabled or it a takes on serial number from L2 controller.
Here is an example from using a chip from another machine (this could be from either Onyx4 or the original Numalink). The log shows that is auto-initalised and took on serial from L2:
?-192.168.XXX.XXX-L2>config
L2 192.168.XXX.XXX: - ---- (no rack ID set) (LOCAL)
L1 192.168.XXX.XXX:0:0 - ---r-- (no rack and slot ID set)
?-192.168.XXX.XXX-L2>serial
L2 system serial number: not set.
?-192.168.XXX.XXX-L2>192.168.XXX.XXX:0:0 brick rackslot 1 7
000r00:
brick rack set to 001 (takes effect on next L1 reboot/power cycle)
brick slot set to 07 (takes effect on next L1 reboot/power cycle)
?-192.168.XXX.XXX-L2>192.168.XXX.XXX:0:0 reboot_l1
?-192.168.XXX.XXX-L2>INFO: closed connection to 000r00
INFO: opened USB device at b1;p2/0;d6 (/dev/sgil1_0)
?-192.168.XXX.XXX-L2>config
L2 192.168.XXX.XXX: - ---- (no rack ID set) (LOCAL)
L1 192.168.XXX.XXX:0:0 - 001r07
?-192.168.XXX.XXX-L2>001r07
001r07 ATTN: FAN 0 warning limit reached @ 0 RPM.
001r07
001r07 ATTN: Environmental redundancy lost.
l1 log
001r07:
02/07/06 06:28:15 checksum Error - common header initialized
02/07/06 06:28:15 nvram checksum error - initializing core data.
02/07/06 06:28:15 nvram checksum error - initializing extended data.
02/07/06 06:28:15 nvram checksum error - log pointers invalid, log pointers reset
02/07/06 06:28:15 L1 booting 1.42.9
02/07/06 06:28:15 ** fixing invalid SSN value
02/07/06 06:28:15 ** fixing BSN mismatch
02/07/06 06:28:15 USB0: waiting on open
02/07/06 06:28:15 USB0: opened
02/07/06 06:28:15 USB0: registered for events
02/07/06 06:28:15 power up (PANEL)
02/07/06 06:28:15 FAN 0 warning limit reached @ 0 RPM.
02/07/06 06:28:15 Environmental redundancy lost.
02/07/06 06:28:15 L1 booting 1.42.9
02/07/06 06:28:15 USB0: waiting on open
02/07/06 06:28:15 USB0: opened
02/07/06 06:28:15 USB0: registered for events
02/07/06 06:28:15 FAN 0 warning limit reached @ 0 RPM.
02/07/06 06:28:15 Environmental redundancy lost.
?-192.168.XXX.XXX-L2>l1 serial
001r07:
BSN: NYY856 SSN: L0000000 Time: 02/07/2106 06:28:15 Security: OFF
?-192.168.XXX.XXX-L2>
It you look above at the log sequence: "checksum Error - common header initialized ...", you can see the Numalink L1 is reinitalising the Dallas chip. This is the exact same sequence observed on Fuel as well.
I have proved that it is possible to flip a Numalink router by just putting in a cleared Dallas and starting it up connected to L2. Here is log from putting in cleared Dallas:
10/30/20 23:18:15 L1 booting 1.42.9
10/30/20 23:18:16 USB0: waiting on open
10/30/20 23:18:17 USB0: opened
10/30/20 23:18:16 USB0: registered for events
?-192.168.XXX.XXX-L2>serial all
001r07:
Data Location Value
------------------------------ ------------ --------
Local System Serial Number NVRAM not set
Reference System Serial Number NVRAM
Local Brick Serial Number EEPROM NYY856
Reference Brick Serial Number NVRAM NYY856
EEPROM Product Name Serial Part Number Rev T/W
---------- -------------- ------------- -------------------- --- ------
POWER RPWR NYY856 030_1631_004 C 00
LOGIC ROUTER NXA065 030_1634_004 C 00
?-192.168.XXX.XXX-L2>l1 power up
001r07 ERROR: SerNum:Invalid System Serial Number format. See log for details.
?-192.168.XXX.XXX-L2>l1 log
001r07:
10/30/20 23:16:22 checksum Error - common header initialized
10/30/20 23:16:22 nvram checksum error - initializing core data.
10/30/20 23:16:22 nvram checksum error - initializing extended data.
10/30/20 23:16:23 nvram checksum error - log pointers invalid, log pointers reset
10/30/20 23:16:23 L1 booting 1.42.9
10/30/20 23:16:24 USB0: waiting on open
10/30/20 23:16:24 USB0: opened
10/30/20 23:16:25 USB0: registered for events
10/30/20 23:18:15 L1 booting 1.42.9
10/30/20 23:18:16 USB0: waiting on open
10/30/20 23:18:17 USB0: opened
10/30/20 23:18:16 USB0: registered for events
10/30/20 23:19:44 Invalid SSN format.
10/30/20 23:19:45 SSN:
10/30/20 23:19:45 Numeric portion (last 7 chars) must be 0000000 through 3999999
?-192.168.XXX.XXX-L2>l1 help serial
001r07:
serial
shows secure system serial numbering information only.
serial verify
test the brick's readiness for secure serial numbering.
serial all
show system and brick part/serial numbers.
serial all v|verbose
show system and brick part/serial numbers with EEPROM indexes
serial dimm
show dimm part/serial numbers.
serial dimm v|verbose
show dimm part/serial numbers with extended data and EEPROM indexes.
serial clear
clear the system serial number.
serial <str> <str> <str> <str>
erases and reassigns system serial number using temporary authenticator.serial security on
enables system serial number security.
?-192.168.XXX.XXX-L2>l1 serial clear
001r01:
INFO: command not supported on bricks that enforce security.
?-192.168.XXX.XXX-L2>l1 serial verify
001r01:
ERROR: SerNum:No assigned System Serial Number. See log for details
In this case it appears to have enabled Serial Security. So likely the 0nly way to get machine to initialise with new serial number is by plugging it into computer node via Numalink connection.
NOTE: For more testing, see my log on SGI NVRAM replacement, here, on using EEPROM Programmer to clear chip before putting it into SGI Chassis. This also shows how to disable the serial security on Numalink Router.
LSI SAS3442X-R in O350 & Fuel
Adding an LSI SAS3442X-R board in combination with SATA SSD into your O350 or Fuel is by far and away the cheapest way to get signficant disk performance boost. Fuel installation is a snap, if you buy a "new old stock" retail box which comes with board + cabling with Molex power. Here is a area where the Fuel's "cheap" PC architecture wins hards down.
Within O350 rack mount server things are a little more complicated as it does not have Molex power connectors dangling all around the place or slots for for disks inside, so you need to be a little more creative to get the power and SAS/SATA cabling sorted.
Here is the low down on these board. Firstly they come in various configurations and the naming reflects this:
- SAS3442X == Serial Attached SCSI, 3 Gbit/sec, 4 Internal , 4 External, 2 Connectors, PCI-X (vs PCE-e)
- SAS3080X == SAS, 3 Gbit/sec, 8 internal, PCI-X
- SAS3800X == SAS, 3 Gbit/sec, 8 external (via 2 x SFF-8470 connectors), PCI-X
- X == PCI-X and there is also a corresponding PCI-express variants
- -R == RAID, but cards can be flashed to HBA (Initator Target) mode.
The board come in a number of versions (ReportsAs | -PartNo | FWVer):
- 1068(A0) | -01A | A0 - avoid these as they do not have updated software available and they cannot be flashed for RAID or IT mode of operation
- 1068(B0) | -01B | B0 - these can be cross flashed (RAID/IT) and should work in SGI's (I have not tested or confirmed this version)
- 1068(B1) | -02C | B1 - these can be cross flashed (RAID/IT) and work well in SGIs
Once you get your board you should flash it to latest firmware / BIOS, using the HBA (IT) firmware rather than RAID firmware as SGI cannot be configured to use RAID.
To flash the board requires an MS-DOS machine with PCI-X slot. Neither the Linux or Windows version of the flashing tool allow you to erase the flash which is required to flip some of the cards from RAID to HBA (IT) mode.
The last firmware / BIOS version is:
- 01.33.00.00 - Firmware
- 06.36.00.00 - BIOS
This is avaiable via the BroadCom support site and is in the "P21 Package for Window & Dos". The package includes:
- 3442XRB0.fw - RAID (-R) for 1068(B0) series adaptors
- 3442XRB1.fw - RAID (-R) for 1068(B1) series adaptors
- 3442XTB0.fw - HBA (-IT) for 1068(B0) series adaptors
- 3442XTB1.fw - HBA (-IT) for 1068(B1) series adaptors
- mptsas.rom - BIOS (same for all versions)
NOTE: The only 1068(A0) series Firmare I was able to find is in HP Service Pack (SP45154) and after applying this the adaptor will report as SAS3080 (ie 8 Internal Ports).
To flash an "-R" board to lastest "-IT" firmware the steps are:
- Find the card - "sasflash -listall", you only need this if there are multiple boards in the same machine, in my example I do not use controller number option as I only had one board in the flashing machine
- Erase Flash - "sasflash -o -e 7" , note that this also erase the SAS device Address and so be sure to record this so you can reapply it later
- Flash the new Firmware & BIOS - "sasflash -o -f FMVERt.bX -b mptsas.rom"
- Put the SAS Address back = "sasflash -o -sasadd XXXXXXXXXX"
Card should now report new FW/BIOS version:
If you have flashed it to IT (HBA) mode then you should see that it reports as XX.XX.XX.XX-IT in the FW Revision, by going into bios configuration at boot:
Now take board out of MS-DOS flashing machine and put it in your Fuel/O350 and you will see it reports the LSI 1068 at boot:
Here is why you would bother with this...
---
--- 1. Here is diskperf of in built Ultra 160 SCSI boot disk
--- with a Seagate Cheetah U160 spinning disk
---
# diskperf -W -D -c4g -n "fuel/pink UW160 ST336706LW" testfile
#---------------------------------------------------------
# Disk Performance Test Results Generated By Diskperf V1.2
#
# Test name : fuel/pink UW160 ST336706LW
# Test date : Sat Oct 3 22:52:00 2020
# Test machine : IRIX64 pink 6.5 07202013 IP35
# Test type : XFS data subvolume
# Test path : testfile
# Request sizes : min=16384 max=4194304
# Parameters : direct=1 time=10 scale=1.000 delay=0.000
# XFS file size : 4294967296 bytes
#---------------------------------------------------------
# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
16384 2.60 52.58 2.88 2.89 2.32 2.45
32768 4.98 52.84 6.08 6.10 4.38 4.62
65536 9.14 52.85 13.70 13.72 8.03 8.32
131072 15.68 52.51 31.43 28.23 14.27 14.76
262144 24.39 51.92 36.51 36.63 22.76 22.36
524288 33.61 51.29 36.42 36.74 31.47 31.79
1048576 41.44 51.62 46.25 41.76 39.28 37.59
2097152 46.84 51.23 46.36 46.41 45.73 43.57
4194304 50.30 50.96 49.32 49.69 49.13 47.92
---
--- 2. And here is an Octane2 with an ACARD SAS/SATA adaptor
--- with a Samsung 850 EVO SATA SSD
---
# diskperf -W -D -c4g -n "octane2/porcipine scsi/sata/acard 850 EVO" testfile
#---------------------------------------------------------
# Disk Performance Test Results Generated By Diskperf V1.2
#
# Test name : octane2/porcipine scsi/sata/acard 850 EVO
# Test date : Sat Oct 3 22:32:06 2020
# Test machine : IRIX64 porcipine 6.5 07202013 IP30
# Test type : XFS data subvolume
# Test path : testfile
# Request sizes : min=16384 max=4194304
# Parameters : direct=1 time=10 scale=1.000 delay=0.000
# XFS file size : 4294967296 bytes
#---------------------------------------------------------
# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
16384 23.72 20.57 23.65 19.16 23.67 19.08
32768 29.38 24.93 29.36 23.86 29.34 23.73
65536 33.03 29.02 33.02 28.26 33.01 28.15
131072 35.43 30.43 35.40 29.61 35.43 29.49
262144 36.94 31.26 36.93 30.41 36.92 30.32
524288 37.67 31.53 37.66 30.66 37.67 30.56
1048576 37.99 31.43 37.98 30.81 37.98 30.79
2097152 38.18 31.00 38.14 30.55 38.18 30.67
4194304 38.27 30.04 38.28 29.83 38.28 29.89
---
--- 3. Now here is the LSI Logic SAS3442X-R with
--- Samsung 840 EVO SATA SSD
---
# diskperf -W -D -c4g -n "fuel/pink sas3442X-I 840 EVO" test/testfile
#---------------------------------------------------------
# Disk Performance Test Results Generated By Diskperf V1.2
#
# Test name : fuel/pink sas3442X-I 840 EVO
# Test date : Sat Oct 3 22:23:11 2020
# Test machine : IRIX64 pink 6.5 07202013 IP35
# Test type : XFS data subvolume
# Test path : test/testfile
# Request sizes : min=16384 max=4194304
# Parameters : direct=1 time=10 scale=1.000 delay=0.000
# XFS file size : 4294967296 bytes
#---------------------------------------------------------
# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
16384 96.89 105.67 94.87 63.93 94.41 63.93
32768 138.93 150.77 129.34 103.63 135.62 102.92
65536 180.30 194.71 136.40 149.68 59.84 149.72
131072 205.64 225.30 148.95 191.94 61.98 191.70
262144 224.77 245.83 132.71 224.31 57.87 224.69
524288 228.81 258.08 131.87 246.02 59.23 246.12
1048576 224.70 264.18 111.43 257.66 59.37 257.81
2097152 217.02 267.26 109.98 264.21 57.54 264.55
4194304 179.81 268.91 135.53 267.42 57.62 267.30
---
--- 4. For completeness here is the internal UW160 SCSI with
--- ACARD Samsung EVO SSD
---
% diskperf -W -D -c4g -n "fuel/pink UW160 ACARD SSD" testfile
#---------------------------------------------------------
# Disk Performance Test Results Generated By Diskperf V1.2
#
# Test name : fuel/pink UW160 ACARD SSD
# Test date : Wed May 19 20:31:02 2021
# Test machine : IRIX64 pink 6.5 07202013 IP35
# Test type : XFS data subvolume
# Test path : testfile
# Request sizes : min=16384 max=4194304
# Parameters : direct=1 time=10 scale=1.000 delay=0.000
# XFS file size : 4294967296 bytes
#---------------------------------------------------------
# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
16384 95.33 105.98 94.49 77.86 95.72 77.90
32768 137.80 151.31 137.15 119.21 137.45 117.97
65536 179.19 194.74 179.31 165.12 179.27 159.75
131072 210.06 225.61 210.19 204.37 210.19 198.17
262144 230.41 246.06 230.70 233.18 230.73 230.25
524288 243.69 258.75 243.31 251.31 242.97 250.27
1048576 250.07 265.13 249.99 261.13 250.03 260.76
2097152 253.42 268.48 253.62 266.65 253.55 266.26
4194304 254.82 270.33 255.29 269.36 255.19 269.29
So spinning disk Fuel report results are generally poorer than Octane2 with SSD even though Fuel has much faster disk sub-system (160 MB/sec for Fuel vs. 40 MB/sec of Octane). In fact Dual 600 Octane2 is pretty much saturating its 40 MB/sec SCSI bus with SSD.
But the Fuel with SAS3442X had the best single disk performance I have ever seen on MIPS SGI machine and this is very cheap to setup compared to ACARD SCSI to SATA Adaptors which are now selling for prices that are more than an entire Fuel ! Get to ebay now to improve your Fuel performance ;-) .
UPDATE: I have since done test of Fuel internal UW160 with ACARD ARS-2160 with Samsung EVO SSD and this provide even better results than the SAS3442X. As final point of comparison I need to see how IRIX SW RAID'ed (stripped) XVM volume performs with stripped drives acoss SAS drives.
NOTE #1: If you find that your sasflash version does not allow you to erase the ROM, then you should go back to an older version.
NOTE #2: Updated Fuel hinv
NOTE #3: LSI internal SAS connector is: SFF-8484 & external connector is: SFF-8470 (both SAS industry standard interfaces)
Some Testing with XVM and LSI SAS344X SAS
Given good performance results achieved with LSI SAS344X, I was curious to see what sort of performance could be achieved if you use IRIX's XVM (XFS Volume Manager) which allows you to setup Stripped volumes (RAID 0) for performance and Mirrored volumes (RAID 1) for data security.
My initial test where to set up mirror drives across 2 x SSD:
$ diskperf -W -D -c4g -n "fuel/pink LSISAS3 EVO 2xSSD Mirror" testfile
#---------------------------------------------------------
# Disk Performance Test Results Generated By Diskperf V1.2
#
# Test name : fuel/pink LSISAS3 EVO 2xSSD Mirror
# Test date : Tue Jun 8 10:59:49 2021
# Test machine : IRIX64 pink 6.5 07202013 IP35
# Test type : XFS data subvolume
# Test path : testfile
# Request sizes : min=16384 max=4194304
# Parameters : direct=1 time=10 scale=1.000 delay=0.000
# XFS file size : 4294967296 bytes
#---------------------------------------------------------
# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
16384 0.03 72.26 0.03 58.82 0.03 59.81
32768 0.06 111.45 0.06 91.18 0.07 87.57
65536 0.13 142.93 0.13 126.77 0.11 123.36
131072 0.25 166.89 0.24 157.57 0.26 151.88
262144 0.52 184.49 0.50 172.30 0.49 173.67
524288 1.00 189.69 1.09 186.79 1.07 191.92
1048576 1.36 195.16 1.75 196.48 1.64 199.43
2097152 3.24 203.30 3.09 219.54 3.20 195.22
4194304 6.08 200.12 5.67 197.19 5.86 200.26
Whooahh ... the impact of mirroring on performance is huge, to the point of unacceptable.
Putting SATA SSD into O350 with LSI SAS3442X
As shown above the LSI SAS3442X provides a way to put much faster modern SATA SSD into Fuel. For O350 Servers you need do a bit of cabling playing and you can get SATA SSD installed and accessbile via same front facing disk bay used for 3.5 inch disks.
While the disk are not hot swappable, this option does allow you to easily add 1 to 4 disk depending on whether you want to still include one of the hardrives in the bay. This option also allow you to keep your DVD-ROM attached.
So if you have only a single compute module you can have boot disk, 1 or 2 SATA SSDs and DVD-ROM.
To provide this I used:
- LSI SAS3442X-R SAS/SATA cable that come with card
- A single Molex -> 2 x power out extension cable
- A single Molex -> 2 x SATA power cable
Using the Molex extension cable was able to get power to SCSI backplane / DVD-ROM drive and have extension feed into the drive bay and the SATA power cables. The SATA lines are then just feed through the same way.
While 2 SATA lines is easy, for 4 lines you will have to sacrifice the SCSI disk, as there is not sufficient room for drives or cables.
Here are pictures of the way to do this:
Once you have extensions plugged together:
- run SAS lines (flat blue cabling in picture) and SAS power cables from one part of extension between fan gap and into drive bay
- then push Molex plug into same fan gap and bring out other Molex extension to plug into the existing SCSI backplane power and
- being careful to keep the existing DVD-ROM mini power cable out from the fan gap, so you can plug it back ino DVD-ROM:
The result is two SAS/SATA lanes being available in the existing drive bay:
O350 (Onyx 350 / Origin 350) Fans
The O350 machines have a number of fans which are reported on as part of the environmental monitoring. What fans are installed can different based on the chassis configuration.
The the main fan group are powered from the IP53 Interface board, which has 4 fan pinouts (on this port H2J5 has broken retention clip):
The fan plugin points and what they report as via L1 env are:
H2H6 (EXHST 1) - Four pin for back facing right hand (looking from back) exhaust fan (always present)
H2J5 (EXHST 2) - Four pin for back facing left hand (looking from back) exhaust fan (optional , not present if there is V12 or DM3 installed)
H9J4 (PS) - Three pin for front facing power module fan, power and monitoring cable runs along inner side of chassis between narrow gap between edge and power modules enclousure (always present)
H9H6 (ODY) - Three pin for optional V12 / DM3 carrier fan (see picture below), This fan is used instead H2J2 exhaust fan when V12 / DM3 is installed.
In addition to the main chassis fans the O350 also has a fan within the PCI / Disk chassis section, This is reported as "PCI 1" & "PCI 2", as while it is a single unit it contains two seperate fans, each with its own cable, which are connected to riser board:
And finally the IP59_4CPU 4 x 1GHz Processor Module includes an additional three fans (NODE ZERO - N0 LEFT / N0 CNTR / N0 RIGHT) that are part of the board and need to be removed to get access to installation screws:
Here are the different L1 "env" reports based on alternate chassis configurations:
M200XXXX-001-L2>env
001c01: <<===== O350 with V12
Environmental monitoring is enabled and running.
...
...
Description State Warning RPM Current RPM
--------------- ---------- ----------- -----------
FAN 0 EXHST 1 Enabled 1980 2393
FAN 1 PS Enabled 3200 4821
FAN 2 PCI 1 Enabled 1980 2428
FAN 3 PCI 2 Enabled 1980 2616
FAN 4 ODY Enabled 1679 1985
...
...
001c02: <<===== O350 with DM3
Environmental monitoring is enabled and running.
...
...
Description State Warning RPM Current RPM
--------------- ---------- ----------- -----------
FAN 0 EXHST 1 Enabled 1980 2280
FAN 1 PS Enabled 3200 4963
FAN 2 PCI 1 Enabled 1980 2343
FAN 3 PCI 2 Enabled 1980 2556
FAN 4 ODY Enabled 1679 1814
...
...
001c03: <<===== O350 with IP59_4CPU 4x1GHZ Module
Environmental monitoring is enabled and running.
...
...
Description State Warning RPM Current RPM
--------------- ---------- ----------- -----------
FAN 0 EXHST 1 Enabled 1980 2616
FAN 1 EXHST 2 Enabled 1980 2616
FAN 2 PS Enabled 3200 4066
FAN 3 PCI 1 Enabled 1980 2556
FAN 4 PCI 2 Enabled 1980 2812
FAN 5 N0 LEFT Enabled 1980 4054
FAN 6 N0 CNTR Enabled 1980 3846
FAN 7 N0 RIGHT Enabled 1980 4166
...
...
001r06: <<===== Not an O350 this is Numalink Router
Environmental monitoring is enabled and running.
...
...
Description State Warning RPM Current RPM
--------------- ---------- ----------- -----------
FAN 0 LEFT Enabled 2160 5443
FAN 1 RIGHT Enabled 2160 5532
...
...
NOTE: "FAN #" reported vary depending on the configuration, but the ID is consistent.
Upgrading Fuel CPU PIMM requires IRIX PROM flash
To my knowledge all SGI machines support some level of CPU upgrade/swap. This is typically just an exercise in doing a physical swap of the CPU module and then reboot. The system will then do hardware based detection a pickup the new CPU speed.
The Fuel is an exception to this, as if you put in a new physical CPU then it will not automatically boot up at faster speed. Rather it relies on SW based setting that are managed via PROM "flash" program to set the revised CPU parameters.
The revised parameter are set using "flash -f" option:
# man flash
flash(1M) flash(1M)
NAME
flash - reprogram the flash PROM hardware on Origin and OCTANE machines
SYNOPSIS
flash [ -a ] [ -c ] [ -d ] [ -D ] [ -f ] [ -F ] [ -i ] [ -m module_id ]
[ -n ] [ -o ] [ -p dir_name ] [ -P img_name ] [ -s slot_name ] [ -S ]
[ -v ] [ -V ]
The SGI Origin 3000 server series.
flash [ -a ] [ -d ] [ -D ] [ -f ] [ -F ] [ -b brick_id ]
[ -o ] [ -p dir_name ] [ -P img_name ] [ -S ]
[ -v ] [ -V ]
flash -L
DESCRIPTION
flash allows a user to manage the flash PROMs on the IO and CPU boards of
Origin systems, the base system board on OCTANE systems and CPU boards on
the SGI Origin 3000 server series. Without options, the command flashes
all appropriate boards on the machine with the PROM images found in
/usr/cpu/firmware. Normally, flash is executed automatically during the
installation of a new release of IRIX. A customer should rarely need to
use it directly. You must have superuser privilege to use this command.
...
...
...
-f Specify different (than currently in PROM) configurations
values to be used when the new images are flashed. These
values include the speed of the CPU, hub, and size of the
cache. This option should be used with great care as cause
the machine to freeze and be rendered unusable if incorrect
values are given.
-F Similar to -f except more detailed information is required
and no checking is done in the input values. This is more
dangerous the -f option and the same cautions apply.
...
...
...
-o Override the version checking and flash the PROM even if it
is not newer than what is currently on the PROM.
Here is an example session:
---
--- Test 1
---
# flash -f
No proms need flashing
# flash -f -v
setting default path_name to /usr/cpu/firmware
No proms need flashing
# flash -f -V
Prom version 6.211
No proms need flashing
# flash -F -V
Prom version 6.211
No proms need flashing
---
--- As I have not got actual update physical CPU install
--- need to do manual override with -o flag
---
--- Test 2
---
# flash -o -f -V
Enter CPU frequency (MHZ): [400] 600
Enter Hub frequency (MHZ): [200] 200
Enter cache size (in MBs): [4] 4
Enter machine type (0)SN1 (1)SN10 (2)SN11 (3)SN12:
If flash is killed in the middle of execution, the machine
will freeze after it is reset. continuing...
Invalid input... Try again.
Enter machine type (0)SN1 (1)SN10 (2)SN11 (3)SN12: SN11
Invalid input... Try again.
Enter machine type (0)SN1 (1)SN10 (2)SN11 (3)SN12: 2
Info for prom at /hw/module/001c01/node/prom
Prom version 6.211
#
---
--- NOTE: you can get the required system information via hinv
--- for CPU speed, BUS speed and Cache size
--- For SNx machine type, this is reported at boot.
--- I do not know if the SNx varies across Fuel machine versions
--- so you should check specifically for your machine
---
# cd /var/adm
# grep SN SYSLOG
Oct 4 02:33:10 6A:pink unix: Selecting SN11 <<== My Fuel
Oct 4 03:38:37 6A:pink unix: Selecting SN11
Oct 4 03:43:25 6A:pink unix: Selecting SN11
...
Feb 5 16:40:45 6A:pink unix: Selecting SN11
Feb 6 11:35:36 6A:pink unix: Selecting SN11
So how do you know that the appropriate set of CPU & BUS speed and Cache size are?
This can be found by looking at existing machine hinv -mv results. Here is set of all of 500 to 900 MHz Fuels (collected from historical Nekochan hinv threads and my own Fuel):
--- 500 (SGI PN 030-1708-002 / 030-1708-003)
CPU 0 at Module 001c01/Slot 0/Slice A: 500 Mhz MIPS R14000 Processor Chip (enabled)
Processor revision: 2.3. Scache: Size 2 MB Speed 250 Mhz Tap 0xa
HUB in Module 001c01/Slot 0: Revision 2 Speed 200.00 Mhz (enabled)
--- 600 (SGI PN 030-1836-001 / 030-1730-002 / 030-1730-001)
CPU 0 at Module 001c01/Slot 0/Slice A: 600 Mhz MIPS R14000 Processor Chip (enabled)
Processor revision: 2.4. Scache: Size 4 MB Speed 300 Mhz Tap 0xa
HUB in Module 001c01/Slot 0: Revision 2 Speed 200.00 Mhz (enabled)
--- 700 (SGI PN 030-1891-001)
CPU 0 at Module 001c01/Slot 0/Slice A: 700 Mhz MIPS R16000 Processor Chip (enabled)
Processor revision: 2.2. Scache: Size 4 MB Speed 350 Mhz Tap 0xc
HUB in Module 001c01/Slot 0: Revision 2 Speed 200.00 Mhz (enabled)
--- 800 (SGI PN 030-2024-001 / 030-1932-001)
CPU 0 at Module 001c01/Slot 0/Slice A: 800 Mhz MIPS R16000 Processor Chip (enabled)
Processor revision: 2.2. Scache: Size 4 MB Speed 400 Mhz Tap 0xa
HUB in Module 001c01/Slot 0: Revision 2 Speed 200.00 Mhz (enabled)
--- 900 (SGI PN 030-2023-001)
CPU 0 at Module 001c01/Slot 0/Slice A: 900 Mhz MIPS R16000 Processor Chip (enabled)
Processor revision: 3.0. Scache: Size 8 MB Speed 450 Mhz Tap 0xb
HUB in Module 001c01/Slot 0: Revision 2 Speed 200.00 Mhz (enabled)
As per above to get the SNx info you should do a grep on on SYSLOG at "/var/adm/SYSLOG" looking for "SN" and it will print out what this is for your Fuel.
Here is the log of a real update of my machine. First swap out slower CPU module and put in faster one. Then run flash command. I swapped a 600 MHz PIMM for 800 MHz PIMM, which I am happy to report worked as it should. As pre-caution I first did hinv -mv on machine and checked to PIMM module against the table above to verify that the PIMM was indeed an 800 MHz one:
# flash -o -f -v
setting default path_name to /usr/cpu/firmware
Enter CPU frequency (MHZ): [400] 800
Enter Hub frequency (MHZ): [200] 200
Enter cache size (in MBs): [4] 4
Enter machine type (0)SN1 (1)SN10 (2)SN11 (3)SN12: 2
m001c01: freq cpu 800000000 freq hub 200000000 mode 549baf85
m001c01: Flashed this prom 14 times
m001c01: Flashing prom data in file /usr/cpu/firmware/ip35prom.img
m001c01: to device /hw/module/001c01/node/prom
m001c01: > Manufacturer code: 0x00
m001c01: > Device code : 0x00
m001c01: > Erasing code sectors (30 to 40 seconds)
m001c01: > Erasure complete and verified
m001c01: PROM Header contains:
m001c01: Magic: 0x4a464b535743534d
m001c01: Version: 6.211
m001c01: Length: 0x169648
m001c01: Segments: 1
m001c01: Segment 0:
m001c01: Name: ip35prom
m001c01: Flags: 0x10
m001c01: Offset: 0x1000
m001c01: Entry: 0xc00000001fc00000
m001c01: Ld Addr: 0xc00000001fc00000
m001c01: True Length: 0x168648
m001c01: True sum: 0x84ae700
m001c01: > Programming Bedrock PROM
m001c01: > Writing 1476168 bytes of data ...
m001c01: > 0/168648 ........
m001c01: > 8000/168648 ........
m001c01: > 10000/168648 ........
m001c01: > 18000/168648 ........
m001c01: > 20000/168648 ........
m001c01: > 28000/168648 ........
m001c01: > 30000/168648 ........
m001c01: > 38000/168648 ........
m001c01: > 40000/168648 ........
m001c01: > 48000/168648 ........
m001c01: > 50000/168648 ........
m001c01: > 58000/168648 ........
m001c01: > 60000/168648 ........
m001c01: > 68000/168648 ........
m001c01: > 70000/168648 ........
m001c01: > 78000/168648 ........
m001c01: > 80000/168648 ........
m001c01: > 88000/168648 ........
m001c01: > 90000/168648 ........
m001c01: > 98000/168648 ........
m001c01: > a0000/168648 ........
m001c01: > a8000/168648 ........
m001c01: > b0000/168648 ........
m001c01: > b8000/168648 ........
m001c01: > c0000/168648 ........
m001c01: > c8000/168648 ........
m001c01: > d0000/168648 ........
m001c01: > d8000/168648 ........
m001c01: > e0000/168648 ........
m001c01: > e8000/168648 ........
m001c01: > f0000/168648 ........
m001c01: > f8000/168648 ........
m001c01: > 100000/168648 ........
m001c01: > 108000/168648 ........
m001c01: > 110000/168648 ........
m001c01: > 118000/168648 ........
m001c01: > 120000/168648 ........
m001c01: > 128000/168648 ........
m001c01: > 130000/168648 ........
m001c01: > 138000/168648 ........
m001c01: > 140000/168648 ........
m001c01: > 148000/168648 ........
m001c01: > 150000/168648 ........
m001c01: > 158000/168648 ........
m001c01: > 160000/168648 ........
m001c01: > 168000/168648 .
m001c01: > 168648/168648
m001c01: > Programmed and verified
Prom version 6.211
m001c01: Verifying:
m001c01: cpu speed 800000000
m001c01: hub speed 200000000
m001c01: other configuration information also verified
m001c01: Compare of file data to in core prom data succeeded
# reboot
Finally what happens if you need to downgrade CPU speed..
In this case you will need to run "flash -f -o" using the initial (faster) CPU and set PROM parameters to speed for target slower CPU. Once this is done you can shutdown your machine and swap CPU PIMM modules.
If swapping from a faster to slower CPU PIMM without first doing PROM update, then you Fuel will not boot. The only known remedy at this point is to put in right speed (or faster) CPU and do the "flash -f -o" update.
NOTE #1: Fuel CPU replacement discussion thread on "Irix Network". This includes discussion of possible way to change CPU speed parameters via POD/CAC mode, though no one appears to know the exact set of commands. What we do know is that the PROM is held in a flash chip on the Fuel & O350 board and this is not the same as the DALLAS NVRAM component. See picture etc below for "search for PROM chip..."
NOTE #2: Fuel SGI Part Numbers taken from "SGI Depot" Fuel Parts Page
Recovering Damaged Fuel PIMM
The above tip show how you flash the Fuel PROM to raise / lower the CPU speed based on the installed CPU. The Fuel is the only Chimera based machine that needs this and it appear to be that for the O350 (Origin, Onyx & Tezro) series machines the "Bedrock" ASIC is on the same board as the CPUs. In the case of the Fuel the "Bedrock" ASIC is on the system board and the CPU board is just that a CPU only module/
In my case the PIMM connector array got damaged in shipping and many of PINs were squashed:
To fix this I used a pair of very fine tweezers to pull out and straighten the pins. In doing this you need lift up the squashed PIN and then make sure it straightened up so it is vertical.
Looking at the pin array line up you make sure that all the pins are nicely aligned and chance of repair is good.
Good luck to others who need to do this repair and please make sure you protect your PIMM with the right packaging before shipping.
References & Links:
- Irix Network - working to take up hole left with demise of Nekchan, populated by many passionate and knowledgeable SGI users
- irix7.com - keep an archive of lots and lots of original SGI technical documents
- SGI Depot - keeps an archive of various sgi related materials and provides parts. run by Ian Mapleson one of the original sgi/irix community members and all round helpful person
- techpub.jurassic.nl - another SGI TechPubs archive, link and thanks for keeping this high quality document (via HTML and PDF), while irix7 above is PDFs
- "Hardware Quick-reference Booklet (Origin and Onyx2 Series) - HMQ-380-C" - this document is for older Orgin / Onyx2 , but the POD command document (see page 174 for "POD Mode Commands") is still useful for Origin 350 Chimera based systems. Theses physical / virtual dip switch setting seem to align with what is documented via "man prom".
- HP Server for Flashing - more details on setup to help with flashing disks & Adapters.
- Dallas DS1742W Hacking - my testing on replacement and intialisation of Dallas DS1742W chips
- SGI Fuel L1 Serial Comms - HPE has this old SGI bullitin which clearly states that L1 comms via internal serial port if 38400 Baud and external Serial Port #1 is 9600 Baud.
- Onyx 350 - Racking and Stacking, my rather long blog post on moving all my O350 kit into new SGI "Hour Glass" rack.
- What is inside that box? - SGI hinv - my hinv reports of mostly Chimera machines
- Chimera "Dip Switch Calculator" - use at your own peril ... as this is only as good as the sketchy documentation (see link above), and also via "man prom". See below for examples of the difference in boot behavior on a Fuel based on the "debug" flag setting.
Sample POD/DEX/CAC Session via Console Serial Port with O350
What is this POD/DEX/CAC stuff ?
- POD - Power-On Diagnostic
- DEX - Dirty EXclusive
- CAC - CAChed
To find info see the following SGI document "Hardware Quick-reference Booklet (Origin and Onyx2 Series) - HMQ-380-C" see page 174 for "POD Mode Commands" and the POD/CAC online help. Here is a sample session on Chimera board via L1 Console Serial Port:
01c01-L1>help
Commands are:
check fru promver|promversionnode
reset|rst prom try pic
make pwm syscom error
pci * autopower|apwr syscom|junkbus|jb|bedr
partdb cpu nia|ni|ctc nib
iia|ii|cti iib iic iid
config|cfg debug display|dsp button|btn
env fan help|hlp history|hist
l1dbg link log ioport|ioprt
istat l1 leds margin|mgn
network pimm port|prt power|pwr
reset|rst nmi softreset|softrst select|sel
serial sysstate eeprom uart
usb router|rtr service date
nvram security flash reboot_l1
version|ver pbay test|tst scan
fru|pci|prom|node
enter 'hlp <cmd>' for more help on a single command.
001c01-L1>cpu
CPU Present Enabled
--- ------- -------
0A 1 1
0B 1 1
0C 1 1
0D 1 1
001c01-L1>env
Environmental monitoring is enabled and running.
Description State Warning Limits Fault Limits Current
-------------- ---------- ----------------- ----------------- -------
1.8V Wait Pwr 10% 1.62/ 1.98 20% 1.44/ 2.16 0.000
12V Wait Pwr 10% 10.80/ 13.20 20% 9.60/ 14.40 0.125
12V #2 Wait Pwr 10% 10.80/ 13.20 20% 9.60/ 14.40 0.125
3.3V Wait Pwr 10% 2.97/ 3.63 20% 2.64/ 3.96 0.069
12V IO Wait Pwr 10% 10.80/ 13.20 20% 9.60/ 14.40 0.125
5V AUX Wait Pwr 10% 4.50/ 5.50 20% 4.00/ 6.00 5.096
3.3V AUX Wait Pwr 10% 2.97/ 3.63 20% 2.64/ 3.96 3.302
PCI 5V AUX Wait Pwr 10% 4.50/ 5.50 20% 4.00/ 6.00 5.070
PCI 3.3V Wait Pwr 10% 2.97/ 3.63 20% 2.64/ 3.96 0.069
PCI 2.5V Wait Pwr 10% 2.25/ 2.75 20% 2.00/ 3.00 0.000
PCI 5V Wait Pwr 10% 4.50/ 5.50 20% 4.00/ 6.00 0.000
XIO 12V BIAS Wait Pwr 10% 10.80/ 13.20 20% 9.60/ 14.40 0.125
XIO 5V Wait Pwr 10% 4.50/ 5.50 20% 4.00/ 6.00 0.000
XIO 2.5V Wait Pwr 10% 2.25/ 2.75 20% 2.00/ 3.00 0.000
XIO 3.3V AUX Wait Pwr 10% 2.97/ 3.63 20% 2.64/ 3.96 3.302
IP53 3.3V AUX Wait Pwr 10% 2.97/ 3.63 20% 2.64/ 3.96 3.302
IP53 5V AUX Wait Pwr 10% 4.50/ 5.50 20% 4.00/ 6.00 5.070
IP53 12V Wait Pwr 10% 10.80/ 13.20 20% 9.60/ 14.40 0.125
IP53 VCPU Wait Pwr 10% 1.13/ 1.38 20% 1.00/ 1.50 0.000
IP53 SRAM Wait Pwr 10% 2.25/ 2.75 20% 2.00/ 3.00 0.000
IP53 1.5V Wait Pwr 10% 1.35/ 1.65 20% 1.20/ 1.80 0.000
Description State Warning RPM Current RPM
--------------- ---------- ----------- -----------
FAN 0 EXHST 1 Wait Pwr 1980 0
FAN 1 PS Wait Pwr 3200 0
FAN 2 PCI 1 Wait Pwr 1980 0
FAN 3 PCI 2 Wait Pwr 1980 0
FAN 4 ODY Wait Pwr 1679 0
Advisory Critical Fault Current
Description State Temp Temp Temp Temp
----------------- ---------- --------- --------- --------- ---------
0 INTERFACE 0 Wait Pwr [Autofan Control] 75C/167F 18C/ 64F
1 INTERFACE 1 Wait Pwr [Autofan Control] 75C/167F 19C/ 66F
2 INTERFACE 2 Wait Pwr [Autofan Control] 75C/167F 17C/ 62F
3 PCI RISER Wait Pwr [Autofan Control] 75C/167F 17C/ 62F
4 ODYSSEY Wait Pwr [Autofan Control] 75C/167F 17C/ 62F
5 NODE Wait Pwr [Autofan Control] 75C/167F 17C/ 62F
6 BEDROCK Wait Pwr Not currently available
Zone Temp Target Current Zone Fan Curr/Min
Zone Name State Sensors Average Average Index Fan %
--------- -------- ------------ -------- -------- --------- ---------
NODE Wait Pwr 0,1,2,5,6 47C/116F 17C/ 62F 0 18%/ 18%
PS Wait Pwr 0,1,2,5,6 47C/116F 17C/ 62F 1 55%/ 55%
PCI Wait Pwr 3 45C/113F 17C/ 62F 2,3 55%/ 55%
ODY Wait Pwr 4 48C/118F 17C/ 62F 4 55%/ 55%
...
... Set the debug flags and boot up to POD mode...
...
001c01-L1>debug 0x10d
debug switches set to 0x010d
001c01-L1>power up
001c01-L1>
entering console mode 001c01 CPU0, <CTRL_T> to escape to L1
Starting PROM Boot process
hubii_link_good: 8-brick attached to module 001c01.
HUB at 0x0 attached as widget 0xb
001c01/0xb/xbow_arb: nasid= 0x0 xbow_base= 0x9200000000000000
001c01/0xb/xbow_arb: 622 master is 0xb
Check_master: link 11 is master
hubii_link_good: 8-brick attached to module 001c01.
Check_master: link 11 is master
IP35 PROM SGI Version 6.210 built 02:33:51 PM Aug 26, 2004
built for bedrock rev. 1.1 or greater
SN12 Graphics Blade.
Local master CPU A revision: f42
Local slave CPU B revision: f42
Local slave CPU D revision: f42
Local slave CPU C revision: f42
PROM length: 0x1686a8, BSS length: 0xa7a0, flash count: 2
Configured bedrock clock: 200.0 MHz
Status of local IO: 0x1 0x3fc03ff6403
Bedrock Rev: 2, Module: 1 (001c01) from Sys Ctlr
On PROM entry: ERR_EPC=0xc00000001fc02cb0 (0xc00000001fc02cb0)
Configuring memory
Local memory configured: 8192 MB (premium)
*** Warning: System controller debug switches are non-zero (0x10d)
*** Diag level set to None (2)
*** Info level set to verbose
*** Boot stop requested at Global (2)
before reading NICHub NIC: 0x62c0e690
SR1 set to 0x6000081690349000
SR0 set to 0x0000000062c0e690
Testing/Initializing memory ............... DONE
Copying PROM code to memory ............... Copy PROM (0x90000000188
Done
DONE
Skipping secondary cache diags
Skipping secondary cache diags
Skipping secondary cache diags
Skipping secondary cache diags
CPU B switching stack into UALIAS and invalidating D-cache
CPU A switching stack into UALIAS and invalidating D-cache
CPU C switching stack into UALIAS and invalidating D-cache
CPU D switching stack into UALIAS and invalidating D-cache
CPU B switching into node 0 cached RAM
CPU C switching into node 0 cached RAM
CPU B running cached
CPU C running cached
CPU A switching into node 0 cached RAM
CPU D switching into node 0 cached RAM
CPU A running cached
CPU D running cached
Initializing kldir.
Done initializing kldir.
Initializing klconfig.
init_klcfg: nasid 0 start 9600000000030000 size 10000
Done initializing klconfig.
Discovering local IO ...................... Check_master: link 11 ir
Check_master: link 11 is master
DONE
CPU A initialized subnode
CPU C initialized subnode
Discovering NUMAlink connectivity .........
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 5893 usec
Waiting for peers to complete discovery.... Discovery results:
ENTRY 0: HUB(62c0e690)
NASID=-1 Mod=1 Flg=0x9500000 PROM=6.210 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
DONE
No other nodes present; becoming global master
Global master is entry 0, NIC 0x62c0e690, /hw/rack/001/bay/01
Global master is /hw/rack/001/bay/01
Global barrier (line 4315)Global barrier passed.
Global barrier (line 4348)Global barrier passed.
Master System Topology Graph (pre-nasid_assign):
Local Slave : Waiting for my NASID ...
ENTRY 0: HUB(62c0e690)
NASID=-1 Mod=1 Flg=0x9500000 PROM=6.210 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Local Slave : Waiting for my NASID ...
Local Slave : Waiting for my NASID ...
Port 1 connection: Not connected
Port status: NF
Calculating NASIDs
num_routers is 0
Master System Topology Graph:
ENTRY 0: HUB(62c0e690)
NASID=0 Mod=1 Flg=0x9500000 PROM=6.210 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
Distributing routing tables
Distributing NASIDs
*** NASID assigned to 0
CPU B switching to UALIAS
CPU D switching to UALIAS
CPU C switching to UALIAS
CPU A switching to UALIAS
CPU D running in UALIAS
CPU A running in UALIAS
CPU B running in UALIAS
CPU C running in UALIAS
CPU D Flushing and invalidating caches
CPU C Flushing and invalidating caches
CPU B Flushing and invalidating caches
Changing node ID to 0
Global barrier (line 4823)Global barrier passed.
CPU A Flushing and invalidating caches
Global barrier (line 4928)Global barrier passed.
CPU B switching to node 0 cached RAM
CPU D switching to node 0 cached RAM
CPU B running cached
CPU D running cached
CPU A switching to node 0 cached RAM
CPU C switching to node 0 cached RAM
CPU A running cached
CPU C running cached
Nasids in partition: +0
Regions in partition: +0
Intializing any CPUless nodes.............. Global barrier (line Gl.
Global barrier (line 7715)Global barrier passed.
DONE
Global barrier (line 5089)Global barrier passed.
hubii_link_good: 8-brick attached to module 001c01.
Checking partitioning information ......... DONE
No other nodes present; becoming partition master
*** After partitioning ***
ENTRY 0: HUB(62c0e690)
NASID=0 Mod=1 Flg=0x9500000 PROM=6.210 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: FE
Erecting partition fences ................ DONE
Update config for routers connected to hubs
Update config for hubs and hubless routers
CPU B flushing cache
CPU D flushing cache
CPU A flushing cache
CPU C flushing cache
check_router_cfg: nasid 0 is_voyager 0 check_cfg = 0
Global barrier (line 5300)Global barrier passed.
Nasids in partition: +0
Regions in partition: Local slave entering slave loop
Local slave entering slave loop
Local slave entering slave loop
+0
A 000 001c01:
A 000 001c01: *** Entering POD mode on node 0
...
... Get POD Command Help
...
A 000 001c01: POD SysCt Cac> ?
A 000 001c01: Commands may be separated by semicolons, grouped with
A 000 001c01: curly braces, and used in nested loop constructs.
A 000 001c01:
A 000 001c01: Calculator
A 000 001c01: Print hex: px EXPR
A 000 001c01: Print decimal: pd EXPR
A 000 001c01: Print octal: po EXPR
A 000 001c01: Print binary: pb EXPR
A 000 001c01: Look up PROM addr: nm ADDR
A 000 001c01: Hardware Registers
A 000 001c01: Print register(s): pr [GPRNAME [VAL]
A 000 001c01: Print fpreg(s): pf [REGNO]
A 000 001c01: Store register: sr REG VAL
A 000 001c01: Store fpreg: sf REGNO VAL
A 000 001c01: Memory Access
A 000 001c01: Print address: pa ADDR [BITNO]
A 000 001c01: Load byte: lb ADDR [COUNT]
A 000 001c01: Load half-word: lh ADDR [COUNT]
A 000 001c01: Load word: lw ADDR [COUNT]
A 000 001c01: Load double-word: ld ADDR [COUNT]
A 000 001c01: Load ASCII: la ADDR [COUNT]
A 000 001c01: Store byte: sb ADDR [VAL [COUNT]]
A 000 001c01: Store half-word: sh ADDR [VAL [COUNT]]
A 000 001c01: Store word: sw ADDR [VAL [COUNT]]
A 000 001c01: Store double-word: sd ADDR [VAL [COUNT]]
A 000 001c01: Store and verify: sdv ADDR VAL
A 000 001c01: Memory Operations
A 000 001c01: Fill mem w/ byte: memset DST BYTE LEN
A 000 001c01: Copy memory bytes: memcpy DST SRC LEN
A 000 001c01: Cmp memory bytes: memcmp DST SRC LEN
A 000 001c01: Add memory bytes: memsum SRC LEN
A 000 001c01: Memory Testing
A 000 001c01: Mem. sanity test: santest ADDR
A 000 001c01: Dir/prot init: dirinit START LEN
A 000 001c01: Memory clear: meminit START LEN
A 000 001c01: Dir. test/init: dirtest START LEN
A 000 001c01: Memory test/init: memtest START LEN
A 000 001c01: Clear errors: clear
A 000 001c01: Display errors: error
A 000 001c01: Quality mode: qual [1|0]
A 000 001c01: ECC mode: ecc [1|0]
A 000 001c01: Set R10k int mask: im [BYTE]
A 000 001c01: Test error limit: maxerr COUNT
A 000 001c01: Scan dir states: scandir ADDR [LEN]
A 000 001c01: Directory state: dirstate [BASE [LEN [STATE]]]
A 000 001c01: Network and Vectors
A 000 001c01: Vector read: vr VEC VADDR
A 000 001c01: Vector write: vw VEC VADDR VAL
A 000 001c01: Vector exchange: vx VEC VADDR VAL
A 000 001c01: Discover network: disc
A 000 001c01: Dump pcfg struct: pcfg [n:NODE] [v]
A 000 001c01: Get/set node ID: node [[VEC] ID]
A 000 001c01: Set up route: route [VEC NODE]
A 000 001c01: Read router NIC: rnic [VEC]
A 000 001c01: Dump config info.: cfg [n:NODE]
A 000 001c01: Dump route table: rtab [VEC]
A 000 001c01: Dmp/clr rtr stat: rstat VEC
A 000 001c01: Control Structures
A 000 001c01: Reset the system: reset [all]
A 000 001c01: Softreset a node: softreset n:NODE
A 000 001c01: Call subroutine: call ADDR [A0 [A1 [...]]]
A 000 001c01: Inv. cache & jump: jump ADDR [A0 [A1]]
A 000 001c01: Goto slave mode: slave
A 000 001c01: Repeat count: repeat COUNT CMD
A 000 001c01: Repeat forever: loop CMD
A 000 001c01: While loop: while (EXPR) CMD
A 000 001c01: For loop: for (CMD;EXPR;CMD) CMD
A 000 001c01: If statement: if (EXPR) CMD
A 000 001c01: Delay: delay MICROSEC
A 000 001c01: Sleep: sleep SEC
A 000 001c01: Benchmark timing: time CMD
A 000 001c01: Echo string: echo "STRING"
A 000 001c01: Miscellaneous
A 000 001c01: Show PROM version: version
A 000 001c01: Display help: help [CMDNAME]
A 000 001c01: Read hub NIC: nic [n:NODE]
A 000 001c01: Prgm remote PROM: flash NODE [...]
A 000 001c01: Prgm remote PROM with values: fflash NODE [...]
A 000 001c01: Prgm modebits with values: setmodebits NODE [...]
A 000 001c01: TLB and Cache
A 000 001c01: Clear TLB: tlbc [INDEX]
A 000 001c01: Read TLB: tlbr [INDEX]
A 000 001c01: Inv. cache(s): inval [i][d][s]
A 000 001c01: Flush+inv caches: flush
A 000 001c01: Dump dcache tag: dtag line
A 000 001c01: Dump icache tag: dtag line
A 000 001c01: Dump scache tag: stag line
A 000 001c01: Dump dcache line: dline line
A 000 001c01: Dump icache line: iline line
A 000 001c01: Dump scache line: sline line
A 000 001c01: Dump dcache tag: adtag line
A 000 001c01: Dump icache tag: aitag line
A 000 001c01: Dump scache tag: astag line
A 000 001c01: Dump dcache line: adline line
A 000 001c01: Dump icache line: ailine line
A 000 001c01: Dump scache line: asline line
A 000 001c01: Store a scache dword: sscache line taglo taghi
A 000 001c01: Store a scache tag: sstag line taglo taghi [way]
A 000 001c01: Set memory mode: go dex|unc|cac
A 000 001c01: Hub_send_data_err: hubsde
A 000 001c01: Rtr_send_data_err: rtrsde
A 000 001c01: Check local link: chklink
A 000 001c01: Self-test hub: bist le|ae|lr|ar [n:NODE]
A 000 001c01: Self-test router: rbist le|ae|lr|ar VEC
A 000 001c01: Self-test memory: mbist ADDR
A 000 001c01: Disable CPU/MEM: disable n:NODE [SLICE/BANKS]
A 000 001c01: Enable CPU/MEM: enable n:NODE [SLICE/BANKS]
A 000 001c01: Temp. disable: tdisable n:NODE [SLICE]
A 000 001c01: Cache tests
A 000 001c01: Instruction Cache test: icachetest
A 000 001c01: Primary Cache test: dcachetest
A 000 001c01: Secondary Cache test: scachetest
A 000 001c01: compute CPU frequency
A 000 001c01: Generate SAMSUNG WAR
A 000 001c01: I/O PROM
A 000 001c01: List segments: segs [FLAG]
A 000 001c01: Load/exec segment: exec [SEGNAME [FLAG]]
A 000 001c01: Reconfig. memory: reconf
A 000 001c01: Console Selection
A 000 001c01: Use IOC3/IOC4 UART: ioc
A 000 001c01: Use JunkBus UART: junk
A 000 001c01: Use SysCtlr UART: elsc
A 000 001c01: Use Net UART: talk [n:NODE SLICE]
A 000 001c01: Error Registers
A 000 001c01: Dump II CRBs: crb [n:NODE]
A 000 001c01: 137-col wide crb: crbx [n:NODE]
A 000 001c01: Dump PI err spool: dumpspool [n:NODE SLICE]
A 000 001c01: Dump error info: error_dump
A 000 001c01: Dump reset error: reset_dump
A 000 001c01: Dump bridge errs: edump_bri [n:NODE]
A 000 001c01: System Controller
A 000 001c01: System ctlr cmd: sc ["STRING"]
A 000 001c01: Wr sysctlr nvram: scw ADDR [VAL [COUNT]]
A 000 001c01: Rd sysctlr nvram: scr ADDR [COUNT]
A 000 001c01: Rd sysctlr dbgsw: dips
A 000 001c01: Set/get debug sw: dbg [VIRT_VAL PHYS_VAL]
A 000 001c01: Set/get password: pas ["PASW"]
A 000 001c01: Set/get module #: module [NUM]
A 000 001c01: Set/get partition #: partition [NUM]
A 000 001c01: Get module NIC: modnic
A 000 001c01: Debugging
A 000 001c01: Verbose mode: verbose [1|0]
A 000 001c01: Use alt. regs: altregs [NUM]
A 000 001c01: kernel debugging: kdebug [STACKADDR]
A 000 001c01: Use kernel symtab: kern_sym
A 000 001c01: Send NMI to node: nmi n:NODE [SLICE]
A 000 001c01: Why are we here?: why
A 000 001c01: Stack backtrace: btrace [epc sp]
A 000 001c01: Switch to cpu: cpu [[n:NODE] SLICE]
A 000 001c01: Disassemble: dis ADDR [COUNT]
A 000 001c01: Dump mem cfg: dmc [n:NODE]
A 000 001c01: Run FRU analyzer: fru [1(local) | 2(all node)]
A 000 001c01: Environment Variables and Error Log
A 000 001c01: Init. PROM log: initlog [n:NODE]
A 000 001c01: Clear PROM log: clearlog [n:NODE]
A 000 001c01: Init. all PROM logs in system: initalllogs
A 000 001c01: Clear all PROM logs in system: clearalllogs
A 000 001c01: Set variable: setenv [n:NODE] KEY ["STRING"]
A 000 001c01: Remove variable: unsetenv [n:NODE] KEY
A 000 001c01: Print variables: printenv [n:NODE] [KEY]
A 000 001c01: Tail log entries: log [n:NODE] [TAIL_CNT [HEAD_CNT]]
A 000 001c01: power cycle CBrick
A 000 001c01: initialize PLL WAR variables
A 000 001c01: obtain PLL WAR statistics
A 000 001c01: I/O Diagnostics
A 000 001c01: XBow Diagnostic: dgxbow [m<n|h|m>] [n<NODE>]
A 000 001c01: Bridge Diagnostic: dgbrdg [m<n|h|m>] [n<NODE>] [s<slot>]
A 000 001c01: IO7 Conf Spc Diag: dgconf [m<n|h|m>] [n<NODE>] [s<slot>]
A 000 001c01: PCI Bus Diag.: dgpci [m<n|h|m>] [n<NODE>] [s<slot>] [p<P]
A 000 001c01: Serial PIO Diag: dgspio [m<n|h|m|x>] [n<NODE>] [s<slot>] [
A 000 001c01: Serial DMA Diag: dgsdma [m<n|h|m|x>] [n<NODE>] [s<slot>] [
A 000 001c01: Keyb/Mouse Diag: dgpckm [m<n|m>]
...
... Now Enter CAC Mode ... but we hae to first go to DEX..
...
A 000 001c01: POD SysCt Cac> go cac
A 000 001c01: Must be in Dex mode before switching to Cac or Unc.
A 000 001c01: POD SysCt Cac> go dex
A 000 001c01:
A 000 001c01: *** Requested DEX mode on node 0
...
... Get DEX Command Help
...
A 000 001c01: POD SysCt Dex> ?
A 000 001c01: Commands may be separated by semicolons, grouped with
A 000 001c01: curly braces, and used in nested loop constructs.
A 000 001c01:
A 000 001c01: Calculator
A 000 001c01: Print hex: px EXPR
A 000 001c01: Print decimal: pd EXPR
A 000 001c01: Print octal: po EXPR
A 000 001c01: Print binary: pb EXPR
A 000 001c01: Look up PROM addr: nm ADDR
A 000 001c01: Hardware Registers
A 000 001c01: Print register(s): pr [GPRNAME [VAL]
A 000 001c01: Print fpreg(s): pf [REGNO]
A 000 001c01: Store register: sr REG VAL
A 000 001c01: Store fpreg: sf REGNO VAL
A 000 001c01: Memory Access
A 000 001c01: Print address: pa ADDR [BITNO]
A 000 001c01: Load byte: lb ADDR [COUNT]
A 000 001c01: Load half-word: lh ADDR [COUNT]
A 000 001c01: Load word: lw ADDR [COUNT]
A 000 001c01: Load double-word: ld ADDR [COUNT]
A 000 001c01: Load ASCII: la ADDR [COUNT]
A 000 001c01: Store byte: sb ADDR [VAL [COUNT]]
A 000 001c01: Store half-word: sh ADDR [VAL [COUNT]]
A 000 001c01: Store word: sw ADDR [VAL [COUNT]]
A 000 001c01: Store double-word: sd ADDR [VAL [COUNT]]
A 000 001c01: Store and verify: sdv ADDR VAL
A 000 001c01: Memory Operations
A 000 001c01: Fill mem w/ byte: memset DST BYTE LEN
A 000 001c01: Copy memory bytes: memcpy DST SRC LEN
A 000 001c01: Cmp memory bytes: memcmp DST SRC LEN
A 000 001c01: Add memory bytes: memsum SRC LEN
A 000 001c01: Memory Testing
A 000 001c01: Mem. sanity test: santest ADDR
A 000 001c01: Dir/prot init: dirinit START LEN
A 000 001c01: Memory clear: meminit START LEN
A 000 001c01: Dir. test/init: dirtest START LEN
A 000 001c01: Memory test/init: memtest START LEN
A 000 001c01: Clear errors: clear
A 000 001c01: Display errors: error
A 000 001c01: Quality mode: qual [1|0]
A 000 001c01: ECC mode: ecc [1|0]
A 000 001c01: Set R10k int mask: im [BYTE]
A 000 001c01: Test error limit: maxerr COUNT
A 000 001c01: Scan dir states: scandir ADDR [LEN]
A 000 001c01: Directory state: dirstate [BASE [LEN [STATE]]]
A 000 001c01: Network and Vectors
A 000 001c01: Vector read: vr VEC VADDR
A 000 001c01: Vector write: vw VEC VADDR VAL
A 000 001c01: Vector exchange: vx VEC VADDR VAL
A 000 001c01: Discover network: disc
A 000 001c01: Dump pcfg struct: pcfg [n:NODE] [v]
A 000 001c01: Get/set node ID: node [[VEC] ID]
A 000 001c01: Set up route: route [VEC NODE]
A 000 001c01: Read router NIC: rnic [VEC]
A 000 001c01: Dump config info.: cfg [n:NODE]
A 000 001c01: Dump route table: rtab [VEC]
A 000 001c01: Dmp/clr rtr stat: rstat VEC
A 000 001c01: Control Structures
A 000 001c01: Reset the system: reset [all]
A 000 001c01: Softreset a node: softreset n:NODE
A 000 001c01: Call subroutine: call ADDR [A0 [A1 [...]]]
A 000 001c01: Inv. cache & jump: jump ADDR [A0 [A1]]
A 000 001c01: Goto slave mode: slave
A 000 001c01: Repeat count: repeat COUNT CMD
A 000 001c01: Repeat forever: loop CMD
A 000 001c01: While loop: while (EXPR) CMD
A 000 001c01: For loop: for (CMD;EXPR;CMD) CMD
A 000 001c01: If statement: if (EXPR) CMD
A 000 001c01: Delay: delay MICROSEC
A 000 001c01: Sleep: sleep SEC
A 000 001c01: Benchmark timing: time CMD
A 000 001c01: Echo string: echo "STRING"
A 000 001c01: Miscellaneous
A 000 001c01: Show PROM version: version
A 000 001c01: Display help: help [CMDNAME]
A 000 001c01: Read hub NIC: nic [n:NODE]
A 000 001c01: Prgm remote PROM: flash NODE [...]
A 000 001c01: Prgm remote PROM with values: fflash NODE [...]
A 000 001c01: Prgm modebits with values: setmodebits NODE [...]
A 000 001c01: TLB and Cache
A 000 001c01: Clear TLB: tlbc [INDEX]
A 000 001c01: Read TLB: tlbr [INDEX]
A 000 001c01: Inv. cache(s): inval [i][d][s]
A 000 001c01: Flush+inv caches: flush
A 000 001c01: Dump dcache tag: dtag line
A 000 001c01: Dump icache tag: dtag line
A 000 001c01: Dump scache tag: stag line
A 000 001c01: Dump dcache line: dline line
A 000 001c01: Dump icache line: iline line
A 000 001c01: Dump scache line: sline line
A 000 001c01: Dump dcache tag: adtag line
A 000 001c01: Dump icache tag: aitag line
A 000 001c01: Dump scache tag: astag line
A 000 001c01: Dump dcache line: adline line
A 000 001c01: Dump icache line: ailine line
A 000 001c01: Dump scache line: asline line
A 000 001c01: Store a scache dword: sscache line taglo taghi
A 000 001c01: Store a scache tag: sstag line taglo taghi [way]
A 000 001c01: Set memory mode: go dex|unc|cac
A 000 001c01: Hub_send_data_err: hubsde
A 000 001c01: Rtr_send_data_err: rtrsde
A 000 001c01: Check local link: chklink
A 000 001c01: Self-test hub: bist le|ae|lr|ar [n:NODE]
A 000 001c01: Self-test router: rbist le|ae|lr|ar VEC
A 000 001c01: Self-test memory: mbist ADDR
A 000 001c01: Disable CPU/MEM: disable n:NODE [SLICE/BANKS]
A 000 001c01: Enable CPU/MEM: enable n:NODE [SLICE/BANKS]
A 000 001c01: Temp. disable: tdisable n:NODE [SLICE]
A 000 001c01: Cache tests
A 000 001c01: Instruction Cache test: icachetest
A 000 001c01: Primary Cache test: dcachetest
A 000 001c01: Secondary Cache test: scachetest
A 000 001c01: compute CPU frequency
A 000 001c01: Generate SAMSUNG WAR
A 000 001c01: I/O PROM
A 000 001c01: List segments: segs [FLAG]
A 000 001c01: Load/exec segment: exec [SEGNAME [FLAG]]
A 000 001c01: Reconfig. memory: reconf
A 000 001c01: Console Selection
A 000 001c01: Use IOC3/IOC4 UART: ioc
A 000 001c01: Use JunkBus UART: junk
A 000 001c01: Use SysCtlr UART: elsc
A 000 001c01: Use Net UART: talk [n:NODE SLICE]
A 000 001c01: Error Registers
A 000 001c01: Dump II CRBs: crb [n:NODE]
A 000 001c01: 137-col wide crb: crbx [n:NODE]
A 000 001c01: Dump PI err spool: dumpspool [n:NODE SLICE]
A 000 001c01: Dump error info: error_dump
A 000 001c01: Dump reset error: reset_dump
A 000 001c01: Dump bridge errs: edump_bri [n:NODE]
A 000 001c01: System Controller
A 000 001c01: System ctlr cmd: sc ["STRING"]
A 000 001c01: Wr sysctlr nvram: scw ADDR [VAL [COUNT]]
A 000 001c01: Rd sysctlr nvram: scr ADDR [COUNT]
A 000 001c01: Rd sysctlr dbgsw: dips
A 000 001c01: Set/get debug sw: dbg [VIRT_VAL PHYS_VAL]
A 000 001c01: Set/get password: pas ["PASW"]
A 000 001c01: Set/get module #: module [NUM]
A 000 001c01: Set/get partition #: partition [NUM]
A 000 001c01: Get module NIC: modnic
A 000 001c01: Debugging
A 000 001c01: Verbose mode: verbose [1|0]
A 000 001c01: Use alt. regs: altregs [NUM]
A 000 001c01: kernel debugging: kdebug [STACKADDR]
A 000 001c01: Use kernel symtab: kern_sym
A 000 001c01: Send NMI to node: nmi n:NODE [SLICE]
A 000 001c01: Why are we here?: why
A 000 001c01: Stack backtrace: btrace [epc sp]
A 000 001c01: Switch to cpu: cpu [[n:NODE] SLICE]
A 000 001c01: Disassemble: dis ADDR [COUNT]
A 000 001c01: Dump mem cfg: dmc [n:NODE]
A 000 001c01: Run FRU analyzer: fru [1(local) | 2(all node)]
A 000 001c01: Environment Variables and Error Log
A 000 001c01: Init. PROM log: initlog [n:NODE]
A 000 001c01: Clear PROM log: clearlog [n:NODE]
A 000 001c01: Init. all PROM logs in system: initalllogs
A 000 001c01: Clear all PROM logs in system: clearalllogs
A 000 001c01: Set variable: setenv [n:NODE] KEY ["STRING"]
A 000 001c01: Remove variable: unsetenv [n:NODE] KEY
A 000 001c01: Print variables: printenv [n:NODE] [KEY]
A 000 001c01: Tail log entries: log [n:NODE] [TAIL_CNT [HEAD_CNT]]
A 000 001c01: power cycle CBrick
A 000 001c01: initialize PLL WAR variables
A 000 001c01: obtain PLL WAR statistics
A 000 001c01: I/O Diagnostics
A 000 001c01: XBow Diagnostic: dgxbow [m<n|h|m>] [n<NODE>]
A 000 001c01: Bridge Diagnostic: dgbrdg [m<n|h|m>] [n<NODE>] [s<slot>]
A 000 001c01: IO7 Conf Spc Diag: dgconf [m<n|h|m>] [n<NODE>] [s<slot>]
A 000 001c01: PCI Bus Diag.: dgpci [m<n|h|m>] [n<NODE>] [s<slot>] [p<P]
A 000 001c01: Serial PIO Diag: dgspio [m<n|h|m|x>] [n<NODE>] [s<slot>] [
A 000 001c01: Serial DMA Diag: dgsdma [m<n|h|m|x>] [n<NODE>] [s<slot>] [
A 000 001c01: Keyb/Mouse Diag: dgpckm [m<n|m>]
...
... Enter CAC Mode
...
A 000 001c01: POD SysCt Dex> go cac
A 000 001c01: Testing/Initializing memory
A 000 001c01: Init PROM text/data (0x9600000001a00000), len 0x16c000
A 000 001c01: Initializing dir/prot
A 000 001c01: Initializing ECC
A 000 001c01: Clearing memory
A 000 001c01: Copy PROM (0x9000000018000000) to RAM (0x9600000001a00000), len 08
A 000 001c01: Done
A 000 001c01: Init PROM bss (0x9600000001b6c000), len 0x8000
A 000 001c01: Initializing dir/prot
A 000 001c01: Initializing ECC
A 000 001c01: Clearing memory
A 000 001c01: Init PROM stack/structures (0x96000000020d0000), len 0x12000
A 000 001c01: Initializing dir/prot
A 000 001c01: Initializing ECC
A 000 001c01: Clearing memory
A 000 001c01: Done
A 000 001c01:
A 000 001c01: *** Requested CAC mode on node 0
...
... Get CAC Command Help
...
A 000 001c01: POD SysCt Cac> ?
A 000 001c01: Commands may be separated by semicolons, grouped with
A 000 001c01: curly braces, and used in nested loop constructs.
A 000 001c01:
A 000 001c01: Calculator
A 000 001c01: Print hex: px EXPR
A 000 001c01: Print decimal: pd EXPR
A 000 001c01: Print octal: po EXPR
A 000 001c01: Print binary: pb EXPR
A 000 001c01: Look up PROM addr: nm ADDR
A 000 001c01: Hardware Registers
A 000 001c01: Print register(s): pr [GPRNAME [VAL]
A 000 001c01: Print fpreg(s): pf [REGNO]
A 000 001c01: Store register: sr REG VAL
A 000 001c01: Store fpreg: sf REGNO VAL
A 000 001c01: Memory Access
A 000 001c01: Print address: pa ADDR [BITNO]
A 000 001c01: Load byte: lb ADDR [COUNT]
A 000 001c01: Load half-word: lh ADDR [COUNT]
A 000 001c01: Load word: lw ADDR [COUNT]
A 000 001c01: Load double-word: ld ADDR [COUNT]
A 000 001c01: Load ASCII: la ADDR [COUNT]
A 000 001c01: Store byte: sb ADDR [VAL [COUNT]]
A 000 001c01: Store half-word: sh ADDR [VAL [COUNT]]
A 000 001c01: Store word: sw ADDR [VAL [COUNT]]
A 000 001c01: Store double-word: sd ADDR [VAL [COUNT]]
A 000 001c01: Store and verify: sdv ADDR VAL
A 000 001c01: Memory Operations
A 000 001c01: Fill mem w/ byte: memset DST BYTE LEN
A 000 001c01: Copy memory bytes: memcpy DST SRC LEN
A 000 001c01: Cmp memory bytes: memcmp DST SRC LEN
A 000 001c01: Add memory bytes: memsum SRC LEN
A 000 001c01: Memory Testing
A 000 001c01: Mem. sanity test: santest ADDR
A 000 001c01: Dir/prot init: dirinit START LEN
A 000 001c01: Memory clear: meminit START LEN
A 000 001c01: Dir. test/init: dirtest START LEN
A 000 001c01: Memory test/init: memtest START LEN
A 000 001c01: Clear errors: clear
A 000 001c01: Display errors: error
A 000 001c01: Quality mode: qual [1|0]
A 000 001c01: ECC mode: ecc [1|0]
A 000 001c01: Set R10k int mask: im [BYTE]
A 000 001c01: Test error limit: maxerr COUNT
A 000 001c01: Scan dir states: scandir ADDR [LEN]
A 000 001c01: Directory state: dirstate [BASE [LEN [STATE]]]
A 000 001c01: Network and Vectors
A 000 001c01: Vector read: vr VEC VADDR
A 000 001c01: Vector write: vw VEC VADDR VAL
A 000 001c01: Vector exchange: vx VEC VADDR VAL
A 000 001c01: Discover network: disc
A 000 001c01: Dump pcfg struct: pcfg [n:NODE] [v]
A 000 001c01: Get/set node ID: node [[VEC] ID]
A 000 001c01: Set up route: route [VEC NODE]
A 000 001c01: Read router NIC: rnic [VEC]
A 000 001c01: Dump config info.: cfg [n:NODE]
A 000 001c01: Dump route table: rtab [VEC]
A 000 001c01: Dmp/clr rtr stat: rstat VEC
A 000 001c01: Control Structures
A 000 001c01: Reset the system: reset [all]
A 000 001c01: Softreset a node: softreset n:NODE
A 000 001c01: Call subroutine: call ADDR [A0 [A1 [...]]]
A 000 001c01: Inv. cache & jump: jump ADDR [A0 [A1]]
A 000 001c01: Goto slave mode: slave
A 000 001c01: Repeat count: repeat COUNT CMD
A 000 001c01: Repeat forever: loop CMD
A 000 001c01: While loop: while (EXPR) CMD
A 000 001c01: For loop: for (CMD;EXPR;CMD) CMD
A 000 001c01: If statement: if (EXPR) CMD
A 000 001c01: Delay: delay MICROSEC
A 000 001c01: Sleep: sleep SEC
A 000 001c01: Benchmark timing: time CMD
A 000 001c01: Echo string: echo "STRING"
A 000 001c01: Miscellaneous
A 000 001c01: Show PROM version: version
A 000 001c01: Display help: help [CMDNAME]
A 000 001c01: Read hub NIC: nic [n:NODE]
A 000 001c01: Prgm remote PROM: flash NODE [...]
A 000 001c01: Prgm remote PROM with values: fflash NODE [...]
A 000 001c01: Prgm modebits with values: setmodebits NODE [...]
A 000 001c01: TLB and Cache
A 000 001c01: Clear TLB: tlbc [INDEX]
A 000 001c01: Read TLB: tlbr [INDEX]
A 000 001c01: Inv. cache(s): inval [i][d][s]
A 000 001c01: Flush+inv caches: flush
A 000 001c01: Dump dcache tag: dtag line
A 000 001c01: Dump icache tag: dtag line
A 000 001c01: Dump scache tag: stag line
A 000 001c01: Dump dcache line: dline line
A 000 001c01: Dump icache line: iline line
A 000 001c01: Dump scache line: sline line
A 000 001c01: Dump dcache tag: adtag line
A 000 001c01: Dump icache tag: aitag line
A 000 001c01: Dump scache tag: astag line
A 000 001c01: Dump dcache line: adline line
A 000 001c01: Dump icache line: ailine line
A 000 001c01: Dump scache line: asline line
A 000 001c01: Store a scache dword: sscache line taglo taghi
A 000 001c01: Store a scache tag: sstag line taglo taghi [way]
A 000 001c01: Set memory mode: go dex|unc|cac
A 000 001c01: Hub_send_data_err: hubsde
A 000 001c01: Rtr_send_data_err: rtrsde
A 000 001c01: Check local link: chklink
A 000 001c01: Self-test hub: bist le|ae|lr|ar [n:NODE]
A 000 001c01: Self-test router: rbist le|ae|lr|ar VEC
A 000 001c01: Self-test memory: mbist ADDR
A 000 001c01: Disable CPU/MEM: disable n:NODE [SLICE/BANKS]
A 000 001c01: Enable CPU/MEM: enable n:NODE [SLICE/BANKS]
A 000 001c01: Temp. disable: tdisable n:NODE [SLICE]
A 000 001c01: Cache tests
A 000 001c01: Instruction Cache test: icachetest
A 000 001c01: Primary Cache test: dcachetest
A 000 001c01: Secondary Cache test: scachetest
A 000 001c01: compute CPU frequency
A 000 001c01: Generate SAMSUNG WAR
A 000 001c01: I/O PROM
A 000 001c01: List segments: segs [FLAG]
A 000 001c01: Load/exec segment: exec [SEGNAME [FLAG]]
A 000 001c01: Reconfig. memory: reconf
A 000 001c01: Console Selection
A 000 001c01: Use IOC3/IOC4 UART: ioc
A 000 001c01: Use JunkBus UART: junk
A 000 001c01: Use SysCtlr UART: elsc
A 000 001c01: Use Net UART: talk [n:NODE SLICE]
A 000 001c01: Error Registers
A 000 001c01: Dump II CRBs: crb [n:NODE]
A 000 001c01: 137-col wide crb: crbx [n:NODE]
A 000 001c01: Dump PI err spool: dumpspool [n:NODE SLICE]
A 000 001c01: Dump error info: error_dump
A 000 001c01: Dump reset error: reset_dump
A 000 001c01: Dump bridge errs: edump_bri [n:NODE]
A 000 001c01: System Controller
A 000 001c01: System ctlr cmd: sc ["STRING"]
A 000 001c01: Wr sysctlr nvram: scw ADDR [VAL [COUNT]]
A 000 001c01: Rd sysctlr nvram: scr ADDR [COUNT]
A 000 001c01: Rd sysctlr dbgsw: dips
A 000 001c01: Set/get debug sw: dbg [VIRT_VAL PHYS_VAL]
A 000 001c01: Set/get password: pas ["PASW"]
A 000 001c01: Set/get module #: module [NUM]
A 000 001c01: Set/get partition #: partition [NUM]
A 000 001c01: Get module NIC: modnic
A 000 001c01: Debugging
A 000 001c01: Verbose mode: verbose [1|0]
A 000 001c01: Use alt. regs: altregs [NUM]
A 000 001c01: kernel debugging: kdebug [STACKADDR]
A 000 001c01: Use kernel symtab: kern_sym
A 000 001c01: Send NMI to node: nmi n:NODE [SLICE]
A 000 001c01: Why are we here?: why
A 000 001c01: Stack backtrace: btrace [epc sp]
A 000 001c01: Switch to cpu: cpu [[n:NODE] SLICE]
A 000 001c01: Disassemble: dis ADDR [COUNT]
A 000 001c01: Dump mem cfg: dmc [n:NODE]
A 000 001c01: Run FRU analyzer: fru [1(local) | 2(all node)]
A 000 001c01: Environment Variables and Error Log
A 000 001c01: Init. PROM log: initlog [n:NODE]
A 000 001c01: Clear PROM log: clearlog [n:NODE]
A 000 001c01: Init. all PROM logs in system: initalllogs
A 000 001c01: Clear all PROM logs in system: clearalllogs
A 000 001c01: Set variable: setenv [n:NODE] KEY ["STRING"]
A 000 001c01: Remove variable: unsetenv [n:NODE] KEY
A 000 001c01: Print variables: printenv [n:NODE] [KEY]
A 000 001c01: Tail log entries: log [n:NODE] [TAIL_CNT [HEAD_CNT]]
A 000 001c01: power cycle CBrick
A 000 001c01: initialize PLL WAR variables
A 000 001c01: obtain PLL WAR statistics
A 000 001c01: I/O Diagnostics
A 000 001c01: XBow Diagnostic: dgxbow [m<n|h|m>] [n<NODE>]
A 000 001c01: Bridge Diagnostic: dgbrdg [m<n|h|m>] [n<NODE>] [s<slot>]
A 000 001c01: IO7 Conf Spc Diag: dgconf [m<n|h|m>] [n<NODE>] [s<slot>]
A 000 001c01: PCI Bus Diag.: dgpci [m<n|h|m>] [n<NODE>] [s<slot>] [p<P]
A 000 001c01: Serial PIO Diag: dgspio [m<n|h|m|x>] [n<NODE>] [s<slot>] [
A 000 001c01: Serial DMA Diag: dgsdma [m<n|h|m|x>] [n<NODE>] [s<slot>] [
A 000 001c01: Keyb/Mouse Diag: dgpckm [m<n|m>]
A 000 001c01: POD SysCt Cac>
...
... Escape back to L1 (Ctl-t)
...
escaping to L1 system controller
001c01-L1>debug 0x0
debug switches set to 0x0000
returning to console mode 001c01 CPU0, <CTRL_T> to escape to L1
...
... Finally so reset to go back to standard boot process
...
A 000 001c01: POD SysCt Cac> reset
A 000 001c01: Resetting the system...
Starting PROM Boot process
IP35 PROM SGI Version 6.210 built 02:33:51 PM Aug 26, 2004
Testing/Initializing memory ............... DONE
Copying PROM code to memory ............... DONE
Discovering local IO ...................... DONE
Discovering NUMAlink connectivity .........
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 5897 usec
Waiting for peers to complete discovery.... DONE
No other nodes present; becoming global master
Global master is /hw/rack/001/bay/01
Intializing any CPUless nodes.............. DONE
Checking partitioning information ......... DONE
No other nodes present; becoming partition master
Local slave entering slave loop
Local slave entering slave loop
Local slave entering slave loop
Loading BASEIO prom ....................... DONE
BASEIO PROM Monitor SGI Version 6.210 built 02:30:38 PM Aug 26, 2004 (BE64)
4 CPUs on 1 nodes found.
NVRAM checksum is incorrect: reinitializing.
Automatic update of PROM environment disabled
Graphics diagnostics
Odyssey board #0 found on nasid 0
Running Odyssey xtalk sanity diag...
Board version 1 - Buzz revision 3B
On board sdram size: 128 Mb
Cas latency: CAS 3
4 banks by sdram module
Running Odyssey Buzz registers diag...
Device passed diagnostics
Installing PROM Device drivers ............
On-board (IO9) tigon3 1000BaseT interface
Base I/O Ethernet set to /dev/ethernet/tg0
Installing Graphics Console...
graphics install: searching for pipe 0
Probing IOC4 ATA adapter 2
IOC4 RevId = 83
Detected Vendor id/Product MATSHITA DVD-ROM SR-8178
Walking SCSI Adapter 0, (pci id 3)
1+ Device Vendor Product: ATA SCSIDE BRIDGE320
2- 3- 4- 5- 6- 7- 8- 9- 10- 11- 12- 13- 14- 15- = 1 device(s)
Walking SCSI Adapter 1, (pci id 3)
1- 2- 3- 4- 5- 6- 7- 8- 9- 10- 11- 12- 13- 14- 15- = 0 device(s)
Initializing PROM Device drivers ..........
Initializing Base I/O Ethernet Interface...Done.
---------------Interface Configuration Summary----------------
ASIC|Revision|MAC Address : 5701|B5|08:00:69:11:e9:d0
Link Negotiation|Advertisement : On|<H10 F10 H100 F100 H1000 F1000>
Link|Speed|Duplex|Rx/Tx FlowCtrl: Up|1000|Full|Off/Off
--------------------------------------------------------------
DONE
escaping to L1 system controller
001c01-L1>
Sample Session with Fuel (internal ) Serial Comms Port
The Fuel has multiple ways to support diagnostic communications:
- L1 USB Port - this is the external USB port just below the Ethernet port and can be used to connect Fuel to an L2 controller
- Internal Serial Port - there is an internal serial port which can also be used to communicate with the L1 controller using a NULL modem serial cable (38400,8,N,1)
- External Serial Port #1 which is the lower of the two serials ports above the Ethernet Port.
When I first got my Fuel I could communicate with L1 via USB and L2 but not via either the internal or external serial ports. To resolve this I had to:
- edit /etc/inittab - to ensure that ports where defined and not allocated to other uses
- edit /etc/uucp/Devices - to allow use of Serial Port #2 communications at multiple speeds
- /etc/ioconfig.conf - to reset the tty config back to tty1 & tty2 as it was reporting serial ports on tty2 & tty4
- POD/CAC Reset - go into POD/CAC mode and do a reset of logs to clear errors
Here is exampe session via the internal Serial port:
001a01-L1>
001a01-L1>
001a01-L1>ver
L1 1.48.1 (Image A), Built 01/22/2007 11:34:20 [Fuel/PE/O300 1MB image]
001a01-L1>?
ERROR: command not found.
001a01-L1>help
Commands are:
* autopower|apwr syscom|junkbus|jb|bedrockbrick
partdb cpu nia|ni|ctc nib
iia|ii|cti iib iic iid
config|cfg debug display|dsp button|btn
env fan help|hlp history|hist
l1dbg link log ioport|ioprt
istat l1 leds margin|mgn
network pimm port|prt power|pwr
reset|rst nmi softreset|softrst select|sel
serial sysstate eeprom uart
usb router|rtr service date
nvram security flash reboot_l1
version|ver pbay test|tst scan
fru|pci|prom|node
enter 'hlp <cmd>' for more help on a single command.
001a01-L1>uart
Baud Read Read Read Read Read Write Write Write
UART Rate State Status Timeouts Breaks Errors State Status Timeouts
---- ---- ----- ------ -------- ------ ------ ----- ------ --------
JNK 0 57692 Connect Suspend 0 0 42 Connect Ready 0
SMP 37500 Connect Ready 0 0 0 Connect Ready 3
001a01-L1>serial
BSN: MSM019 SSN: XX:XX:XX:XX:XX:XX Time: 02/07/2106 06:28:15 Security: OFF
Public Key data in EEPROM is invalid
001a01-L1>usb
Device: 0 Disconnects: 1 Bus Resets: 20
Endpoint State Status Stalls Errors Timeouts
-------- ----- ------ ------ ------ --------
Control Stalled Suspended 30085 0 0
Read Unconfig Ready 0 0 0
Write Unconfig Ready 1 0 0
001a01-L1>power up
001a01-L1>
entering console mode 001a01 CPU0, <CTRL_T> to escape to L1
Starting PROM Boot process
IP35 PROM SGI Version 6.211 built 04:16:18 PM Jan 25, 2008
Running in DDR mode
*** Mixed standard and premium memory:
*** Treating all as standard.
Testing/Initializing memory ............... DONE
Copying PROM code to memory ............... DONE
Discovering local IO ...................... DONE
Discovering NUMAlink connectivity .........
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 5884 usec
Waiting for peers to complete discovery.... DONE
No other nodes present; becoming global master
Global master is /hw/rack/001/bay/01
Intializing any CPUless nodes.............. DONE
Checking partitioning information ......... DONE
No other nodes present; becoming partition master
Loading BASEIO prom ....................... DONE
BASEIO PROM Monitor SGI Version 6.211 built 04:15:20 PM Jan 25, 2008 (BE64)
1 CPUs on 1 nodes found.
Automatic update of PROM environment disabled
PS/2 Keyboard & Mouse diagnostics
Found mouse on port 0
Found keyboard on port 1
PS/2 Keyboard & Mouse diagnostics passed
Graphics diagnostics
Odyssey board #0 found on nasid 0
Running Odyssey xtalk sanity diag...
Board version 1 - Buzz revision 2B
On board sdram size: 128 Mb
Cas latency: CAS 3
4 banks by sdram module
Running Odyssey Buzz registers diag...
Device passed diagnostics
Installing PROM Device drivers ............
Base I/O Ethernet set to /dev/ethernet/ef0
Installing Graphics Console...
graphics install: searching for pipe 0
Walking SCSI Adapter 0, (pci id 1)
1+ Device Vendor Product: ATA Samsung SSD 840
2- 3- 4- 5- 6- 7- 8- 9- 10- 11- 12- 13- 14- 15- = 1 device(s)
Walking SCSI Adapter 1, (pci id 1)
1+ Device Vendor Product: SONY SDT-9000
2+ Device Vendor Product: TOSHIBA DVD-ROM SD-M1711
3- 4- 5- 6- 7- 8- 9- 10- 11- 12- 13- 14- 15- = 2 device(s)
Initializing PROM Device drivers .......... DONE
escaping to L1 system controller
001a01-L1>env
Environmental monitoring is enabled and running.
Description State Warning Limits Fault Limits Current
-------------- ---------- ----------------- ----------------- -------
12V Enabled 10% 10.80/ 13.20 20% 9.60/ 14.40 12.063
12V IO Enabled 10% 10.80/ 13.20 20% 9.60/ 14.40 12.063
5V Enabled 10% 4.50/ 5.50 20% 4.00/ 6.00 5.044
3.3V Enabled 10% 2.97/ 3.63 20% 2.64/ 3.96 3.320
2.5V Enabled 10% 2.25/ 2.75 20% 2.00/ 3.00 2.470
1.5V Enabled 10% 1.35/ 1.65 20% 1.20/ 1.80 1.466
5V AUX Enabled 10% 4.50/ 5.50 20% 4.00/ 6.00 5.096
3.3V AUX Enabled 10% 2.97/ 3.63 20% 2.64/ 3.96 3.285
PIMM 12V BIAS Enabled 10% 10.80/ 13.20 20% 9.60/ 14.40 12.063
SRAM Enabled 10% 2.25/ 2.75 20% 2.00/ 3.00 2.509
VCPU Enabled 10% 1.44/ 1.76 20% 1.28/ 1.92 1.593
PIMM 1.5V Enabled 10% 1.35/ 1.65 20% 1.20/ 1.80 1.495
PIMM 3.3V AUX Enabled 10% 2.97/ 3.63 20% 2.64/ 3.96 3.268
PIMM 5V AUX Enabled 10% 4.50/ 5.50 20% 4.00/ 6.00 5.096
XIO 12V BIAS Enabled 10% 10.80/ 13.20 20% 9.60/ 14.40 12.000
XIO 5V Enabled 10% 4.50/ 5.50 20% 4.00/ 6.00 5.044
XIO 2.5V Enabled 10% 2.25/ 2.75 20% 2.00/ 3.00 2.457
XIO 3.3V AUX Enabled 10% 2.97/ 3.63 20% 2.64/ 3.96 3.285
Description State Warning RPM Current RPM
--------------- ---------- ----------- -----------
FAN 0 EXHAUST Enabled 920 1188
FAN 1 HD Enabled 1560 2393
FAN 2 PCI Enabled 1120 1520
FAN 3 XIO 1 Enabled 1600 2343
FAN 4 XIO 2 Enabled 1600 2220
FAN 5 PS Enabled 1349 30681
Advisory Critical Fault Current
Description State Temp Temp Temp Temp
----------------- ---------- --------- --------- --------- ---------
0 NODE 0 Enabled [Autofan Control] 80C/176F 31C/ 87F
1 NODE 1 Enabled [Autofan Control] 80C/176F 27C/ 80F
2 NODE 2 Enabled [Autofan Control] 80C/176F 25C/ 77F
3 PIMM Enabled [Autofan Control] 80C/176F 38C/100F
4 ODYSSEY Enabled [Autofan Control] 80C/176F 24C/ 75F
5 BEDROCK Enabled [Autofan Control] 85C/185F 29C/ 84F
returning to console mode 001a01 console, <CTRL_T> to escape to L1
Sample POD/DEX/CAC Startup on Fuel
Fuel shares same underlying architecture as O3000 & Chimera, with the same L1 debug setting ("debug 0x10d"), to boot into POD/CAC mode:
Starting PROM Boot process
hubii_link_good: A-brick attached to module 001c01.
HUB at 0x0 attached as widget 0xa
001c01/0xa/xbow_arb: nasid= 0x0 xbow_base= 0x9200000000000000
001c01/0xa/xbow_arb: 622 master is 0xa
Check_master: link 10 is master
hubii_link_good: A-brick attached to module 001c01.
Check_master: link 10 is master
IP35 PROM SGI Version 6.211 built 04:16:18 PM Jan 25, 2008
built for bedrock rev. 1.1 or greater
running in IP34 mode
Running in DDR mode
Local master CPU A revision: f42
PROM length: 0x168648, BSS length: 0xa7a0, flash count: 16
Configured bedrock clock: 200.0 MHz
Status of local IO: 0x1 0x3fc0fff6403
Bedrock Rev: 2, Module: 1 (001c01) from Sys Ctlr
On PROM entry: ERR_EPC=0xc00000001fc29684 (0xc00000001fc29684)
Configuring memory
Local memory configured: 4096 MB (premium)
*** Warning: System controller debug switches are non-zero (0x10d)
*** Diag level set to None (2)
*** Info level set to verbose
*** Boot stop requested at Global (2)
before reading NICHub NIC: 0x52275dad
SR1 set to 0x0000081698349000
SR0 set to 0x0000000052275dad
Testing/Initializing memory ............... DONE
Copying PROM code to memory ............... Copy PROM (0x9000000018000000) to RAM (0x9600000001a00000), len 0x168648
Done
DONE
Skipping secondary cache diags
CPU A switching stack into UALIAS and invalidating D-cache
CPU A switching into node 0 cached RAM
CPU A running cached
Initializing kldir.
Done initializing kldir.
Initializing klconfig.
init_klcfg: nasid 0 start 9600000000030000 size 10000
Done initializing klconfig.
Discovering local IO ...................... Check_master: link 10 is master
Check_master: link 10 is master
DONE
CPU A initialized subnode
Discovering NUMAlink connectivity .........
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 5886 usec
Waiting for peers to complete discovery.... Discovery results:
ENTRY 0: HUB(52275dad)
NASID=-1 Mod=1 Flg=0x9500000 PROM=6.211 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
DONE
No other nodes present; becoming global master
Global master is entry 0, NIC 0x52275dad, /hw/rack/001/bay/01
Global master is /hw/rack/001/bay/01
Global barrier (line 4315)Global barrier passed.
Global barrier (line 4348)Global barrier passed.
Master System Topology Graph (pre-nasid_assign):
ENTRY 0: HUB(52275dad)
NASID=-1 Mod=1 Flg=0x9500000 PROM=6.211 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
Calculating NASIDs
num_routers is 0
Master System Topology Graph:
ENTRY 0: HUB(52275dad)
NASID=0 Mod=1 Flg=0x9500000 PROM=6.211 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
Distributing routing tables
Distributing NASIDs
*** NASID assigned to 0
CPU A switching to UALIAS
CPU A running in UALIAS
Changing node ID to 0
Global barrier (line 4823)Global barrier passed.
CPU A Flushing and invalidating caches
Global barrier (line 4928)Global barrier passed.
CPU A switching to node 0 cached RAM
CPU A running cached
Nasids in partition: +0
Regions in partition: +0
Intializing any CPUless nodes.............. Global barrier (line 7714)Global barrier passed.
Global barrier (line 7715)Global barrier passed.
DONE
Global barrier (line 5089)Global barrier passed.
hubii_link_good: A-brick attached to module 001c01.
Checking partitioning information ......... DONE
No other nodes present; becoming partition master
*** After partitioning ***
ENTRY 0: HUB(52275dad)
NASID=0 Mod=1 Flg=0x9500000 PROM=6.211 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: FE
Erecting partition fences ................ DONE
Update config for routers connected to hubs
Update config for hubs and hubless routers
CPU A flushing cache
check_router_cfg: nasid 0 is_voyager 0 check_cfg = 0
Global barrier (line 5300)Global barrier passed.
Nasids in partition: +0
Regions in partition: +0
A 000 001c01:
A 000 001c01: *** Entering POD mode on node 0
A 000 001c01: POD SysCt Cac>
escaping to L2 system controller
?-192.168.XXX.XXX-L2>debug 0
001a01:
debug switches set to 0x0000
re-entering system console mode (001a01 CPU0), <CTRL_T> to escape to L2
A 000 001c01: POD SysCt Cac> reset
A 000 001c01: Resetting the system...
Search for Fuel / O350 PROM Chip
The Octane, Fuel and O3000, O300 & O350 machines all have a flashable PROM chip that can be flashed and dumped using the irix "flash" command. There has been quite a bit if speculation on where the PROM chips are and what is the type. The reason is that the PROM can get invalidated by flash crash and also if you have Fuel flashed for higher speed, this is recorded in PROM chip and there does not appear to be anyway to reset this to work with lower speed CPU other than by putting in faster CPU and then down-flashing speed and only then replacing the faster CPU with slower model.
So is there a way to program / replace the PROM chip to support recovery of machines ?
Here is picture of Fuel system board, in midde right you can see the DALLAS DS1742W-120 (NVRAM and RTC) and next to it a small ATEM EEPROM chip:
And detail view of DALLAS and ATEM Chips:
The ATEM chip is a: ATEM 116 AT2404C PC27, which is a 4K (512 x 8), 2.7 - 5.5 Volt EEPROM.
According to flash log the PROM is: 1,476,168 bytes of data.
So can you put this into 4K bits ?
The answer is no... as 4k bits is 512 bytes, which is much much small than reported PROM flash data size.
More Fuel POD/CAC to try to understand CPU Speed configuration
In trying to find out how Fuel CPU speed is controlled, have been looking are various POD/CAC command to see what they reveal..
?-192.168.XXX.XXX-L2>power up
re-entering system console mode (001a01 CPU0), <CTRL_T> to escape to L2
Starting PROM Boot process
hubii_link_good: A-brick attached to module 001c01.
HUB at 0x0 attached as widget 0xa
001c01/0xa/xbow_arb: nasid= 0x0 xbow_base= 0x9200000000000000
001c01/0xa/xbow_arb: 622 master is 0xa
Check_master: link 10 is master
hubii_link_good: A-brick attached to module 001c01.
Check_master: link 10 is master
IP35 PROM SGI Version 6.211 built 04:16:18 PM Jan 25, 2008
built for bedrock rev. 1.1 or greater
running in IP34 mode
Running in DDR mode
Local master CPU A revision: f42
PROM length: 0x168648, BSS length: 0xa7a0, flash count: 16
Configured bedrock clock: 200.0 MHz
Status of local IO: 0x1 0x3fc0fff6403
Bedrock Rev: 2, Module: 1 (001c01) from Sys Ctlr
On PROM entry: ERR_EPC=0xc00000001fc02ac0 (0xc00000001fc02ac0)
Configuring memory
Local memory configured: 4096 MB (premium)
*** Warning: System controller debug switches are non-zero (0x10d)
*** Diag level set to None (2)
*** Info level set to verbose
*** Boot stop requested at Global (2)
before reading NICHub NIC: 0x52275dad
SR1 set to 0x0000081698349000
SR0 set to 0x0000000052275dad
Testing/Initializing memory ............... DONE
---
--- This section of diagnostics provide memory location where PROM is copied to
--- this is needed to do some memory snooping to see in RAM configuration data
---
Copying PROM code to memory ............... Copy PROM (0x9000000018000000) to RAM (0x9600000001a00000), len 0x168648
Done
DONE
Skipping secondary cache diags
CPU A switching stack into UALIAS and invalidating D-cache
CPU A switching into node 0 cached RAM
CPU A running cached
Initializing kldir.
Done initializing kldir.
Initializing klconfig.
init_klcfg: nasid 0 start 9600000000030000 size 10000
Done initializing klconfig.
Discovering local IO ...................... Check_master: link 10 is master
Check_master: link 10 is master
DONE
CPU A initialized subnode
Discovering NUMAlink connectivity .........
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 5894 usec
Waiting for peers to complete discovery.... Discovery results:
ENTRY 0: HUB(52275dad)
NASID=-1 Mod=1 Flg=0x9500000 PROM=6.211 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
DONE
No other nodes present; becoming global master
Global master is entry 0, NIC 0x52275dad, /hw/rack/001/bay/01
Global master is /hw/rack/001/bay/01
Global barrier (line 4315)Global barrier passed.
Global barrier (line 4348)Global barrier passed.
Master System Topology Graph (pre-nasid_assign):
ENTRY 0: HUB(52275dad)
NASID=-1 Mod=1 Flg=0x9500000 PROM=6.211 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
Calculating NASIDs
num_routers is 0
Master System Topology Graph:
ENTRY 0: HUB(52275dad)
NASID=0 Mod=1 Flg=0x9500000 PROM=6.211 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
Distributing routing tables
Distributing NASIDs
*** NASID assigned to 0
CPU A switching to UALIAS
CPU A running in UALIAS
Changing node ID to 0
Global barrier (line 4823)Global barrier passed.
CPU A Flushing and invalidating caches
Global barrier (line 4928)Global barrier passed.
CPU A switching to node 0 cached RAM
CPU A running cached
Nasids in partition: +0
Regions in partition: +0
Intializing any CPUless nodes.............. Global barrier (line 7714)Global barrier passed.
Global barrier (line 7715)Global barrier passed.
DONE
Global barrier (line 5089)Global barrier passed.
hubii_link_good: A-brick attached to module 001c01.
Checking partitioning information ......... DONE
No other nodes present; becoming partition master
*** After partitioning ***
ENTRY 0: HUB(52275dad)
NASID=0 Mod=1 Flg=0x9500000 PROM=6.211 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: FE
Erecting partition fences ................ DONE
Update config for routers connected to hubs
Update config for hubs and hubless routers
CPU A flushing cache
check_router_cfg: nasid 0 is_voyager 0 check_cfg = 0
Global barrier (line 5300)Global barrier passed.
Nasids in partition: +0
Regions in partition: +0
A 000 001c01:
A 000 001c01: *** Entering POD mode on node 0
---
--- Now lets look at the "ip27conf" are where has:
--- CPU Speed
--- HUB Speed
---
--- First check "asci marker" with: la
--- Then dump bytes with: lb
---
A 000 001c01: POD SysCt Cac> la 0x9600000001a00068 8
A 000 001c01: 9600000001a00068: i p 2 7 c o n f
A 000 001c01: POD SysCt Cac> lb 0x9600000001a00068 32
A 000 001c01: 9600000001a00068: 69 70 32 37 63 6f 6e 66 00 00 00 00 2f af 08 00
A 000 001c01: 9600000001a00078: 00 00 00 00 0b eb c2 00 00 00 00 00 00 13 12 d0
A 000 001c01: POD SysCt Cac>
And an example of L1 debug flag impact on Fuel boot
Here is log of booting Fuel with debug = 0x10d & debug = 0x7890. This is log captured from L2 emulator...
---
--- Ok startup with shortest boot possible .. debug == 0x7890
---
001a01:
debug switches set to 0x7890
?-XXX.XXX.XXX.143-L2>l1 power up
?-XXX.XXX.XXX.143-L2>
entering system console mode (001a01 CPU0), <CTRL_T> to escape to L2
*** DIP switch 15 set. Will skip IO and NUMAlink discovery
IP35 PROM SGI Version 6.211 built 04:16:18 PM Jan 25, 2008
Running in DDR mode
*** Warning: System controller debug switches are non-zero (0x7890)
*** Boot stop requested at Local (1)
*** Giving up global master status
Testing/Initializing memory ............... DONE
Copying PROM code to memory ............... DONE
Discovering NUMAlink connectivity .........
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 5886 usec
Waiting for peers to complete discovery.... DONE
No other nodes present; becoming global master
Global master is /hw/rack/001/bay/01
Intializing any CPUless nodes.............. DONE
Checking partitioning information ......... DONE
No other nodes present; becoming partition master
Suppressing error state display (system just powered on).
A 000 001c01:
A 000 001c01: *** Entering POD mode on node 0
A 000 001c01: POD SysCt Cac>
---
--- Ok now escape back to L2 and set the debug to 0x10d and
--- reboot the Fuel (via POD/CAC "reset")
--- This result in much more complete boot process and hence lots
--- more diagnostic output
---
escaping to L2 system controller
?-XXX.XXX.XXX.143-L2>debug 0x10d
001a01:
debug switches set to 0x010d
re-entering system console mode (001a01 CPU0), <CTRL_T> to escape to L2
A 000 001c01: POD SysCt Cac> reset
A 000 001c01: Resetting the system...
Starting PROM Boot process
hubii_link_good: A-brick attached to module 001c01.
HUB at 0x0 attached as widget 0xa
001c01/0xa/xbow_arb: nasid= 0x0 xbow_base= 0x9200000000000000
001c01/0xa/xbow_arb: 622 master is 0xa
Check_master: link 10 is master
hubii_link_good: A-brick attached to module 001c01.
Check_master: link 10 is master
IP35 PROM SGI Version 6.211 built 04:16:18 PM Jan 25, 2008
built for bedrock rev. 1.1 or greater
running in IP34 mode
Running in DDR mode
Local master CPU A revision: f42
PROM length: 0x168648, BSS length: 0xa7a0, flash count: 16
Configured bedrock clock: 200.0 MHz
Status of local IO: 0x1 0x3fc0fff6403
Bedrock Rev: 2, Module: 1 (001c01) from Sys Ctlr
On PROM entry: ERR_EPC=0xffffffffbfc00300 (0xc00000001fc00300)
Configuring memory
Local memory configured: 4096 MB (premium)
*** Warning: System controller debug switches are non-zero (0x10d)
*** Diag level set to None (2)
*** Info level set to verbose
*** Boot stop requested at Global (2)
before reading NICHub NIC: 0x52275dad
SR1 set to 0x0000081698349000
SR0 set to 0x0000000052275dad
Testing/Initializing memory ............... DONE
Copying PROM code to memory ............... Copy PROM (0x9000000018000000) to RAM (0x9600000001a00000), len 0x168648
Done
DONE
Skipping secondary cache diags
CPU A switching stack into UALIAS and invalidating D-cache
CPU A switching into node 0 cached RAM
CPU A running cached
Initializing kldir.
Done initializing kldir.
Initializing klconfig.
init_klcfg: nasid 0 start 9600000000030000 size 10000
Done initializing klconfig.
Discovering local IO ...................... Check_master: link 10 is master
Check_master: link 10 is master
DONE
CPU A initialized subnode
Discovering NUMAlink connectivity .........
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 5889 usec
Waiting for peers to complete discovery.... Discovery results:
ENTRY 0: HUB(52275dad)
NASID=-1 Mod=1 Flg=0x9500000 PROM=6.211 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
DONE
No other nodes present; becoming global master
Global master is entry 0, NIC 0x52275dad, /hw/rack/001/bay/01
Global master is /hw/rack/001/bay/01
Global barrier (line 4315)Global barrier passed.
Global barrier (line 4348)Global barrier passed.
Master System Topology Graph (pre-nasid_assign):
ENTRY 0: HUB(52275dad)
NASID=-1 Mod=1 Flg=0x9500000 PROM=6.211 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
Calculating NASIDs
num_routers is 0
Master System Topology Graph:
ENTRY 0: HUB(52275dad)
NASID=0 Mod=1 Flg=0x9500000 PROM=6.211 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
Distributing routing tables
Distributing NASIDs
*** NASID assigned to 0
CPU A switching to UALIAS
CPU A running in UALIAS
Changing node ID to 0
Global barrier (line 4823)Global barrier passed.
CPU A Flushing and invalidating caches
Global barrier (line 4928)Global barrier passed.
CPU A switching to node 0 cached RAM
CPU A running cached
Nasids in partition: +0
Regions in partition: +0
Intializing any CPUless nodes.............. Global barrier (line 7714)Global barrier passed.
Global barrier (line 7715)Global barrier passed.
DONE
Global barrier (line 5089)Global barrier passed.
hubii_link_good: A-brick attached to module 001c01.
Checking partitioning information ......... DONE
No other nodes present; becoming partition master
*** After partitioning ***
ENTRY 0: HUB(52275dad)
NASID=0 Mod=1 Flg=0x9500000 PROM=6.211 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: FE
Erecting partition fences ................ DONE
Update config for routers connected to hubs
Update config for hubs and hubless routers
CPU A flushing cache
check_router_cfg: nasid 0 is_voyager 0 check_cfg = 0
Global barrier (line 5300)Global barrier passed.
Nasids in partition: +0
Regions in partition: +0
A 000 001c01:
A 000 001c01: *** Entering POD mode on node 0
A 000 001c01: POD SysCt Cac>
Onyx 350 Console Boot Log
Sometimes you need to do console only boot of graphics worksation (such as Onyx 350). To do this:
- Take out keyboard and mouse
- Power up
- Get to System Maintance Menu (may need to press "ESC" on startup
- Go into Command Prompt from "Systems Maintenance Menu"
- Then do: "setenv console d" and boot into "single" mode
Here is log trace of console boot via L2:
$ telnet L2HOST
Trying 192.168.XXX.140...
Connected to L2HOST.in.DOMAIN.com.
Escape character is '^]'.
Linux 2.4.7-sgil2 (192.168.XXX.140) (ttyp0)
SGI L2 Controller
INFO: connection established to localhost, to quit enter <ctrl-]> <q>
MX00XXXX-001-L2>config
L2 192.168.XXX.140: - 0001 (LOCAL)
L1 192.168.XXX.140:0:0 - 001r06
MX00XXXX-001-L2>INFO: opened USB device at b2;p2/5;d4 (/dev/sgil1_1)
INFO: opened USB device at b2;p2/2;d5 (/dev/sgil1_2)
INFO: opened USB device at b1;p2;d2 (/dev/sgil1_3)
INFO: opened USB device at b1;p1;d3 (/dev/sgil1_4)
INFO: opened USB device at b2;p2/3;d6 (/dev/sgil1_5)
INFO: opened USB device at b2;p2/4;d7 (/dev/sgil1_6)
MX00XXXX-001-L2>config
L2 192.168.XXX.140: - 0001 (LOCAL)
L1 192.168.XXX.140:4:0 - 001c07
L1 192.168.XXX.140:0:0 - 001r06
L1 192.168.XXX.140:3:0 - 001c05
L1 192.168.XXX.140:2:0 - 001c04
L1 192.168.XXX.140:1:0 - 001c03
L1 192.168.XXX.140:5:0 - 001c02
L1 192.168.XXX.140:6:0 - 001c01
MX00XXXX-001-L2>power up
MX00XXXX-001-L2>
entering system console mode (001c01 CPU0), <CTRL_T> to escape to L2
Starting PROM Boot process
Starting PROM Boot process
Starting PROM Boot process
Starting PROM Boot process
Starting PROM Boot process
Starting PROM Boot process
IP35 PROM SGI Version 6.211 built 04:16:18 PM Jan 25, 2008
IP35 PROM SGI Version 6.211 built 04:16:18 PM Jan 25, 2008
IP35 PROM SGI Version 6.211 built 04:16:18 PM Jan 25, 2008
IP35 PROM SGI Version 6.211 built 04:16:18 PM Jan 25, 2008
IP35 PROM SGI Version 6.211 built 04:16:18 PM Jan 25, 2008
IP35 PROM SGI Version 6.211 built 04:16:18 PM Jan 25, 2008
Testing/Initializing memory ........Testing/Initializing memory ........Testing/Initializing memory ........Testing/Initializing memory ........Testing/Initializing memory ........Testing/Initializing memory .................................................. DONE
Copying PROM code to memory ............... DONE
Copying PROM code to memory ............... DONE
DONE
DONE
Copying PROM code to memory ............... DONE
Copying PROM code to memory ............... DONE
Copying PROM code to memory ............... DONE
Copying PROM code to memory ............... DONE
DONE
DONE
DONE
Discovering local IO ...................... DONE
Discovering NUMAlink connectivity ......... Discovering local IO ...................... DONE
Found 7 objects (6 hubs, 1 routers) in 138692 usec
Waiting for peers to complete discovery.... DONE
Discovering NUMAlink connectivity ......... DONE
Found 7 objects (6 hubs, 1 routers) in 52050 usec
Waiting for peers to complete discovery.... Discovering local IO ...................... Discovering local IO ...................... Discovering local IO ...................... DONE
Discovering NUMAlink connectivity ......... DONE
Found 7 objects (6 hubs, 1 routers) in 52051 usec
Waiting for peers to complete discovery.... DONE
Discovering NUMAlink connectivity ......... DONE
Found 7 objects (6 hubs, 1 routers) in 52052 usec
Waiting for peers to complete discovery.... Discovering local IO ......................
pcibus_sanity: WARNING ** Target IOC3 not initialized . . forcing initialization **
DONE
Discovering NUMAlink connectivity ......... DONE
Found 7 objects (6 hubs, 1 routers) in 52050 usec
Waiting for peers to complete discovery.... DONE
Discovering NUMAlink connectivity ......... DONE
Found 7 objects (6 hubs, 1 routers) in 52049 usec
Waiting for peers to complete discovery.... DONE
Global master is /hw/rack/001/bay/01
DONE
Global master is /hw/rack/001/bay/01
DONE
Global master is /hw/rack/001/bay/01
DONE
Global master is /hw/rack/001/bay/01
DONE
Global master is /hw/rack/001/bay/01
DONE
Global master is /hw/rack/001/bay/01
Intializing any CPUless nodes.............. DONE
Checking partitioning information ......... Checking partitioning information ......... Checking partitioning information ......... Checking partitioning information ......... Checking partitioning information ......... Checking partitioning information ......... DONE
DONE
DONE
DONE
DONE
DONE
Local slave entering slave loop
Local slave entering slave loop
Local slave entering slave loop
Local slave entering slave loop
Local slave entering slave loop
Local slave entering slave loop
Local slave entering slave loop
Local slave entering slave loop
Local master entering slave loop
Local slave entering slave loop
Local slave entering slave loop
Local slave entering slave loop
Local slave entering slave loop
Local master entering slave loop
Local slave entering slave loop
Local master entering slave loop
Local master entering slave loop
Loading BASEIO prom ....................... Local master entering slave loop
Local slave entering slave loop
Local slave entering slave loop
Local slave entering slave loop
Local slave entering slave loop
Local slave entering slave loop
DONE
BASEIO PROM Monitor SGI Version 6.211 built 04:15:20 PM Jan 25, 2008 (BE64)
24 CPUs on 6 nodes found.
Automatic update of PROM environment disabled
Graphics diagnostics
Odyssey board #0 found on nasid 3
Running Odyssey xtalk sanity diag...
Board version 1 - Buzz revision 3B
On board sdram size: 128 Mb
Cas latency: CAS 3
4 banks by sdram module
Running Odyssey Buzz registers diag...
Device passed diagnostics
Installing PROM Device drivers ............
On-board (IO9) tigon3 1000BaseT interface
Base I/O Ethernet set to /dev/ethernet/tg0
Installing Graphics Console...
graphics install: searching for pipe 0
Probing IOC4 ATA adapter 2
IOC4 RevId = 83
Detected Vendor id/Product MATSHITA DVD-ROM SR-8178
Walking SCSI Adapter 0, (pci id 3)
1+ Device Vendor Product: ATA SCSIDE BRIDGE320
2- 3- 4- 5- 6- 7- 8- 9- 10- 11- 12- 13- 14- 15- = 1 device(s)
Walking SCSI Adapter 1, (pci id 3)
1- 2- 3- 4- 5- 6- 7- 8- 9- 10- 11- 12- 13- 14- 15- = 0 device(s)
Initializing PROM Device drivers ..........
Initializing Base I/O Ethernet Interface...Failed. MII Status Register = 0x7949
Done.
---------------Interface Configuration Summary----------------
ASIC|Revision|MAC Address : 5701|B5|08:00:69:11:e9:d0
Link Negotiation|Advertisement : On|<H10 F10 H100 F100 F1000>
Link|Speed|Duplex|Rx/Tx FlowCtrl: Down|10|Half|Off/Off
--------------------------------------------------------------
DONE
Cannot connect to keyboard -- check the cable.
Cannot open /dev/input/ioc4pckm0 for input
Cannot connect to keyboard -- check the cable.
Cannot open /dev/input/ioc4pckm0 for input
Checking hardware inventory ............... DONE
**** System Configuration and Diagnostics Summary ****
CONFIG:
No. of NODEs enabled = 6
No. of NODEs disabled = 0
No. of CPUs enabled = 24
No. of CPUs disabled = 0
Mem enabled = 49152 MB
Mem disabled = 0 MB
No. of RTRs enabled = 1
No. of RTRs disabled = 0
DIAG RESULTS:
ALL DIAGS PASSED.
**** End System Configuration and Diagnostics Summary ****
System Maintenance Menu
1) Start System
2) Install System Software
3) Run Diagnostics
4) Recover System
5) Enter Command Monitor
Option? 5
Command Monitor. Type "exit" to return to the menu.
>> hinv
System SGI-IP35
8 800 MHz IP35 Processors
16 1.0 GHz IP35 Processors
Main memory size: 49152 Mbytes
Graphics Controller
PCI IOC4: in slot 1, (adapter 0)
PCI IOC4: in slot 1
PCI Gigabit Ethernet (tigon3) Controller 2
PCI IOC4: in slot 1
PCI Gigabit Ethernet (tigon3) Controller 3
PCI IOC4: in slot 1
USB (OHCI interface)
USB (OHCI interface)
PCI Gigabit Ethernet (tigon3) Controller 4
PCI IOC4: in slot 1
PCI Gigabit Ethernet (tigon3) Controller 5
PCI IOC4: in slot 1
USB (OHCI interface)
USB (OHCI interface)
PCI Gigabit Ethernet (tigon3) Controller 6
PCI Gigabit Ethernet (tigon3) Controller 1
Integral SCSI controller 2: Version IOC4 ATA
CDROM: unit 0 on SCSI Controller 2, (cdrom(2,0,7))
Integral SCSI controller 0: Version Qlogic 12160
Disk drive: unit 1 on SCSI Controller 0, (dksc(0,1,0))
Integral SCSI controller 1: Version Qlogic 12160
>> ls
dksc(0,1,8)/:
sgilabel symmon sash
dksc(0,1,0)/:
. .. root JET CLONE BLUE .desktop-O350HOST.in.DOMAIN.com CDROM floppy
nsmail TT_DB INSTALL dumpster Desktop hosts proc lib64 lib32 stand
sbin opt ns lib dev usr var etc hw bin .sh_history debug
.Sgiresources fonts.scale fonts.dir unix tmp
>> ls dksc(0,1,0)/stand
dksc(0,1,0)/stand:
. .. fx
>> setenv console d
>> single
Starting up the system in single user mode...
Loading dksc(0,1,8)/sash: 896+111764+16853+3848 entry: 0xa8000006012a6ee4
6978191+1541488+1205840 entry: 0xa800000600041b10
IRIX Release 6.5 IP35 Version 07202013 System V - 64 Bit
Copyright 1987-2006 Silicon Graphics, Inc.
All Rights Reserved.
mem_alloc: pagesize is 1048576
Inside mem_alloc_init, total pages is 200
mem_alloc: path to MA device is /hw/mem_alloc
mem_alloc: path /hw/mem_alloc added
mem_alloc: name of MA device 0 is 0
mem_alloc: device 0 added under /hw/mem_alloc
mem_alloc: 0 pgs allocated, each 1048576 bytes
priority_lists initialized
Returning 0 from mem_alloc_init
Scanning FireWire bus /hw/module/001c03/IXbrick/xtalk/15/pci-x/1/2/ohci/0 (1 node)
FireWire Node [0]: APPLE COMPUTER INC., iSight
NOTICE: /hw/module/001c02/IXbrick/xtalk/15/pci-x/1/1a/scsi_ctlr/0: 949X fibre channel firmware version 1.3.24.0
NOTICE: /hw/module/001c02/IXbrick/xtalk/15/pci-x/1/1b/scsi_ctlr/0: 949X fibre channel firmware version 1.3.24.0
NOTICE: /hw/module/001c07/IXbrick/xtalk/15/pci-x/1/2/scsi_ctlr/0: 1068 SAS/SATA firmware version 1.33.0.0
NOTICE: /hw/module/001c05/IXbrick/xtalk/15/pci-x/1/2/scsi_ctlr/0: 1068 SAS/SATA firmware version 1.33.0.0
Selecting IO9 baseio
NOTICE: Starting failsoftd
NOTICE: 10 Gigabit Ethernet: xg1, module 001c02, 100 MHz PCIX bus 2 slot 2
dksc9d2vol: Device not ready, spinning up
dksc9d2vol: Device spun up successfully
xvm is processing the failover configuration
NOTICE: XVM mirrors disabled
NOTICE: XVM snapshot disabled
xvminit complete
INIT: SINGLE USER MODE
Type Ctrl-d to proceed with normal startup,
(or give root password for Single User Mode):
Entering Single User Mode
TERM = (iris-tp)
# ls
Desktop
WIN95-root.hdf
dmconf-err-01.txt
dmconf-err-02.txt
dmconf-err-03.txt
dmconf-succcess-01.txt
dmconf-succcess-02.txt
gfxinfo-test-01.txt
gfxinfo-v-01.txt
maya
nvram-01.txt
src
test.sum
xtdigvid_confidence.log1.18-01-22_00.24
#
NOTE: That IP35 (Chimera) PROM reads both volume header on "dksc(0,1,8)" , which has "standalone shell" - sash, and the root XFS file-system on "dksc(0,1,0)", which has Unix kernel - unix.
That is that.
That all folks .....
NOTE: An O350 Chimera machine with Graphics reports as a "ChiBlade", hence the swords graphics, which is from: "The Complete History Of The Japanese Samurai Sword".