Blog from August, 2011

On Thursday 01 September 2011 between 07:00 and 08:00 local Bremen time servers and network components will be updated.

Server and network operations will be unreliable. Expect unavailable servers and sudden downtimes of services without prior explicit announcement.

If you have to work on files stored on the server during that time, then before the maintenance hour copy the respective files to your local hard disk, work on it locally, and after finishing your work and after the end of the maintenance interval, copy them back onto the server.

For more details on the maintenance hours see the Maintenance Hours page.

Network service interruption

The interruption was caused primarily due to a design failure in the physical setup of the virtual infrastructure hardware.
The four machines operating the VMware vSphere system are redundantly connected to mains, one line is run over a UPS. All machines are connected redundantly to two network switches, which do not have the possibility of a redundant power supply connection, hence are connected to a UPS line.

This morning at around 7:15 a.m. the logging facility of our UPS states that a short outage occurred, which obviously did not do any more harm, apart from a tripped circuit breaker. The design fault was, that both network switches for the virtual infrastructure have been connected to the same breaker downstream of the UPS. As a result the virtual infrastructure was isolated from the outside.

Upon resetting the breaker at about 7:45 a.m. the switches connected the virtual hosts again.

The network switches are now connected to separate circuits to avoid this scenario in the future.

File Server Failure

Due to an unknown error both nodes of the file server cluster are currently unavailable.

Access to \\storage.jacobs.jacobs-university the J:\ and the H:\ drive are currently not possible.

Update 09:15

We opened a support case at Dell with the highest priorisation to support us in solving the problem.

Update 16:15

Operational status has been restored. All resources are available again. No stored data was lost.

We'll have an eye on the system over the weekend.

Sorry for the interruption of service on such an important day!

(info) Trivia: This incident of ~8.5h outage dropped the availability of the file server service down to ~99.96% over the last 2.5 years.

On Thursday 25 August 2011 between 07:00 and 08:00 local Bremen time servers and network components will be updated.

Server and network operations will be unreliable. Expect unavailable servers and sudden downtimes of services without prior explicit announcement.

If you have to work on files stored on the server during that time, then before the maintenance hour copy the respective files to your local hard disk, work on it locally, and after finishing your work and after the end of the maintenance interval, copy them back onto the server.

For more details on the maintenance hours see the Maintenance Hours page.

On Thursday 18 August 2011 between 07:00 and 08:00 local Bremen time servers and network components will be updated.

Server and network operations will be unreliable. Expect unavailable servers and sudden downtimes of services without prior explicit announcement.

If you have to work on files stored on the server during that time, then before the maintenance hour copy the respective files to your local hard disk, work on it locally, and after finishing your work and after the end of the maintenance interval, copy them back onto the server.

For more details on the maintenance hours see the Maintenance Hours page.

On Thursday 11 August 2011 between 07:00 and 08:00 local Bremen time servers and network components will be updated.

Server and network operations will be unreliable. Expect unavailable servers and sudden downtimes of services without prior explicit announcement.

If you have to work on files stored on the server during that time, then before the maintenance hour copy the respective files to your local hard disk, work on it locally, and after finishing your work and after the end of the maintenance interval, copy them back onto the server.

For more details on the maintenance hours see the Maintenance Hours page.

On Thursday 04 August 2011 between 07:00 and 08:00 local Bremen time servers and network components will be updated.

Server and network operations will be unreliable. Expect unavailable servers and sudden downtimes of services without prior explicit announcement.

If you have to work on files stored on the server during that time, then before the maintenance hour copy the respective files to your local hard disk, work on it locally, and after finishing your work and after the end of the maintenance interval, copy them back onto the server.

For more details on the maintenance hours see the Maintenance Hours page.

This is an automatically generated measurement of key performance indicators (KPIs) of Application Provisioning Services for July 2011.

The list of other KPI measurement reports is at KPI Measurements.

KPI Summary Table

KPIs Specified

Not Measured (warning)

Targets Met (plus)

Targets Failed (minus)

13

0

12

1

KPI Measurements

If the specification is met, a (plus) is in the Met column, else a (minus); KPIs not measured have a (warning) there.

SLA

Host

Service

Target

Measured

Met

Remote Login Shell Service

login

SSH

99.000000%

100.0%

(plus)

Teamwork Service

hermia

HTTP

99.000000%

99.998%

(plus)

 

 

HTTPS

99.000000%

97.759%

(minus)

Faculty Web Service

facultyweb

FTP

99.000000%

100.0%

(plus)

 

 

HTTP

99.000000%

100.0%

(plus)

 

 

HTTPS

99.000000%

100.0%

(plus)

SJIRA01

sjira01

HTTP

99.000000%

100.0%

(plus)

 

 

HTTPS

99.000000%

100.0%

(plus)

 

 

SSH

99.000000%

100.0%

(plus)

Alumni Email Server Service

helena

HTTP

99.000000%

100.0%

(plus)

 

 

POP3

99.000000%

100.0%

(plus)

 

 

IMAP

99.000000%

100.0%

(plus)

 

 

SMTP

99.000000%

100.0%

(plus)

Additional Services Information

The following Server Hosting Services are measured as a convenience for the Service Customer to ease service maintenance.
IRC-IT is not responsible for ensuring quality and achieving KPIs of these services.

SLA

Host

Service

Measured

CampusNet Server Service

scnweb01

SNMP

100.0%

 

scnapp01

SNMP

100.0%

 

scnsql01

SNMP

100.0%

WebServer Service

swebsrv01

HTTP

99.949%

 

 

HTTPS

99.954%

 

 

SSH

100.0%

 

swebdev01

HTTP

100.0%

 

 

HTTPS

(not measured)

 

 

SSH

100.0%

 

swebsvn01

HTTP

(not measured)

 

 

HTTPS

(not measured)

 

 

SSH

100.0%

Schomäcker Server Service

sschomkr01

SNMP

100.0%

 

 

MSSQL

100.0%

GiroWeb Server Service

sgiroweb01

SNMP

100.0%

 

 

HTTP

100.0%

 

 

MSSQL

100.0%

Alumni Application Server Service

alumniweb

HTTP

100.0%

 

 

HTTPS

(not measured)

Career Service Center Server Service

csc

HTTP

100.0%

 

 

SSH

100.0%

SLA Counseling Center

scouncil

HTTP

(not measured)

 

 

SSH

(not measured)

ALEA Server Service

alea

HTTP

100.0%

IRC-IR Institutional Repository DSPACE

sdspace

HTTP

100.0%

 

 

HTTPS

(not measured)

 

 

SSH

100.0%

SLA Torrent Seeder Geoscience SOPENDTECT

sopendtect

SSH

100.0%

Graduate Student Association Server SLA

sgsa

HTTP

99.817%

 

 

SSH

99.784%