Unexpected Site Outage Brief | March 31st - April 1st

  • HTML tutorial

Michael

Rank IX
Staff member
Founder 500
Member

Benefactor

15,584
Auburn, California, United States
First Name
Michael
Last Name
Murguia
Member #

0000

Ham/GMRS Callsign
KM6YSL
The Overland Bound site experienced an unexpected outage related to Maintenance March 31st - April 1st.

The outage occurred as a result of Amazon Web Services terminating service after considering some activity related to our access keys being malicious. It was not.

In the background, we are doing major site updates and these require us to reconfigure backend services at AWS. This led to termination of services.

What ensued was a 2 day process of proving the activity was not malicious in order to restore services.

The response by AWS, without notice, seems extreme to anyone working to resolve the issue. The subsequent support process was extremely inefficient given the severity of the impact to our community

We thank you for your patience during this unexpected outage.

Michael
 
Last edited:

El-Dracho

Ambassador, Europe
Moderator
Member
Supporter
Investor

Educator III

13,874
Lampertheim, Germany
First Name
Bjoern
Last Name
Eldracher
Member #

20111

Ham/GMRS Callsign
DO3BE
Thank you for the update @Michael Good to see the site up and running again. Thanks to you and the team for the good work to make it happen.
 
  • Like
Reactions: Michael

highboy4x4

Rank V
Member

Enthusiast III

1,872
Naples, FL, USA
First Name
Russ
Last Name
Derr
Member #

32418

Service Branch
Army (ret)
An egregious knee jerk reaction that severely impacts a person's trust in Amazon's services and support.
My two cents and not trying to be obtuse, but in the Cyber Security realm in which I came from, this action, based on what little info we were given, is an inconvenience yes, but on par with security protocols. Our CS decisions are always difficult as the fine line between security and integrity of the enclave and user experience is at the forefront. And sometimes users suffer. In this day and age, it’s a fact of life.
I will say that it is “seems” sketchy to make one pay for a value added service during an outage. Too many variables to blame AWS outright. Only the team involved in the fix would know the particulars.
(For more info on Cyber Security(CS) look up MCSE+, CCNP, CISSP and CISM certs)

If you want change from my two cents, please send your Zelle info and I will send you a penny!!
 
  • Like
Reactions: Michael

lolzhax

Rank III
Member

Enthusiast III

740
Roseville, CA, USA
First Name
Eric
Last Name
Walley
Member #

26397

My two cents and not trying to be obtuse, but in the Cyber Security realm in which I came from, this action, based on what little info we were given, is an inconvenience yes, but on par with security protocols. Our CS decisions are always difficult as the fine line between security and integrity of the enclave and user experience is at the forefront. And sometimes users suffer. In this day and age, it’s a fact of life.
I will say that it is “seems” sketchy to make one pay for a value added service during an outage. Too many variables to blame AWS outright. Only the team involved in the fix would know the particulars.
(For more info on Cyber Security(CS) look up MCSE+, CCNP, CISSP and CISM certs)

If you want change from my two cents, please send your Zelle info and I will send you a penny!!
I also work in network and software security for a large cloud software corp and a business impacting outage is a severity 1 case with a 4 hour response time, and people on standby to provide hourly updates to the client.

It's simple to match logs to actions taken by the customer. 24 hours of downtime due to a simple misunderstanding is unacceptable in 2025.
 

highboy4x4

Rank V
Member

Enthusiast III

1,872
Naples, FL, USA
First Name
Russ
Last Name
Derr
Member #

32418

Service Branch
Army (ret)
My two cents and not trying to be obtuse, but in the Cyber Security realm in which I came from, this action, based on what little info we were given, is an inconvenience yes, but on par with security protocols. Our CS decisions are always difficult as the fine line between security and integrity of the enclave and user experience is at the forefront. And sometimes users suffer. In this day and age, it’s a fact of life.
I will say that it is “seems” sketchy to make one pay for a value added service during an outage. Too many variables to blame AWS outright. Only the team involved in the fix would know the particulars.
(For more info on Cyber Security(CS) look up MCSE+, CCNP, CISSP and CISM certs)

If you want change from my two cents, please send your Zelle info and I will send you a penny!!
I also work in network and software security for a large cloud software corp and a business impacting outage is a severity 1 case with a 4 hour response time, and people on standby to provide hourly updates to the client.

It's simple to match logs to actions taken by the customer. 24 hours of downtime due to a simple misunderstanding is unacceptable in 2025.
I’m with ya in principle all day long but my point was from the outside looking in. I have learned not to jump before looking at root causes. Only Mike and his team know the finer points. And if that includes you, send me your Zelle info!
 
  • Haha
  • Like
Reactions: Michael and lolzhax

Michael

Rank IX
Staff member
Founder 500
Member

Benefactor

15,584
Auburn, California, United States
First Name
Michael
Last Name
Murguia
Member #

0000

Ham/GMRS Callsign
KM6YSL
I also work in network and software security for a large cloud software corp and a business impacting outage is a severity 1 case with a 4 hour response time, and people on standby to provide hourly updates to the client.

It's simple to match logs to actions taken by the customer. 24 hours of downtime due to a simple misunderstanding is unacceptable in 2025.
AGREE!
 

Michael

Rank IX
Staff member
Founder 500
Member

Benefactor

15,584
Auburn, California, United States
First Name
Michael
Last Name
Murguia
Member #

0000

Ham/GMRS Callsign
KM6YSL
My two cents and not trying to be obtuse, but in the Cyber Security realm in which I came from, this action, based on what little info we were given, is an inconvenience yes, but on par with security protocols. Our CS decisions are always difficult as the fine line between security and integrity of the enclave and user experience is at the forefront. And sometimes users suffer. In this day and age, it’s a fact of life.
I will say that it is “seems” sketchy to make one pay for a value added service during an outage. Too many variables to blame AWS outright. Only the team involved in the fix would know the particulars.
(For more info on Cyber Security(CS) look up MCSE+, CCNP, CISSP and CISM certs)

If you want change from my two cents, please send your Zelle info and I will send you a penny!!
Ya agree about payment for support. Removed my comment because personal feelings and all, but that was salt in the wound!
 
  • Like
Reactions: highboy4x4