AT&T Network Down

Van Living Forum

Help Support Van Living Forum:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.

INTJohn

Well-known member
Supporting Member
Joined
Feb 17, 2017
Messages
972
Reaction score
1,023
I woke this morning to no Wi-Fi connection; just the SOS Emergency Contact signal. It worked fine when I went to sleep.

Immediately wondered is it on my end; my fone messed up or is it on ATT’s end. Happened to remember where there was an ATT dealer here in Ormond Beach and drove there to see a small crowd waiting at the door for the place to open.

This was my clue that the problem wasn’t on my end. 😂 and an employee came outside to officially tell status of the issues. I was like cool so now just wait till ATT gets the thing fixed. Sorta like waiting for an electrical power outage from a storm to be repaired.

My connection was restored sometime around 11:30 am but apparently about 25% of network is still down. I’m sure there was massive repercussions for many but it had very little impact on me personally as Wi-Fi cell fone stuff is not a necessary part of my life. Mostly recreational.
Just my morning’s experience
INTJohn
 
I woke this morning to no Wi-Fi connection; just the SOS Emergency Contact signal. It worked fine when I went to sleep.

Immediately wondered is it on my end; my fone messed up or is it on ATT’s end. Happened to remember where there was an ATT dealer here in Ormond Beach and drove there to see a small crowd waiting at the door for the place to open.

This was my clue that the problem wasn’t on my end. 😂 and an employee came outside to officially tell status of the issues. I was like cool so now just wait till ATT gets the thing fixed. Sorta like waiting for an electrical power outage from a storm to be repaired.

My connection was restored sometime around 11:30 am but apparently about 25% of network is still down. I’m sure there was massive repercussions for many but it had very little impact on me personally as Wi-Fi cell fone stuff is not a necessary part of my life. Mostly recreational.
Just my morning’s experience
INTJohn
My daughter called to see if I use AT&T. She said on the news they said AT&T customers knew about the downtime for maintenance ahead of time. No? Or... you missed the memo? LOL
 
A couple of years ago a software update was done on the Rodgers communication system in Canada which resulted in a complete internet shutdown in most of British Columbia and Alberta. This meant that no one could get cash from an ATM or use their CC at fuel stations. Their cards couldn’t be verified. Fortunatly I always carry Canadian cash when I go up north. It took about 3 days to get the internet back online.

A lot of RVers heading to Alaska had said “I don’t get Canadian cash when going to Alaska because the CC will work everywhere.” I was able to get gas since I had cash. The others just had to sit at the fuel stations for several days.
 
No idea where that came from. Ma

Ha! That's what I read earlier. A professor said he'd bet on human error.
My working years was as a software engineer for a major software house. I’ve experienced bad upgrades in the past and this had all of the markings of a poor software rollout. Like I said - been there, done that. Sorry for the size of my reply, and it does get a bit detailed from this point.

1). The problem first appeared at 2:00 AM. Most software upgrades are rolled out in the middle of the night because that’s when the fewest number of people are using them.

2). The problem spread internally over their network. Again this is a symptom of a software service deterioration. Once it gets overwhelmed, then services start shutting down.

3) because of that, restarting servers failed. Other servers were overloaded causing connectivity to fail with a denial of service when connecting.

And so it goes. Internally the folks at AT&T probably spent some time trying to correct the issue until around noon they totally shut down and started rolling their systems back. Or at least that’s what I would have done. I suspect that some activity from 2:00 forward caused accounting updates and they were concerned that they would have loss of data by reverting.

You generally get a period of time from the managers to try to fix the problem with the upgrade, but at some point you’re forced to roll back. Rolling back can be a huge problem because you have to restore a lot of data on all of the servers to make a clean break with the effect of the upgrade, and that can take hours. I suspect also that the upgrade required some changes in data structures, and those would have had to be rolled back as well.

The best way to do an upgrade is to have the old system sitting there, but not being used just sitting there. That way if you have to revert, all you have to do is to switch servers back. But in order to do that, you would have had to have a replication system in place between the old and new system so that any data updates made by any users, such as calls made, accounting, etc, on the new system would be replicated to the old, in a format the the old system understood. This would require a significant amount of infrastructure to be in place, so they probably didn’t have that. If that is the case, then I suspect that some of the records from 2:00 forward are lost, and AT&T will come out with a statement on the order of “There will be no charge for any calls made between 2:00 AM and noon.” We’ll see.
 
Last edited:
^^^^ continued from my earlier post…. *** Warning detailed stuff ***. Ignore if not interested…

I’m going to assume that there was a difference in the data structures in the old and new systems (which is often the case, and which I suspect given that it took so long to correct). One way to avoid a slow rollback Is to create a level of virtualization of the presentation of the data to the software code. This is usually done in the IO complex.

If this is done, then it becomes possible to roll out a new application, but version the expected presentation of the data. That way you can delay the actual physical conversion until absolutely necessary. But to even support such virtualization requires that the system was implemented with the idea that the physical representation of the data was not the same as the presentation.

I had to use this approach several years ago when we had a system that could only support 2 terabytes of data and we wanted to go up to the petabyte level. We translated the basic data structure when IO was done and once the user was satisfied that there were no problems, we could start writing the blocks in the new format, thus expanding the scope to the petabyte level.

The reason that we did this is that we wanted to be sure to make it possible to revert from the upgrade in minimal time if there were problems in other parts of the software. But to make this work, you have to assume that it’s possible to have the worst upgrade and not just be optimistic that everything’s going to be ok.
 
Last edited:
Top