4/21からのEC2・RDSの障害のログ

ほっとくとログが流れてしまいそうなので。
http://status.aws.amazon.com/


日本のフォーラムでも一応アナウンスあったけど、うーん…


PDT→JSTは+16:00。たぶん。
時系列はわけわかめ。21日と22日のとこには、同じことが書いてあるのかな?
ツールチップみたいなインターフェースは何とかしてほしい…


日本のフォーラムで和訳がアナウンスされないかなー。

Amazon EC2 (N. Virginia)の最新のとこ (04/24 20:14 JST)

Apr 24, 3:12 AM PDT The pace of recovery has begun to level out for the remaining group of stuck EBS volumes that require a more time-consuming recovery process. We continue to make progress and will provide additional updates on status as we work through the remaining volumes.

Apr 24, 3:12 AM PDT (Apr 24, 19:12 JST)
回復のペースは、リカバリに時間のかかるEBSのグループで横ばいになってきています。
我々は作業を進め、残りのボリュームについての作業状態の追加情報を提供します。

Amazon RDS (N. Virginia)の最新のとこ (04/24 19:57 JST)

3:26 AM PDT The RDS APIs for the affected Availability Zone have now been restored. We will continue monitoring the service very closely, but at this time RDS is operating normally in all Availability Zones for all APIs and restored Database Instances. Recovery is still underway for a small number of Database Instances in the affected Availability Zone. We expect steady progress in restoring access to this subset of Database Instance to continue as EBS volume recovery continues.

Apr 24, 3:26 AM PDT (Apr 24, 19:26 JST)
影響を受けるアベイラビリティゾーンのRDS APIは回復しました。
我々は念入りにサービスのモニタリングを続けますが、現在RDSは、すべてのアベイラビリティゾーンの全APIと復元されたデータベースインスタンスは、正常に動作しています。
影響を受けるアベイラビリティゾーンの少数のインスタンスにおいて、リカバリは回復はまだ進行中です。
EBSボリュームのリカバリが続くように、データベースインスタンスへのアクセスの復元も、着実な進展すると期待しています。

Amazon EC2 (N. Virginia) 障害ログ

今日

Posts from previous days are available below under Status History.

Apr 22, 2:41 AM PDT We continue to make progress in restoring volumes but don't yet have an estimated time of recovery for the remainder of the affected volumes. We will continue to update this status and provide a time frame when available.
6:18 AM PDT We're starting to see more meaningful progress in restoring volumes (many have been restored in the last few hours) and expect this progress to continue over the next few hours. We expect that well reach a point where a minority of these stuck volumes will need to be restored with a more time consuming process, using backups made to S3 yesterday (these will have longer recovery times for the affected volumes). When we get to that point, we'll let folks know. As volumes are restored, they become available to running instances, however they will not be able to be detached until we enable the API commands in the affected Availability Zone.
8:49 AM PDT We continue to see progress in recovering volumes, and have heard many additional customers confirm that they're recovering. Our current estimate is that the majority of volumes will be recovered over the next 5 to 6 hours. As we mentioned in our last post, a smaller number of volumes will require a more time consuming process to recover, and we anticipate that those will take longer to recover. We will continue to keep everyone updated as we have additional information.
2:15 PM PDT In our last post at 8:49am PDT, we said that we anticipated that the majority of volumes "will be recovered over the next 5 to 6 hours." These volumes were recovered by ~1:30pm PDT. We mentioned that a "smaller number of volumes will require a more time consuming process to recover, and we anticipate that those will take longer to recover." We're now starting to work on those. We're also now working to enable customers to be able to launch EBS backed instances and create, delete, attach and detach EBS volumes in the affected Availability Zone. Our current estimate is that this will take 3-4 hours until full access is restored. We will continue to keep everyone updated as we have additional information.
6:27 PM PDT We're continuing to work on restoring the remaining affected volumes. The work we're doing to enable customers to be able to launch EBS backed instances and create, delete, attach and detach EBS volumes in the affected Availability Zone is taking considerably more time than we anticipated. The team is in the midst of troubleshooting a bottleneck in this process and we'll report back when we have more information to share on the timing of this functionality being fully restored.
9:11 PM PDT We wanted to give a more detailed update on the state of our recovery. At this point, we have recovered a large number of the stuck volumes and are in the process of recovering the remainder. We have added significant storage capacity to the cluster, and storage capacity is no longer a bottleneck to recovery. Some portion of these volumes have lost the connection to their instance, and are waiting to be connected before normal operations can resume. In order to re-establish this connection, we need to allow the instances in the affected Availability Zone to access the EC2 control plane service. There are a large number of control plane requests being generated by the system as we re-introduce instances and volumes. The load on our control plane is higher than we anticipated. We are re-introducing these instances slowly in order to moderate the load on the control plane and prevent it from becoming overloaded and affecting other functions. We are currently investigating several avenues to unblock this bottleneck and significantly increase the rate at which we can restore control plane access to volumes and instances-- and move toward a full recovery.

The team has been completely focused on restoring access to all customers, and as such has not yet been able to focus on performing a complete post mortem. Once our customers have been taken care of and are fully back up and running, we will post a detailed account of what happened, along with the corrective actions we are undertaking to ensure this doesn't happen again. Once we have additional information on the progress that is being made, we will post additional updates.
Apr 23, 1:55 AM PDT We are continuing to work on unblocking the bottleneck that is limiting the speed with which we can re-establish connections between volumes and their instances. We will continue to keep everyone updated as we have additional information.
8:54 AM PDT We have made significant progress during the night in manually restoring the remaining stuck volumes, and are continuing to work through the remainder. Additionally we have removed some of the bottlenecks that were preventing us from allowing more instances to re-establish their connection with the stuck volumes, and the majority of those instances and volumes are now connected. We've encountered an additional issue that's preventing the recovery of the remainder of the connections from being established, but are making progress. Once we solve for this bottleneck, we will work on restoring full access for customers to the control plane.
11:54 AM PDT Quick update. We've tried a couple of ideas to remove the bottleneck in opening up the APIs, each time we've learned more but haven't yet solved the problem. We are making progress, but much more slowly than we'd hoped. Right now we're setting up more control plane components that should be capable of working through the backlog of attach/detach state changes for EBS volumes. These are coming online, and we've been seeing progress on the backlog, but it's still too early to tell how much this will accelerate the process for us.

For customers who are still waiting for restoration of the EBS control plane capability in the impacted AZ, or waiting for recovery of the remaining volumes, we understand that no information for hours at a time is difficult for you. We've been operating under the assumption that people prefer us to post only when we have new information. Think enough people have told us that they prefer to hear from us hourly (even if we don't have meaningful new information) that we're going to change our cadence and try to update hourly from here on out.
12:46 PM PDT We have completed setting up the additional control plane components and we are seeing good scaling of the system. We are now processing through the backlog of state changes and customer requests at a very quick rate. Barring any setbacks, we anticipate getting through the remainder of the backlog in the next hour. We will be in a brief hold after that, assessing whether we can proceed with reactivating the APIs.
1:49 PM PDT We've reduced the backlog of outstanding requests significantly and are now holding to assess whether everything looks good to take the next steps toward opening up API access.
2:48 PM PDT We have successfully completed processing the backlog of state changes between our control plane services and the degraded EBS cluster. We are now starting to slowly roll out changes that will re-enable the EBS APIs in the affected zone. Once that happens, requests to attach volumes and detach volumes will begin working. If that goes well, we will open up the zone to the remainder of the EBS APIs, including create volume and create snapshot. In parallel, we are continuing to manually recover the remainder of the stuck volumes. Once API functionality has been restored, we will post that update here.
3:43 PM PDT The API re-enablement is going well. Attach volume and detach volume requests will now work for many of the volumes in the affected zone. We are continuing to work on enabling access to all APIs.
4:42 PM PDT Attach and detach volume requests now work for all volumes that have been recovered. We are still manually recovering volumes in parallel, the APIs will still not work for any volume that has not been recovered yet. We are currently working to enable the ability for customers to create new volumes and snapshots in the affected zone.
5:51 PM PDT We have now fully enabled the create snapshots API in addition to the attach and detach volume APIs. Currently all APIs are enabled except for create volume which we are actively working on. We continue to work on restoring the remaining stuck volumes.
6:56 PM PDT The create volume API is now enabled. At this time all APIs are back up and working normally. We will be monitoring the service very closely, but at this time all EC2 and EBS APIs are operating normally in all Availability Zones. The majority of affected volumes have been recovered, and we are working hard to manually recover the remainder. Please note that if your volume has not yet been recovered, you will still not be able to write to your volume or successfully perform API calls on your volume until we have recovered it.
8:39 PM PDT We continue to see stability in the service and are confident now that that the service is operating normally for all API calls and all restored EBS volumes.

We mentioned before that the process to recover this remaining group of stuck volumes will take longer. We are being extra cautious in this recovery effort. We will continue to update you as we have additional information.
9:58 PM PDT Progress on recovering the remaining stuck volumes is proceeding slower than we anticipated. We are currently looking for ways in which we can speed up the process, and will keep you updated.
11:37 PM PDT The process of recovering the remaining stuck volumes continues to proceed slowly. We will continue to keep you updated as we make additional progress.
Apr 24, 1:38 AM PDT We are continuing to recover remaining stuck EBS volumes in the affected Availability Zone, and the pace of volume recovery is now steadily increasing. We will continue to keep you posted with regular updates.
Apr 24, 3:12 AM PDT The pace of recovery has begun to level out for the remaining group of stuck EBS volumes that require a more time-consuming recovery process. We continue to make progress and will provide additional updates on status as we work through the remaining volumes.

4/22

1:41 AM PDT We are currently investigating latency and error rates with EBS volumes and connectivity issues reaching EC2 instances in the US-EAST-1 region.
2:18 AM PDT We can confirm connectivity errors impacting EC2 instances and increased latencies impacting EBS volumes in multiple availability zones in the US-EAST-1 region. Increased error rates are affecting EBS CreateVolume API calls. We continue to work towards resolution.
2:49 AM PDT We are continuing to see connectivity errors impacting EC2 instances, increased latencies impacting EBS volumes in multiple availability zones in the US-EAST-1 region, and increased error rates affecting EBS CreateVolume API calls. We are also experiencing delayed launches for EBS backed EC2 instances in affected availability zones in the US-EAST-1 region. We continue to work towards resolution.
3:20 AM PDT Delayed EC2 instance launches and EBS API error rates are recovering. We're continuing to work towards full resolution.
4:09 AM PDT EBS volume latency and API errors have recovered in one of the two impacted Availability Zones in US-EAST-1. We are continuing to work to resolve the issues in the second impacted Availability Zone. The errors, which started at 12:55AM PDT, began recovering at 2:55am PDT
5:02 AM PDT Latency has recovered for a portion of the impacted EBS volumes. We are continuing to work to resolve the remaining issues with EBS volume latency and error rates in a single Availability Zone.
6:09 AM PDT EBS API errors and volume latencies in the affected availability zone remain. We are continuing to work towards resolution.
6:59 AM PDT There has been a moderate increase in error rates for CreateVolume. This may impact the launch of new EBS-backed EC2 instances in multiple availability zones in the US-EAST-1 region. Launches of instance store AMIs are currently unaffected. We are continuing to work on resolving this issue.
7:40 AM PDT In addition to the EBS volume latencies, EBS-backed instances in the US-EAST-1 region are failing at a high rate. This is due to a high error rate for creating new volumes in this region.
8:54 AM PDT We'd like to provide additional color on what were working on right now (please note that we always know more and understand issues better after we fully recover and dive deep into the post mortem). A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it's difficult to create new EBS volumes and EBS backed instances. We are working as quickly as possible to add capacity to that one Availability Zone to speed up the re-mirroring, and working to restore the control plane issue. We're starting to see progress on these efforts, but are not there yet. We will continue to provide updates when we have them.
10:26 AM PDT We have made significant progress in stabilizing the affected EBS control plane service. EC2 API calls that do not involve EBS resources in the affected Availability Zone are now seeing significantly reduced failures and latency and are continuing to recover. We have also brought additional capacity online in the affected Availability Zone and stuck EBS volumes (those that were being remirrored) are beginning to recover. We cannot yet estimate when these volumes will be completely recovered, but we will provide an estimate as soon as we have sufficient data to estimate the recovery. We have all available resources working to restore full service functionality as soon as possible. We will continue to provide updates when we have them.
11:09 AM PDT A number of people have asked us for an ETA on when we'll be fully recovered. We deeply understand why this is important and promise to share this information as soon as we have an estimate that we believe is close to accurate. Our high-level ballpark right now is that the ETA is a few hours. We can assure you that all-hands are on deck to recover as quickly as possible. We will update the community as we have more information.
12:30 PM PDT We have observed successful new launches of EBS backed instances for the past 15 minutes in all but one of the availability zones in the US-EAST-1 Region. The team is continuing to work to recover the unavailable EBS volumes as quickly as possible.
1:48 PM PDT A single Availability Zone in the US-EAST-1 Region continues to experience problems launching EBS backed instances or creating volumes. All other Availability Zones are operating normally. Customers with snapshots of their affected volumes can re-launch their volumes and instances in another zone. We recommend customers do not target a specific Availability Zone when launching instances. We have updated our service to avoid placing any instances in the impaired zone for untargeted requests.
6:18 PM PDT Earlier today we shared our high level ETA for a full recovery. At this point, all Availability Zones except one have been functioning normally for the past 5 hours. We have stabilized the remaining Availability Zone, but recovery is taking longer than we originally expected. We have been working hard to add the capacity that will enable us to safely re-mirror the stuck volumes. We expect to incrementally recover stuck volumes over the coming hours, but believe it will likely be several more hours until a significant number of volumes fully recover and customers are able to create new EBS-backed instances in the affected Availability Zone. We will be providing more information here as soon as we have it.

Here are a couple of things that customers can do in the short term to work around these problems. Customers having problems contacting EC2 instances or with instances stuck shutting down/stopping can launch a replacement instance without targeting a specific Availability Zone. If you have EBS volumes stuck detaching/attaching and have taken snapshots, you can create new volumes from snapshots in one of the other Availability Zones. Customers with instances and/or volumes that appear to be unavailable should not try to recover them by rebooting, stopping, or detaching, as these actions will not currently work on resources in the affected zone.
10:58 PM PDT Just a short note to let you know that the team continues to be all-hands on deck trying to add capacity to the affected Availability Zone to re-mirror stuck volumes. It's taking us longer than we anticipated to add capacity to this fleet. When we have an updated ETA or meaningful new update, we will make sure to post it here. But, we can assure you that the team is working this hard and will do so as long as it takes to get this resolved.
2:41 AM PDT We continue to make progress in restoring volumes but don't yet have an estimated time of recovery for the remainder of the affected volumes. We will continue to update this status and provide a time frame when available.
6:18 AM PDT We're starting to see more meaningful progress in restoring volumes (many have been restored in the last few hours) and expect this progress to continue over the next few hours. We expect that well reach a point where a minority of these stuck volumes will need to be restored with a more time consuming process, using backups made to S3 yesterday (these will have longer recovery times for the affected volumes). When we get to that point, we'll let folks know. As volumes are restored, they become available to running instances, however they will not be able to be detached until we enable the API commands in the affected Availability Zone.
8:49 AM PDT We continue to see progress in recovering volumes, and have heard many additional customers confirm that they're recovering. Our current estimate is that the majority of volumes will be recovered over the next 5 to 6 hours. As we mentioned in our last post, a smaller number of volumes will require a more time consuming process to recover, and we anticipate that those will take longer to recover. We will continue to keep everyone updated as we have additional information.
2:15 PM PDT In our last post at 8:49am PDT, we said that we anticipated that the majority of volumes "will be recovered over the next 5 to 6 hours." These volumes were recovered by ~1:30pm PDT. We mentioned that a "smaller number of volumes will require a more time consuming process to recover, and we anticipate that those will take longer to recover." We're now starting to work on those. We're also now working to enable customers to be able to launch EBS backed instances and create, delete, attach and detach EBS volumes in the affected Availability Zone. Our current estimate is that this will take 3-4 hours until full access is restored. We will continue to keep everyone updated as we have additional information.
6:27 PM PDT We're continuing to work on restoring the remaining affected volumes. The work we're doing to enable customers to be able to launch EBS backed instances and create, delete, attach and detach EBS volumes in the affected Availability Zone is taking considerably more time than we anticipated. The team is in the midst of troubleshooting a bottleneck in this process and we'll report back when we have more information to share on the timing of this functionality being fully restored.
9:11 PM PDT We wanted to give a more detailed update on the state of our recovery. At this point, we have recovered a large number of the stuck volumes and are in the process of recovering the remainder. We have added significant storage capacity to the cluster, and storage capacity is no longer a bottleneck to recovery. Some portion of these volumes have lost the connection to their instance, and are waiting to be connected before normal operations can resume. In order to re-establish this connection, we need to allow the instances in the affected Availability Zone to access the EC2 control plane service. There are a large number of control plane requests being generated by the system as we re-introduce instances and volumes. The load on our control plane is higher than we anticipated. We are re-introducing these instances slowly in order to moderate the load on the control plane and prevent it from becoming overloaded and affecting other functions. We are currently investigating several avenues to unblock this bottleneck and significantly increase the rate at which we can restore control plane access to volumes and instances-- and move toward a full recovery.

The team has been completely focused on restoring access to all customers, and as such has not yet been able to focus on performing a complete post mortem. Once our customers have been taken care of and are fully back up and running, we will post a detailed account of what happened, along with the corrective actions we are undertaking to ensure this doesnt happen again. Once we have additional information on the progress that is being made, we will post additional updates.
Apr 23, 1:55 AM PDT We are continuing to work on unblocking the bottleneck that is limiting the speed with which we can re-establish connections between volumes and their instances. We will continue to keep everyone updated as we have additional information.
Apr 24, 1:28 AM PDT We are continuing to recover remaining stuck EBS volumes in the affected Availability Zone, and the pace of volume recovery is now steadily increasing. We will continue to keep you posted with regular updates.

4/21

1:41 AM PDT We are currently investigating latency and error rates with EBS volumes and connectivity issues reaching EC2 instances in the US-EAST-1 region.
2:18 AM PDT We can confirm connectivity errors impacting EC2 instances and increased latencies impacting EBS volumes in multiple availability zones in the US-EAST-1 region. Increased error rates are affecting EBS CreateVolume API calls. We continue to work towards resolution.
2:49 AM PDT We are continuing to see connectivity errors impacting EC2 instances, increased latencies impacting EBS volumes in multiple availability zones in the US-EAST-1 region, and increased error rates affecting EBS CreateVolume API calls. We are also experiencing delayed launches for EBS backed EC2 instances in affected availability zones in the US-EAST-1 region. We continue to work towards resolution.
3:20 AM PDT Delayed EC2 instance launches and EBS API error rates are recovering. We're continuing to work towards full resolution.
4:09 AM PDT EBS volume latency and API errors have recovered in one of the two impacted Availability Zones in US-EAST-1. We are continuing to work to resolve the issues in the second impacted Availability Zone. The errors, which started at 12:55AM PDT, began recovering at 2:55am PDT
5:02 AM PDT Latency has recovered for a portion of the impacted EBS volumes. We are continuing to work to resolve the remaining issues with EBS volume latency and error rates in a single Availability Zone.
6:09 AM PDT EBS API errors and volume latencies in the affected availability zone remain. We are continuing to work towards resolution.
6:59 AM PDT There has been a moderate increase in error rates for CreateVolume. This may impact the launch of new EBS-backed EC2 instances in multiple availability zones in the US-EAST-1 region. Launches of instance store AMIs are currently unaffected. We are continuing to work on resolving this issue.
7:40 AM PDT In addition to the EBS volume latencies, EBS-backed instances in the US-EAST-1 region are failing at a high rate. This is due to a high error rate for creating new volumes in this region.
8:54 AM PDT We'd like to provide additional color on what were working on right now (please note that we always know more and understand issues better after we fully recover and dive deep into the post mortem). A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it's difficult to create new EBS volumes and EBS backed instances. We are working as quickly as possible to add capacity to that one Availability Zone to speed up the re-mirroring, and working to restore the control plane issue. We're starting to see progress on these efforts, but are not there yet. We will continue to provide updates when we have them.
10:26 AM PDT We have made significant progress in stabilizing the affected EBS control plane service. EC2 API calls that do not involve EBS resources in the affected Availability Zone are now seeing significantly reduced failures and latency and are continuing to recover. We have also brought additional capacity online in the affected Availability Zone and stuck EBS volumes (those that were being remirrored) are beginning to recover. We cannot yet estimate when these volumes will be completely recovered, but we will provide an estimate as soon as we have sufficient data to estimate the recovery. We have all available resources working to restore full service functionality as soon as possible. We will continue to provide updates when we have them.
11:09 AM PDT A number of people have asked us for an ETA on when we'll be fully recovered. We deeply understand why this is important and promise to share this information as soon as we have an estimate that we believe is close to accurate. Our high-level ballpark right now is that the ETA is a few hours. We can assure you that all-hands are on deck to recover as quickly as possible. We will update the community as we have more information.
12:30 PM PDT We have observed successful new launches of EBS backed instances for the past 15 minutes in all but one of the availability zones in the US-EAST-1 Region. The team is continuing to work to recover the unavailable EBS volumes as quickly as possible.
1:48 PM PDT A single Availability Zone in the US-EAST-1 Region continues to experience problems launching EBS backed instances or creating volumes. All other Availability Zones are operating normally. Customers with snapshots of their affected volumes can re-launch their volumes and instances in another zone. We recommend customers do not target a specific Availability Zone when launching instances. We have updated our service to avoid placing any instances in the impaired zone for untargeted requests.
6:18 PM PDT Earlier today we shared our high level ETA for a full recovery. At this point, all Availability Zones except one have been functioning normally for the past 5 hours. We have stabilized the remaining Availability Zone, but recovery is taking longer than we originally expected. We have been working hard to add the capacity that will enable us to safely re-mirror the stuck volumes. We expect to incrementally recover stuck volumes over the coming hours, but believe it will likely be several more hours until a significant number of volumes fully recover and customers are able to create new EBS-backed instances in the affected Availability Zone. We will be providing more information here as soon as we have it.

Here are a couple of things that customers can do in the short term to work around these problems. Customers having problems contacting EC2 instances or with instances stuck shutting down/stopping can launch a replacement instance without targeting a specific Availability Zone. If you have EBS volumes stuck detaching/attaching and have taken snapshots, you can create new volumes from snapshots in one of the other Availability Zones. Customers with instances and/or volumes that appear to be unavailable should not try to recover them by rebooting, stopping, or detaching, as these actions will not currently work on resources in the affected zone.
10:58 PM PDT Just a short note to let you know that the team continues to be all-hands on deck trying to add capacity to the affected Availability Zone to re-mirror stuck volumes. It's taking us longer than we anticipated to add capacity to this fleet. When we have an updated ETA or meaningful new update, we will make sure to post it here. But, we can assure you that the team is working this hard and will do so as long as it takes to get this resolved.

Amazon RDS (N. Virginia) 障害ログ

今日

Apr 23, 3:07 AM PDT 1:48 AM PDT We are currently investigating connectivity and latency issues with RDS database instances in the US-EAST-1 region.
Apr 23, 2:16 AM PDT We can confirm connectivity issues impacting RDS database instances across multiple availability zones in the US-EAST-1 region.
Apr 23, 3:05 AM PDT We are continuing to see connectivity issues impacting some RDS database instances in multiple availability zones in the US-EAST-1 region. Some Multi AZ failovers are taking longer than expected. We continue to work towards resolution.
Apr 23, 4:03 AM PDT We are making progress on failovers for Multi AZ instances and restore access to them. This event is also impacting RDS instance creation times in a single Availability Zone. We continue to work towards the resolution.
Apr 23, 5:06 AM PDT IO latency issues have recovered in one of the two impacted Availability Zones in US-EAST-1. We continue to make progress on restoring access and resolving IO latency issues for remaining affected RDS database instances.
Apr 23, 6:29 AM PDT We continue to work on restoring access to the affected Multi AZ instances and resolving the IO latency issues impacting RDS instances in the single availability zone.
Apr 23, 8:12 AM PDT Despite the continued effort from the team to resolve the issue we have not made any meaningful progress for the affected database instances since the last update. Create and Restore requests for RDS database instances are not succeeding in US-EAST-1 region.
Apr 23, 10:35 AM PDT We are making progress on restoring access and IO latencies for affected RDS instances. We recommend that you do not attempt to recover using Reboot or Restore database instance APIs or try to create a new user snapshot for your RDS instance - currently those requests are not being processed.
Apr 23, 2:35 PM PDT We have restored access to the majority of RDS Multi AZ instances and continue to work on the remaining affected instances. A single Availability Zone in the US-EAST-1 region continues to experience problems for launching new RDS database instances. All other Availability Zones are operating normally. Customers with snapshots/backups of their instances in the affected Availability zone can restore them into another zone. We recommend that customers do not target a specific Availability Zone when creating or restoring new RDS database instances. We have updated our service to avoid placing any RDS instances in the impaired zone for untargeted requests.
Apr 23, 11:42 PM PDT In line with the most recent Amazon EC2 update, we wanted to let you know that the team continues to be all-hands on deck working on the remaining database instances in the single affected Availability Zone. It's taking us longer than we anticipated. When we have an updated ETA or meaningful new update, we will make sure to post it here. But, we can assure you that the team is working this hard and will do so as long as it takes to get this resolved.
Apr 23, 7:08 AM PDT In line with the most recent Amazon EC2 update, we are making steady progress in restoring the remaining affected RDS instances. We expect this progress to continue over the next few hours and we'll keep folks posted.
Apr 23, 2:43 PM PDT We are continuing to make progress in restoring access to the remaining affected RDS instances. We expect this progress to continue over the next few hours and we'll keep folks posted.
Apr 23, 12:00 AM PDT We are continuing to work on restoring access to the remaining affected RDS instances. We expect the restoration process to continue over the next several hours and we'll update folks as we have new information.
Apr 23, 8:45 AM PDT We have made significant progress in resolving stuck IO issues and restoring access to RDS database instances and now have the vast majority of them back operational again. We continue to work on restoring access to the small number of remaining affected instances and we'll update folks as we have new information.
Apr 23, 12:54 PM PDT As we mentioned in our last update at 8:45 AM, we now have the vast majority of affected RDS instances back operational again. Since that post, we have continued to work on restoring access to the small number of remaining affected instances.

RDS uses EBS, and as such, our pace of recovery is dependent on EBS's recovery. As mentioned in the most recent EC2 post, EBSs recovery has gone a bit slower than anticipated in the last few hours. This has slowed down RDS recovery as well. We understand how significant this service interruption is for our affected customers and we are working feverishly to address the impact. We'll update folks as we have new information.

Additionally we have heard from customers that you prefer more frequent updates, even if there has been no meaningful progress. We have heard that feedback, and will try to post hourly updates here. Some of these updates will point to EC2's updates (as they continue to recover the rest of EBS volumes), but we'll post nonetheless.
Apr 23, 2:04 PM PDT We continue to work on restoring access to the small number of remaining affected RDS Database Instances. No additional updates to report for this hour.
Apr 23, 3:06 PM PDT Progress is being made on fully re-enabling the EBS APIs for all Availability Zones in the US East Region. As soon as the EBS APIs are fully restored, we plan to enable all RDS APIs for the affected Availability Zone. In addition, we continue to make slow but steady progress in recovering the small number of remaining affected RDS Database Instances. Please note that all RDS APIs are currently available for other Availability Zones in the US East Region.
Apr 23, 3:55 PM PDT EBS API re-enablement is proceeding well and is ongoing. Per the last update, as soon as all EBS APIs are fully restored, we plan to begin restoring RDS APIs for the affected Availability Zone. We continue to make incremental progress in recovering the small number of remaining affected RDS Database Instances in that zone.
Apr 23, 5:04 PM PDT No significant updates to report for this hour. Full EBS API re-enablement is still ongoing. We are waiting for this to complete to fully enable RDS APIs. As before, we continue to make steady progress on restoring access for the remaining RDS Database Instances that are not yet available.
Apr 23, 6:03 PM PDT All but one of the EBS APIs for the affected Availability Zone have now been restored. When full EBS API restoration is complete, we plan to begin enabling the RDS APIs.
Apr 23, 7:10 PM PDT All EBS APIs are now fully restored and operating normally in all Availability Zones of the US East Region. As a result, we are now in the process of restoring all RDS APIs for the affected zone.
Apr 23, 8:00 PM PDT There are no significant updates for this hour. We continue to work on restoring all RDS APIs for the affected Availability Zone now that EBS APIs are fully restored.
Apr 23, 9:05 PM PDT We continue to work on restoring all RDS APIs for the affected Availability Zone now that EBS APIs are fully restored. This is taking longer than anticipated, and we are moving cautiously. We will continue to report our progress on a regular basis.
Apr 23, 10:01 PM PDT We continue to work on restoring all RDS APIs for the affected Availability Zone, as well as recovering remaining affected DB Instances. We are looking for ways to speed up the process.
Apr 23, 11:39 PM PDT We are still hard at work restoring full RDS API functionality for the affected Availability Zone. Progress is slower than anticipated, but we will continue to update you regularly. Please note that all RDS APIs are operating normally for other Availability Zones within the US-East-1 Region.
Apr 24, 1:32 AM PDT Progress in restoring all RDS API functionality for the affected zone is accelerating. We are currently working through a backlog of stuck RDS API workflows, such as DB Instance reboot or backup requests. As this nears completion, we plan to fully re-enable RDS APIs.
3:26 AM PDT The RDS APIs for the affected Availability Zone have now been restored. We will continue monitoring the service very closely, but at this time RDS is operating normally in all Availability Zones for all APIs and restored Database Instances. Recovery is still underway for a small number of Database Instances in the affected Availability Zone. We expect steady progress in restoring access to this subset of Database Instance to continue as EBS volume recovery continues.

4/22

1:48 AM PDT We are currently investigating connectivity and latency issues with RDS database instances in the US-EAST-1 region.
2:16 AM PDT We can confirm connectivity issues impacting RDS database instances across multiple availability zones in the US-EAST-1 region.
3:05 AM PDT We are continuing to see connectivity issues impacting some RDS database instances in multiple availability zones in the US-EAST-1 region. Some Multi AZ failovers are taking longer than expected. We continue to work towards resolution.
4:03 AM PDT We are making progress on failovers for Multi AZ instances and restore access to them. This event is also impacting RDS instance creation times in a single Availability Zone. We continue to work towards the resolution.
5:06 AM PDT IO latency issues have recovered in one of the two impacted Availability Zones in US-EAST-1. We continue to make progress on restoring access and resolving IO latency issues for remaining affected RDS database instances.
6:29 AM PDT We continue to work on restoring access to the affected Multi AZ instances and resolving the IO latency issues impacting RDS instances in the single availability zone.
8:12 AM PDT Despite the continued effort from the team to resolve the issue we have not made any meaningful progress for the affected database instances since the last update. Create and Restore requests for RDS database instances are not succeeding in US-EAST-1 region.
10:35 AM PDT We are making progress on restoring access and IO latencies for affected RDS instances. We recommend that you do not attempt to recover using Reboot or Restore database instance APIs or try to create a new user snapshot for your RDS instance - currently those requests are not being processed.
2:35 PM PDT We have restored access to the majority of RDS Multi AZ instances and continue to work on the remaining affected instances. A single Availability Zone in the US-EAST-1 region continues to experience problems for launching new RDS database instances. All other Availability Zones are operating normally. Customers with snapshots/backups of their instances in the affected Availability zone can restore them into another zone. We recommend that customers do not target a specific Availability Zone when creating or restoring new RDS database instances. We have updated our service to avoid placing any RDS instances in the impaired zone for untargeted requests.
11:42 PM PDT In line with the most recent Amazon EC2 update, we wanted to let you know that the team continues to be all-hands on deck working on the remaining database instances in the single affected Availability Zone. It's taking us longer than we anticipated. When we have an updated ETA or meaningful new update, we will make sure to post it here. But, we can assure you that the team is working this hard and will do so as long as it takes to get this resolved.
7:08 AM PDT In line with the most recent Amazon EC2 update, we are making steady progress in restoring the remaining affected RDS instances. We expect this progress to continue over the next few hours and we'll keep folks posted.
2:43 PM PDT We are continuing to make progress in restoring access to the remaining affected RDS instances. We expect this progress to continue over the next few hours and we'll keep folks posted.
Apr 23, 12:00 AM PDT We are continuing to work on restoring access to the remaining affected RDS instances. We expect the restoration process to continue over the next several hours and we'll update folks as we have new information.

4/21

1:48 AM PDT We are currently investigating connectivity and latency issues with RDS database instances in the US-EAST-1 region.
2:16 AM PDT We can confirm connectivity issues impacting RDS database instances across multiple availability zones in the US-EAST-1 region.
3:05 AM PDT We are continuing to see connectivity issues impacting some RDS database instances in multiple availability zones in the US-EAST-1 region. Some Multi AZ failovers are taking longer than expected. We continue to work towards resolution.
4:03 AM PDT We are making progress on failovers for Multi AZ instances and restore access to them. This event is also impacting RDS instance creation times in a single Availability Zone. We continue to work towards the resolution.
5:06 AM PDT IO latency issues have recovered in one of the two impacted Availability Zones in US-EAST-1. We continue to make progress on restoring access and resolving IO latency issues for remaining affected RDS database instances.
6:29 AM PDT We continue to work on restoring access to the affected Multi AZ instances and resolving the IO latency issues impacting RDS instances in the single availability zone.
8:12 AM PDT Despite the continued effort from the team to resolve the issue we have not made any meaningful progress for the affected database instances since the last update. Create and Restore requests for RDS database instances are not succeeding in US-EAST-1 region.
10:35 AM PDT We are making progress on restoring access and IO latencies for affected RDS instances. We recommend that you do not attempt to recover using Reboot or Restore database instance APIs or try to create a new user snapshot for your RDS instance - currently those requests are not being processed.
2:35 PM PDT We have restored access to the majority of RDS Multi AZ instances and continue to work on the remaining affected instances. A single Availability Zone in the US-EAST-1 region continues to experience problems for launching new RDS database instances. All other Availability Zones are operating normally. Customers with snapshots/backups of their instances in the affected Availability zone can restore them into another zone. We recommend that customers do not target a specific Availability Zone when creating or restoring new RDS database instances. We have updated our service to avoid placing any RDS instances in the impaired zone for untargeted requests.
11:42 PM PDT In line with the most recent Amazon EC2 update, we wanted to let you know that the team continues to be all-hands on deck working on the remaining database instances in the single affected Availability Zone. It's taking us longer than we anticipated. When we have an updated ETA or meaningful new update, we will make sure to post it here. But, we can assure you that the team is working this hard and will do so as long as it takes to get this resolved.