2017-09-06

Flock 2017 : Summary

The trip to FLOCK 2017 in Cape Code was a nice excursion where I learned a lot of things. I had not been able to go to the two previous Flocks in Rochester NY or Poland, so had not been up to date with many things. It was very nice to see many people who I had not seen in 2 years and to catch up with many projects which I had heard of and even installed servers for but not much knowledge on the details.

The days were mostly a blur of going to a couple of talks per day, a lot of hallway track items and dealing with a couple of outages which were happening that needed help on. So the following is a shortened summary:

Monday: Day 0

I posted on this earlier. The day was a pretty good one and I got to let someone else drive through Massachusetts traffic.

Tuesday: Day 1

I wanted to make sure I did not sleep through the opening day talks (something I have been known to do), so I got up extra early, had a big breakfast with some guests from Europe, and made it to sit up front. Matthew Miller gave a nice talk on the status of Fedora and was able to show some pretty pictures from data I helped collect. After trying to advertise the EPEL state of the union talk, I then went to do some hallway meetings and talked with kernel, FESCO and various developers about x86_32 support in Fedora. This was to tell the x86 committee at a meeting on 2017-09-06. 
Later in the day, I went to see Tom Callaway give a talk on licenses and the importance of a strong liver when dealing with them. It was interesting to see how far we have come in so many years. I had hoped to then go to the Fedora on Windows subsystem as I have been using Cygwin on Windows for years and wanted to see how this worked  also. However, a work item came up and I was pretty much booked until later in the evening.

Wednesday: Day 2

Today was the EPEL state of the Union talk. I spent the morning working on a blog post about everything I was going to say.. only to do a CNTRL-A backspace at the wrong moment. Goodbye writing. I am going to go over the particulars in a different post. The two talks went pretty well but I am needing to go over the videos to see what I actually said versus what I think I said. After the talks, I got to ride in a Tesla and also play various boardwalk games at a nice retro playplace. I finally went back and crashed for a bit, but woke up with insomnia til 4am. 

Thursday: Day 3

This day was a for the start of it. I was really really tired and almost fell asleep at the Fedora Infrastructure State of the Union talk. I went back to the room at 1300 for a power nap and woke up after 1700. Went to see if anything was still active and had some more hallway talks about EPEL and other architectures. Finally went back to bed at 2200 and slept soundly.

Friday: Day 4

Had a nice breakfast with most of the Fedora Infrastructure team, and then did a fast jog to catch my bus to the airport. The bus ride was supposed to be 90 minutes which would allow me 2 hours to get through security. Sadly, a Friday before Labour day weekend.. does not lead to a 90 minute bus ride. At 3 hours and somewhat, I got to the airport in time to deal with very last minute getting through security and everything else. I got onto the plane before the doors closed, and was able to fly home to be greeted by the last remnants of hurricane Harvey. We only had 40 minutes of rain from it but even as a smidgen of what eastern Texas got it was incredibly heavy rain and hail. Got home and crashed. 

Fedora Project Outage RCA :: DNS Outage 2017-09-06


Early on 2017-09-06, many people attempting to reach fedoraproject.org
found that it had disappeared from the internet. People attempting to
do 'yum/dnf install', browse the website, or other Internet related
activities were getting various error messages that the sites no
longer existed in DNS. Some people had no difficulty and were not
able to duplicate the problem, but anyone who was using a DNS server
that had dnssec checking turned on were unable to get any IP address
lookups related to the site.

The problem was due to a misconfigured record in the registrar's data
about DNS. The previous week, multiple records had been added by the
registrar to the DNS data in the .org. DNS table. The records were the
DNSsec records for fedorapeople.org, fedorahosted.org, and
fedoraproject.org, and the registrar had added them to fedoraproject.org.
versus each to the correct zone. In seeing this, I asked for two of
the records to be removed, and somehow confused which one was to
stay. This meant that the key meant for fedorahosted.org. was left for
fedoraproject.org and the fedoraproject/fedorapeople were removed.

When the registrar updated its .org. data early UTC on 2017-09-06, DNS
servers like Google's 8.8.8.8 dns no longer would show any addresses
inside of Fedora's dns tables. Other dns servers also were no longer
working and people who are on the IETF for DNSsec came into help in
case there was some other problem going on.

After diagnosing the problem, Fedora IT contacted the registrar and
got the correct DNSsec keys added to the master table. This cleaned up
the problems with many DNS servers but some will cache the broken data
for up to the TTL of 24 hours so users were still having problems as
of 2200 UTC 2017-09-06. A temporary fix is to hard code the main proxy
ip address into /etc/hosts, however this can cause problems later if
not removed and the main proxy is down for maintenance.

I would like to thank the members of the IETF dnssec group who took
the time out to help us through this problem. I would also like to
apologize to everyone who had disruption due to this.

2017-08-28

Flock 2017: Day 0

Today (2017-08-28) was the day before the official beginning of Flock 2017 which is being held in Cape Cod, Massachusetts.  This is the first Flock I have been able to go in 2 years so it has been a lot of catchup with old friends.

The day started off pretty well with only the usual planes, trains and automobiles problems. The airport kept having dyslexia problems with sending people to gate C-12 for a flight at C-15, and C-15 for a flight to C-12. The attendants could not correct the problem because the airport runs the consoles. After an hour of calls and people running back and forth, the signs were finally updated 5 minutes before the flight was ready to board. Which then led to the next fun problem for the poor attendant. The plane we were supposed to fly had mechanical difficulties, and the airline had to do a last minute replacement with a slightly smaller plane. This meant that all the seats had to be moved around and new tickets for everyone. 

The plane flight was pretty uneventful, and when I arrived I ran into Zonker Harris who was headed to Walden Pond for a bit of sunbathing. This solved the trains and automobile problems and we took the Interstate and other roads to Cape Cod. The drive was uneventful though it did remind me that Massachusetts is the one state that makes turn signals optional car equipment and car horns extra loud.

Flock is being held at a nice Golf resort in Hyannis. The room I have is on the second floor and it was nice to hear Seagulls in the distance. For dinner I had a cod dinner at the inhouse bar, and tonight I am working on getting my Wednesday Flock presentations better pictures. 

This evening, I am listening to Lynyrd Skynyrd who are playing next door. I expect FreeBird will be the closing song.

2017-06-22

Problems with EPEL and Fedora mirroring: Many Root Cause Analysis

There was a problem with EPEL and Fedora mirrors for the last 24 hours where people getting updates would get various errors like:

Updateinfo file is not valid XML:

The problem was caused by a problem in the compose which output the XML file not as xml but as sqllite. The problem was fixed within a couple of hours on the Fedora side, but it has taken a lot longer to fix further downstream.

  • Some of the Fedora mirror containers were not updating correctly. We use a docker container on each proxy to keep the data fresh. 4? of the 14 proxies said they were updating but seem to not do so. These servers were our main ipv6 servers so people getting updates from these were more affected than other users. 
  • Some mirrors only update 1 or 2 times a day (or even slower). This means that your favourite mirror may keep the data for 12 to 48 hours. 
  • Some client plugins like to peg to a quickest mirror to try and keep downloads fast. While we may tell you that there are 20 mirrors up to date, the plugin will use the one it got stuff fastest from in the past. This means you can end up with going to a 'broken' mirror for a lot longer.
  • Some yum/dnf systems seem to have other options set to keep the bad xml file until it 'ages' out. This means that while an updated xml is there, some systems are still complaining because their box already has it.
The fixes on the Fedora side are to put in better tests to try and see that this does not happen again. The client side fixes are currently to do either one of the following:

  • yum clean all
  • yum clean metadata
Thank you all for your patience on this problem.

2017-06-07

Call for Papers: Flock to Fedora 2017

In summer, an old engineer's fancy turns to writing paper proposals. For it is time for people to submit papers to https://flocktofedora.org/. This year, Flock is being held in Cape Cod Massachusetts from August 29 to September 01. Flock is also focusing on being a 'get-er-done' conference where workshops on getting software problems worked on by many people will have focus. So do you have something you have wanted to get done in Fedora that you needed to have a bunch of people from around the US and Europe to focus on? Put together a short proposal and submit it to https://register.flocktofedora.org/  [Oh and make sure that the people who you need to work with know about it.. and agree that they want to do it also. Surprise is the opposite of consensus.]

The CFP ends on July 15th 2017. Good luck. I am putting in a proposal for a fast moving EPEL workshop. For a more complete post on FLOCK talk/workshop requirements please see http://blog.linuxgrrl.com/2017/06/08/propose-a-talk-for-flock/

2017-05-30

The steam roller of life

Some days it really feels like you are the last man standing as the zombie horde rolls in, and sometimes it feels like people just seem to scream stop at every little thing. However, a lot of times it just looks like this to everyone else:


The security guard is doing his job and is the hero of his own story (in fact has an extra on DVD about his family.) He is trying to get the 'villians' to stop. Austin Powers is the hero in his story because he is just trying to get to the other side of the room to stop Doctor Evil. The vast gulf between the two is just how far apart and how little danger there really is. It is also a story about how avoidable the inevitable crunch at the end is.

  1. The guard could have stood to the left or right and let the steamroller go by. [The guard could have also shot Austin or something else.]
  2. Austin could have 'swerved to the left or right' just a little and missed the guard. [Or he could have gotten out and gotten there faster.]
OK so you are thinking "Yes Captain Obvious that is exactly the humour being shown here.. thank you for breaking it down for us..." The point I am looking at is how often this mirrors our online community problems. Someone is trying to accomplish something, and someone for whatever reason yells stop. (Or someone is meant to keep something stable, and someone is ramming through a new paradigm). Those of us in the moment get caught up in all the energy, and  we forget that to most people outside that all they see is how avoidable the whole confrontation was. 
Sometimes we feel that it is better to get run over by the steamroller than take a step left or right. Sometimes we feel that putting the pedal to the metal on the steamroller is going to make this so much faster, and we can't move it to the right or left for a small change. 

2017-05-24

Canaries in a coal mine (apropos nothing)


[This post is brought to you by Matthew Inman. Reading http://theoatmeal.com/comics/believe made me realize I don't listen enough and Verisatium's https://www.youtube.com/watch?v=UBVV8pch1dM made me realize why thinking is hard. I am writing this to remind myself when I forget and jump on some phrase.]

Various generations ago, part of my family was coal miners and some of their lore was still passed down many many years later. One of those was about the proverbial canary. A lot of people like to think that they are being a canary when they bring up a problem that they believe will cause great harm.. singing louder because they have run out of air.

That isn't what a canary does. The birds in the mines go silent when the air runs out. They may have died or are on the verge of being dead. They got quieter and quieter and what the miners listened for was the lack of noise from birds versus more noise. Of course it is very very hard to hear the birds in the first place in a mine because they aren't quiet places. There is hammering, and shoveling and footsteps echoing down long tubes.. so you might think.. bring more birds.. that just added more distractions and miners would get into fights because the damn birds never shut up. So the birds were few and far between and people would have to check up on the birds every now and then to see if they were still kicking. Safer mines would have some old fellow stay near the bird and if it died/passed out they would begin ringing a bell which could be heard down the hole.

So if analogies were 1:1, the time to worry is not when people are complaining a lot on a mailing list about some change. In fact if everyone complains, then you could interpret that you have too many birds and not enough miners so go ahead. The time to worry would be when things have changed but no one complains. Then you probably really need to look at getting out of the mine (or most likely you will find it is too late).

However analogies are rarely 1:1 or even 1:20. People are not birds, and you should pay attention to when changes cause a lot of consternation. Listen to why the change is causing problems or pain. Take some time to process it, and see what can be done to either alter the change or find a way for the person who is in pain to get out of pain.