AGI Safety Reading Group

Learn the fundamentals behind AI alignment and the risks posed by advanced AI.
Applications are now closed.
Express interest in future iterations

Understand the motivations and arguments behind the field of AI safety. The program will cover the concept of AGI, fundamental alignment problems, and possible solutions to them. The course follows a curriculum by BlueDot Impact with input from researchers at OpenAI, Alignment Research Center Evals, and FAR AI. View the full curriculum below.

Each week consists of 2 - 4 hours split between independent readings and a group discussion which are held in person on a university campus or in central London and facilitated by an AI researcher, student, or SAIL organiser that has previous experience with AI safety.

Details

  • We don’t have a specific number of applications we can accept yet although we don’t expect to be able to accept everyone that applies.

  • We don’t plan to run any virtual sessions although we recommend checking out the virtual course by BlueDot Impact. You can also see other local groups here and if you would like help being connected to local or online groups then you can get in touch with us.

  • The course includes some technical machine learning content although there are opportunities to learn the core concepts of ML before the course starts.

  • The discussions sessions will be in person on Imperial, UCL, LSE, or KCL campus, or in a coworking space near to Farringdon, further details will be shared upon acceptance onto the course.

  • Yes! We currently plan to run another in person course starting in October this year. You can express interest in joining future iterations by filling out this form.

  • We run socials, hackathons, and other events. You can see all the events we run here.

  • We are also running a reading group on AI governance which we recommend for anyone interested in governance and policy.

  • Yes, the content is available online here.


Curriculum

  • For those less familiar with machine learning or that want to review the basics, there is the option to review foundational concepts relevant to the rest of the course.

  • Artificial general intelligence (AGI) is the core concept of this course, the first week will explore what we mean by it, and reasons for thinking that the field of machine learning is heading towards it.

  • This week starts off by focusing on reward misspecification: the phenomenon where our default techniques for training ML models often unintentionally assign high rewards to undesirable behaviour.

  • Even without reward misspecification, the rewards used during training don’t allow us to control how agents generalize to new situations. This week we cover the scenarios in which agents in new situations generalize to behaving in competent yet undesirable ways because of learning the wrong goals from previous training: the problem of goal misgeneralisation.

  • This week introduces scalable oversight as an approach to preventing reward misspecification, and discusses one scalable oversight proposal: iterated amplification.

  • This week focuses on two more potential alignment techniques proposed to work at scale: debate and training using unrestricted adversarial examples.

  • Our current methods of training capable neural networks give us little insight into how or why they function. This week we cover the field of interpretability, which aims to change this by developing methods for understanding how neural networks think.

  • This week covers AI governance. As well as the technical approaches covered in the course, we also require strategic and political solutions to manage safe deployment of potential AGI systems, as well as dealing with new questions and risks arising from the existence of AGI.

  • This week gives you a chance to reflect on the content and how you might apply the ideas to your career plans. You will also have the option to complete a project on something related to the content. It’s a chance for you to explore your interests and make a step towards contributing to the field yourself.