List

What we will be learning?

  1. Introduction to Linux and High Performance Computing (HPC)
  2. Basic navigation in Linux
  3. Managing files/folders
  4. File manipulation and editing
  5. Wildcards and permission
  6. Filtering and searching
  7. Piping & Redirection & Process Management
  8. Submitting task/jobs using SLURM
  9. Installing software

What is Linux?

Linux is an open-source operating system similar to Windows and MacOS. Here are some key points about Linux:

  1. Open Source Nature:
    • Linux is developed collaboratively by a global community of contributors. Its source code is freely available, allowing anyone to view, modify, and distribute it.
    • Kernel vs. Distributions:
    • The Linux kernel is the core component responsible for managing hardware resources.
    • Distributions (e.g., Ubuntu, CentOS, Debian) package the kernel with additional software, creating complete operating systems.
  2. Application of Linux in daily life:
    • Linux powers servers, supercomputers, embedded devices, and even Android smartphones. Many web servers, cloud services, and scientific research clusters run on Linux.
  3. General User Interface (GUI) vs Command line (CLI)
    • Linux comes with two interfaces – a command line interface (CLI) and a general user interface (GUI)
    • Example of CLI
  4. Why are we learning CLI?
    • Performance and Efficiency
      • Command-line operations are often faster because it executes commands directly without the overhead of graphical interfaces.
      • CLI tools consume fewer system resources (e.g. CPU, memory) compared to resource-intensive GUI applications.
      • CLI allows control and customization, making it ideal for developers and system administrators
    • Stability and Consistency
      • CLI commands remain consistent across different systems and distributions (e.g., Ubuntu, Fedora, Arch Linux).
    • Software Development and Programming
      • Linux provides native support for popular languages like Python, C/C++, Java, Perl, Ruby, and more
      • Developers find a rich ecosystem of libraries and tools for programming purposes
      • The Linux terminal (Bash) is powerful and versatile.
      • Windows’ command line has a different syntax but macOS also uses Bash as its default shell.
    • Cost and Licensing
      • Linux distributions are free and open source.
      • Windows and macOS often require paid licenses.
      • Linux supports a wide range of open-source software.
    • Software/Hardware Compatibility
      • Runs well on older hardware
  5. How does learning Linux CLI help me in bioinformatics
    • Learning the Linux Command Line Interface (CLI) is immensely beneficial for anyone working in the field of bioinformatics
    • Linux CLI provides powerful tools for managing and analyzing biological data files.
    • You can efficiently manipulate text files, perform data extraction, and process large datasets using commands like grep, sed, and awk.
    • Bioinformatics often involves handling diverse data files (FASTA, SAM/BAM, VCF, etc.). Navigating directories, creating folders, and organizing data become second nature with CLI skills.
    • Writing scripts allows you to automate repetitive tasks. Bioinformatics workflows benefit from scripted processes, ensuring consistency and reproducibility.
    • Many bioinformatics tools are command-line based (e.g., minimap2, freebayes, samtools, bwa etc…). Learning the CLI enables you to use these tools effectively.
    • When things go wrong (and they will!), CLI expertise helps diagnose issues.

What is High Performance Computing (HPC)?

  • HPC refers to the practice of combining computing power to deliver far greater performance than a typical desktop or workstation.
  • It involves using clusters of powerful processors that work in parallel to process massive multi-dimensional data sets (often referred to as “big data”) and solve complex problems at extremely high speeds.
  • There are many HPCs in Qatar, there is one in Sidra Medicine (physical), in QCRI (physical), in HBKU and HMC (physical)
  • At Qatar University, since Summer 2022, we have been given access to use the Microsoft Azure HPC, a comprehensive cloud computing platform. Cloud computing means the servers/machines are not located on-site.
  1. Why can’t I just use my laptop or workstation at my desk?

While it’s possible to perform some bioinformatics analyses on a local machine, there are limitations.

  • Laptops/workstations have limited computational resources (CPU, memory, storage). Complex analyses involving large datasets may be slow or impractical. Some bioinformatics tasks (e.g., genome assembly, large-scale alignment) require substantial computational resources. For example, insufficient RAM can hinder performance.
  • Many bioinformatics tools are designed for parallel or multi-threading processing. HPC clusters or cloud platforms like Azure provide better parallelization capabilities than individual machines.
  • Large-scale bioinformatics projects generate massive amounts of data. Cloud platforms like Azure offer scalable storage solutions, whereas local machines may have limited disk space
  • Collaborative research often involves sharing data and analyses. Cloud-based solutions allow seamless collaboration and scalability across teams

>>>>Next section