The shell is the interface between the user and the operating system kernel. In Linux, popular shells include Bash (Bourne - Again SHell). When working on machine learning projects, the shell allows you to execute commands, run scripts, and manage processes.
The working directory is the current location in the file system where the shell is operating. You can use the pwd
(print working directory) command to check the current working directory:
pwd
There are two types of paths in Linux: absolute and relative. An absolute path starts from the root directory (/
), while a relative path is relative to the current working directory. For example, if you want to list the contents of a directory named data
in the current working directory, you can use the relative path:
ls data
cd
(Change Directory): This command is used to change the current working directory.datasets
:cd datasets
- To move up one level in the directory tree:
cd..
ls
(List): Lists the contents of a directory.ls -l # Lists detailed information about files and directories
touch
: Create an empty file.touch new_file.txt
mkdir
: Create a new directory.mkdir new_directory
rm
: Remove files and directories.rm new_file.txt # Remove a file
rm -r new_directory # Remove a directory recursively
cp
(Copy): Copy files and directories.cp source_file.txt destination_folder/
mv
(Move/Rename): Move a file or rename it.mv old_name.txt new_name.txt
ps
(Process Status): Displays information about currently running processes.ps -ef # Shows all processes with full format
top
: Provides a real - time view of system processes and resource usage.top
kill
: Terminate a process. First, find the process ID (PID) using ps
, then:kill -9 <PID> # Forcefully terminate a process
apt
is used for package management. To install Python 3 and pip:sudo apt update
sudo apt install python3 python3 - pip
pip
is commonly used. To install a machine - learning library like numpy
:pip install numpy
virtualenv
: Create isolated Python environments.virtualenv ml_env # Create a virtual environment named ml_env
source ml_env/bin/activate # Activate the virtual environment
deactivate
grep
: Search for a pattern in a file. Suppose you have a data file data.txt
and you want to find all lines containing the word “error”:grep "error" data.txt
awk
: A powerful text - processing language. For example, if you have a CSV file data.csv
and you want to print the second column:awk -F ',' '{print $2}' data.csv
sort
: Sort the content of a file. To sort a file named numbers.txt
numerically:sort -n numbers.txt
uniq
: Remove duplicate lines from a sorted file.sort numbers.txt | uniq
Bash scripts can automate repetitive tasks in machine learning workflows. For example, the following script can create a virtual environment, activate it, and install necessary Python packages:
#!/bin/bash
# Create and activate virtual environment
virtualenv ml_automation_env
source ml_automation_env/bin/activate
# Install packages
pip install numpy pandas scikit - learn
To run the script, first make it executable:
chmod +x script.sh
./script.sh
The Linux command line offers a wide range of tools and techniques that are essential for machine learning. From file management to process management, package and environment management, and data manipulation, these capabilities help streamline the machine - learning workflow. By mastering these Linux command - line skills, machine learning practitioners can save time, increase efficiency, and better manage their projects.
pip
and virtualenv
.