Condemn none: if you can stretch out a helping hand, do so. If you cannot, fold your hands, bless your brothers, and let them go their own way. - Swami Vivekananda

I am trying to stretch a hand here :) !

This tutorial will go over the basics of working with arrays in python, as well various modules required for getting up and running with analyzing data. The goal is to give you enough to get you started and you should refer to other resources for more depth/breadth.

Running Interactive Shell (MAC/Linux)

Check if you have python installed by running python --version in your terminal.

You can run an interactive shell in your terminal by running the python command in your terminal.

If python is not installed refer the download page. It should be installed by default on ubuntu and on Mac.

I like using jupyter notebook when coding ML models in python. It is a python package that provides a convinient interactive shell that runs in your browser. You can also share your notebook with your friends very conviniently! they only have to have jupyter note book installed. You can install it by running pip install jupyter.

Once jupyter is installed, you can run the note book by running jupyter notebook in your shell.

Data Types and Variables

Contrast to statically typed languages like C++/C/Java, in Python there is no notion of explicitly declaring the type of a variable. The type of a variable is inferred at run-time, hence Python is a dynamically-typed language.

You can assign different data-types to a particular variable

i = 10 # int
j = 10.123 # floating point
a = "a" # str
b = [1, 2, 3] # list

You can check for the type of a variable by using the type function

i = 10
type(i)
int

Operators

The usual operators are used in python

  • + : addition
  • - : subtraction
  • * : multiplication
  • % : modulo
  • / : division
  • //: truncation division
  • ** : exponentiation

Logical operators

  • ~ : bitwise negation
  • or : logical or
  • and : logical and
  • not : logical not
  • <, <=, >, >=, !=, ==
  • , &, ^ : bitwise or, and, xor
  • in : element in (a list, tupple, dictionary..)
  • «, » : left shift and right shift

Conditional Statements and Control Flow

If/Else Block

It’s useful to be able to check whether a particular statement is True/False prior to executing a code block.

The if else block in python

if (True):
    print("if")
elif (statement):
    print("elif")
else:
    print("else")

Note statements used in must be boolean statements i.e. result in either a True or False value.

For/While Loop

Say we want to add the first 100 integers, we can do this using a for loop

cum_sum = 0
for i in range(101):
    cum_sum += i
print(cum_sum)
5050

The range(101) function creats an imutable sequence type, that starts at 0 and ends at 100. You can also call range with the a different start and step_size, but by default they are set to 0 and 1 respectively range(start, end, step_size)

The equivalent while loop of the above:

cum_sum = 0
i = 0
while(i < 100):
    i += 1
    cum_sum += i
print(cum_sum)
5050

Fibonacci using a loop

n = 11
old, new = 0, 1
for i in range(n - 3):
    t = new
    new = old + new
    old = t

print("fib_" + str(n - 2) + ": " + str(old) + " fib_" + str(n - 1) + ": " + str(new))
fib_9: 21 fib_10: 34

Lists/Tuples/Dicts

We will be dealing with different data structures, and those are:

  • Lists : mutable list
  • Tuples : non-mutable list
  • Dicts : mutable list of <key, value> pairs

Mutability means the data structure can be modified after creation.

Lists

You can create lists using brackets.

a = [1, 2, 3] # list with 3 elements
b = [10] * 10; # list with 100 elements all of which are 10.

print('a = ', a)
print('b = ', b)
a =  [1, 2, 3]
b =  [10, 10, 10, 10, 10, 10, 10, 10, 10, 10]

Indexes in python start from 0. You can index through a list using the indexing operator [] for example a[1], will index the 2nd element 2.

You can get the length of a list by using the len function.

len(a)
3

It is quite common to run through a list to check if a particular value is present. Python makes it is easy by providing the in operator. It checks if a particular value is present in a list. You can also use the not in which checks if it is not present in the list.

1 in a
True
10 not in a
True

List slicing

Python lists support slice indexing syntax. If you’re not familiar with slice indexing it’s simple…

print(a[start:end])

Will print elements of array a from $[start, end)$ exclusive (i.e. end will not be included in the set of indexes that will be returned).

Either start or end can be left out, for example

print(a[:end]) # will print from index 0 to end exclusive
print(a[start:]) # will print from index start to end of list

In general when slicing an array you can use set start, end, step, where

  • start: starting index to begin slicing
  • end: last index at which array will be sliced at
  • step: increments from start index to end index
a[start:end:step]
a = 'hello world'
print(a[0:3])  # print the first 3 characters
print(a[0:-5]) # print everything besides the last 5 characters (note the -5 used as the end value)
print(a[::-1]) # start from index 0 to end and take steps of -1
               # i.e. the slices will be end - 1, end - 2, .., 0
               # which will reverse the string; in one line!

hel
hello
dlrow olleh

List Comprehension

Say we want a list with all even numbers between 0 to 101. This can be written in one line using list comprehension as follows:

zero_to_101 = [i for i in range(101) if i % 2 == 0]
print(zero_to_101)

If you look at the way mathematicians would write out a set using math notation, you would see that there are quite a bit of similarity between list comprehension and the mathematical notation.

The above sets can be converted into list comprehension in one line of code

S = [x ** 3 for x in range(0, 10)]
S
[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]
A = [3 * i for i in range(0, 301)]
first_4_elements = str(A[:4]).strip(']')
last_4_elements =  str(A[-4:]).strip('[')
print(first_4_elements, " ,..., ", last_4_elements)
[0, 3, 6, 9  ,...,  891, 894, 897, 900]
D = [x for x in S if x in A]
print(D)
[0, 27, 216, 729]

list comprehensions are useful one lines if you want to create masks, set of indexes, etc..

Somethings to keep in mind

When working with lists you might find yourself the need to create a copy or extend a an existing one.

Simply assigning an array to another variable does not copy the content, it just copies a reference to the other variable, so you end up with two variables pointing to the same list.

a = [1, 2, 3]
b = a # does not create copy

To create a copy you can either call the list() built-in function or slice the array

b = a[:]  
#  OR
b = list(a)

There are different ways of appending/extending a list in python.

  • append
  • extend
    • operator

extend

extend method appends elements to the end of a list from an iterable list , so for example a.extend([1, 2]) will add elements, 1, 2 to list a like so [1, 2, 3, 1, 2]

note: An iterator is a stream of data. An iterable is an object that has an __iter__ method implemented that returns an iterator. Or it has the __getitem__ method implemented that given an index it returns an item.

append

Append is different from extend in that it appends the data to the end of the list.

a.append([1, 2]) # a will become [1, 2, 3, [1, 2]]

+ operator overloaded

The plus + operator is semantically similar to extend, i.e. when you’re extending a list with another object that object has to be iterable.

There is a difference if you were to use + or += when mutating a list. In the first case it will extend the array and create a new copy, where as the second case will extend the array in place.

a_list = a_list + your_list #will make a new copy
a_list += your_list #will extend a_list in place

Tuples

Tuples are similar to lists, but they are not mutable i.e. once a tuple has been created you cannot append more data to it.

>>> tupple = ("hello", "my", "name")
>>> tupple[0]
>>> 'hello'

Why use tuples instead of lists?

  • it protects against accidental change. If you know your list shouldn’t change then declare it as a tupple.
  • they are also faster than lists.

Dictionaries

Like lists dictionaries are mutable data structure. They can grow and shrink. The difference is in how elements are accessed. In dictionaries elements are accessed by using keys, whereas in lists we index through it using integer values.

Dictionaries in python are implemented as hash tables, so finding a particular value is fast!

full_name = {} # empty dictionary
full_name["First Name"] = "Sajad"
full_name["Last Name"]  = "Darabi"

print(full_name)
{'First Name': 'Sajad', 'Last Name': 'Darabi'}

We could have initialized the full_name dictionary in one line as well

full_name = {"First Name": "Sajad", "Last Name": "Darabi"}

We can access individual elements by using the corresponding key, for example full_name["First Name"]

Functions

Pass by reference

By default in python when passing a parameter to a function, it is passed by reference. Take a look at the function below

def passbyreference(x):
    print(x, id(x))
    x = 20
    print(x, id(x))

x = 10
print(x, id(x))
passbyreference(x)

We declare a variable x, and the id function returns a unique integer assigned to that variable. If we run the above code we get the following (note: the ids might be different if you run it on your own computer!)

def passbyreference(x):
    print(x, id(x))
    x = 20
    print(x, id(x))

x = 10
print(x, id(x))
passbyreference(x)

10 140132861012704
10 140132861012704
20 140132861013024

The id’s 139961556929248 are the same prior to mutating the input x, once it has been mutated within the function x = 20 the id’s 139961556929568 are different

Side effects

A function produces a side effect if it modifies the callers environment in any way in addition to returning a value. Typical side effects are mutating the data, printing to the screen, etc..

In python if a function uses the extending in place operator += on a list, it will mutate the original list.

def mutateList(l):
    print(l)
    l += [2, 1]
l = [4, 3]
mutateList(l)
print(l)
[4, 3]
[4, 3, 2, 1]

The function has modified the content of the original list. To avoid this you can pass the list by creating a new copy: mutateList(l[:]).

Using External Modules

We can use modules written by others by importing them into our python environment

Let’s use the module numpy this module is often used for processing large listst and matrices data structures.

Make sure you have numpy installed, if you don’t run pip install numpy in terminal. (you might have to use pip3 depending on how you have python setup on your computer)

import numpy as np
np.random.randn(5, 1) # creates a column vector with
                      #dimension 5 and assigns random values.
array([[ 0.74216898],
       [ 0.36478401],
       [-2.47659455],
       [ 0.25336232],
       [-1.36196602]])

Additional Resources

This should be enough to get you going. For additional resources you can refer to the following tutorials