Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
S
Stackoverflow_scrapping
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Muhammad Sabih Ur
Stackoverflow_scrapping
Commits
0a9fa80e
Commit
0a9fa80e
authored
Apr 03, 2023
by
Muhammad Sabih Ur
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Add new file
parents
Pipeline
#6
failed
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
55 additions
and
0 deletions
+55
-0
.gitlab-ci.yml
.gitlab-ci.yml
+55
-0
No files found.
.gitlab-ci.yml
0 → 100644
View file @
0a9fa80e
import requests
from bs4 import BeautifulSoup
import pandas as pd
header = {
'User-Agent'
:
'
Mozilla/5.0
(Windows
NT
10.0;
Win64;
x64)
AppleWebKit/537.36
(KHTML,
like
Gecko)
Chrome/58.0.3029.110
Safari/537.36'
,
'
Accept-Language'
:
'
en-US,en;q=0.5'
,
'
Accept-Encoding'
:
'
gzip,
deflate,
br'
,
'
Connection'
:
'
keep-alive'
,
}
data = []
def getQuestions(tag, pgno)
:
url = f'https://stackoverflow.com/questions/tagged/{tag}?tab=newest&page={pgno}&pagesize=50'
try
:
r = requests.get(url, headers=header)
soup = BeautifulSoup(r.text, 'html.parser')
except Exception as e
:
print(f"An error occurred
:
{
e
}
"
)
questions
=
soup.find_all('div',
{'class':
's-post-summary'})
#
print(questions)
for
item
in
questions:
question
=
{
'title':
item.find('a',
{'class':
's-link'}).text.strip(),
'description':
item.find('div',
{'class':
's-post-summary--content-excerpt'}).text.strip(),
'date':
item.find('span',
{'class':
'relativetime'})['title'],
'link':
'https://stackoverflow.com/'
+
item.find('a',
{'class':
's-link'})['href'],
#
print(vote)
#
print(link)
#
print(description)
#
print(votes)
}
#
print(question)
data.append(question)
return
#
Total
pages
we
have
for
python
tag
"42473"
for x in range(102, 201)
:
getQuestions('python', x)
df = pd.DataFrame(data)
# print(len(data))
print(df.head())
df.to_csv('F:\StacksOverflow\stacks3.csv', index=False)
print("Done")
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment